Essential Software Engineering for Researchers: Summary of Commands

Key Points

Tools I: Packaging and virtual-environments
  • There are tens of thousands of Python packages

  • The choice is between reinventing the square wheel or reusing existing work

  • The state of an environment can be stored in a file

  • This stored environment is then easy to audit and recreate

Data Structures
  • Data structures are the representation of information the same way algorithms represent logic and workflows

  • Using the right data structure can vastly simplify code

  • Basic data structures include numbers, strings, tuples, lists, dictionaries and sets.

  • Advanced data structures include numpy array and panda dataframes

  • Classes are custom data structures

Tools II: Code Formatters
  • Code formatting means how the code is typeset

  • It influences how easily the code is read

  • It has no impact on how the code runs

  • Almost all editors and IDEs have some means to set up an automatic formatter

  • 5 minutes to set up the formatter is redeemed across the time of the project i.e. the cost is close to nothing

Structuring code
  • Code is read more often than written

  • The structure of the code should follow a well written explanation of its algorithm

  • Separate level of abstractions (or details) in a problem must be reflected in the structure of the code, e.g. with separate functions

  • Separable concerns (e.g. reading an input file to create X, creating X, computing stuff with X, saving results to file) must be reflected in the structure of the code

Tools III: Linters
  • Linting is about discovering errors and code-smells before running the code

  • It shortcuts the “edit-run-debug and repeat” workflow

  • Almost all editors and IDEs have some means to setup automatic linting

  • 5 minutes to setup a linter is redeemed across the time of the project i.e. the cost is close to nothing

Design Patterns
  • Many coders have come before

  • Transferable solutions to common problems have been identified

  • It is easier to apply a known pattern than to reinvent it, but first you have to spend some time learning about patterns.

  • Iterators and generators are convenient patterns to separate loops from compute and to avoid copy-pasting.

  • Dependency injections is a pattern to create modular algorithms

Testing Overview
  • Testing is the standard approach to software quality assurance

  • Testing helps to ensure that code performs its intended function: well-tested code is likely to be more reliable, correct and malleable

  • Good tests thoroughly exercise critical code

  • Code without any tests should arouse suspicion, but it is entirely possible to write a comprehensive but practically worthless test suite

  • Testing can contribute to performance, security and long-term stability as the size of the codebase and its network of contributors grows

  • Testing can ensure that software has been installed correctly, is portable to new platforms, and is compatible with new versions of its dependencies

  • In the context of research software, testing can be used to validate code i.e. ensure that it faithfully implements scientific theory

  • Unit (e.g. a function); Functional (e.g. a library); and Regression, (e.g. a bug) are three commonly used types of tests

  • Test coverage can provide a coarse- or fine-grained metric of comprehensiveness, which often provides a signal of code quality

  • Automated testing is another such signal: it lowers friction; ensures that breakage is identified sooner and isn’t released; and implies that machine-readable instructions exist for building and code and running the tests

  • Testing ultimately contributes to sustainability i.e. that software is (and remains) fit for purpose as its functionality and/or contributor-base grows, and its dependencies and/or runtime environments change

Writing unit tests
  • Testing is not only standard practice in mainstream software engineering, it also provides distinct benefits for any non-trivial research software

  • pytest is a powerful testing framework, with more functionality than Python’s built-in features while still catering for simple use cases

  • Testing with VS Code is straightforward and encourages good habits (writing tests as you code, and simplifying test-driven development)

  • It is important to have a set-up you can use for every project - so that it becomes as routine in your workflow as version control itself

  • pytest has a myriad of extensions that are worthy of mention such as Jupyter, benchmark, Hypothesis etc

  • Adding unit tests can verify the correctness of software and improve its structure: isolating logical distinct code for testing often involves untangling complex structures

Summary of Commands

Action Command
Create a new repository git init

Glossary

git
A widely used implementation of a Version Control System