Course overview
|
|
Tools I: Packaging and virtual-environments
|
There are tens of thousands of Python packages
The choice is between reinventing the square wheel or reusing existing work
The state of an environment can be stored in a file
This stored environment is then easy to audit and recreate
|
Tools II: Code Formatters
|
Code formatting means how the code is typeset
It influences how easily the code is read
It has no impact on how the code runs
Almost all editors and IDEs have some means to set up an automatic formatter
5 minutes to set up the formatter is redeemed across the time of the project i.e. the cost is close to nothing
|
Tools III: Linters
|
Linting is about discovering errors and code-smells before running the code
It shortcuts the “edit-run-debug and repeat” workflow
Almost all editors and IDEs have some means to setup automatic linting
5 minutes to setup a linter is redeemed across the time of the project i.e. the cost is close to nothing
|
Data Structures
|
Data structures are the representation of information the same way algorithms represent logic and workflows
Using the right data structure can vastly simplify code
Basic data structures include numbers, strings, tuples, lists, dictionaries and sets.
Advanced data structures include numpy array and panda dataframes
Classes are custom data structures
|
Structuring code
|
Code is read more often than written
The structure of the code should follow a well written explanation of its algorithm
Separate level of abstractions (or details) in a problem must be reflected in the structure of the code, e.g. with separate functions
Separable concerns (e.g. reading an input file to create X, creating X, computing stuff with X, saving results to file) must be reflected in the structure of the code
|
Testing Overview
|
Testing is the standard approach to software quality assurance
Testing helps to ensure that code performs its intended function: well-tested code is likely to be more reliable, correct and malleable
Good tests thoroughly exercise critical code
Code without any tests should arouse suspicion, but it is entirely possible to write a comprehensive but practically worthless test suite
Testing can contribute to performance, security and long-term stability as the size of the codebase and its network of contributors grows
Testing can ensure that software has been installed correctly, is portable to new platforms, and is compatible with new versions of its dependencies
In the context of research software, testing can be used to validate code i.e. ensure that it faithfully implements scientific theory
Unit (e.g. a function); Functional (e.g. a library); and Regression, (e.g. a bug) are three commonly used types of tests
Test coverage can provide a coarse- or fine-grained metric of comprehensiveness, which often provides a signal of code quality
Automated testing is another such signal: it lowers friction; ensures that breakage is identified sooner and isn’t released; and implies that machine-readable instructions exist for building and code and running the tests
Testing ultimately contributes to sustainability i.e. that software is (and remains) fit for purpose as its functionality and/or contributor-base grows, and its dependencies and/or runtime environments change
|
Writing unit tests
|
Testing is not only standard practice in mainstream software engineering, it also provides distinct benefits for any non-trivial research software
pytest is a powerful testing framework, with more functionality than Python’s built-in features while still catering for simple use cases
Testing with VS Code is straightforward and encourages good habits (writing tests as you code, and simplifying test-driven development)
It is important to have a set-up you can use for every project - so that it becomes as routine in your workflow as version control itself
pytest has a myriad of extensions that are worthy of mention such as Jupyter, benchmark, Hypothesis etc
|
Unit Testing Challenge
|
Writing unit tests can reveal issues with code that otherwise appears to run correctly
Adding unit tests can improve software structure: isolating logical distinct code for testing often involves untangling complex structures
pytest can be used to add simple tests for python code but also be leveraged for more complex uses like parametrising tests and adding tests to docstrings
|
Advanced Topic: Design Patterns
|
Many coders have come before
Transferable solutions to common problems have been identified
It is easier to apply a known pattern than to reinvent it, but first you have to spend some time learning about patterns.
Iterators and generators are convenient patterns to separate loops from compute and to avoid copy-pasting.
Dependency injections is a pattern to create modular algorithms
|