Tools I: Packaging and virtual-environments

Overview

Teaching: 4 min
Exercises: 6 min
Questions
  • How to use a package manager to install third party tools and libraries

Objectives
  • Use conda to install a reproducible environment

Python packages

Python virtual environments

Package managers

Package managers help you install packages. Some help you install virtual environments as well. Better known python package managers include conda, pip, poetry

  conda pip poetry
audience research all developers
manage python packages
manage non-python packages
choose python version
manage virtual envs
easy interface
fast

Rules for choosing a package manager

  1. Choose one
  2. Stick with it

We chose conda because it is the de facto standard in science, and because it can natively install libraries such as fftw, vtk, or even Python, R, and Julia themselves.

It is also the de facto package manager on Imperial’s HPC cluster systems.

Example

Installing and using an environment

  1. If you haven’t already, see the setup guide for instructions on how to install conda, Visual Studio Code and Git.

  2. Create a new folder to use for this course. Avoid giving it a name that includes spaces. If you’re using an ICT managed PC the folder must be located in your user area on the C: drive i.e. C:\Users\UserName (Note that files placed here are not persistent so you must remember to take a copy before logging out). Start Visual Studio Code and select “Open folder…” from the welcome screen. Navigate to the folder you just created and press “Select Folder”.

  3. Press “New file” and copy the below text. Save the file as environment.yml, the location should default to your newly created folder.

    name: course
    dependencies:
      - python>=3.6
      - flake8
      - pylint
      - black
      - mypy
      - requests
      - pip
      - pip:
        - -e git+https://github.com/ImperialCollegeLondon/R2T2.git#egg=r2t2
    
  4. Create a new virtual environment using conda:

    Windows users will want to start the app Anaconda Prompt from the Start Menu.

    Linux and Mac users should use a terminal app of their choice. You may see a warning with instructions. Please follow the instructions.

    conda env create -f [path to environment.yml]
    

    You can obtain [path to environment.yml] by right clicking the file tab near the top of Visual Studio Code and selecting “Copy Path” from the drop-down menu. Right click on the window for your command line interface to paste the path.

  5. We can now activate the environment:

    conda activate course
    
  6. And check python knows about the installed packages. Start a Python interpreter with the command python then:

    import requests
    

    We expect this to run and not fail. You can see the location of the installed package with:

    requests.__file__
    
    'C:\\ProgramData\\Anaconda3\\envs\\course\\lib\\site-packages\\requests\\__init__.py'
    

    The file path you see will vary but note that it is within a directory called course that contains the files for the virtual environment you have created. Exit the Python interpreter:

    exit()
    
  7. Finally, feel free to remove requests from environment.yml, then run

    conda env update -f [path to environment.yml]
    

    and see whether the package has been updated or removed.

Selecting an environment in Visual Studio Code

If you haven’t already, see the setup guide for instructions on how to install Visual Studio (VS) Code.

On Linux and Mac, one option is to first activate conda, and then start VS Code:

> conda activate name_of_environment
> code .

The simplest option for all platforms is to set the interpreter is via the Command Palette:

If you already have a Python file open then it’s also possible to set the interpreter using the toolbar at the bottom of the window.

Installing an editable package

Editable packages are packages that you can modify for development and have python immediately recognize your changes.

Look at the last few lines of environment.yml. It installs r2t2 in editable mode. The package is automatically downloaded from the web and installed next to environment.yml in the subfolder src/r2t2.

Try and add print("Hello!") to src/r2t2/r2t2/__init__.py.

Then start python and do

import r2t2

Your greeting should appear: python did indeed take the modified file into account.

Note that r2t2 was setup as a python package with a standard directory structure and a setup.py file. It’s well worth investing 10 minutes into transforming a python script into a package just to make it a shareable development environment.

Choosing the installation directory for R2T2

It would be nice if we could choose the directory where the editable package goes, i.e. rather than have r2t2 install in src/r2t2 we might want to install it directly in an r2t2 subfolder.

Nominally, pip does allow us to do that with –src.

However, it is not (yet) possible to tell conda to tell to use a given option, as highlighted in this issue. But that’s where the fun begins, because conda is an open-source effort, you could pitch in and try and add a feature or a fix. There is a lot to learn just from lurking around issues of open-source projects, whether it is about the project itself, or even about language design. There is even more to learn from participating.

Key Points

  • There are tens of thousands of Python packages

  • The choice is between reinventing the square wheel or reusing existing work

  • The state of an environment can be stored in a file

  • This stored environment is then easy to audit and recreate