There are two ways to execute Jupyter notebooks on the RCS compute service.
Firstly, there’s the familiar web interface (College network or VPN only). Here you can choose an instance with multiple CPUs, large memory and/or a GPU before uploading or creating a notebook and then running it interactively. Jobs are limited to 8 hours in duration, but during this time you can close your browser and return at any point.
Secondly, you can run notebooks non-interactively on any of the usual job queues, and then use the web interface to view the results. To do this you’ll need to create a conda environment containing all the dependencies that your job needs, ensuring that you also include jupyter
and ipykernel
. For example:
conda create --name my_env python=3.7 ipykernel jupyter pandas matplotlib --yes
And then use qsub
to submit a job script like this:
#PBS -l walltime=01:00:00,select=1:ncpus=1:mem=4G
cd $PBS_O_WORKDIR
eval "$($HOME/anaconda3/bin/conda shell.bash hook)"
conda run --name my_env python -mjupyter nbconvert my_notebook.ipynb --execute --inplace --ExecutePreprocessor.kernel_name=python3 --ExecutePreprocessor.timeout=-1
Further resources
- The Research Computing Service’s Essential Software Engineering for Researchers course explains more about virtual environments and package management, including Conda
- Running Jupyter notebooks on Imperial College’s compute cluster demonstrates a more advanced real-world example of machine learning with GPUs