For various reasons Docker isn’t generally available on compute clusters, including the RCS compute service. However, Singularity is, and provides a way of running Docker images. This can be useful if a package you wish to use isn’t already accessible as a module or via Conda but is available from Docker Hub or elsewhere.
Here’s an example using VEP, which has many dependencies and can therefore be difficult to install otherwise.
First, log into the compute service and download an example VCF file to your current directory:
curl -O https://raw.githubusercontent.com/Ensembl/ensembl-vep/release/99/examples/homo_sapiens_GRCh38.vcf
Then you can just run the Docker version of VEP using Singularity:
singularity run docker://ensemblorg/ensembl-vep /opt/vep/src/ensembl-vep/vep -i homo_sapiens_GRCh38.vcf --database
If you’re using this for real data then you should be using a job script. For VEP you should also look into the --cache
and --fork
arguments.
Further resources
- Singularity User Guide and Sylabs Cloud (which hosts a more limited range of Singularity images)
- Running Scientific Applications on HPC Infrastructure Using Singularity: A Case Study (Jeremy Cohen, Department of Computing)
- Quay is another container registry. It hosts many applications from the Biocontainers project, which aims to deliver automated container builds of the numerous bioinformatics packages available through Bioconda.