Research Computing Tips: Using Nextflow on the RCS compute service

Nextflow is a popular tool for running multi-stage computational pipelines. It’s a scalable alternative to writing shell scripts for co-ordinating complex workflows, particularly those involving data processing. It supports pipeline resumption and job queuing so is well-suited for use on the RCS compute service.

Existing workflows can be run mostly unmodified, if the following advice is followed:

nextflow itself (i.e. the workflow coordinator) should be run in a job
Processes to be run by the coordinator (i.e. in serial) should be run using the local executor.
Processes to be run in jobs created by the coordinator (i.e. in parallel) should be labelled with the name of the relevant job class e.g. throughput
The queueSize should be set to less than or equal to 50

For a full example, including a file highlighting the changes required to the official tutorial, please see this Gist. To run this example on the compute service simply clone and run it:

git clone https://gist.github.com/322369519b5dfd0195e3645d82bfe909.git nextflow-tutorial
cd nextflow-tutorial
qsub tutorial.pbs.sh

Further resources

Getting started on the RCS web pages
The #Nextflow channel on the Imperial Research Software Community Slack is a good place to ask further questions