Nextflow is a popular tool for running multi-stage computational pipelines. It’s a scalable alternative to writing shell scripts for co-ordinating complex workflows, particularly those involving data processing. It supports pipeline resumption and job queuing so is well-suited for use on the RCS compute service.
Existing workflows can be run mostly unmodified, if the following advice is followed:
nextflow
itself (i.e. the workflow coordinator) should be run in a job- Processes to be run by the coordinator (i.e. in serial) should be run using the
local
executor. - Processes to be run in jobs created by the coordinator (i.e. in parallel) should be labelled with the name of the relevant job class e.g.
throughput
- The
queueSize
should be set to less than or equal to 50
For a full example, including a file highlighting the changes required to the official tutorial, please see this Gist. To run this example on the compute service simply clone and run it:
git clone https://gist.github.com/322369519b5dfd0195e3645d82bfe909.git nextflow-tutorial
cd nextflow-tutorial
qsub tutorial.pbs.sh
Further resources
- Getting started on the RCS web pages
- The #Nextflow channel on the Imperial Research Software Community Slack is a good place to ask further questions