Snakemake is a popular tool for running data analysis workflows. Like Nextflow, it’s a scalable alternative to writing shell scripts for co-ordinating complex workflows. Unlike Nextflow it’s inspired by Make (focused on “targets” or outputs) and uses a Python based language for defining pipelines (rather than Groovy). It also integrates well with Conda - optionally creating environments on-demand for each stage of a workflow.
The rcs-snakemake-tutorial repository demonstrates how to run a lightly-modified version of the official Snakemake short tutorial on the RCS compute service.
Note that in this workflow Snakemake executes rules in parallel but doesn’t create new jobs. This is possible but requires use of an advanced feature (Cluster Execution) that we’ll describe in a future tip.
Further resources
- Running long jobs on the RCS compute service
- Getting started on the RCS web pages