Array jobs

Running a number of independent but similar jobs in parallel on the RCS compute service? By submitting a single array job you can efficiently run a large set of parameterised tasks.

For example: imagine that we want to identify prime numbers from a list of 100 numbers stored in a file (numbers.txt) and that we have a program (is_prime.py) that can check whether a number is prime:

$ head numbers.txt
923342417
169896723
827992835
244634106
829135501

$ python3 is_prime.py 923342417
True

We can create an array job of 100 sub-jobs using a -J 1-100 directive in our job script (is_prime.pbs.sh). This results in each sub-job being passed an index between 1 and 100 via the PBS_ARRAY_INDEX variable, which it can use to extract the corresponding entry from our list of numbers (e.g. using sed). It passes the number to an instance of our program, appending the output to a file shared between all our sub-jobs:

#PBS -J 1-100
N=$(sed -n "${PBS_ARRAY_INDEX}p" numbers.txt)
python3 is_prime.py $N >>results.txt

For the full example see this Gist. In reality each entry in the numbers.txt file would consist of a longer list of arguments and/or flags to be passed to the relevant program.

This approach can be combined with job dependencies by scheduling a job to run after all the sub-jobs have completed. For example to summarise the results of the array job we can use a dependent non-array job script (summary.pbs.sh) that includes the following:

echo "The following numbers were found to be prime:"
fgrep True results.txt | cut -d' ' -f1

This should be submitted to run after the array job completes successfully:

$ qsub is_prime.pbs.sh
1913586[].pbs
$ qsub -W depend=afterok:1913586[] summary.pbs.sh

Note that array job IDs have a suffix [], which must be included when referring to the job e.g. qdel 1913586[].

Further resources