Job dependencies

To understand basics of job dependencies, signals and checkpoints is important for reliable, smooth and convenient workflow.

The SLURM dependency feature is useful when you need to chain jobs that have dependencies like a preprocessing job using 1 core should be followed by a simulation job using 64 cores or the results should then be post-processed in a single-core job. If you want to user advanced features please consult sbatch-documentation

Example 1: Preprocess a job

First we will create a text file by a job as preparement for a primary job and then let the primary job read the textfile. Lets save the pre-processjob as preprocess.subm:

#!/bin/bash

#SBATCH --time=00:06:00
#SBATCH --ntasks=1
#SBATCH --output=input.next

sleep 300

echo "${SLURM_JOB_ID}: there is nothing to see here" > input.txt

The second job should now use the output (input.txt) from our “preprocess”-job. For illusatration we will simply show files content on 64 cores, with our submission script process.subm.

#!/bin/bash

#SBATCH --time=00:02:00
#SBATCH --ntasks=64 --nodes=1
#SBATCH --output=output.next

FILE="./input.txt"

srun --ntasks=${SLURM_NTASKS} --cpu-bind=cores \
    bash -c "echo \$SLURM_PROCID: $(cat ${FILE})" # (1)

Run cat on 64 tasks

So now we get to know the --dependency=_<dependency_list>_ option. For our task we need the process-job only to start when the preprocess-job has runned successfully. To run a job only if another one retuns EXIT_SUCCESS we could specify the dependency as --dependency=afterok:_<jobid>_.

To our example this means we submit the preprocess job with the --parsable option, so the submssion commands output is limited to jobid. This output we will directly store to PreJID-Variable:

PreJID=$(sbatch --parsable ./preprocess.subm )

Lets do our conditional job submission and enqueue our process-job to start onyl if our preprocess -job succeeds:

sbatch --dependency=afterok:${PreJID} ./process.subm

If the two jobs have runned there should be a “./input.txt” with one line and a “output.next” repeating “./input.txt” content multiple times:

[...]
30: 5584: there is nothing to see here
31: 5584: there is nothing to see here
33: 5584: there is nothing to see here
35: 5584: there is nothing to see here
37: 5584: there is nothing to see here
39: 5584: there is nothing to see here
[...]

Example 2: data-staging example

This example would be a bit more complex. This time we will try to apply following scheme with our jobs:

flowchart LR
            A("prepare data") --> B{"for(i=0 ;i<=10; i++)"};
            B --> |"true"|C("processing<br>data");
            B --> |"false"|E["stage out data"];
            C --> D{"<br>success?<br><br>"};
            D --> |"yes"|B;
            D --> |"no"|E("stage out data");

So we have 3 phases: one preparement or “staging in phase”, a phase of 10 iterations processing data and a phase of staging data out.

Preparation

To go through this by example we have to do a little preparation.

First of all let us create a submit directory and cd into it.

mkdir $HOME/datastaging && cd $HOME/datastaging

PreProcess script

Next step we will create our (very simple) pre processing script (prep.subm):

#!/bin/bash

#SBATCH --time=00:01:00
#SBATCH --ntasks=1
#SBATCH --output=%J.prep

sleep 10 

mkdir -p ${STAGE_IN_DIR} && cd ${STAGE_IN_DIR}
echo "0" > fb.txt
echo "1" >> fb.txt

Process script

The proccessing script (process.subm) just continues the fibonacci series…

#!/bin/bash

#SBATCH --time=00:00:30
#SBATCH --ntasks=1
#SBATCH --output=/dev/null

sleep 15 # (1)

cd ${STAGE_IN_DIR}
f_0=$(tail -2 fb.txt | head -1)
f_1=$(tail -1 fb.txt)

let "f_n1 = $f_0 + $f_1" # (1)
echo "$f_n1" >> fb.txt

A bit waiting time for illustration purposes.
generate next fibonacci number

staging data out

The last submission script (postprocess.subm) we will run if all other jobs ended or one failed just copies the data back to our home directory.

#!/bin/bash

#SBATCH --time=00:01:20
#SBATCH --ntasks=1
#SBATCH --output=/dev/null

rsync -av --remove-source-files ${STAGE_IN_DIR}/fb.txt \
        $HOME/datastaging/fb.${SLURM_JOB_ID} # (1)

copy our fb file from the “unsave” /workdir to our home.

submitting

To fit this all togehter we save this launch.bash script in the same directory as our submission scripts

#!/bin/bash

STAGE_IN_DIR="/workdir/$USER/datastaging" # (1)

A_JOBID=$(sbatch --parsable prep.subm) # (2)

JOBIDS=${A_JOBID} 

for i in {1..10}; do
    B_JOBID="$(sbatch --parsable --dependency=afterok:${A_JOBID} process.subm)"
    JOBIDS="${JOBIDS}:${B_JOBID}" # (3) 
    A_JOBID=${B_JOBID}
done

sbatch --quiet --dependency=afterok:${A_JOBID}?afternotok:${JOBIDS} \ # (4)
    postprocess.subm

define compute directory
first is used for preparation purpose
build a chain of jobids in a n:n+1:n+2:… manner
start after ${A_JOBID} returns successfull or jobid of n:n+1:n+2:… fails. Where the ? means “or”. This will for example expand to ```–depenendency=afterok:32?afternotok32:31:30:29:28:27:26:25:24:23

Add the --partition=dev option for small jobs

Make it exeutable:

chmod +x ./launch.bash

And start it:

./launch.bash

check job

If you use the squeue command you should see something like this:

[bt123456@festus01 datastaging]$ squeue 
     JOBID PARTITION               NAME     USER            ACCOUNT ST      TIME  TIME_LEFT  NODES EXEC_HOST NODELIST(REASON)
      5763    normal   postprocess.subm bt123456 vo_it-servicezentr PD      0:00       2:00      1       n/a (Dependency)
      5762    normal       process.subm bt123456 vo_it-servicezentr PD      0:00       1:00      1       n/a (Dependency)
      5761    normal       process.subm bt123456 vo_it-servicezentr PD      0:00       1:00      1       n/a (Dependency)
      5760    normal       process.subm bt123456 vo_it-servicezentr PD      0:00       1:00      1       n/a (Dependency)
      5759    normal       process.subm bt123456 vo_it-servicezentr PD      0:00       1:00      1       n/a (Dependency)
      5758    normal       process.subm bt123456 vo_it-servicezentr PD      0:00       1:00      1       n/a (Dependency)
      5757    normal       process.subm bt123456 vo_it-servicezentr PD      0:00       1:00      1       n/a (Dependency)
      5756    normal       process.subm bt123456 vo_it-servicezentr PD      0:00       1:00      1       n/a (Dependency)
      5755    normal       process.subm bt123456 vo_it-servicezentr PD      0:00       1:00      1       n/a (Dependency)
      5754    normal       process.subm bt123456 vo_it-servicezentr PD      0:00       1:00      1       n/a (Dependency)
      5753    normal       process.subm bt123456 vo_it-servicezentr  R      0:06       0:54      1    s76a05 s76a05

When all jobs are done you should have a file named fb.<last jobid>, containing fibonacci series until f(11).

second condition

There is also a second dependency in this chain namely afternotok:${JOBIDS}which tells slurm to run the stage-out job (postprocess.subm) if any jobs in the list fail. So lets start this job again but this time we cancel one of the process jobs:

./launch.bash

Now lets wait until a few jobs are done:

[bt123456@festus01 datastaging]$ squeue 
     JOBID PARTITION               NAME     USER            ACCOUNT ST      TIME  TIME_LEFT  NODES EXEC_HOST NODELIST(REASON)
      5788    normal   postprocess.subm bt123456 vo_it-servicezentr PD      0:00       2:00      1       n/a (Dependency)
      5787    normal       process.subm bt123456 vo_it-servicezentr PD      0:00       1:00      1       n/a (Dependency)
      5786    normal       process.subm bt123456 vo_it-servicezentr PD      0:00       1:00      1       n/a (Dependency)
      5785    normal       process.subm bt123456 vo_it-servicezentr PD      0:00       1:00      1       n/a (Dependency)
      5784    normal       process.subm bt123456 vo_it-servicezentr  R      0:02       0:58      1    s76a05 s76a05

and then kill the running one:

scancel 5784

This will return a “not ok” for the canceled job, so our condition to run “postprocess.subm” is fulfilled. If we take a look at the staged-out file we will see it is moved to our home and contains less numbers then our previous example:

[bt123456@festus01 datastaging]$ cat fb.5788 
0
1
1
2
3
5
8
13