Job dependencies
To understand basics of job dependencies, signals and checkpoints is important for reliable, smooth and convenient workflow.
The SLURM dependency feature is useful when you need to chain jobs that have dependencies like a preprocessing job using 1 core should be followed by a simulation job using 64 cores or the results should then be post-processed in a single-core job. If you want to user advanced features please consult sbatch-documentation
Example 1: Preprocess a job
First we will create a text file by a job as preparement for a primary job and then let the primary job read the textfile. Lets save the pre-processjob as preprocess.subm:
#!/bin/bash
#SBATCH --time=00:06:00
#SBATCH --ntasks=1
#SBATCH --output=input.next
sleep 300
echo "${SLURM_JOB_ID}: there is nothing to see here" > input.txt
The second job should now use the output (input.txt) from our “preprocess”-job. For illusatration we will simply show files content on 64 cores, with our submission script process.subm.
#!/bin/bash
#SBATCH --time=00:02:00
#SBATCH --ntasks=64 --nodes=1
#SBATCH --output=output.next
FILE="./input.txt"
srun --ntasks=${SLURM_NTASKS} --cpu-bind=cores \
bash -c "echo \$SLURM_PROCID: $(cat ${FILE})" # (1)
- Run
cat
on 64 tasks
So now we get to know the --dependency=_<dependency_list>_
option. For our task we need the process-job only to start when the preprocess-job has runned successfully. To run a job only if another one retuns EXIT_SUCCESS we could specify the dependency as --dependency=afterok:_<jobid>_
.
To our example this means we submit the preprocess job with the --parsable
option, so the submssion commands output is limited to jobid. This output we will directly store to PreJID-Variable:
PreJID=$(sbatch --parsable ./preprocess.subm )
Lets do our conditional job submission and enqueue our process-job to start onyl if our preprocess -job succeeds:
sbatch --dependency=afterok:${PreJID} ./process.subm
If the two jobs have runned there should be a “./input.txt” with one line and a “output.next” repeating “./input.txt” content multiple times:
[...]
30: 5584: there is nothing to see here
31: 5584: there is nothing to see here
33: 5584: there is nothing to see here
35: 5584: there is nothing to see here
37: 5584: there is nothing to see here
39: 5584: there is nothing to see here
[...]
Example 2: data-staging example
This example would be a bit more complex. This time we will try to apply following scheme with our jobs:
flowchart LR
A("prepare data") --> B{"for(i=0 ;i<=10; i++)"};
B --> |"true"|C("processing<br>data");
B --> |"false"|E["stage out data"];
C --> D{"<br>success?<br><br>"};
D --> |"yes"|B;
D --> |"no"|E("stage out data");
So we have 3 phases: one preparement or “staging in phase”, a phase of 10 iterations processing data and a phase of staging data out.
Preparation
To go through this by example we have to do a little preparation.
First of all let us create a submit directory and cd into it.
mkdir $HOME/datastaging && cd $HOME/datastaging
PreProcess script
Next step we will create our (very simple) pre processing script (prep.subm):
#!/bin/bash
#SBATCH --time=00:01:00
#SBATCH --ntasks=1
#SBATCH --output=%J.prep
sleep 10
mkdir -p ${STAGE_IN_DIR} && cd ${STAGE_IN_DIR}
echo "0" > fb.txt
echo "1" >> fb.txt
Process script
The proccessing script (process.subm) just continues the fibonacci series…
#!/bin/bash
#SBATCH --time=00:00:30
#SBATCH --ntasks=1
#SBATCH --output=/dev/null
sleep 15 # (1)
cd ${STAGE_IN_DIR}
f_0=$(tail -2 fb.txt | head -1)
f_1=$(tail -1 fb.txt)
let "f_n1 = $f_0 + $f_1" # (1)
echo "$f_n1" >> fb.txt
- A bit waiting time for illustration purposes.
- generate next fibonacci number
staging data out
The last submission script (postprocess.subm) we will run if all other jobs ended or one failed just copies the data back to our home directory.
#!/bin/bash
#SBATCH --time=00:01:20
#SBATCH --ntasks=1
#SBATCH --output=/dev/null
rsync -av --remove-source-files ${STAGE_IN_DIR}/fb.txt \
$HOME/datastaging/fb.${SLURM_JOB_ID} # (1)
- copy our fb file from the “unsave” /workdir to our home.
submitting
To fit this all togehter we save this launch.bash script in the same directory as our submission scripts
#!/bin/bash
STAGE_IN_DIR="/workdir/$USER/datastaging" # (1)
A_JOBID=$(sbatch --parsable prep.subm) # (2)
JOBIDS=${A_JOBID}
for i in {1..10}; do
B_JOBID="$(sbatch --parsable --dependency=afterok:${A_JOBID} process.subm)"
JOBIDS="${JOBIDS}:${B_JOBID}" # (3)
A_JOBID=${B_JOBID}
done
sbatch --quiet --dependency=afterok:${A_JOBID}?afternotok:${JOBIDS} \ # (4)
postprocess.subm
- define compute directory
- first is used for preparation purpose
- build a chain of jobids in a n:n+1:n+2:… manner
- start after ${A_JOBID} returns successfull or jobid of n:n+1:n+2:… fails. Where the ? means “or”. This will for example expand to ```–depenendency=afterok:32?afternotok32:31:30:29:28:27:26:25:24:23
Add the --partition=dev
option for small jobs
Make it exeutable:
chmod +x ./launch.bash
And start it:
./launch.bash
check job
If you use the squeue
command you should see something like this:
[bt123456@festus01 datastaging]$ squeue
JOBID PARTITION NAME USER ACCOUNT ST TIME TIME_LEFT NODES EXEC_HOST NODELIST(REASON)
5763 normal postprocess.subm bt123456 vo_it-servicezentr PD 0:00 2:00 1 n/a (Dependency)
5762 normal process.subm bt123456 vo_it-servicezentr PD 0:00 1:00 1 n/a (Dependency)
5761 normal process.subm bt123456 vo_it-servicezentr PD 0:00 1:00 1 n/a (Dependency)
5760 normal process.subm bt123456 vo_it-servicezentr PD 0:00 1:00 1 n/a (Dependency)
5759 normal process.subm bt123456 vo_it-servicezentr PD 0:00 1:00 1 n/a (Dependency)
5758 normal process.subm bt123456 vo_it-servicezentr PD 0:00 1:00 1 n/a (Dependency)
5757 normal process.subm bt123456 vo_it-servicezentr PD 0:00 1:00 1 n/a (Dependency)
5756 normal process.subm bt123456 vo_it-servicezentr PD 0:00 1:00 1 n/a (Dependency)
5755 normal process.subm bt123456 vo_it-servicezentr PD 0:00 1:00 1 n/a (Dependency)
5754 normal process.subm bt123456 vo_it-servicezentr PD 0:00 1:00 1 n/a (Dependency)
5753 normal process.subm bt123456 vo_it-servicezentr R 0:06 0:54 1 s76a05 s76a05
When all jobs are done you should have a file named fb.<last jobid>, containing fibonacci series until f(11).
0
1
1
2
3
5
8
13
21
34
55
89
second condition
There is also a second dependency in this chain namely afternotok:${JOBIDS}
which tells slurm to run the stage-out job (postprocess.subm) if any jobs in the list fail. So lets start this job again but this time we cancel one of the process jobs:
./launch.bash
Now lets wait until a few jobs are done:
[bt123456@festus01 datastaging]$ squeue
JOBID PARTITION NAME USER ACCOUNT ST TIME TIME_LEFT NODES EXEC_HOST NODELIST(REASON)
5788 normal postprocess.subm bt123456 vo_it-servicezentr PD 0:00 2:00 1 n/a (Dependency)
5787 normal process.subm bt123456 vo_it-servicezentr PD 0:00 1:00 1 n/a (Dependency)
5786 normal process.subm bt123456 vo_it-servicezentr PD 0:00 1:00 1 n/a (Dependency)
5785 normal process.subm bt123456 vo_it-servicezentr PD 0:00 1:00 1 n/a (Dependency)
5784 normal process.subm bt123456 vo_it-servicezentr R 0:02 0:58 1 s76a05 s76a05
and then kill the running one:
scancel 5784
This will return a “not ok” for the canceled job, so our condition to run “postprocess.subm” is fulfilled. If we take a look at the staged-out file we will see it is moved to our home and contains less numbers then our previous example:
[bt123456@festus01 datastaging]$ cat fb.5788
0
1
1
2
3
5
8
13