Slurm commands and options
srun/sbatch
The following options are intended to be used with sbatch or srun.
Take a look at FAQ-Page
resource options
option | purpose | examples |
---|---|---|
--ntasks=<count> |
set number of tasks for this job | --ntasks=256 |
--ntasks-per-node=<count> |
set number of tasks per node | --ntasks-per-node=128 |
--cpus-per-task=<count> |
set the number of cpus per task (default value: 1) |
--cpus-per-task=2 |
--contstraint=<feature> |
only use nodes which have this feature | --constraint=HighMem |
--exclude=<nodenames> |
do not use nodes passed also a path to a list may specified |
--exclude=r11n01 --exclude=./exlude-list.txt |
--nodelist=<nodenames> |
only use nodes passed also a path to a list may specified |
--nodelist=r11n01 --exclude=./nodelist.txt |
--gres=<name>[[:type]:count] |
generic resourece specifier (per node) | any gpu --gres=gpu:1 2 h100 gpus ----gres=gpu:h100:2 |
timing options
option |
purpose |
example |
---|---|---|
--deadline=<TIME> |
Remove job if no ending is possible before this deadline. Timeformats: YYYY-MM-DD[THH:MM[:SS]] HH:MM[:SS] [AM|PM] |
end before 10th jan 13h:--deadline=2024-01-10T13:00 |
--beginn=<TIME> |
Defer joballocation until the specified time.Timeformats: like –deadline= and “now”+TIME |
begin 16:00 --begin=16:00 begin an hour after submit time --begin=now+1:00:00 |
--time=<TIME> |
Specify jobs walltime limit. | Set job duration to 9 hours--time=09:00:00 |
other useful options (srun/sbatch)
option | purpose | examples |
---|---|---|
--reservation=<names> |
Allocate resources from named partitions. | --reservation=lscale_test |
--partition=<partition_name> |
Choose a partition to run a job. (May necessary for some resources, like h100.) |
--partition=dev |
signaling options (scancel)
These are option for canceling / signaling jobs
Take a look at advanced section
option | purpose | examples |
---|---|---|
--full |
Pass signal down to jobsteps. | --full |
--jobname=<job_name> |
Restrict the scancel operation to jobs with this job name |
--jobname=RUN_Z |
--signal=<signal_[name|number]> |
Number of the signal to send. (Default: KILL) | --signal=USR1 --signal=10 |
--state=<job_state_name> |
Execute cancel action only on jobs that are in the given state [“PENDING”, “RUNNING” or “SUSPENDED”]. |
--state="PENDING" |