FAQ

What happens when my jobs exceeds the wall time ?

It happens that a job exceeds its wall time. In this case, slurm will send a SIGTERM signal followed by a delayed SIGKILL. Usually, already the SIGTERM leads to the abortion of the job, which often means that the job is simply killed without returning valuable data. In principle, a program can handle the SIGTERM signal (see signal handling) and, e.g., output sufficient information for restarting from the final point. The SIGKILL, however, leads to the inescapable abortion of the job.

Are there recommendations about requesting cpu-resources?

sequential codes

In the case of a sequential program, which is not what the clusters are primarily intended for, one should request a single node with --ntasks=1.

shared memory

If the code is able to run in parallel using OpenMP, one is restricted to a shared memory domain (usually a single node). Best would be to allocate –cpus-per-task instead of –ntasks. Moreover, one must be aware that even if the main memory of a node appears as a single, logical memory, it consists of usually more than one physical memory blocks which are connected by an intra-node network. Therefore, when running an OpenMP job (which runs well on a single-CPU machine) on a node with multiple CPUs and physical memory interfaces without taking special care, one usually ends up with a dramatic loss in performance.

Use --cpus-per-task= to specify the number of cores/threads per process if you run a shared memory programm

distributed memory

For MPI jobs the optimal resources depend on the parallelizability of the code and the interplay between the node-level performance and the speed of the high-performance network in between. In this case, there is often a tradeoff between the width of a job (i.e., the number of nodes) and the required wall time. A wide job requires less time for the computation but usually has a long waiting time in the queue and could lose efficiency if the code does not scale sufficiently.

Use --nstasks= or --ntasks-per-node= to specify the number of processes if you run a distributed memory programm

mix distributed and shared

If you mixing distributed and shared memory programming you should also mix the resource options mentioned above.

Assuming your program should run on two nodes, it should start 2 MPI-processes on each node and each process should start 64 threads; then following ressource reservation is recommended:

#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=32

Why is this important, why not simply asking always for ntasks?

Because ntasks will just allocate n cpus, but a more detailed descritption allows slurm to distribute processes in a much more performant setup. Looking at the above example: slurm is able to pin one mpi-process per socket and then pin the (sub-)threads over the belonging socket. This could result a lot faster inter-thread communication.

What would be a suitable walltime ?

The question about the requested wall time also depends strongly on the job requirements and the requested resources. One should always request a wall time that is sufficient for the job to finish with a sufficient buffer, which requires some experience. However, note that there is an upper limit for the wall time (usually 24 hours). In any case, simply using the wall time limit for every job is strongly discouraged since there are mechanisms (e.g., backfilling) which allow short-duration jobs to bypass wider jobs that are waiting for all requested nodes to become free. Using the wall time limit as default for all jobs prevents the scheduler to properly plan ahead and use techniques like backfilling, which finally results in a non-optimal utilization of resources.

Can I reserve resources independent from a specific job ?

In some cases, eg to run specific tests, it may helpful or necessary to reserv/block a specified set of nodes. In case you need such reservation write a mail at hpc@uni-bayreuth.de.

Why my job is not starting?

There are several reasons that prevent a job from running even if the desired resources are (or seem to be) unoccupied.

The first possible reason is that the job requires resources that exist but are not available (e.g., due to one of the reservations mentioned above).

The second possibility is that there is a wide job in the queue with a higher priority than your own job. In this case slurm will wait for the rest of the resources in order to start the wider job. If the wall time of your own job is larger than it takes to wait for those resources, your job cannot bypass and must wait.

Why I don’t receive a job notification mail ?

The most common cause of this is that the values for --mail-user= and --mail-type= have not been set correctly. Make sure to choose a valid address of the University of Bayreuth (xyz@uni-bayreuth.de) and only specify the mail types which are mentioned in the documentation.