Mixed MPI OpenMP
This example is intended to introduce you to the use of slurm for mixed (distributed-shared) memory programming with OpenMP and MPI.
As always: Our examles are only MWE’s to illustrate most basic principles. Their are not meant to be efficently or well code-styled in any manner.
Prepare
In this example, we want to use intel mpi because thread pinning does not require as much configuration as with openmpi. So let’s load an Intel OneAPI module:
module load OneAPI_2025.0.0
Create a directory for your example and enter it:
mkdir ~/mpiomp.ex
cd ~/mpiomp.ex
Codes
C code
The following code demonstrates how to mix mpi and openmp; more precisely how to nest openmp inside mpi. Please copy and save it to mpi_openmp_example.c inside your project directory.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
|
- This is needed to let the sched_getcpu() call work to get the cpu number.
- Start distributed memory parallel region (MPI).
- Start shared memory parallel region (OpenMP).
- Synchronize output.
Now lets compile this with intels mpi-compilerwrapper for intels llvm c compiler (icx):
mpiicx -qopenmp mpi_openmp_example.c -o ./ompmpi.x
submission script
Next step is to write a suitable submissionscript an save it to mpi_openmp_example.submit.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
- It is very important to properly set –nodes=, –ntasks-per-node= and –cpus-per-task=. Please take a look on this cheat sheet and also the FAQ-section.
- Set number of OpenMP processes for each MPIprocess.
- Let each place for a tasks to run on be a core.
- Bind the OpenMP threads as close as possible together (processor level).
submit and run
Now its time to submit our job example:
sbatch mpi_openmp_example.submit
Now, when the job is completed, the output file should look something like this:
HOSNTAME:CPU-CORE <rank>//<thread> (<n threads>) -> <process id>:<system thread-id>
s72b03.festus:128 2//0 (3) -> 1838426:1838426
s72b03.festus:129 2//1 (3) -> 1838426:1838434
s72b03.festus:131 3//0 (3) -> 1838427:1838427
s72b03.festus:130 2//2 (3) -> 1838426:1838437
s72b03.festus:132 3//1 (3) -> 1838427:1838435
s72b03.festus:133 3//2 (3) -> 1838427:1838436
s72b02.festus:129 0//1 (3) -> 1745943:1745949
s72b02.festus:128 0//0 (3) -> 1745943:1745943
s72b02.festus:131 1//0 (3) -> 1745944:1745944
s72b02.festus:130 0//2 (3) -> 1745943:1745951
s72b02.festus:132 1//1 (3) -> 1745944:1745950
s72b02.festus:133 1//2 (3) -> 1745944:1745952
As you can see, each node has started two of 4 mpi processes (rank), each of which has started its own 3 threads.