Submitting jobs ๐
Usually jobs on the cluster are started by submitting a job or batch script to one of the partitions (queues). Slurm will then take care of reserving the correct amount of resources and start the application on the reserved nodes. A job script, in the case of slurm, is a bash script and can be written locally (using your favorite plain text editor) or directly within the console on the cluster (using VIM, Emacs, nano ...). A typical example script named job.sh is given below:
#!/bin/bash
#SBATCH --nodes=1 # the number of nodes you want to reserve
#SBATCH --ntasks-per-node=1 # the number of tasks/processes per node
#SBATCH --cpus-per-task=36 # the number cpus per task
#SBATCH --partition=normal # on which partition to submit the job
#SBATCH --time=24:00:00 # the max wallclock time (time limit your job will run)
#SBATCH --job-name=MyJob123 # the name of your job
#SBATCH --mail-type=ALL # receive an email when your job starts, finishes normally or is aborted
#SBATCH --mail-user=your_account@uni-muenster.de # your mail address
# LOAD MODULES HERE IF REQUIRED
...
# START THE APPLICATION
...
The #!/bin/bash tells the script to use bash. #SBATCH is a slurm directive and is used to configure slurm. Everywhere else the # sign is used to for comments.
You can submit your script to the batch system with the command: sbatch job.sh
MPI parallel Jobs ๐
Start an MPI job with 72 MPI ranks distributed on 2 nodes for 1 hour on the *normal * partition. Instead of mpirun, the preferred command to start MPI jobs within slurm is srun.
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=36
#SBATCH --partition=normal
#SBATCH --time=01:00:00
#SBATCH --job-name=MyMPIJob123
#SBATCH --output=output.dat
#SBATCH --mail-type=ALL
#SBATCH --mail-user=your_account@uni-muenster.de
# load needed modules
module load intel
# Previously needed for Intel MPI (as we do here) - not needed for OpenMPI
# export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so
# run the application
srun /path/to/my/mpi/program
Note that srun here is starting as many tasks as you requested with --ntasks-per-node . It is essentially a substitute for mpirun. Know what you are doing when your use it!
OpenMP parallel Jobs ๐
Start a job on 36 CPUs with 1 threads each for 1 hour on the *normal * partition.
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --cpus-per-task=36
#SBATCH --partition=normal
#SBATCH --time=01:00:00
#SBATCH --job-name=MyMPIJob123
#SBATCH --output=output.dat
#SBATCH --mail-type=ALL
#SBATCH --mail-user=your_account@uni-muenster.de
# Bind each thread to one core
export OMP_PROC_BIND=TRUE
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# load needed modules
module load intel
# run the application
/path/to/my/openmp/program
Hybrid MPI/OpenMP Jobs ๐
Start a job on 2 nodes, 9 MPI tasks per node, 4 OpenMP threads per task.
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=9
#SBATCH --partition=normal
#SBATCH --time=01:00:00
#SBATCH --job-name=MyMPIJob123
#SBATCH --output=output.dat
#SBATCH --mail-type=ALL
#SBATCH --mail-user=your_account@uni-muenster.de
export OMP_NUM_THREADS=4
# load needed modules
module load intel
# run the application
srun /path/to/my/hybrid/program
GPU jobs in gpu2080ย partition ๐
You can use the following submit script to use the gpu2080 Partition (assuming that your code runs on a single GPU). Please be aware that the nodes have 32 CPU cores and 8 GPUs, so please don't use more than 4 cores per GPU! Same for memory, there are 240GB of usable RAM, so don't use more than 30 GB per GPU reserved. Adjust the reserved time according to your needs.
#!/bin/bash
#SBATCH --partition=gpu2080
#SBATCH --nodes=1
#SBATCH --mem=30G
#SBATCH --ntasks-per-node=4
#SBATCH --gres=gpu:1
#SBATCH --time=0-01:00:00
#SBATCH --job-name=MyCUDAJob
#SBATCH --output=output.dat
#SBATCH --error=error.dat
#SBATCH --mail-type=ALL
#SBATCH --mail-user=your_account@uni-muenster.de
# load needed modules module purge
ml palma/2023a
ml CUDA
ml foss
ml UCX-CUDA
ml CMake
# Use UCX to be compatible with Nvidia (formerly Mellanox) Infiniband adapters
export OMPI_MCA_pml=ucx
# run the application using mpirun in this case
./my_program
Hybrid MPI/OpenMP/CUDA Jobs ๐
Start a job on 2 nodes, 2 MPI tasks per node, 4 OpenMP threads per task, 2 GPUs:
#!/bin/bash
#SBATCH --partition=gpu2080
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:2
#SBATCH --job-name=MyMPIOpenMPCUDAJob
#SBATCH --output=output.dat
#SBATCH --error=error.dat
#SBATCH --mail-type=ALL
#SBATCH --mail-user=your_account@uni-muenster.de
export OMP_NUM_THREADS=4
# load needed modules module purge
ml palma/2022a
ml CUDA/11.7.0
ml foss/2022a
ml UCX-CUDA/1.12.1-CUDA-11.7.0
ml CMake/3.23.1
# Use UCX to be compatible with Nvidia (formerly Mellanox) Infiniband adapters
export OMPI_MCA_pml=ucx
# run the application using mpirun in this case
mpirun /path/to/my/hybrid/program
Interactive Jobs ๐
You can request resources from SLURM and it will allocate an interactive shell to the user. On the login node type the following into your shell:
salloc --nodes 1 --cpus-per-task 36 --time 00:30:00 --partition express
This will give you a session with 36 CPUs for 30 minutes on the express partition. You will automatically be forwarded to a compute node.
Flexible Submission Script ๐
If you want to change parameters in your script without actually editing it, you can use command line arguments overwriting the #SBATCH pragmas in the script:
sbatch --cpus-per-task 16 submit_script.sh