Partitions (Queues) 🔗
The batch or job scheduling system on PALMA-II is called SLURM. If you are used to PBS/Maui and want to switch to SLURM, this document might help you. The job scheduler is used to start and manage computations on the cluster but also to distribute resources among all users depending on their needs. Computation jobs (but also interactive sessions) can be submitted to different queues (or partitions in the SLURM language), which have different hardware, access constraints and scheduling properties.
General purpose public CPU partitions 🔗
These partitions are accessible by all HPC users and can be used for all kinds of CPU-based HPC workloads. The high-core Zen nodes are particularly well suited for SMP (shared-memory multiprocessing) applications. The older Skylake nodes are mostly limited to 36 cores which limits the usefulness for applications that do not make use of MPI.
The number of nodes that are unavailable fluctuates over time as it includes nodes that only temporarily do not accept new jobs due to excessive load caused by existing jobs (typically due to high I/O activity).
Special purpose public CPU partitions 🔗
These partitions are also accessible by all HPC users, but put some constraints on the workloads you can put on them sensibly. In particular this includes the express partition, which is only suitable for very short computations (e.g., test runs), and the requeue partition(s) that allow you to run calculations on the group-exclusive hardware.
If your jobs are running on one of the requeue nodes while they are requested by one of the exclusive group partitions, your job will be terminated and resubmitted, so use with care! Ideally, you should ensure that your application can be restarted without loosing too much progress, e.g., by periodically dumping the current state of your simulation to disk so that you can continue from there.
General purpose public GPU partitions 🔗
These partitions are accessible by all HPC users and can be used for GPU-based HPC workloads.
The GPU nodes are very expensive and we only have a comparatively small number of them. Please use them only for tasks that require significant GPU compute power. We reserve the right to terminate jobs that allocate GPU resources but do not use them.
Special purpose public GPU partitions 🔗
Analogous to the special purpose CPU partitions, there are also equivalent special purpose GPU partitions.
You can allocate a maximum of 1 Job with 2 GPUs, 8 CPU cores and 60G of RAM on this node.
Exclusive partitions 🔗
Finally, we have the exclusive partitions, where certain user groups have priority access.
You can also use scontrol show partition to get information on the partitions directly on the cluster. Even more details can be shown by using sinfo -p <partitionname> or just sinfo.