[slurm-users] MaxCPUsPerNode Clarification

306 views
Skip to first unread message

Willy Markuske

unread,
Jun 21, 2022, 10:11:32 AM6/21/22
to slurm...@lists.schedmd.com

Hello All,

I'm trying to clarify how the MaxCPUsPerNode can be configured. I'm looking to enable my "cpu" partition to run on our GPU nodes while ensuring there are always some cpus available for the "gpu" partition. I know I can set the "cpu" partition to have a MaxCPUsPerNode less than the number of available cpus on the GPU nodes to do this. However, I don't also want to limit the number of cpus available on a CPU node which doesn't seem possible currently because only a single partition definition can be included in slurm.conf.

The desired configuration would be something like this

Partition  Nodes  #CPUs Available

cpu          cpu-[01-03] 64

cpu          gpu-[01-02] 32

gpu          gpu-[01-02] 64

It doesn't seem possible to set a partition to limit MaxCPUsPerNode on a per node basis. Is the real solution a different partition/QOS to handle this?

Regards,

--

Willy Markuske

HPC Systems Engineer

Research Data Services

P: (619) 519-4435

René Sitt

unread,
Jun 22, 2022, 4:50:44 AM6/22/22
to slurm...@lists.schedmd.com

Hello,

the solution we are currently using on our site is indeed a separate partition; according to your example It'd look like this:

Partition  Nodes  #CPUs Available

cpu          cpu-[01-03] 64

cpu_any   gpu-[01-02] 32 (set with MaxCPUsPerNode=32)

gpu          gpu-[01-02] 64

The trick now is to have CPU-only jobs with <cores_per_node> <= 32 set "--partition=cpu,cpu_any" to signal to the scheduler that they can run in either.
Together with node weights you can then make sure that CPU-only jobs will prefer to fill up the cpu-<xy> nodes first before taking cores form the gpu-<xy> nodes by using the cpu_any partition.

This also opens up the possibility for automatically changing --partition=cpu to --partition=cpu,cpu_any if <cores_per_node> <= 32 via job_submit.lua (a good example to use as a starting template can be found e.g. here: https://gist.github.com/mikerenfro/92d70562f9bb3f721ad1b221a1356de5 - although I'd be careful and test this first, as I cannot say if this is still applicable in unmodified form for current-day SLURM versions)

Regards,
René Sitt

Am 21.06.22 um 16:11 schrieb Willy Markuske:
-- 
Dipl.-Chem. René Sitt
Hessisches Kompetenzzentrum für Hochleistungsrechnen
Philipps-Universität Marburg
Hans-Meerwein-Straße
35032 Marburg

Tel. +49 6421 28 23523
si...@hrz.uni-marburg.de
www.hkhlr.de
Reply all
Reply to author
Forward
0 new messages