[slurm-users] Q about setting up CPU limits

Dj Merrill

unread,

Sep 22, 2021, 2:58:34 PM9/22/21

to slurm...@lists.schedmd.com

Hi all,

I'm relatively new to Slurm and my Internet searches so far have turned
up lots of examples from the client perspective, but not from the admin
perspective on how to set this up, and I'm hoping someone can point us
in the right direction. This should be pretty simple... :-)

We have a test cluster running Slurm 21.08.1 and are trying to figure
out how to set a limit of 200 CPU cores that can be requested in a
partition. Basically, if someone submits a thousand single CPU core
jobs, it should run 200 of them and the other 800 will wait in the queue
until 1 is finished, then run their next job from the queue, etc, or if
someone has a 180 CPU core job running and they submit a 30 CPU core
job, it should wait in the queue until the 180 core job finishes. If
someone submits a job requesting 201 CPU cores, it should fail and give
an error.

According to the Slurm resource limits hierarchy, if a partition limit
is set, we should be able to setup a user association to override it in
the case where we might want someone to be able to access 300 CPU cores
in that partition, for example.

I can see in the Slurm documentation how to setup max nodes per
partition, but have not been able to find how to do this with CPU cores.

My questions are:

1) How do we setup a CPU core limit on a partition that applies to all
users?

2) How do we setup a user association to allow a single person to use
more than the default CPU core limit set on the partition?

3) Is there a better way to accomplish this than the method I'm asking?

For reference, Slurm accounting is setup, GPU allocations are working
properly, and I think we are close but just missing something obvious to
setup the CPU core limits.

Thank you,

-Dj

deej.vcf

Carsten Beyer

unread,

Sep 23, 2021, 7:19:36 AM9/23/21

to slurm...@lists.schedmd.com

Hi Dj,

the solution could be in two QOS. We use something similar to restrict
usage of GPU nodes (MaxTresPU=node=2). Examples below are from our
Testcluster.

1) create a QOS with e.g. MaxTresPU=cpu=200 and assign it to your
partition, e.g.

[root@bta0 ~]# sacctmgr -s show qos maxcpu format=Name,MaxTRESPU
      Name     MaxTRESPU
---------- -------------
    maxcpu        cpu=10
[root@bta0 ~]#
[root@bta0 ~]# scontrol show part maxtresputest
PartitionName=maxtresputest
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=maxcpu

If a user submits jobs requesting more cpus his (new) jobs get
'QOSMaxCpuPerUserLimit' in squeue.

kxxxxxx@btlogin1% squeue
             JOBID PARTITION     NAME     USER ST       TIME NODES
NODELIST(REASON)
            125316 maxtrespu maxsubmi kxxxxxx PD 0:00      1
(QOSMaxCpuPerUserLimit)
            125317 maxtrespu maxsubmi kxxxxxx PD 0:00      1
(QOSMaxCpuPerUserLimit)
            125305 maxtrespu maxsubmi kxxxxxx R 0:45      1 btc30
            125306 maxtrespu maxsubmi kxxxxxx R 0:45      1 btc30

2) create a second QOS with Flags=DenyOnLimit,OverPartQoS and
MaxTresPU=400. Assign it to a user that should overcome the limit of 200
cpus, but he will be limited then to 400. That user has to use this QOS,
when submiting new jobs, e.g.

[root@bta0 ~]# sacctmgr -s show qos overpart format=Name,Flags%30,MaxTRESPU
      Name                          Flags     MaxTRESPU
---------- ------------------------------ -------------
overpart        DenyOnLimit,OverPartQOS        cpu=40

Cheers,
Carsten

--
Carsten Beyer
Abteilung Systeme

Deutsches Klimarechenzentrum GmbH (DKRZ)
Bundesstraße 45a * D-20146 Hamburg * Germany

Phone: +49 40 460094-221
Fax: +49 40 460094-270
Email: be...@dkrz.de
URL: http://www.dkrz.de

Geschäftsführer: Prof. Dr. Thomas Ludwig
Sitz der Gesellschaft: Hamburg
Amtsgericht Hamburg HRB 39784

Dj Merrill

unread,

Sep 24, 2021, 4:33:37 PM9/24/21

to slurm...@lists.schedmd.com

Thank you Carsten. I'll take a closer look at the QOS limit approach.

If I'm understanding the documentation correctly, partition limits (non
QOS) are set via the slurm.conf file, and although there are options for
limiting the max number of nodes for a person, and the max cpus per
node, there isn't an option within slurm.conf to limit the max total
number of cpus that someone can use, so my original approach will not work.

The QOS option you mention seems to be the way to do it in order to set
a default limit for everyone on the partition.

The only other approach I can see would be to set an association limit
for every account individually.

Thank you,

-Dj

deej.vcf

Pavel Vashchenkov

unread,

Sep 27, 2021, 7:22:39 AM9/27/21

to slurm...@lists.schedmd.com

Hi all

There are some jobs in queue with message "(Nodes required for job are
DOWN, DRAINED or reserved for jobs in higher priority partitions)"

But there are free nodes in the same time.

My question:
Is there a script to show what the resources are required and why
exactly the queue system does not start the jobs?

I mean, for example, I run this script with ID of job
$ why_job_does_not_start.sh 12345

and it writes:
"There is not enough memory on free nodes" or "There is no enough nodes
to start"

--
Pavel Vashchenkov

Reply all

Reply to author

Forward