[slurm-users] PropagateResourceLimits

Diego Zuccato

unread,

Apr 22, 2021, 9:06:50 AM4/22/21

to Slurm User Community List

Hello all.

I'd need a clarification about PropagateResourceLimits.
If I set it to NONE, will cgroup still limit the resources a job can use
on the worker node(s), actually decoupling limits on the frontend from
limits on the worker nodes?

I've been bitten by the default being ALL, so when I tried to limit to
1GB soft / 4GB hard the memory users can use on the frontend, the jobs
began to fail at startup even if they requested 200G (that are available
on the worker nodes but not on the frontend)...

Tks.

--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

Ryan Novosielski

unread,

Apr 22, 2021, 10:56:27 AM4/22/21

to Slurm User Community List

My recollection is that this parameter is talking about “ulimit” parameters, and doesn’t have to do with cgroups. The documentation is not as clear here as it could be, about what this does, the mechanism by which it’s applied (PAM module), etc.

Sent from my iPhone

> On Apr 22, 2021, at 09:07, Diego Zuccato <diego....@unibo.it> wrote:
>
> Hello all.

Prentice Bisbal

unread,

Apr 27, 2021, 11:31:37 AM4/27/21

to slurm...@lists.schedmd.com

I don't think PAM comes into play here. Since Slurm is starting the
processes on the compute nodes as the user, etc., PAM is being bypassed.

Prentice

Diego Zuccato

unread,

Apr 28, 2021, 2:27:38 AM4/28/21

to Slurm User Community List

Il 27/04/2021 17:31, Prentice Bisbal ha scritto:

> I don't think PAM comes into play here. Since Slurm is starting the
> processes on the compute nodes as the user, etc., PAM is being bypassed.

Then maybe slurmd somehow goes throught the PAM stack another way, since
limits on the frontend got propagated (as implied by
PropagateResourceLimits default value of ALL).
And I can confirm that setting it to NONE seems to have solved the
issue: users on the frontend get limited resources, and jobs on the
nodes get the resources they asked.

Prentice Bisbal

unread,

Apr 29, 2021, 12:36:31 PM4/29/21

to slurm...@lists.schedmd.com

On 4/28/21 2:26 AM, Diego Zuccato wrote:

Il 27/04/2021 17:31, Prentice Bisbal ha scritto:

I don't think PAM comes into play here. Since Slurm is starting the processes on the compute nodes as the user, etc., PAM is being bypassed.

Then maybe slurmd somehow goes throught the PAM stack another way, since limits on the frontend got propagated (as implied by PropagateResourceLimits default value of ALL).
And I can confirm that setting it to NONE seems to have solved the issue: users on the frontend get limited resources, and jobs on the nodes get the resources they asked.

In this case, Slurm is deliberately looking at the resource limits effect when the job is submitted on the submission host, and then copying them the to job's environment. From the slurm.conf documentation (https://slurm.schedmd.com/slurm.conf.html):

PropagateResourceLimits

A comma-separated list of resource limit names. The slurmd daemon uses these names to obtain the associated (soft) limit values from the user's process environment on the submit node. These limits are then propagated and applied to the jobs that will run on the compute nodes.'

Then later on, it indicates that all resource limits are propagated by default:

The following limit names are supported by Slurm (although some options may not be supported on some systems):

ALL

All limits listed below (default)

You should be able to verify this yourself in the following manner:

1. Start two separate shells on the submission host

2. Change the limits in one of the shells. For example, reduce core size to 0, with 'ulimit -c 0' in just one shell.

3. Then run 'srun ulimit -a' from each shell.

4. Compare the output. The one shell should show that core size is now zero.

--

Prentice

Ryan Novosielski

unread,

Apr 29, 2021, 12:55:15 PM4/29/21

to Slurm User Community List

It may not for specifically PropagateResourceLimits – as I said, the docs are a little sparse on the “how” this actually works – but you’re not correct that PAM doesn’t come into play re: user jobs. If you have “UsePam = 1” set, and have an /etc/pam.d/slurm, as our site does, there is some amount of interaction here, and PAM definitely affects user jobs.

Prentice Bisbal

unread,

Apr 29, 2021, 1:22:24 PM4/29/21

to slurm...@lists.schedmd.com

What I said in my last e-mail (which you probably haven't gotten to yet)
is similar to this case. On it's own Slurm wouldn't propagate resource
limits, but that as been added as a function. In your case, Slurm has
functionality built into it where you can tell it to use PAM. With this
functionality built into Slurm and enabled like you have done, Slurm
would bypass PAM.

This is similar to SSH, where you can enable the UsePAM feature.

My reading of the documentation for PropagateResourceLimits, I think
Slurm looks at the limits in the actual environment when the job is
submitted, not in /etc/security/limits.conf via PAM. In my previous
e-mail, I provided a method to test this, but haven't tested this
myself. Yet.

Prentice

Prentice Bisbal

unread,

Apr 29, 2021, 1:41:24 PM4/29/21

to slurm...@lists.schedmd.com

So I decided to eat my own dog food, and tested this out myself. First of all, running ulimit through srun "naked" like that doesn't work, since ulimit is a bash shell builtin, so I had to write a simple shell script:

$ cat ulimit.sh

#!/bin/bash

ulimit -a

By default, core is set to zero in our environment as a good security practice and to keep user's core dumps from filling up the filesystem. My default ulimit settings:

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 128054
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Now I run my ulimit.sh script through srun

$ srun -N1 -n1 -t 00:01:00 --mem=1G ./ulimit.sh
srun: job 1249977 queued and waiting for resources
srun: job 1249977 has been allocated resources
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 257092
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) 1048576
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Now I set core size:

$ ulimit -c 1024
(base) [pbisbal@sunfire01 ulimit]$ ulimit -c
1024

And run ulimit.sh through srun again:

$ srun -N1 -n1 -t 00:01:00 --mem=1G ./ulimit.sh
srun: job 1249978 queued and waiting for resources
srun: job 1249978 has been allocated resources
core file size          (blocks, -c) 1024
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 257092
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) 1048576
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

This confirms that PropagateResourceLimits comes from the user's environment, not PAM. If you have UsePAM enabled as Ryan suggested in a previous e-mail, that puts *upper limits* on the values propagated by PropagateResourceLimits. According to the slurm.conf man age, it doesn't necessarily override the limits set in the environment when the job is submitted:

UsePAM If set to 1, PAM (Pluggable Authentication Modules for Linux)
              will be enabled. PAM is used to establish the upper bounds for
              resource limits. With PAM support enabled, local system adminis‐
              trators can dynamically configure system resource limits. Chang‐
              ing the upper bound of a resource limit will not alter the lim‐
              its of running jobs, only jobs started after a change has been
              made will pick up the new limits. The default value is 0 (not
              to enable PAM support)....

So if I set core file size to 0 and /etc/security/limits.conf sets it to 1024, if UsePAM=1 and PropagateResourceLimits=ALL (the default for that setting), core file size will stay 0. If I set it to 2048 and UsePAM=1, then Slurm will reduce that limit to 1024.

Note that setting UsePAM=1 alone isn't enough. You need to configure a PAM module named slurm, too, as Ryan pointed out.

Prentice

Ole Holm Nielsen

unread,

Apr 29, 2021, 3:31:54 PM4/29/21

to slurm...@lists.schedmd.com

On 29-04-2021 18:54, Ryan Novosielski wrote:
> It may not for specifically PropagateResourceLimits – as I said, the docs are a little sparse on the “how” this actually works – but you’re not correct that PAM doesn’t come into play re: user jobs. If you have “UsePam = 1” set, and have an /etc/pam.d/slurm, as our site does, there is some amount of interaction here, and PAM definitely affects user jobs.

The "UsePAM" parameter seems to be discouraged in slurm.conf, see Tim
Wickberg's reply here: https://bugs.schedmd.com/show_bug.cgi?id=4098#c3

On compute nodes one should probably use pam_slurm_adopt in stead.

I've collected some additional information in
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#pam-module-restrictions

/Ole

Reply all

Reply to author

Forward