[slurm-users] Cgroups not constraining memory & cores

638 views
Skip to first unread message

Sean McGrath

unread,
Nov 8, 2022, 8:13:49 AM11/8/22
to slurm...@lists.schedmd.com
Hi,

I can't get cgroups to constrain memory or cores. If anyone is able to point out what I am doing wrong I would be very grateful please.

Testing:

Request a core and 2G of memory, log into it and compile a binary that just allocates memory quickly:

$ salloc -n 1 --mem=2G
$ ssh $SLURM_NODELIST
$ cat stoopid-memory-overallocation.c
/*
*
* Sometimes you need to over allocate the memory available to you.
* This does so splendidly. I just hope you have limits set to kill it!
*
*/

int main()
{
while(1)
{
void *m = malloc(1024*1024);
memset(m,0,1024*1024);
}
return 0;
}
$ gcc -o stoopid-memory-overallocation.x stoopid-memory-overallocation.c

Checking memory usage before as a baseline:

$ free -g
total used free shared buff/cache available
Mem: 251 1 246 0 3 248
Swap: 7 0 7

Launch the memory over allocation and check memory use subsequently and see that 34G has been allocated when I expect it to be constrained to 2G:

$ ./stoopid-memory-overallocation.x &
$ sleep 10 && free -g
total used free shared buff/cache available
Mem: 251 34 213 0 3 215
Swap: 7 0 7

Run another process to check cpu constraints:

$ ./stoopid-memory-overallocation.x &

Check it with top and I can see that the 2 processes are running simultaneously:

$ top
top - 13:04:44 up 13 days, 23:39, 2 users, load average: 0.63, 0.27, 0.11
Tasks: 525 total, 3 running, 522 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.7 us, 5.5 sy, 0.0 ni, 93.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 257404.1 total, 181300.3 free, 72588.6 used, 3515.2 buff/cache
MiB Swap: 8192.0 total, 8192.0 free, 0.0 used. 183300.3 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
120978 smcgrat 20 0 57.6g 57.6g 968 R 100.0 22.9 0:22.63 stoopid-memory-
120981 smcgrat 20 0 11.6g 11.6g 952 R 100.0 4.6 0:04.57 stoopid-memory-
...

Is this actually a valid test case or am I doing something else wrong?

Thanks

Sean

Setup details:

Ubuntu 20.04.5 LTS (Focal Fossa).
slurm 21.08.8-2.
cgroup-tools version 0.41-10 installed.

The following was set in /etc/default/grub and update-grub run:

GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1"

Relevant parts of scontrol show conf

JobAcctGatherType = jobacct_gather/none
ProctrackType = proctrack/cgroup
TaskPlugin = task/cgroup
TaskPluginParam = (null type)


The contents of the full slurm.conf

ClusterName=neuro
SlurmctldHost=neuro01(192.168.49.254)
AuthType=auth/munge
CommunicationParameters=block_null_hash
CryptoType=crypto/munge
Epilog=/home/support/slurm/etc/slurm.epilog.local
EpilogSlurmctld=/home/support/slurm/etc/slurm.epilogslurmctld
JobRequeue=0
MaxJobCount=30000
MpiDefault=none
Prolog=/home/support/slurm/etc/prolog
ReturnToService=2
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmUser=root
StateSaveLocation=/var/slurm_state/neuro
SwitchType=switch/none
TaskPlugin=task/cgroup
ProctrackType=proctrack/cgroup
RebootProgram=/sbin/reboot
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=300
SlurmdTimeout=300
Waittime=0
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core
AccountingStorageHost=service01
AccountingStorageType=accounting_storage/slurmdbd
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm.log
SlurmdDebug=3
SlurmdLogFile=/var/log/slurm.log
DefMemPerNode=257300
MaxMemPerNode=257300
NodeName=neuro-n01-mgt RealMemory=257300 Sockets=2 CoresPerSocket=16 State=UNKNOWN
NodeName=neuro-n02-mgt RealMemory=257300 Sockets=2 CoresPerSocket=16 State=UNKNOWN
PartitionName=compute Nodes=ALL Default=YES MaxTime=5760 State=UP Shared=YES


cgroup.conf file contents:

CgroupAutomount=yes
ConstrainCores=yes
ConstrainRAMSpace=yes
TaskAffinity=no

Sean Maxwell

unread,
Nov 8, 2022, 8:34:14 AM11/8/22
to Slurm User Community List
Hi Sean,

I don't see PrologFlags=Contain in your slurm.conf. It is one of the entries required to activate the cgroup containment: https://slurm.schedmd.com/cgroup.conf.html#OPT_/etc/slurm/slurm.conf

Best,

-Sean

Sean McGrath

unread,
Nov 11, 2022, 11:48:46 AM11/11/22
to Slurm User Community List
Hi,

Many thanks for that pointer Sean. I had missed the PrologFlags=Contain setting so have added it to slurm.conf now.

I've also explicitly built slurm with pam support:

../configure --sysconfdir=/home/support/pkgs/slurm/etc --prefix=/home/support/pkgs/slurm/ubuntu_20.04/21.08.8-2 --localstatedir=/var/run/slurm --enable-pam && make && make install install

It appears to me as if slurm tasks are launching within cgroups.

E.g. if I do

srun --mem=100 sleep 300&

And login to the node I can see memory limits for cgroups:

$ cat /sys/fs/cgroup/memory/slurm/uid_5446/memory.limit_in_bytes
9223372036854771712
$ cat /sys/fs/cgroup/memory/slurm/uid_5446/job_24/memory.limit_in_bytes
269907656704
$ cat /sys/fs/cgroup/memory/slurm/uid_5446/job_24/step_0/memory.limit_in_bytes
269907656704

But if I do this to over allocate memory it still allows me to:

srun --mem=100 stoopid-memory-overallocation.x

More memory is being allocated by the node than should be allowed.

I'm clearly doing something wrong here. Can anyone point out what it is please? Am I just using the wrong test methodology?

Thanks in advance

Sean

Sean Maxwell

unread,
Nov 11, 2022, 12:12:57 PM11/11/22
to Slurm User Community List
Hi Sean,

A couple ideas:

1) In your original cgroups.conf you have "TaskAffinity=no", but I'm not aware of that parameter for cgroups.conf and cannot find it documented. You may want to remove it.
2) Also in cgroups.conf, you may want to try adding "ConstrainSwapSpace=yes" so that the process cannot use swap in addition to RAM.

Hopefully one of those will resolve the issue so that the jobs end with OOM.

Best,

-Sean
Reply all
Reply to author
Forward
0 new messages