[slurm-users] MaxMemPerCPU not enforced?

Angel de Vicente

unread,

Jul 24, 2023, 7:21:34 AM7/24/23

to Slurm User Community List

Hello,

I'm trying to get Slurm to control the memory used per CPU, but it does
not seem to enforce the MaxMemPerCPU option in slurm.conf

This is running in Ubuntu 22.04 (cgroups v2), Slurm 23.02.3.

Relevant configuration options:

,----cgroup.conf
| AllowedRAMSpace=100
| ConstrainCores=yes
| ConstrainRAMSpace=yes
| ConstrainSwapSpace=yes
| AllowedSwapSpace=0
`----

,----slurm.conf
| TaskPlugin=task/affinity,task/cgroup
| PrologFlags=X11
|
| SelectType=select/cons_res
| SelectTypeParameters=CR_CPU_Memory,CR_CORE_DEFAULT_DIST_BLOCK
| MaxMemPerCPU=500
| DefMemPerCPU=200
|
| JobAcctGatherType=jobacct_gather/linux
|
| EnforcePartLimits=ALL
|
| NodeName=xxx RealMemory=257756 Sockets=4 CoresPerSocket=8 ThreadsPerCore=1 Weight=1
|
| PartitionName=batch Nodes=duna State=UP Default=YES MaxTime=2-00:00:00 MaxCPUsPerNode=32 OverSubscribe=FORCE:1
| PartitionName=interactive Nodes=duna State=UP Default=NO MaxTime=08:00:00 MaxCPUsPerNode=32 OverSubscribe=FORCE:2
`----

I can ask for an interactive session with 4GB/CPU (I would have thought
that "EnforcePartLimits=ALL" would stop me from doing that), and once
I'm in the interactive session I can execute a 3GB test code without any
issues (I can see with htop that the process does indeed use a RES size
of 3GB at 100% CPU use). Any idea what could be the problem or how to
start debugging this?

,----
| [angelv@xxx test]$ sinter -n 1 --mem-per-cpu=4000
| salloc: Granted job allocation 127544
| salloc: Nodes xxx are ready for job
|
| (sinter) [angelv@xxx test]$ stress -m 1 -t 600 --vm-keep --vm-bytes 3G
| stress -m 1 -t 600 --vm-keep --vm-bytes 3G
| stress: info: [1772392] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
`----

Many thanks,
--
Ángel de Vicente
Research Software Engineer (Supercomputing and BigData)
Tel.: +34 922-605-747
Web.: http://research.iac.es/proyecto/polmag/

GPG: 0x8BDC390B69033F52

Groner, Rob

unread,

Jul 24, 2023, 9:31:15 AM7/24/23

to Slurm User Community List

I'm not sure I can help with the rest, but the EnforcePartLimits setting will only reject a job at submission time that exceeds partition limits, not overall cluster limits. I don't see anything, offhand, in the interactive partition definition that is exceeded by your request for 4 GB/CPU.

Rob

From: slurm-users on behalf of Angel de Vicente
Sent: Monday, July 24, 2023 7:20 AM
To: Slurm User Community List
Subject: [slurm-users] MaxMemPerCPU not enforced?

Matthew Brown

unread,

Jul 24, 2023, 10:21:59 AM7/24/23

to Slurm User Community List

Slurm will allocate more cpus to cover the memory requirement. Use sacct's query fields to compare Requested Resources vs. Allocated Resources:

$ scontrol show part normal_q | grep MaxMem
DefMemPerCPU=1920 MaxMemPerCPU=1920

$ srun -n 1 --mem-per-cpu=4000 --partition=normal_q --account=arcadm hostname
srun: job 1577313 queued and waiting for resources
srun: job 1577313 has been allocated resources
tc095

$ sacct -j 1577313 -o jobid,reqtres%35,alloctres%35
JobID ReqTRES AllocTRES
------------ ----------------------------------- -----------------------------------
1577313 billing=1,cpu=1,mem=4000M,node=1 billing=3,cpu=3,mem=4002M,node=1
1577313.ext+ billing=3,cpu=3,mem=4002M,node=1
1577313.0 cpu=3,mem=4002M,node=1

From the Slurm manuals (eg. man srun):

--mem-per-cpu=<size>[units]

Minimum memory required per allocated CPU. ... Note that if the job's --mem-per-cpu value exceeds the configured MaxMemPerCPU, then the user's limit will be treated as a memory limit per task

Angel de Vicente

unread,

Jul 24, 2023, 5:53:34 PM7/24/23

to Matthew Brown, Slurm User Community List

Hello,

Matthew Brown <brow...@vt.edu> writes:

> Minimum memory required per allocated CPU. ... Note that if the job's
> --mem-per-cpu value exceeds the configured MaxMemPerCPU, then the
> user's limit will be treated as a memory limit per task

Ah, thanks, I should've read the documentation more carefully.

From my limited tests today, somehow in the interactive queue all seems
OK now, but not so in the 'batch' queue. For example, I just submitted
three jobs with different amount of CPUs per job (4, 8 and 16 processes
respectively). MaxMemPerCPU is set to 2GB, and these jobs run the
'stress' command, consuming 3GB per process.

,----
| [user@xxx test]$ squeue
| JOBID PARTITION NAME USER ST TIME TIME_LIMIT CPUS QOS ACCOUNT NODELIST(REASON)
| 127564 batch test user R 9:25 15:00 16 normal ddgroup xxx
| 127562 batch test user R 9:25 15:00 4 normal ddgroup xxx
| 127563 batch test user R 9:25 15:00 8 normal ddgroup xxx
`----

It looks like Slurm is trying to kill the jobs, but somehow not all the
processes die (as you can see below, 2 out of the 4 processes in job
127562 are still there after 9 minutes, 3 of the 8 proceeses in job
127563 and 6 of the 16 processes in job 127564):

,----
| [user@xxx test]$ ps -fea | grep stress
| user 1853317 1853314 0 22:35 ? 00:00:00 stress -m 16 -t 600 --vm-keep --vm-bytes 3G
| user 1853319 1853317 66 22:35 ? 00:06:17 stress -m 16 -t 600 --vm-keep --vm-bytes 3G
| user 1853320 1853317 65 22:35 ? 00:06:11 stress -m 16 -t 600 --vm-keep --vm-bytes 3G
| user 1853321 1853317 65 22:35 ? 00:06:11 stress -m 16 -t 600 --vm-keep --vm-bytes 3G
| user 1853328 1853317 65 22:35 ? 00:06:12 stress -m 16 -t 600 --vm-keep --vm-bytes 3G
| user 1853329 1853317 65 22:35 ? 00:06:12 stress -m 16 -t 600 --vm-keep --vm-bytes 3G
| user 1853338 1853337 0 22:35 ? 00:00:00 stress -m 8 -t 600 --vm-keep --vm-bytes 3G
| user 1853340 1853338 68 22:35 ? 00:06:32 stress -m 8 -t 600 --vm-keep --vm-bytes 3G
| user 1853341 1853338 69 22:35 ? 00:06:34 stress -m 8 -t 600 --vm-keep --vm-bytes 3G
| user 1853347 1853316 0 22:35 ? 00:00:00 stress -m 4 -t 600 --vm-keep --vm-bytes 3G
| user 1853350 1853347 68 22:35 ? 00:06:29 stress -m 4 -t 600 --vm-keep --vm-bytes 3G
| user 1854560 1511070 0 22:45 pts/2 00:00:00 grep stress
`----

And these processes are truly using 3GB:

,----
| [user@xxx test]$ ps -v 1853319
| PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
| 1853319 ? R 6:25 8642 11 3149428 3146040 1.1 stress -m 16 -t 600 --vm-keep --vm-bytes 3G
`----

Any idea how to solve/debug this?

Angel de Vicente

unread,

Jul 25, 2023, 4:13:21 AM7/25/23

to Slurm User Community List

Hello,

Angel de Vicente <angel.de...@iac.es> writes:

> From my limited tests today, somehow in the interactive queue all seems
> OK now, but not so in the 'batch' queue. For example, I just submitted
> three jobs with different amount of CPUs per job (4, 8 and 16 processes
> respectively). MaxMemPerCPU is set to 2GB, and these jobs run the
> 'stress' command, consuming 3GB per process.

OK, this was again me reading the documentation too quickly, as this is
the expected behaviour as per the NOTE in ConstrainRAMSpace
(cgroup.conf).

[I was led off track by the fact that it is behaving differently in
interactive and batch modes: in the interactive tests all the processes
were killed, while in batch mode only some of the processes get the
ax].

Sorry for the noise,

Reply all

Reply to author

Forward