[slurm-users] Strange memory limit behavior with --mem-per-gpu

1,108 views
Skip to first unread message

Paul Raines

unread,
Apr 6, 2022, 3:30:42 PM4/6/22
to slurm...@lists.schedmd.com

I have a user who submitted an interactive srun job using:

srun --mem-per-gpu 64 --gpus 1 --nodes 1 ....

From sacct for this job we see:

ReqTRES : billing=4,cpu=1,gres/gpu=1,mem=10G,node=1
AllocTRES : billing=4,cpu=1,gres/gpu=1,mem=64M,node=1

(where 10G I assume comes from the DefMemPerCPU=10240 set in slurm.conf)

Now I think the user here made a mistake and 64M should be way too
little for the job but it is running fine. They may have forgot the
'G' and meant to do 64G

The user submitted two jobs just like this, and both are running on the
same node where I see:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5496 nms88 20 0 521.1g 453.2g 175852 S 100.0 30.0 1110:37 python
5555 nms88 20 0 484.7g 413.3g 182456 S 93.8 27.4 1065:22 python

and if I cd to /sys/fs/cgroup/memory/slurm/uid_5143603/job_1120342
for one of the jobs I see:

# cat memory.limit_in_bytes
1621429846016
# cat memory.usage_in_bytes
744443580416

(the node itself has 1.5TB of RAM total)

So my question is why did SLURM end up running the job this way? Why
was the cgroup limit not 64MB which would have made the job fail
with OOM pretty quickly?

On someone else's job submitted with

srun -N 1 --ntasks-per-node=1 --gpus=1 --mem=128G --cpus-per-task=3 ...

on the node in the memory cgroup I see the expected

# cat memory.limit_in_bytes
137438953472

But I worry it could fail since those other two jobs are essentially
consuming all the memory.

---------------------------------------------------------------
Paul Raines http://help.nmr.mgh.harvard.edu
MGH/MIT/HMS Athinoula A. Martinos Center for Biomedical Imaging
149 (2301) 13th Street Charlestown, MA 02129 USA



The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline <https://www.massgeneralbrigham.org/complianceline> .
Please note that this e-mail is not secure (encrypted). If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately. Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail.


Paul Raines

unread,
Apr 7, 2022, 9:56:51 AM4/7/22
to slurm...@lists.schedmd.com

Basically, it appears using --mem-per-gpu instead of just --mem gives
you unlimited memory for your job.

$ srun --account=sysadm -p rtx8000 -N 1 --time=1-10:00:00
--ntasks-per-node=1 --cpus-per-task=1 --gpus=1 --mem-per-gpu=8G
--mail-type=FAIL --pty /bin/bash
rtx-07[0]:~$ find /sys/fs/cgroup/memory/ -name job_$SLURM_JOBID
/sys/fs/cgroup/memory/slurm/uid_5829/job_1134067
rtx-07[0]:~$ cat /sys/fs/cgroup/memory/slurm/uid_5829/job_1134067/memory.limit_in_bytes
1621419360256

That is a limit of 1.5TB which is all the memory on rtx-07, not
the 8G I effectively asked for at 1 GPU and 8G per GPU.

Using --mem works as normal

$ srun --account=sysadm -p rtx8000 -N 1 --time=1-10:00:00
--ntasks-per-node=1 --cpus-per-task=1 --gpus=1 --mem=8G --mail-type=FAIL
--pty /bin/bash
rtx-07[0]:~$ find /sys/fs/cgroup/memory/ -name job_$SLURM_JOBID
/sys/fs/cgroup/memory/slurm/uid_5829/job_1134068
rtx-07[0]:~$ cat /sys/fs/cgroup/memory/slurm/uid_5829/job_1134068/memory.limit_in_bytes
8589934592

Bjørn-Helge Mevik

unread,
Apr 8, 2022, 4:02:57 AM4/8/22
to slurm...@schedmd.com
Paul Raines <rai...@nmr.mgh.harvard.edu> writes:

> Basically, it appears using --mem-per-gpu instead of just --mem gives
> you unlimited memory for your job.
>
> $ srun --account=sysadm -p rtx8000 -N 1 --time=1-10:00:00
> --ntasks-per-node=1 --cpus-per-task=1 --gpus=1 --mem-per-gpu=8G
> --mail-type=FAIL --pty /bin/bash
> rtx-07[0]:~$ find /sys/fs/cgroup/memory/ -name job_$SLURM_JOBID
> /sys/fs/cgroup/memory/slurm/uid_5829/job_1134067
> rtx-07[0]:~$ cat /sys/fs/cgroup/memory/slurm/uid_5829/job_1134067/memory.limit_in_bytes
> 1621419360256
>
> That is a limit of 1.5TB which is all the memory on rtx-07, not
> the 8G I effectively asked for at 1 GPU and 8G per GPU.

Which version of Slurm is this? We noticed a behaviour similar to this
on Slurm 20.11.8, but when we tested it on 21.08.1, we couldn't
reproduce it. (We also noticed an issue with --gpus-per-task that
appears to have been fixed in 21.08.)

--
B/H
signature.asc

Paul Raines

unread,
Apr 8, 2022, 8:49:00 AM4/8/22
to Bjørn-Helge Mevik, slurm...@schedmd.com

Sorry, should have stated that before. I am running Slurm 20.11.3
on CentOS 8 Stream that I compiled myself back in June 2021.

I will try to arrange an upgrade in the next few weeks.

-- Paul Raines (http://help.nmr.mgh.harvard.edu)
Reply all
Reply to author
Forward
0 new messages