[slurm-users] Enforce gpu usage limits (with GRES?)

1,340 views
Skip to first unread message

Analabha Roy

unread,
Feb 1, 2023, 12:13:20 PM2/1/23
to slurm...@lists.schedmd.com
Hi,

I'm new to slurm, so I apologize in advance if my question seems basic.

I just purchased a single node 'cluster' consisting of one 64-core cpu and an nvidia rtx5k gpu (Turing architecture, I think). The vendor supplied it with ubuntu 20.04 and slurm-wlm 19.05.5. Now I'm trying to adjust the config to suit the needs of my department.

I'm trying to bone up on GRES scheduling by reading this manual page, but am confused about some things.

My slurm.conf file has the following lines put in it by the vendor:

###################
# COMPUTE NODES
GresTypes=gpu
NodeName=shavak-DIT400TR-55L CPUs=64 SocketsPerBoard=2 CoresPerSocket=32 ThreadsPerCore=1 RealMemory=95311 Gres=gpu:1
#PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP

PartitionName=CPU Nodes=ALL Default=Yes MaxTime=INFINITE  State=UP

PartitionName=GPU Nodes=ALL Default=NO MaxTime=INFINITE  State=UP
#####################

So they created two partitions that are essentially identical. Secondly, they put just the following line in gres.conf:

###################
NodeName=shavak-DIT400TR-55L      Name=gpu        File=/dev/nvidia0
###################

That's all. However, this configuration does not appear to constrain anyone in any manner. As a regular user, I can still use srun or sbatch to start GPU jobs from the "CPU partition," and nvidia-smi says that a simple cupy script that multiplies matrices and starts as an sbatch job in the CPU partition can access the gpu just fine. Note that the environment variable "CUDA_VISIBLE_DEVICES" does not appear to be set in any job step. I tested this by starting an interactive srun shell in both CPU and GPU partition and running ''echo $CUDA_VISIBLE_DEVICES" and got bupkis for both.


What I need to do is constrain jobs to using chunks of GPU Cores/RAM so that multiple jobs can share the GPU. 

As I understand from the gres manpage, simply adding "AutoDetect=nvml" (NVML should be installed with the NVIDIA HPC SDK, right? I installed it with apt-get...) in gres.conf should allow Slurm to detect the GPU's internal specifications automatically. Is that all, or do I need to config an mps GRES as well? Will that succeed in jailing out the GPU from jobs that don't mention any gres parameters (perhaps by setting CUDA_VISIBLE_DEVICES), or is there any additional config for that? Do I really need that extra "GPU" partition that the vendor put in for any of this, or is there a way to bind GRES resources to a particular partition in such a way that simply launching jobs in that partition will be enough?

Thanks for your attention.
Regards
AR









 



--
Analabha Roy
Assistant Professor
Golapbag Campus, Barddhaman 713104
West Bengal, India

Holtgrewe, Manuel

unread,
Feb 2, 2023, 6:20:21 AM2/2/23
to slurm...@lists.schedmd.com

Hi,


if by "share the GPU" you mean exclusive allocation to a single job then, I believe, you are missing cgroup configuration for isolating access to the GPU.


Below the relevant parts (I believe) of our configuration.


There also is a way of time- and space-slice GPUs but I guess you should get things setup without slicing.


I hope this helps.


Manuel


==> /etc/slurm/cgroup.conf <==
# https://bugs.schedmd.com/show_bug.cgi?id=3701
CgroupMountpoint="/sys/fs/cgroup"
CgroupAutomount=yes
AllowedDevicesFile="/etc/slurm/cgroup_allowed_devices_file.conf"

==> /etc/slurm/cgroup_allowed_devices_file.conf <==
/dev/null
/dev/urandom
/dev/zero
/dev/sda*
/dev/cpu/*/*
/dev/pts/*
/dev/nvidia*

==> /etc/slurm/slurm.conf <==

ProctrackType=proctrack/cgroup

# Memory is enforced via cgroups, so we should not do this here by [*]
#
# /etc/slurm/cgroup.conf: ConstrainRAMSpace=yes
#
# [*] https://bugs.schedmd.com/show_bug.cgi?id=5262
JobAcctGatherParams=NoOverMemoryKill

TaskPlugin=task/cgroup

JobAcctGatherType=jobacct_gather/cgroup


--
Dr. Manuel Holtgrewe, Dipl.-Inform.
Bioinformatician
Core Unit Bioinformatics – CUBI
Berlin Institute of Health / Max Delbrück Center for Molecular Medicine in the Helmholtz Association / Charité – Universitätsmedizin Berlin

Visiting Address: Invalidenstr. 80, 3rd Floor, Room 03 028, 10117 Berlin
Postal Address: Chariteplatz 1, 10117 Berlin

E-Mail: manuel.h...@bihealth.de
Phone: +49 30 450 543 607
Fax: +49 30 450 7 543 901
Web: cubi.bihealth.org  www.bihealth.org  www.mdc-berlin.de  www.charite.de

From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of Analabha Roy <harise...@gmail.com>
Sent: Wednesday, February 1, 2023 6:12:40 PM
To: slurm...@lists.schedmd.com
Subject: [ext] [slurm-users] Enforce gpu usage limits (with GRES?)
 

Analabha Roy

unread,
Feb 2, 2023, 12:52:40 PM2/2/23
to Slurm User Community List
Hi,

Thanks for the reply. Yes, your advice helped! Much obliged. Not only was cgroups config necessary, but the option

ConstrainDevices=yes

in cgroup.conf was necessary to enforce the gpu gres. Now, not adding a gres parameter to srun causes gpu jobs to fail. An improvement!

Although, I still can't keep out gpu jobs from the "CPU" partition. Is there a way to link a partition to a GRES or something?

Alternatively, can I define two nodenames in slurm.conf that point to the same physical node, but only one of them has the gpu GRES? That way, I can link the GPU partition to the gres-configged nodename only.

Thanks in advance,
AR

PS: If the slurm devs are reading this, may I suggest that perhaps it would be a good idea to add a reference to cgroups in the gres documentation page?






 

Markus Kötter

unread,
Feb 3, 2023, 3:05:56 AM2/3/23
to slurm...@lists.schedmd.com
Hi,


limits ain't easy.

> https://support.ceci-hpc.be/doc/_contents/SubmittingJobs/SlurmLimits.html#precedence


I think there is multiple options, starting with not having GPU
resources in the CPU partition.

Or creating qos the partition and have
MaxTRES=gres/gpu:A100=0,gres/gpu:K80=0,gres/gpu=0
attaching it to the CPU partition.

And the configuration will require some values as well,

# slurm.conf
AccountingStorageEnforce=associations,limits,qos,safe
AccountingStorageTRES=gres/gpu,gres/gpu:A100,gres/gpu:K80

# cgroups.conf
ConstrainDevices=yes

most likely some others I miss.


MfG
--
Markus Kötter, +49 681 870832434
30159 Hannover, Lange Laube 6
Helmholtz Center for Information Security

Analabha Roy

unread,
Feb 4, 2023, 5:08:46 AM2/4/23
to Slurm User Community List

Hi,

Thanks, your advice worked. I used sacctmgr to create a QOS called 'nogpu' and set MaxTRES=gres/gpu=0, then attached it to the cpu partition in slurm.conf as 

PartitionName=CPU Nodes=ALL Default=Yes QOS=nogpu MaxTime=INFINITE  State=UP

And it works! Trying to run gpu jobs in the cpu partition now fails. Qos'es are nice!

Only thing is that the nogpu qos has a priority of 0. Should it be higher?


AR
Reply all
Reply to author
Forward
0 new messages