[slurm-users] Reserving resources for use by non-slurm stuff

Shooktija S N via slurm-users

unread,

Apr 17, 2024, 7:45:00 AMApr 17

to slurm...@lists.schedmd.com

Hi, I am running Slurm (v22.05.8) on 3 nodes each with the following specs:

OS: Proxmox VE 8.1.4 x86_64 (based on Debian 12)

CPU: AMD EPYC 7662 (128)

GPU: NVIDIA GeForce RTX 4070 Ti

Memory: 128 Gb

This is /etc/slurm/slurm.conf on all 3 computers without the comment lines:

ClusterName=DlabCluster
SlurmctldHost=server1
GresTypes=gpu
ProctrackType=proctrack/linuxproc
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=root
StateSaveLocation=/var/spool/slurmctld
TaskPlugin=task/affinity,task/cgroup
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
SchedulerType=sched/backfill
SelectType=select/cons_tres
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
SlurmctldDebug=debug3
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=debug3
SlurmdLogFile=/var/log/slurmd.log
NodeName=server[1-3] RealMemory=128636 Sockets=1 CoresPerSocket=64 ThreadsPerCore=2 State=UNKNOWN Gres=gpu:1
PartitionName=mainPartition Nodes=ALL Default=YES MaxTime=INFINITE State=UP

I want to reserve a few cores and a few gigs of RAM for use only by the OS which cannot be accessed by jobs being managed by Slurm. What configuration do I need to do to achieve this?

Is it possible to reserve in a similar fashion a 'percent' of the GPU which Slurm cannot exceed so that the OS has some GPU resources?

Is it possible to have these configs be different for each of the 3 nodes?

Thanks!

Sean Maxwell via slurm-users

unread,

Apr 17, 2024, 10:04:48 AMApr 17

to Shooktija S N, slurm...@lists.schedmd.com

Hi Shooktija,

On Wed, Apr 17, 2024 at 7:45 AM Shooktija S N via slurm-users <slurm...@lists.schedmd.com> wrote:

NodeName=server[1-3] RealMemory=128636 Sockets=1 CoresPerSocket=64 ThreadsPerCore=2 State=UNKNOWN Gres=gpu:1
PartitionName=mainPartition Nodes=ALL Default=YES MaxTime=INFINITE State=UP

I want to reserve a few cores and a few gigs of RAM for use only by the OS which cannot be accessed by jobs being managed by Slurm. What configuration do I need to do to achieve this?

You want to look at these parameters for the Node section of slurm.conf

https://slurm.schedmd.com/slurm.conf.html#OPT_CoreSpecCount
https://slurm.schedmd.com/slurm.conf.html#OPT_MemSpecLimit

Is it possible to reserve in a similar fashion a 'percent' of the GPU which Slurm cannot exceed so that the OS has some GPU resources?

Not that I know of

Is it possible to have these configs be different for each of the 3 nodes?

Yes. You will need to define the nodes using 3 separate Node definitions versus one definition for all 3

Best,

-Sean

Paul Raines via slurm-users

unread,

Apr 17, 2024, 4:10:57 PMApr 17

to slurm-users

On a single Rocky8 workstation with one GPU where we wanted ssh
interactive logins to it to have a small portion of its resources (shell,
compiling, simple data manipulations, console desktop, etc) and the rest
for SLURM we did this:

- Set it to use cgroupv2
* modify /etc/defaultg/grub to add systemd.unified_cgroup_hierarchy=1
to GRUB_CMDLINE_LINUX. Remake grub with grub2-mkconfig
* create file /usr/etc/cgroup_cpuset_init with the lines

#!/bin/bash
echo "+cpuset" >> /sys/fs/cgroup/cgroup.subtree_control
echo "+cpuset" >> /sys/fs/cgroup/system.slice/cgroup.subtree_control

* Modify/create /etc/systemd/system/slurmd.service.d/override.conf
so it has:

[Service]
ExecStartPre=-/usr/etc/cgroup_cpuset_init

- figure out exact cores to use for "free user" use and cores for SLURM.
Also use GPU sharding in SLURM so GPU can be shared.

* install hwloc-ls
* run 'hwloc-ls' to tranlate physical cores 0-9 to logical cores
For me P 0-9 was Logical 0,2,4,6,8,10,12,14,16,18
* in /etc/slurm.conf the NodeName definition has

CPUs=128 Boards=1 SocketsPerBoard=1 CoresPerSocket=64 ThreadsPerCore=2 \
RealMemory=257267 MemSpecLimit=20480 \
CpuSpecList=0,2,4,6,8,10,12,14,16,18 \
TmpDisk=6000000 Gres=gpu:nvidia_a2:1,shard:nvidia_a2:32

reserving those 10 cores and 20GB of RAM for "free user"

* gres.conf has the lines:

AutoDetect=nvml
Name=shard Count=32

* Need to add gres/shard to GresTypes= too. Job submissions use
the option --gres=shard:N where N is less than 32

- Set up systemd to restrict "free users" to cores 0-9 and the 20GB

* Run: systemctl set-property user.slice MemoryHigh=20480M
* Run for every individual user on the system

systemctl set-property user-$uid.slice AllowedCPUs=0-9

where $uid is that users user ID. We do this in a script
that also runs sacctmgr to add them to the SLURM system

I could not just set this one for user.slice itself which is what I
first tried because it then restricted the root user too and that
cause wierd behavior with a lot of system tools. So far the
root/daemon process work fine in the 20GB limit though so that
MemoryHigh=20480M is one and done

Then reboot.

-- Paul Raines (http://help.nmr.mgh.harvard.edu)

The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline <https://www.massgeneralbrigham.org/complianceline> .
Please note that this e-mail is not secure (encrypted). If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately. Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail.

--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com

Reply all

Reply to author

Forward