Hi all,
When trying to start slurmd, it is failing with cgroup issues. Any suggestions on where to troubleshoot this issue?
x8000c0s0b0n0:~ # slurmd -V
slurm 24.11.0
x8000c0s0b0n0:~ # slurmd -D -vvv
slurmd: debug: Log file re-opened
slurmd: debug2: hwloc_topology_init
slurmd: debug2: hwloc_topology_load
slurmd: debug2: hwloc_topology_export_xml
slurmd: debug: CPUs:288 Boards:1 Sockets:4 CoresPerSocket:72 ThreadsPerCore:1
slurmd: error: Couldn't find the specified plugin name for cgroup/v2 looking at all files
slurmd: error: cannot find cgroup plugin for cgroup/v2
slurmd: error: cannot create cgroup context for cgroup/v2
slurmd: error: Unable to initialize cgroup plugin
slurmd: error: slurmd initialization failed
x8000c0s0b0n0:~ # mount | grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)
x8000c0s0b0n0:~ # grep cgroup /etc/slurm/slurm.conf
ProctrackType=proctrack/cgroup
TaskPlugin=task/affinity,task/cgroup
Thanks!
Jordan
On further inspection I found:
slurmd: debug3: Trying to load plugin /usr/lib64/slurm/cgroup_v2.so
slurmd: debug4: /usr/lib64/slurm/cgroup_v2.so: Does not exist or not a regular file.
Which didn’t exist. So I created the cgroup.conf file:
x8000c0s0b0n0:/etc/slurm # cat cgroup.conf
CgroupPlugin=cgroup/v1
ConstrainCores=yes
ConstrainRAMSpace=yes
AllowedRAMSpace=95
then
mkdir -p /sys/fs/cgroup/freezer
mount -t cgroup -o freezer cgroup /sys/fs/cgroup/freezer
now slurmd can start.
- Jordan
From:
Webb, Jordan via slurm-users <slurm...@lists.schedmd.com>
Date: Monday, January 20, 2025 at 3:59 PM
To: slurm...@schedmd.com <slurm...@schedmd.com>
Subject: [EXTERNAL] [slurm-users] Slurmd cannot find cgroup plugin
Hi all, When trying to start slurmd, it is failing with cgroup issues. Any suggestions on where to troubleshoot this issue? x8000c0s0b0n0: ~ # slurmd -V slurm 24. 11. 0 x8000c0s0b0n0: ~ # slurmd -D -vvv slurmd: debug: Log file re-opened slurmd: