[slurm-users] Fwd: task/cgroup plugin causes "srun: error: task 0 launch failed: Plugin initialization failed" error on Ubuntu 22.04

1,609 views
Skip to first unread message

Tim Schneider

unread,
Jun 15, 2023, 6:04:47 PM6/15/23
to slurm...@lists.schedmd.com
Hi,

I am maintaining the SLURM cluster of my research group. Recently I updated to Ubuntu 22.04 and Slurm 21.08.5 and ever since, I am unable to launch jobs. When launching a job, I receive the following error:

$ srun --nodes=1 --ntasks-per-node=1 -c 1 --mem-per-cpu 1G --time=01:00:00 --pty -p amd -w cn02 --pty bash -i
srun: error: task 0 launch failed: Plugin initialization failed

Strangely, I cannot find any indication of this problem in the logs (find the logs attached). The problem must be related to the task/cgroup plugin, as it does not occur when I disable it.

After reading in the documentation, I tried adding the cgroup_enable=memory swapaccount=1 kernel parameters, but the problem persisted.

I would be very grateful for any advice where to look since I have no idea how to investigate this issue further.

Thanks a lot in advance.

Best,

Tim


cgroup.conf
slurmd.log
slurmctld.log

Reed Dier

unread,
Jun 15, 2023, 8:12:37 PM6/15/23
to Slurm User Community List
I don’t have any direct advice off-hand, but I figure I will try to help steer the conversation in the right direction for figuring it out.

I’m going to assume that since you mention 21.08.5, that this means you are using the slurm-wlm packages from the ubuntu repos, and not building yourself?

And have all the components (slurmctld(s), slurmdbd, slurmd(s)) been upgraded as well?

The only thing that immediately comes to mind is that I remember reading a good bit about Ubuntu 22.04’s use of cgroups v2, which as I understand it are very different from cgroups v1, and plenty of people have had issues with v1/v2 mismatches with slurm and other applications.


Hope that at least steers the conversation in a good direction.

Reed

<cgroup.conf><slurmd.log><slurmctld.log>

abel pinto

unread,
Jun 15, 2023, 9:28:58 PM6/15/23
to Slurm User Community List
Indeed, the issue seems to be that Ubuntu 22.04 does not support cgroups v1 anymore. Does SLURM support cgroupsv2? It seems so: https://slurm.schedmd.com/cgroup_v2.html

/Abel

On Jun 15, 2023, at 20:20, Reed Dier <reed...@focusvq.com> wrote:

I don’t have any direct advice off-hand, but I figure I will try to help steer the conversation in the right direction for figuring it out.

Tim Schneider

unread,
Jun 17, 2023, 4:08:56 PM6/17/23
to slurm...@lists.schedmd.com

Hi,

I just want to wrap this up in case someone has the same issue in the future.

As Reed pointed out, Ubuntu 22 does not support cgroups v1 anymore. At the same time, the slurm-wlm package in the Ubuntu repositories uses cgroups v1, which makes its task/cgroup plugin incompatible with Ubuntu 22.

My solution was to build Slurm 22.05 manually, while ensuring that libdbus-1-dev is installed (as otherwise cgroups v2 support does not get built). This takes a bit more time but seems to work so far.

Thanks a lot Reed & Abel for your advice!

Best,

Tim

On 6/16/23 10:42, Tim Schneider wrote:

Hi again,

I just realized that https://groups.google.com/g/slurm-users/c/0dJhe5r6_2Q?pli=1 wrote at some point that he build Slurm 22 instead of using the Ubuntu repo version. So I guess I will have to look into that.

Best,

Tim

On 6/16/23 10:36, Tim Schneider wrote:

Hi Abel and Reed,

thanks a lot for your quick replies!

I did indeed just install slurm-wlm from the Ubuntu repos.

Following the advice of https://groups.google.com/g/slurm-users/c/0dJhe5r6_2Q?pli=1, I tried disabling cgroups v1 on Ubuntu, but that just leads to an error during startup of slurmd:

slurmd: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/proctrack_cgroup.so
slurmd: error: unable to mount freezer cgroup namespace: Invalid argument
slurmd: error: unable to create freezer cgroup namespace
slurmd: error: Couldn't load specified plugin name for proctrack/cgroup: Plugin init() callback failed
slurmd: error: cannot create proctrack context for proctrack/cgroup
slurmd: error: slurmd initialization failed

So it seems that slurmd is using cgroups v1. This is also reflected in the mounts (for the output below, cgroups v1 is enabled again):

$ mount | grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)

What is still confusing to me is that the slurmd logs indicate no error when I try running with cgroups v1 enabled and the error only appears on the slurmctld side.

Do you know how I can enable cgroups v2 in Slurm? To me it seems that this is what https://groups.google.com/g/slurm-users/c/0dJhe5r6_2Q?pli=1 did.

Best,

Tim

On 6/16/23 03:28, abel pinto wrote:
Indeed, the issue seems to be that Ubuntu 22.04 does not support cgroups v1 anymore. Does SLURM support cgroupsv2? It seems so: https://slurm.schedmd.com/cgroup_v2.html

/Abel

On Jun 15, 2023, at 20:20, Reed Dier <reed...@focusvq.com> wrote:

I don’t have any direct advice off-hand, but I figure I will try to help steer the conversation in the right direction for figuring it out.

Tim Schneider

unread,
Jun 17, 2023, 4:09:42 PM6/17/23
to slurm...@lists.schedmd.com
I don’t have any direct advice off-hand, but I figure I will try to help steer the conversation in the right direction for figuring it out.
Reply all
Reply to author
Forward
0 new messages