[slurm-users] Slurm versions 23.02.6 and 22.05.10 are now available (CVE-2023-41914)

729 views
Skip to first unread message

Tim Wickberg

unread,
Oct 11, 2023, 4:02:30 PM10/11/23
to slurm-a...@schedmd.com, slurm...@schedmd.com
Slurm versions 23.02.6 and 22.05.10 are now available to address a
number of filesystem race conditions that could let an attacker take
control of an arbitrary file, or remove entire directories' contents
(CVE-2023-41914).

SchedMD customers were informed on September 27th and provided a patch
on request; this process is documented in our security policy [1].

--------
CVE-2023-41914:

A number of race conditions have been identified within the
slurmd/slurmstepd processes that can lead to the user taking ownership
of an arbitrary file on the system. A related issue can lead to the user
overwriting an arbitrary file on the compute node (although with data
that is not directly under their control). A related issue can also lead
to the user deleting all files and sub-directories of an arbitrary
target directory on the compute node.

Thank you to François Diakhate (CEA) for reporting the original issue to
us. A number of related issues were found during an extensive audit of
Slurm's filesystem handling code in reaction to that report, and are
included here in this same disclosure.
--------

SchedMD only issues security fixes for the supported releases (currently
23.02 and 22.05). Due to the complexity of these fixes, we do not
recommend attempting to backport the fixes to older releases, and
strongly encourage sites to upgrade to fixed versions immediately.

Downloads are available at https://www.schedmd.com/downloads.php .

Release notes follow below.

- Tim

[1] https://www.schedmd.com/security.php

--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support

> * Changes in Slurm 23.02.6
> ==========================
> -- Fix CpusPerTres= not upgreadable with scontrol update
> -- Fix unintentional gres removal when validating the gres job state.
> -- Fix --without-hpe-slingshot configure option.
> -- Fix cgroup v2 memory calculations when transparent huge pages are used.
> -- Fix parsing of sgather --timeout option.
> -- Fix regression from 22.05.0 that caused srun --cpu-bind "=verbose" and "=v"
> options give different CPU bind masks.
> -- Fix "_find_node_record: lookup failure for node" error message appearing
> for all dynamic nodes during reconfigure.
> -- Avoid segfault if loading serializer plugin fails.
> -- slurmrestd - Correct OpenAPI format for 'GET /slurm/v0.0.39/licenses'.
> -- slurmrestd - Correct OpenAPI format for 'GET /slurm/v0.0.39/job/{job_id}'.
> -- slurmrestd - Change format to multiple fields in 'GET
> /slurmdb/v0.0.39/assocations' and 'GET /slurmdb/v0.0.39/qos' to handle
> infinite and unset states.
> -- When a node fails in a job with --no-kill, preserve the extern step on the
> remaining nodes to avoid breaking features that rely on the extern step
> such as pam_slurm_adopt, x11, and job_container/tmpfs.
> -- auth/jwt - Ignore 'x5c' field in JWKS files.
> -- auth/jwt - Treat 'alg' field as optional in JWKS files.
> -- Allow job_desc.selinux_context to be read from the job_submit.lua script.
> -- Skip check in slurmstepd that causes a large number of errors in the munge
> log: "Unauthorized credential for client UID=0 GID=0". This error will
> still appear on slurmd/slurmctld/slurmdbd start up and is not a cause for
> concern.
> -- slurmctld - Allow startup with zero partitions.
> -- Fix some mig profile names in slurm not matching nvidia mig profiles.
> -- Prevent slurmscriptd processing delays from blocking other threads in
> slurmctld while trying to launch {Prolog|Epilog}Slurmctld.
> -- Fix sacct printing ReqMem field when memory doesn't exist in requested TRES.
> -- Fix how heterogenous steps in an allocation with CR_PACK_NODE or -mpack are
> created.
> -- Fix slurmctld crash from race condition within job_submit_throttle plugin.
> -- Fix --with-systemdsystemunitdir when requesting a default location.
> -- Fix not being able to cancel an array task by the jobid (i.e. not
> <jobid>_<taskid>) through scancel, job launch failure or prolog failure.
> -- Fix cancelling the whole array job when the array task is the meta job and
> it fails job or prolog launch and is not requeable. Cancel only the
> specific task instead.
> -- Fix regression in 21.08.2 where MailProg did not run for mail-type=end for
> jobs with non-zero exit codes.
> -- Fix incorrect setting of memory.swap.max in cgroup/v2.
> -- Fix jobacctgather/cgroup collection of disk/io, gpumem, gpuutil TRES values.
> -- Fix -d singleton for heterogeneous jobs.
> -- Downgrade info logs about a job meeting a "maximum node limit" in the
> select plugin to DebugFlags=SelectType. These info logs could spam the
> slurmctld log file under certain circumstances.
> -- prep/script - Fix [Srun|Task]<Prolog|Epilog> missing SLURM_JOB_NODELIST.
> -- gres - Rebuild GRES core bitmap for nodes at startup. This fixes error:
> "Core bitmaps size mismatch on node [HOSTNAME]", which causes jobs to enter
> state "Requested node configuration is not available".
> -- slurmctd - Allow startup with zero nodes.
> -- Fix filesystem handling race conditions that could lead to an attacker
> taking control of an arbitrary file, or removing entire directories'
> contents. CVE-2023-41914.

> * Changes in Slurm 22.05.10
> ===========================
> -- Fix filesystem handling race conditions that could lead to an attacker
> taking control of an arbitrary file, or removing entire directories'
> contents. CVE-2023-41914.

Taras Shapovalov

unread,
Oct 12, 2023, 2:37:59 AM10/12/23
to slurm-a...@schedmd.com, slurm...@schedmd.com, Slurm User Community List
Are the older versions affected as well?


Best regards,

Taras

From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of Tim Wickberg <t...@schedmd.com>
Sent: Thursday, October 12, 2023 00:01
To: slurm-a...@schedmd.com <slurm-a...@schedmd.com>; slurm...@schedmd.com <slurm...@schedmd.com>
Subject: [slurm-users] Slurm versions 23.02.6 and 22.05.10 are now available (CVE-2023-41914)
 
External email: Use caution opening links or attachments

Bjørn-Helge Mevik

unread,
Oct 12, 2023, 3:56:53 AM10/12/23
to slurm...@schedmd.com
Taras Shapovalov <tshap...@nvidia.com> writes:

> Are the older versions affected as well?

Yes, all older versjons are affected.

--
B/H
signature.asc

Taras Shapovalov

unread,
Oct 13, 2023, 6:22:44 AM10/13/23
to Slurm User Community List
Oh, does this mean that no one should use Slurm versions <= 21.08 any more?

Best regards,

Taras



From: slurm-users on behalf of Bjørn-Helge Mevik
Sent: Thursday, October 12, 2023 11:56
To: slurm...@schedmd.com
Subject: Re: [slurm-users] Slurm versions 23.02.6 and 22.05.10 are now available (CVE-2023-41914)

Ole Holm Nielsen

unread,
Oct 13, 2023, 6:42:34 AM10/13/23
to slurm...@lists.schedmd.com
On 10/13/23 12:22, Taras Shapovalov wrote:
> Oh, does this mean that no one should use Slurm versions <= 21.08 any more?

SchedMD recommends to use the currently supported versions (currently
22.05 or 23.02). Next month 23.11 will be released and 22.05 will become
unsupported.

The question for sites is whether they can accept running software that
contains known security holes? That goes for Slurm as well as all other
software such as the Linux kernel etc. etc. We don't yet know the CVE
score for CVE-2023-41914, but SchedMD's description of the fixes sounds
pretty serious.

IMHO, your organization's IT security policy should be consulted in order
to answer your question.

/Ole

Ryan Novosielski

unread,
Oct 13, 2023, 11:02:22 AM10/13/23
to Slurm User Community List
If you look at the downloads page, this has happened before:


This should probably be updated as well, to indicate the new floor after this CVE.

But the point is, basically, you’re going to hit this if you don’t upgrade at least every ~18 months.

--
#BlackLivesMatter
____
|| \\UTGERS,     |---------------------------*O*---------------------------
||_// the State  |         Ryan Novosielski - novo...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ  | Office of Advanced Research Computing - MSB A555B, Newark
     `'

Gerhard Strangar

unread,
Oct 13, 2023, 1:09:16 PM10/13/23
to slurm...@lists.schedmd.com
Tim Wickberg wrote:

> A number of race conditions have been identified within the
> slurmd/slurmstepd processes that can lead to the user taking ownership
> of an arbitrary file on the system.

Is it any different than the CVE-2023-41915 in PMIx or does it just have
an additional number but it's the same issue? Or did anyone mis-type the
number? I couldn't find any information on CVE-2023-41914.

Gerhard

Bjørn-Helge Mevik

unread,
Oct 16, 2023, 2:30:41 AM10/16/23
to slurm...@schedmd.com
Taras Shapovalov <tshap...@nvidia.com> writes:

> Oh, does this mean that no one should use Slurm versions <= 21.08 any more?

That of course depends on your security requirements, but I wouldn't
have used those older versions in production any more, at least. (We
actually did upgrade from 21.08 to 23.02 on a couple of our clusters due
to this.)

--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
signature.asc

Groner, Rob

unread,
Oct 16, 2023, 11:22:37 AM10/16/23
to slurm...@lists.schedmd.com
It is my understanding that it is a different issue than pmix.  So to be fully protected, you would need to build the latest/fixed pmix and rebuild slurm using that (or just keep pmix disabled), AND have this latest version of slurm with their fix for their own vulnerability.

Rob


From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of Gerhard Strangar <g...@arcor.de>
Sent: Friday, October 13, 2023 1:08 PM
To: slurm...@lists.schedmd.com <slurm...@lists.schedmd.com>
Subject: Re: [slurm-users] Slurm versions 23.02.6 and 22.05.10 are now available (CVE-2023-41914)
 

Christopher Samuel

unread,
Oct 16, 2023, 12:46:25 PM10/16/23
to slurm...@lists.schedmd.com
On 10/16/23 08:22, Groner, Rob wrote:

> It is my understanding that it is a different issue than pmix.

That's my understanding too. The PMIx issue wasn't in Slurm, it was in
the PMIx code that Slurm was linked to. This CVE is for Slurm itself.

--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA


Kilian Cavalotti

unread,
Oct 16, 2023, 7:54:33 PM10/16/23
to Slurm User Community List
Those CVEs are indeed for different software (one for PMIx, one for
Slurm), even though they're ultimately for the same kind of underlying
problem (chown() being used instead of lchown(), which could lead in
taking over privileged files).

The Slurm patches include more fixes related to permissions and race
conditions, but both vulnerabilities have been discovered and reported
by the same person (Hi François! ;).

CHeers,
--
Kilian
--
Kilian

Reply all
Reply to author
Forward
0 new messages