[slurm-users] Slurm version 24.11.1 is now available

91 views
Skip to first unread message

Marshall Garey via slurm-users

unread,
Jan 23, 2025, 3:45:53 PM1/23/25
to slurm-a...@schedmd.com, slurm...@schedmd.com
We are pleased to announce the availability of Slurm version 24.11.1.

This fixes a few possible crashes of the slurmctld and slurmrestd; a
regression in 24.11 which caused file transfers to a job with sbcast to
not join the job container namespace; mpi apps using Intel OPA, PSM2 and
OMPI 5.x when ran through srun; and various minor to moderate bugs.

Downloads are available at https://www.schedmd.com/downloads.php .

--
Marshall Garey
Release Management, Support, and Development
SchedMD LLC - Commercial Slurm Development and Support

> * Changes in Slurm 24.11.1
> ==========================
> -- With client commands MIN_MEMORY will show mem_per_tres if specified.
> -- Fix errno message about bad constraint
> -- slurmctld - Fix crash and possible split brain issue if the
> backup controller handles an scontrol reconfigure while in control
> before the primary resumes operation.
> -- Fix stepmgr not getting dynamic node addrs from the controller
> -- stepmgr - avoid "Unexpected missing socket" errors.
> -- Fix `scontrol show steps` with dynamic stepmgr
> -- Deny jobs using the "R:" option of --signal if PreemptMode=OFF
> globally.
> -- Force jobs using the "R:" option of --signal to be preemptable
> by requeue or cancel only. If PreemptMode on the partition or QOS is off
> or suspend, the job will default to using PreemptMode=cancel.
> -- If --mem-per-cpu exceeds MaxMemPerCPU, the number of cpus per
> task will always be increased even if --cpus-per-task was specified. This
> is needed to ensure each task gets the expected amount of memory.
> -- Fix compilation issue on OpenSUSE Leap 15
> -- Fix jobs using more nodes than needed when not using -N
> -- Fix issue with allocation being allocated less resources
> than needed when using --gres-flags=enforce-binding.
> -- select/cons_tres - Fix errors with MaxCpusPerSocket partition
> limit. Used cpus/cores weren't counted properly, nor limiting free ones
> to avail, when the socket was partially allocated, or the job request
> went beyond this limit.
> -- Fix issue when jobs were preempted for licenses even if there
> were enough licenses available.
> -- Fix srun ntasks calculation inside an allocation when nodes are
> requested using a min-max range.
> -- Print correct number of digits for TmpDisk in sdiag.
> -- Fix a regression in 24.11 which caused file transfers to a job
> with sbcast to not join the job container namespace.
> -- data_parser/v0.0.40 - Prevent a segfault in the slurmrestd when
> dumping data with v0.0.40+complex data parser.
> -- Remove logic to force lowercase GRES names.
> -- data_parser/v0.0.42 - Prevent the association id from always
> being dumped as NULL when parsing in complex mode. Instead it will now
> dump the id. This affects the following endpoints:
> GET slurmdb/v0.0.42/association
> GET slurmdb/v0.0.42/associations
> GET slurmdb/v0.0.42/config
> -- Fixed a job requeuing issue that merged job entries into the
> same SLUID when all nodes in a job failed simultaneously.
> -- When a job completes, try to give idle nodes to reservations with
> the REPLACE flag before allowing them to be allocated to jobs.
> -- Avoid expensive lookup of all associations when dumping or
> parsing for v0.0.42 endpoints.
> -- Avoid expensive lookup of all associations when dumping or
> parsing for v0.0.41 endpoints.
> -- Avoid expensive lookup of all associations when dumping or
> parsing for v0.0.40 endpoints.
> -- Fix segfault when testing jobs against nodes with invalid gres.
> -- Fix performance regression while packing larger RPCs.
> -- Document the new mcs/label plugin.
> -- job_container/tmpfs - Fix Xauthoirty file being created
> outside the container when EntireStepInNS is enabled.
> -- job_container/tmpfs - Fix spank_task_post_fork not always
> running in the container when EntireStepInNS is enabled.
> -- Fix a job potentially getting stuck in CG on permissions
> errors while setting up X11 forwarding.
> -- Fix error on X11 shutdown if Xauthority file was not created.
> -- slurmctld - Fix memory or fd leak if an RPC is recieved that
> is not registered for processing.
> -- Inject OMPI_MCA_orte_precondition_transports when using PMIx. This fixes
> mpi apps using Intel OPA, PSM2 and OMPI 5.x when ran through srun.
> -- Don't skip the first partition_job_depth jobs per partition.
> -- Fix gres allocation issue after controller restart.
> -- Fix issue where jobs requesting cpus-per-gpu hang in queue.
> -- switch/hpe_slingshot - Treat HTTP status forbidden the same as
> unauthorized, allowing for a graceful retry attempt.


--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com

Ole Holm Nielsen via slurm-users

unread,
Jan 24, 2025, 3:03:29 AM1/24/25
to slurm...@lists.schedmd.com
Hi Marshall,

Could you update the NEWS file?
https://github.com/SchedMD/slurm/blob/master/NEWS

Thanks,
Ole

On 1/23/25 21:41, Marshall Garey via slurm-users wrote:
> We are pleased to announce the availability of Slurm version 24.11.1.
>
> This fixes a few possible crashes of the slurmctld and slurmrestd; a
> regression in 24.11 which caused file transfers to a job with sbcast to
> not join the job container namespace; mpi apps using Intel OPA, PSM2 and
> OMPI 5.x when ran through srun; and various minor to moderate bugs.
>
> Downloads are available at https://www.schedmd.com/downloads.php .

--

Tim Wickberg via slurm-users

unread,
Jan 24, 2025, 5:06:56 PM1/24/25
to slurm...@lists.schedmd.com
https://github.com/SchedMD/slurm/blob/slurm-24.11/NEWS is current.

We've changed how the release branches are managed which means that the
changes for each maintenance release aren't reflected in the master
branch version of that file. The release-branch-specific NEWS is being
updated for the existing stable releases as each new maintenance release
is tagged. (It's now generated from the Changelog: commit trailers,
instead of directly changed as commits are pushed.)

There will likely be further changes to NEWS and RELEASE_NOTES for 25.05
when released this spring, but we haven't settled on exactly what that
will look like yet.

- Tim

On 1/24/25 01:01, Ole Holm Nielsen via slurm-users wrote:
> Hi Marshall,
>
> Could you update the NEWS file?
> https://github.com/SchedMD/slurm/blob/master/NEWS
>
> Thanks,
> Ole

--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support

Reply all
Reply to author
Forward
0 new messages