[slurm-users] Slurm versions 24.11.2 and 24.05.6 are now available

149 views
Skip to first unread message

Marshall Garey via slurm-users

unread,
Feb 25, 2025, 4:15:22 PM2/25/25
to slurm...@schedmd.com, slurm-a...@schedmd.com
We are pleased to announce the availability of Slurm versions 24.11.2
and 24.05.6.

24.11.2 fixes a variety of minor to major bugs. Fixed regressions
include loading non-default QOS on pending jobs from pre-24.11 state,
pending jobs displaying QOS=(null) when not explicitly requesting a QOS,
running jobs that requested multiple partitions potentially having an
incorrect partition when slurmctld is restarted, and burst_buffer.lua
failing if slurm.conf is in a non-standard location. This release also
fixes a few crashes in slurmctld: crashing when a job that can preempt
requests --test-only, crasing when the scheduler evaluates a job on
nodes with suspended jobs, and crashing due to a long-standing bug
causing a job record without job_resrcs.

24.05.6 fixes sattach with auth/slurm, a slurmrestd crash when using
data_parser/v0.0.40, a slurmctld crash when using job suspension, a
performance regression for RPCs with large amounts of data, and some
other moderate severity bugs.

Downloads are available at https://www.schedmd.com/downloads.php .

--
Marshall Garey
Release Management, Support, and Development
SchedMD LLC - Commercial Slurm Development and Support

> * Changes in Slurm 24.11.2
> ==========================
> -- Fix segfault when submitting --test-only jobs that can preempt.
> -- Fix regression introduced in 23.11 that prevented the following
> flags from being added to a reservation on an update:
> DAILY, HOURLY, WEEKLY, WEEKDAY, and WEEKEND.
> -- Fix crash and issues evaluating job's suitability for running in
> nodes with already suspended job(s) there.
> -- Slurmctld will ensure that healthy nodes are not reported as
> UnavailableNodes in job reason codes.
> -- Fix handling of jobs submitted to a current reservation with
> flags OVERLAP,FLEX or OVERLAP,ANY_NODES when it overlaps nodes with a
> future maintenance reservation. When a job submission had a time limit that
> overlapped with the future maintenance reservation, it was rejected. Now
> the job is accepted but stays pending with the reason "ReqNodeNotAvail,
> Reserved for maintenance".
> -- pam_slurm_adopt - avoid errors when explicitly setting
> some arguments to the default value.
> -- Fix qos preemption with PreemptMode=SUSPEND
> -- slurmdbd - When changing a user's name update lineage
> at the same time.
> -- Fix regression in 24.11 in which burst_buffer.lua does not
> inherit the SLURM_CONF environment variable from slurmctld and fails to run
> if slurm.conf is in a non-standard location.
> -- Fix memory leak in slurmctld if select/linear and the
> PreemptParameters=reclaim_licenses options are both set in slurm.conf.
> Regression in 24.11.1.
> -- Fix running jobs, that requested multiple partitions, from
> potentially being set to the wrong partition on restart.
> -- switch/hpe_slingshot - Fix compatibility with newer cxi
> drivers, specifically when specifying disable_rdzv_get.
> -- Add ABORT_ON_FATAL environment variable to capture a backtrace
> from any fatal() message.
> -- Fix printing invalid address in rate limiting log statement.
> -- sched/backfill - Fix node state PLANNED not being cleared from
> fully allocated nodes during a backfill cycle.
> -- select/cons_tres - Fix future planning of jobs with bf_licenses.
> -- Prevent redundant "on_data returned rc: Rate limit exceeded,
> please retry momentarily" error message from being printed in
> slurmctld logs.
> -- Fix loading non-default QOS on pending jobs from pre-24.11 state.
> -- Fix pending jobs displaying QOS=(null) when not explicitly
> requesting a QOS.
> -- Fix segfault issue from job record with no job_resrcs
> -- Fix failing sacctmgr delete/modify/show account operations
> with where clauses.
> -- Fix regression in 24.11 in which Slurm daemons started catching
> several SIGTSTP, SIGTTIN and SIGUSR1 signals and ignored them, while before
> they were not ignoring them. This also caused slurmctld to not being
> able to shutdown after a SIGTSTP because slurmscriptd caught the signal
> and stopped while slurmctld ignored it. Unify and fix these situations and
> get back to the previous behavior for these signals.
> -- Document that SIGQUIT is no longer ignored by slurmctld,
> slurmdbd, and slurmd in 24.11. As of 24.11.0rc1, SIGQUIT is identical to
> SIGINT and SIGTERM for these daemons, but this change was not documented.
> -- Fix not considering nodes marked for reboot without ASAP
> in the scheduler.
> -- Remove the boot^ state on unexpected node reboot after
> return to service.
> -- Do not allow new jobs to start on a node which is being rebooted
> with the flag nextstate=resume.
> -- Prevent lower priority job running after cancelling an ASAP reboot.
> -- Fix srun jobs starting on nextstate=resume rebooting nodes.

>
> * Changes in Slurm 24.05.6
> ==========================
> -- data_parser/v0.0.40 - Prevent a segfault in the slurmrestd when
> dumping data with v0.0.40+complex data parser.
> -- Fix sattach when using auth/slurm.
> -- scrun - Add support '--all' argument for kill subcommand.
> -- Fix performance regression while packing larger RPCs.
> -- Fix crash and issues evaluating job's suitability for running in
> nodes with already suspended job(s) there.
> -- Fixed a job requeuing issue that merged job entries into the
> same SLUID when all nodes in a job failed simultaneously.
> -- switch/hpe_slingshot - Fix compatibility with newer cxi
> drivers, specifically when specifying disable_rdzv_get.
> -- Add ABORT_ON_FATAL environment variable to capture a backtrace
> from any fatal() message.

--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com

Markus Köberl via slurm-users

unread,
Feb 26, 2025, 4:13:17 AM2/26/25
to slurm...@lists.schedmd.com, Marshall Garey
On Tuesday, 25 February 2025 22:10:02 CET Marshall Garey via slurm-users
wrote:
> We are pleased to announce the availability of Slurm versions 24.11.2
> and 24.05.6.

On the download page the wrong md5sum is displayed for slurm-24.11.2.tar.bz2


regards
Markus Köberl
--
Markus Koeberl
Graz University of Technology
Signal Processing and Speech Communication Laboratory
E-mail: markus....@tugraz.at
signature.asc

Tim McMullan via slurm-users

unread,
Feb 26, 2025, 8:10:14 AM2/26/25
to slurm...@lists.schedmd.com, Markus Köberl
Thank you Markus, I fixed the error and figured out how that happened so it shouldn't happen that way again!

Thanks again,
--Tim

--
Tim McMullan

Release Management, Support, and Development
SchedMD LLC - Commercial Slurm Development and Support

Steven Jones via slurm-users

unread,
Mar 3, 2025, 8:06:20 PM3/3/25
to slurm...@schedmd.com
I am trying to add slurmdbd to my first attempt of slurmctld.

I have mariadb 10.11 running and permissions set.

MariaDB [(none)]> CREATE DATABASE slurm_acct_db;
Query OK, 1 row affected (0.000 sec)

MariaDB [(none)]> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| slurm_acct_db      |
+--------------------+



When I try to start slurmdbd it fails.

[root@vuwunicoslurmd3 ~]# systemctl status slurmdbd
? slurmdbd.service - Slurm DBD accounting daemon
     Loaded: loaded (/usr/lib/systemd/system/slurmdbd.service; disabled; preset: disabled)
     Active: inactive (dead)
[root@vuwunicoslurmd3 ~]# systemctl enable --now slurmdbd
Created symlink /etc/systemd/system/multi-user.target.wants/slurmdbd.service ? /usr/lib/systemd/system/slurmdbd.service.
[root@vuwunicoslurmd3 ~]# systemctl status slurmdbd
? slurmdbd.service - Slurm DBD accounting daemon
     Loaded: loaded (/usr/lib/systemd/system/slurmdbd.service; enabled; preset: disabled)
     Active: inactive (dead)
  Condition: start condition failed at Tue 2025-03-04 00:54:38 UTC; 1s ago
             ?? ConditionPathExists=/etc/slurm/slurmdbd.conf was not met

Mar 04 00:54:38 vuwunicoslurmd3.ods.vuw.ac.nz systemd[1]: Slurm DBD accounting daemon was skipped because of an unmet co>
[root@vuwunicoslurmd3 ~]#

So there seems to be a hole in the guide.   Some config  is needed?



regards

Steven 


Kamil Wilczek via slurm-users

unread,
Mar 4, 2025, 3:32:24 AM3/4/25
to Steven Jones, slurm...@schedmd.com
Hello,

yes, you need to configure the SlurmDBD daemon:

https://slurm.schedmd.com/slurmdbd.html
https://slurm.schedmd.com/slurmdbd.conf.html

Accounting setup (enforcing limits for example) requires the
database, but some additional steps are also required to
get the whole system working.

Systemd service is not starting because the configuration
file is missing:

ConditionPathExists=/etc/slurm/slurmdbd.conf

Kind regards,
--
Kamil Wilczek [https://keys.openpgp.org/]
[D415917E84B8DA5A60E853B6E676ED061316B69B]

Steffen Grunewald via slurm-users

unread,
Mar 4, 2025, 3:45:35 AM3/4/25
to Steven Jones, slurm...@schedmd.com
TIL about the "--now" option to "systemctl enable"... thanks for this one! ;)
although I admit to prefer a step-by-step approach (and I'd only enable a unit
if it's been successfully started once, to avoid complaints at reboot)...

You wrote that you configured MySQL but didn't mention SlurmDBD config.
Does the file that is being complained about exist (on that machine)?

> So there seems to be a hole in the guide. Some config is needed?

To be honest, I've been following Ole's detailed setup instructions since
Adam and Eve - not the ones directly from the horse's mouth.
Whatever, I'd first try to track down that ConditionPathExists issue...

Best, Steffen

--
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany
~~~
Fon: +49-331-567 7274
Mail: steffen.grunewald(at)aei.mpg.de
~~~

Ole Holm Nielsen via slurm-users

unread,
Mar 4, 2025, 4:36:21 AM3/4/25
to slurm...@lists.schedmd.com
On 3/4/25 09:43, Steffen Grunewald via slurm-users wrote:
>> Following the setup at, https://slurm.schedmd.com/accounting.html#mysql-configuration
>>
>> When I try to start slurmdbd it fails.
>>
>> [root@vuwunicoslurmd3 ~]# systemctl status slurmdbd
>> ? slurmdbd.service - Slurm DBD accounting daemon
>> Loaded: loaded (/usr/lib/systemd/system/slurmdbd.service; disabled; preset: disabled)
>> Active: inactive (dead)
>> [root@vuwunicoslurmd3 ~]# systemctl enable --now slurmdbd
>> Created symlink /etc/systemd/system/multi-user.target.wants/slurmdbd.service ? /usr/lib/systemd/system/slurmdbd.service.
>> [root@vuwunicoslurmd3 ~]# systemctl status slurmdbd
>> ? slurmdbd.service - Slurm DBD accounting daemon
>> Loaded: loaded (/usr/lib/systemd/system/slurmdbd.service; enabled; preset: disabled)
>> Active: inactive (dead)
>> Condition: start condition failed at Tue 2025-03-04 00:54:38 UTC; 1s ago
>> ?? ConditionPathExists=/etc/slurm/slurmdbd.conf was not met
>
> TIL about the "--now" option to "systemctl enable"... thanks for this one! ;)
> although I admit to prefer a step-by-step approach (and I'd only enable a unit
> if it's been successfully started once, to avoid complaints at reboot)...
>
> You wrote that you configured MySQL but didn't mention SlurmDBD config.
> Does the file that is being complained about exist (on that machine)?
>
>> So there seems to be a hole in the guide. Some config is needed?
>
> To be honest, I've been following Ole's detailed setup instructions since
> Adam and Eve - not the ones directly from the horse's mouth.
> Whatever, I'd first try to track down that ConditionPathExists issue...

The Systemd error message "ConditionPathExists=/etc/slurm/slurmdbd.conf
was not met" is a critical error! Check that the file exists and is owned
by the user slurm and group slurm, for example:

$ ls -l /etc/slurm/slurmdbd.conf
-rw-------. 1 slurm slurm 504 Feb 28 2023 /etc/slurm/slurmdbd.conf

Make sure that you configured slurmdbd.conf correctly, see this Wiki page:
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_database/#slurmdbd-configuration

IHTH,
Ole
Reply all
Reply to author
Forward
0 new messages