[slurm-users] Slurmctld process error 'double free or corruption' on RHEL 9 (Rocky Linux)

294 views
Skip to first unread message

William VINCENT via slurm-users

unread,
Jul 15, 2024, 4:45:45 AM7/15/24
to slurm...@lists.schedmd.com
Hello

I am writing to report an issue with the Slurmctld process on our RHEL 9
(Rocky Linux) .

Twice in the past 5 days, the Slurmctld process has encountered an error
that resulted in the service stopping. The error message displayed was
"double free or corruption (out)". This error has caused significant
disruption to our jobs, and we are concerned about its recurrence.

We have tried troubleshooting the issue, but we have not been able to
identify the root cause of the problem. We would appreciate any
assistance or guidance you can provide to help us resolve this issue.

Please let us know if you need any additional information or if there
are any specific steps we should take to diagnose the problem further.

Thank you for your attention to this matter.

Best regards,

_________________________

Jul 09 22:12:01 admin slurmctld[711010]: double free or corruption
(fasttop)
Jul 09 22:12:01 admin systemd[1]: slurmctld.service: Main process
exited, code=killed, status=6/ABRT
Jul 09 22:12:01 admin systemd[1]: slurmctld.service: Failed with result
'signal'.
Jul 09 22:12:01 admin systemd[1]: slurmctld.service: Consumed 11min
26.451s CPU time.

.....

Jul 14 10:15:01 admin slurmctld[1633720]: double free or corruption (out)
Jul 14 10:15:02 admin systemd[1]: slurmctld.service: Main process
exited, code=killed, status=6/ABRT
Jul 14 10:15:02 admin systemd[1]: slurmctld.service: Failed with result
'signal'.
Jul 14 10:15:02 admin systemd[1]: slurmctld.service: Consumed 7min
27.596s CPU time.

_________________________

slurmctld -V
slurm 22.05.9

________________________

cat /etc/slurm/slurm.conf |grep -v '#'


ClusterName=xxx
SlurmctldHost=admin
SlurmctldParameters=enable_configless
SlurmUser=slurm
AuthType=auth/munge
CryptoType=crypto/munge


SlurmctldPort=6817
StateSaveLocation=/var/spool/slurmctld
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmctldDebug=verbose
DebugFlags=NO_CONF_HASH


SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmdLogFile=/var/log/slurm/slurmd.log
SlurmdDebug=verbose

SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core,CR_LLN
DefMemPerCPU=1024
MaxMemPerCPU=4096
GresTypes=gpu


ProctrackType=proctrack/cgroup
JobAcctGatherType=jobacct_gather/cgroup
JobAcctGatherFrequency=15
JobCompType=jobcomp/none

TaskPlugin=task/cgroup
LaunchParameters=use_interactive_step

AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=admin
AccountingStoragePort=6819
AccountingStorageEnforce=associations
AccountingStorageTRES=gres/gpu



MailProg=/usr/bin/mailx
EnforcePartLimits=YES
MaxArraySize=200000
MaxJobCount=500000
MpiDefault=none
ReturnToService=2
SwitchType=switch/none
TmpFS=/tmpslurm/
UsePAM=1



InactiveLimit=0
KillWait=30
MessageTimeout=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0



PriorityType=priority/multifactor
PriorityFlags=FAIR_TREE,MAX_TRES
PriorityDecayHalfLife=1-0
PriorityWeightFairshare=10000




NodeName=xxx  NodeHostname=xxx  CPUs=4 Sockets=4 RealMemory=3500
TmpDisk=1 CoresPerSocket=1 ThreadsPerCore=1 State=DRAIN
NodeName=xxx  NodeHostname=xxx  CPUs=2 Sockets=2 RealMemory=1700
TmpDisk=1 CoresPerSocket=1 ThreadsPerCore=1 State=DRAIN
NodeName=xxx  NodeHostname=xxx  CPUs=4 Sockets=4 RealMemory=1700
TmpDisk=1 CoresPerSocket=1 ThreadsPerCore=1 State=DRAIN
NodeName=xxx  NodeHostname=xxx  CPUs=4 Sockets=4 RealMemory=3500
TmpDisk=1 CoresPerSocket=1 ThreadsPerCore=1 State=DRAIN


NodeName=r9nc-24-[1-12] NodeHostname=r9nc-24-[1-12] Sockets=2
CoresPerSocket=12 ThreadsPerCore=1 CPUs=24 RealMemory=180000 State=UNKNOWN
NodeName=r9nc-48-[1-4]  NodeHostname=r9nc-48-[1-4] Sockets=2
CoresPerSocket=24 ThreadsPerCore=1 CPUs=48 RealMemory=480000 State=UNKNOWN
NodeName=r9ng-1080-[1-7]   NodeHostname=r9ng-1080-[1-7] Sockets=2
CoresPerSocket=10 ThreadsPerCore=1 CPUs=20 RealMemory=180000
State=UNKNOWN Gres=gpu:1080ti:4
NodeName=r9ng-1080-8   NodeHostname=r9ng-1080-8 Sockets=2
CoresPerSocket=10 ThreadsPerCore=1 CPUs=20 RealMemory=176687
State=UNKNOWN Gres=gpu:1080ti:1

PartitionName=24CPUNodes      Nodes=r9nc-24-[1-12]        State=UP
MaxTime=UNLIMITED OverSubscribe=NO MaxMemPerCPU=7500 DefMemPerCPU=7500
TRESBillingWeights="CPU=1.0,Mem=0.125G" Default=YES
PartitionName=48CPUNodes      Nodes=r9nc-48-[1-4]         State=UP
MaxTime=UNLIMITED OverSubscribe=NO MaxMemPerCPU=10000 DefMemPerCPU=8000
TRESBillingWeights="CPU=1.0,Mem=0.125G"
PartitionName=GPUNodes   Nodes=r9ng-1080-[1-7]            State=UP
MaxTime=UNLIMITED OverSubscribe=NO MaxMemPerCPU=9000 DefMemPerCPU=9000
PartitionName=GPUNodes1080-dev   Nodes=r9ng-1080-8        State=UP
MaxTime=UNLIMITED OverSubscribe=NO MaxMemPerCPU=9000 DefMemPerCPU=9000
Hidden=Yes

_________________________

sinfo
PARTITION        AVAIL  TIMELIMIT  NODES  STATE NODELIST
24CPUNodes*         up   infinite     12   idle r9nc-24-[1-12]
48CPUNodes          up   infinite      2   idle r9nc-48-[1-2]
GPUNodes            up   infinite      4   idle r9ng-1080-[4-7]
GPUNodes1080-dev    up   infinite      1   idle r9ng-1080-8


--
William VINCENT
Administrateur systèmes et réseaux

--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com

Ole Holm Nielsen via slurm-users

unread,
Jul 15, 2024, 5:25:38 AM7/15/24
to slurm...@lists.schedmd.com
On 7/15/24 10:43, William VINCENT via slurm-users wrote:
> I am writing to report an issue with the Slurmctld process on our RHEL 9
> (Rocky Linux) .
>
> Twice in the past 5 days, the Slurmctld process has encountered an error
> that resulted in the service stopping. The error message displayed was
> "double free or corruption (out)". This error has caused significant
> disruption to our jobs, and we are concerned about its recurrence.
>
> We have tried troubleshooting the issue, but we have not been able to
> identify the root cause of the problem. We would appreciate any assistance
> or guidance you can provide to help us resolve this issue.
>
> Please let us know if you need any additional information or if there are
> any specific steps we should take to diagnose the problem further.

You're running Slurm 22.05.9 on RockyLinux 9 (is that Rocky 9.4 or what?).
Such an old Slurm version probably hasn't been tested much on EL9 systems,

For security reasons you ought to upgrade to a recent Slurm version, just
search for "CVE" in https://github.com/SchedMD/slurm/blob/master/NEWS to
find out about security holes in older versions.

You can upgrade by 2 major releases in a single step, so you can go to
23.11.8. Upgrading Slurm is fairly easy, and I've collected various
pieces of advice in the Wiki page
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#upgrading-slurm

Hopefully a newer Slurm version is going to solve your issue.

I hope this helps,
Ole

William V via slurm-users

unread,
Jul 15, 2024, 5:37:43 AM7/15/24
to slurm...@lists.schedmd.com
Thank you for your response, I hadn't considered that version 22 could be the problem.

I am aware that we are not up to date, but we use the EPEL repo for our RPM packages. Originally, we did not want to install .rpm directly because our policy is to apply security updates every night via the repositories, but unfortunately, in this case, it does not work. I think it is because only one person is responsible for maintaining the packages for RHEL.

I have already reported the security issue, but at the moment it does not seem possible to update: https://bugzilla.redhat.com/show_bug.cgi?id=2280545

It appears from another ticket that the compilation fails for version 24: https://bugzilla.redhat.com/show_bug.cgi?id=2259935

If the compilation fails, will the RPM package work on RHEL 9?

Ole Holm Nielsen via slurm-users

unread,
Jul 15, 2024, 6:46:50 AM7/15/24
to slurm...@lists.schedmd.com
On 7/15/24 11:35, William V via slurm-users wrote:
> Thank you for your response, I hadn't considered that version 22 could be the problem.
>
> I am aware that we are not up to date, but we use the EPEL repo for our RPM packages. Originally, we did not want to install .rpm directly because our policy is to apply security updates every night via the repositories, but unfortunately, in this case, it does not work. I think it is because only one person is responsible for maintaining the packages for RHEL.

You should *NOT* use Slurm packages from the EPEL repository!! The Slurm
documentation recommends to exclude those packages, see
https://slurm.schedmd.com/upgrades.html#epel_repository

> I have already reported the security issue, but at the moment it does not seem possible to update: https://bugzilla.redhat.com/show_bug.cgi?id=2280545

RedHat doesn't provide support for Slurm, and if necessary you should
contact SchedMD to obtain Slurm support.

> It appears from another ticket that the compilation fails for version 24: https://bugzilla.redhat.com/show_bug.cgi?id=2259935

I think this ticket only reports problems regarding older Slurm releases?

> If the compilation fails, will the RPM package work on RHEL 9?

You should build your own Slurm RPM packages, and compilation failure
would indicate a bug somewhere!

Just as a test, I've now built RPM packages of the currently supported
Slurm releases 23.11.8 and 24.05.1 on a RockyLinux 9.4 system. The RPMs
have built without any issues or compilation errors at all! I haven't
tested these RPMs on our production cluster which runs EL8 :-)

I recommend that you consult the Slurm documentation page[1] and my Wiki
page for Slurm installation:
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/ Remember to
install all prerequisite packages before building Slurm, as explained in
the Wiki!

Best regards,
Ole

[1] https://slurm.schedmd.com/documentation.html

William V via slurm-users

unread,
Jul 15, 2024, 7:11:04 AM7/15/24
to slurm...@lists.schedmd.com
Wow, thank you so much for all this information and the installation wiki.
I have a lot of work to do to change the infrastructure, I hope it will go smoothly.

William V via slurm-users

unread,
Jul 16, 2024, 10:21:55 AM7/16/24
to slurm...@lists.schedmd.com
How can I propose modifications to the wiki?
For example, for RHEL9, it is missing 'dnf install dbus-devel' for compil with "cgroup v2" .

Ole Holm Nielsen via slurm-users

unread,
Jul 16, 2024, 1:52:53 PM7/16/24
to slurm...@lists.schedmd.com
On 16-07-2024 16:20, William V via slurm-users wrote:
> How can I propose modifications to the wiki?
> For example, for RHEL9, it is missing 'dnf install dbus-devel' for compil with "cgroup v2" .

On my RockyLinux 9.4 system there was no requirement for the dbus-devel
RPM package (it isn't installed) when I built the Slurm RPMs. How did
you experience this requirement?

/Ole

Jeffrey T Frey via slurm-users

unread,
Jul 16, 2024, 2:13:52 PM7/16/24
to Ole.H....@fysik.dtu.dk, slurm...@lists.schedmd.com
I can confirm on a freshly-installed RockyLinux 9.4 system, the dbus-devel package was not installed by default. The Development Tools


# dnf repoquery --groupmember dbus-devel
Last metadata expiration check: 2:04:16 ago on Tue 16 Jul 2024 12:02:50 PM EDT.
dbus-devel-1:1.12.20-8.el9.i686
dbus-devel-1:1.12.20-8.el9.x86_64
@platform-devel


# dnf group list
Last metadata expiration check: 2:03:23 ago on Tue 16 Jul 2024 12:02:50 PM EDT.
Available Environment Groups:
Minimal Install
Workstation
Custom Operating System
Virtualization Host
Installed Environment Groups:
Server with GUI
Server
Installed Groups:
Legacy UNIX Compatibility
Console Internet Tools
Container Management
Development Tools
Headless Management
RPM Development Tools
System Tools
Available Groups:
.NET Development
Graphical Administration Tools
Network Servers
Scientific Support
Security Tools
Smart Card Support


So the package was _not_ present on any of the groups that got installed, and "Platform Development" isn't in the group list in the first place.

William V via slurm-users

unread,
Jul 17, 2024, 2:45:46 AM7/17/24
to slurm...@lists.schedmd.com
I had exactly this problem :
https://www.reddit.com/r/SLURM/comments/152ef0c/problems_installing_slurm/

Jul 16 11:28:31 occitest slurmd[54981]: slurmd: error: Couldn't find the specified plugin name for cgroup/v2 looking at all files
Jul 16 11:28:31 occitest slurmd[54981]: slurmd: error: cannot find cgroup plugin for cgroup/v2
Jul 16 11:28:31 occitest slurmd[54981]: slurmd: error: cannot create cgroup context for cgroup/v2
Jul 16 11:28:31 occitest slurmd[54981]: slurmd: error: Unable to initialize cgroup plugin
Jul 16 11:28:31 occitest slurmd[54981]: slurmd: error: slurmd initialization failed

So on the machine where I compile the packages, I installed dbus-devel, I recompiled and then reinstalled on the machine and it works now.

Another thing, to install devel packages on Rocky Linux (I don't know about other RHEL), you need to use the command: "dnf install xxx --enablerepo=devel".
Ex:
dnf install rpm-build gcc python3 openssl openssl-devel pam-devel numactl numactl-devel hwloc hwloc-devel munge munge-libs munge-devel lua lua-devel readline-devel rrdtool-devel ncurses-devel gtk2-devel libibmad libibumad perl-Switch perl-ExtUtils-MakeMaker xorg-x11-xauth --enablerepo=devel;
dnf install http-parser-devel json-c-devel libjwt-devel freeipmi-devel libssh2-devel man2html munge munge-libs munge-devel mariadb-server mariadb-devel --enablerepo=devel;

Ole Holm Nielsen via slurm-users

unread,
Jul 17, 2024, 4:13:18 AM7/17/24
to slurm...@lists.schedmd.com
On 7/17/24 08:43, William V via slurm-users wrote:
> I had exactly this problem :
> https://www.reddit.com/r/SLURM/comments/152ef0c/problems_installing_slurm/
>
> Jul 16 11:28:31 occitest slurmd[54981]: slurmd: error: Couldn't find the specified plugin name for cgroup/v2 looking at all files
> Jul 16 11:28:31 occitest slurmd[54981]: slurmd: error: cannot find cgroup plugin for cgroup/v2
> Jul 16 11:28:31 occitest slurmd[54981]: slurmd: error: cannot create cgroup context for cgroup/v2
> Jul 16 11:28:31 occitest slurmd[54981]: slurmd: error: Unable to initialize cgroup plugin
> Jul 16 11:28:31 occitest slurmd[54981]: slurmd: error: slurmd initialization failed
>
> So on the machine where I compile the packages, I installed dbus-devel, I recompiled and then reinstalled on the machine and it works now.

Thanks a lot for this observation! Now I see (thanks, Bas!) in
https://slurm.schedmd.com/quickstart_admin.html#prereqs that:

> cgroup Task Constraining: The task/cgroup plugin will be built if the hwloc development library is present. cgroup/v2 support also requires the bpf and dbus development libraries.

Therefore one *must* install the following packages for cgroup/v2 support:

$ dnf install hwloc-devel libbpf dbus-devel

I've now added these RPM prerequisites in the Wiki page
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#install-prerequisites

> Another thing, to install devel packages on Rocky Linux (I don't know about other RHEL), you need to use the command: "dnf install xxx --enablerepo=devel".
> Ex:
> dnf install rpm-build gcc python3 openssl openssl-devel pam-devel numactl numactl-devel hwloc hwloc-devel munge munge-libs munge-devel lua lua-devel readline-devel rrdtool-devel ncurses-devel gtk2-devel libibmad libibumad perl-Switch perl-ExtUtils-MakeMaker xorg-x11-xauth --enablerepo=devel;
> dnf install http-parser-devel json-c-devel libjwt-devel freeipmi-devel libssh2-devel man2html munge munge-libs munge-devel mariadb-server mariadb-devel --enablerepo=devel;

That's an interesting issue which we should explore to determine the best
practices!

In my tests on a Rocky 9.4 system I didn't need to use
"--enablerepo=devel", even though I see there exists a repo file
/etc/yum.repos.d/rocky-devel.repo containing, however, a warning message:

> name=Rocky Linux $releasever - Devel WARNING! FOR BUILDROOT ONLY DO NOT LEAVE ENABLED

When I try to install a devel RPM package, it comes from the appstream
repo (defined in /etc/yum.repos.d/rocky.repo) in stead:

$ dnf install dbus-devel
Last metadata expiration check: 2:09:03 ago on Wed 17 Jul 2024 07:40:07 AM
CEST.
Dependencies resolved.
================================================================================
Package Architecture Version Repository
Size
================================================================================
Installing:
dbus-devel x86_64 1:1.12.20-8.el9 appstream
33 k

On AlmaLinux 9.4 the appstream repo is defined in
/etc/yum.repos.d/almalinux-appstream.repo so it's readily available.

Could you perhaps examine in more detail your system's appstream repo and
why you need --enablerepo=devel ?

Thanks,
Ole

William V via slurm-users

unread,
Jul 17, 2024, 10:24:19 AM7/17/24
to slurm...@lists.schedmd.com
Hello
Log when i want install without devel
dnf install rpm-build gcc python3 openssl openssl-devel pam-devel numactl numactl-devel hwloc hwloc-devel munge munge-libs munge-devel lua lua-devel readline-devel rrdtool-devel ncurses-devel gtk2-devel libibmad libibumad perl-Switch perl-ExtUtils-MakeMaker xorg-x11-xauth http-parser-devel json-c-devel libjwt-devel freeipmi-devel libssh2-devel man2html munge munge-libs munge-devel mariadb-server mariadb-devel
Last metadata expiration check: 3:49:07 ago on Wed 17 Jul 2024 12:27:52 PM CEST.
Package gcc-11.4.1-3.el9.x86_64 is already installed.
Package python3-3.9.18-3.el9_4.1.x86_64 is already installed.
Package openssl-1:3.0.7-27.el9.x86_64 is already installed.
Package munge-0.5.13-13.el9.x86_64 is already installed.
Package munge-libs-0.5.13-13.el9.x86_64 is already installed.
No match for argument: munge-devel
No match for argument: lua-devel
No match for argument: rrdtool-devel
Package infiniband-diags-48.0-1.el9.x86_64 is already installed.
Package libibumad-48.0-1.el9.x86_64 is already installed.
No match for argument: perl-Switch
Package xorg-x11-xauth-1:1.1-10.el9.x86_64 is already installed.
No match for argument: http-parser-devel
No match for argument: json-c-devel
No match for argument: freeipmi-devel
Package mariadb-server-3:10.5.22-1.el9_2.x86_64 is already installed.
All matches were filtered out by modular filtering for argument: mariadb-devel
Error: Unable to find a match: munge-devel lua-devel rrdtool-devel perl-Switch http-parser-devel json-c-devel freeipmi-devel mariadb-devel

after install :
dnf info json-c-devel
Last metadata expiration check: 2:53:58 ago on Wed 17 Jul 2024 01:23:45 PM CEST.
Installed Packages
Name : json-c-devel
Version : 0.14
Release : 11.el9
Architecture : x86_64
Size : 128 k
Source : json-c-0.14-11.el9.src.rpm
Repository : @System
>From repo : devel
Summary : Development files for json-c
URL : https://github.com/json-c/json-c
License : MIT
Description : This package contains libraries and header files for
: developing applications that use json-c.

default repo rocky :
[appstream]
name=Rocky Linux $releasever - AppStream
mirrorlist=https://mirrors.rockylinux.org/mirrorlist?arch=$basearch&repo=AppStream-$releasever$rltype
#baseurl=http://dl.rockylinux.org/$contentdir/$releasever/AppStream/$basearch/os/
gpgcheck=1
enabled=1
countme=1
metadata_expire=6h
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-Rocky-9


Log when i want install with devel
dnf install rpm-build gcc python3 openssl openssl-devel pam-devel numactl numactl-devel hwloc hwloc-devel munge munge-libs munge-devel lua lua-devel readline-devel rrdtool-devel ncurses-devel gtk2-devel libibmad libibumad perl-Switch perl-ExtUtils-MakeMaker xorg-x11-xauth http-parser-devel json-c-devel libjwt-devel freeipmi-devel libssh2-devel man2html munge munge-libs munge-devel mariadb-server mariadb-devel --enablerepo=devel
Last metadata expiration check: 1:00:58 ago on Wed 17 Jul 2024 03:15:50 PM CEST.
Package gcc-11.4.1-3.el9.x86_64 is already installed.
Package python3-3.9.18-3.el9_4.1.x86_64 is already installed.
Package openssl-1:3.0.7-27.el9.x86_64 is already installed.
Package munge-0.5.13-13.el9.x86_64 is already installed.
Package munge-libs-0.5.13-13.el9.x86_64 is already installed.
Package infiniband-diags-48.0-1.el9.x86_64 is already installed.
Package libibumad-48.0-1.el9.x86_64 is already installed.
Package xorg-x11-xauth-1:1.1-10.el9.x86_64 is already installed.
Package mariadb-server-3:10.5.22-1.el9_2.x86_64 is already installed.
Dependencies resolved.
=================================================================================================================
Package Architecture Version Repository Size
=================================================================================================================
Installing:
freeipmi-devel x86_64 1.6.14-2.el9 devel 234 k
gtk2-devel x86_64 2.24.33-8.el9 appstream 2.7 M
http-parser-devel x86_64 2.9.4-6.el9 devel 14 k
hwloc x86_64 2.4.1-5.el9 baseos 189 k
hwloc-devel x86_64 2.4.1-5.el9 appstream 251 k
json-c-devel x86_64 0.14-11.el9 devel 45 k
libjwt-devel x86_64 1.12.1-11.el9 epel 15 k
libssh2-devel x86_64 1.11.0-1.el9 epel 55 k
lua x86_64 5.4.4-4.el9 appstream 187 k
lua-devel x86_64 5.4.4-4.el9 devel 21 k
man2html x86_64 1.6-29.g.el9 epel 30 k
mariadb-devel x86_64 3:10.5.22-1.el9_2 devel 1.0 M
munge-devel x86_64 0.5.13-13.el9 devel 23 k
ncurses-devel x86_64 6.2-10.20210508.el9 appstream 516 k
numactl x86_64 2.0.16-3.el9 baseos 67 k
numactl-devel x86_64 2.0.16-3.el9 appstream 21 k
openssl-devel x86_64 1:3.0.7-27.el9 appstream 3.0 M
pam-devel x86_64 1.5.1-19.el9 appstream 140 k
perl-ExtUtils-MakeMaker noarch 2:7.60-3.el9 appstream 289 k
perl-Switch noarch 2.17-23.el9 devel 26 k
readline-devel x86_64 8.1-4.el9 appstream 194 k
rpm-build x86_64 4.16.1.3-29.el9 appstream 59 k
rrdtool-devel x86_64 1.7.2-21.el9 devel 19 k
Installing dependencies:
annobin x86_64 12.31-2.el9 appstream 1.0 M
apr x86_64 1.7.0-12.el9_3 appstream 122 k
apr-util x86_64 1.6.1-23.el9 appstream 94 k
apr-util-bdb x86_64 1.6.1-23.el9 appstream 12 k
atk-devel x86_64 2.36.0-5.el9 appstream 173 k
brotli x86_64 1.0.9-6.el9 appstream 312 k
brotli-devel x86_64 1.0.9-6.el9 appstream 31 k
bzip2 x86_64 1.0.8-8.el9 baseos 52 k
bzip2-devel x86_64 1.0.8-8.el9 appstream 214 k
cairo-devel x86_64 1.17.4-7.el9 appstream 190 k
debugedit x86_64 5.0-5.el9 appstream 76 k
dwz x86_64 0.14-3.el9 appstream 127 k
ed x86_64 1.14.2-12.el9 baseos 74 k
efi-srpm-macros noarch 6-2.el9_0 appstream 22 k
elfutils x86_64 0.190-2.el9 baseos 543 k
fontconfig-devel x86_64 2.14.0-2.el9_1 appstream 127 k
fonts-srpm-macros noarch 1:2.0.5-7.el9.1 appstream 27 k
freetype-devel x86_64 2.10.4-9.el9 appstream 1.1 M
fribidi-devel x86_64 1.0.10-6.el9.2 appstream 25 k
gcc-plugin-annobin x86_64 11.4.1-3.el9 appstream 46 k
gdb-minimal x86_64 10.2-13.el9 appstream 3.5 M
gdk-pixbuf2-devel x86_64 2.42.6-4.el9_4 appstream 63 k
ghc-srpm-macros noarch 1.5.0-6.el9 appstream 7.8 k
glib2-devel x86_64 2.68.4-14.el9 appstream 470 k
go-srpm-macros noarch 3.2.0-3.el9 appstream 26 k
graphite2-devel x86_64 1.3.14-9.el9 appstream 21 k
harfbuzz-devel x86_64 2.7.4-10.el9 appstream 305 k
http-parser x86_64 2.9.4-6.el9 appstream 37 k
httpd x86_64 2.4.57-8.el9 appstream 45 k
httpd-core x86_64 2.4.57-8.el9 appstream 1.4 M
httpd-filesystem noarch 2.4.57-8.el9 appstream 12 k
httpd-tools x86_64 2.4.57-8.el9 appstream 80 k
info x86_64 6.7-15.el9 baseos 224 k
kernel-srpm-macros noarch 1.0-13.el9 appstream 15 k
libX11-devel x86_64 1.7.0-9.el9 appstream 939 k
libXau-devel x86_64 1.0.9-8.el9 appstream 13 k
libXcomposite-devel x86_64 0.4.5-7.el9 appstream 16 k
libXcursor-devel x86_64 1.2.0-7.el9 appstream 22 k
libXext-devel x86_64 1.3.4-8.el9 appstream 72 k
libXfixes-devel x86_64 5.0.3-16.el9 appstream 12 k
libXft-devel x86_64 2.3.3-8.el9 appstream 18 k
libXi-devel x86_64 1.7.10-8.el9 appstream 99 k
libXinerama-devel x86_64 1.1.4-10.el9 appstream 13 k
libXrandr-devel x86_64 1.5.2-8.el9 appstream 19 k
libXrender-devel x86_64 0.9.10-16.el9 appstream 16 k
libblkid-devel x86_64 2.37.4-18.el9 appstream 17 k
libdatrie-devel x86_64 0.2.13-4.el9 appstream 132 k
libffi-devel x86_64 3.4.2-8.el9 appstream 28 k
libicu-devel x86_64 67.1-9.el9 appstream 830 k
libmount-devel x86_64 2.37.4-18.el9 appstream 18 k
libpng-devel x86_64 2:1.6.37-12.el9 appstream 290 k
libselinux-devel x86_64 3.6-1.el9 appstream 113 k
libsepol-devel x86_64 3.6-1.el9 appstream 39 k
libssh2 x86_64 1.11.0-1.el9 epel 132 k
libthai-devel x86_64 0.1.28-8.el9 appstream 117 k
libtiff-devel x86_64 4.4.0-12.el9 appstream 514 k
libxcb-devel x86_64 1.13.1-9.el9 appstream 1.0 M
libxml2-devel x86_64 2.9.13-6.el9_4 appstream 827 k
lua-rpm-macros noarch 1-6.el9 appstream 9.0 k
lua-srpm-macros noarch 1-6.el9 appstream 8.5 k
mailcap noarch 2.1.49-5.el9 baseos 32 k
man2html-core x86_64 1.6-29.g.el9 epel 58 k
mariadb-connector-c-devel x86_64 3.2.6-1.el9_0 appstream 55 k
ncurses-c++-libs x86_64 6.2-10.20210508.el9 appstream 36 k
ocaml-srpm-macros noarch 6-6.el9 appstream 7.8 k
openblas-srpm-macros noarch 2-11.el9 appstream 7.3 k
pango-devel x86_64 1.48.7-3.el9 appstream 140 k
patch x86_64 2.7.6-16.el9 appstream 127 k
pcre-cpp x86_64 8.44-3.el9.3 appstream 26 k
pcre-devel x86_64 8.44-3.el9.3 appstream 470 k
pcre-utf16 x86_64 8.44-3.el9.3 appstream 184 k
pcre-utf32 x86_64 8.44-3.el9.3 appstream 175 k
pcre2-devel x86_64 10.40-5.el9 appstream 471 k
pcre2-utf32 x86_64 10.40-5.el9 appstream 202 k
perl-AutoSplit noarch 5.74-481.el9 appstream 20 k
perl-Benchmark noarch 1.23-481.el9 appstream 25 k
perl-CPAN-Meta-YAML noarch 0.018-461.el9 appstream 26 k
perl-Devel-PPPort x86_64 3.62-4.el9 appstream 211 k
perl-ExtUtils-Command noarch 2:7.60-3.el9 appstream 14 k
perl-ExtUtils-Constant noarch 0.25-481.el9 appstream 45 k
perl-ExtUtils-Install noarch 2.20-4.el9 appstream 44 k
perl-ExtUtils-Manifest noarch 1:1.73-4.el9 appstream 34 k
perl-ExtUtils-ParseXS noarch 1:3.40-460.el9 appstream 182 k
perl-File-Compare noarch 1.100.600-481.el9 appstream 12 k
perl-Filter x86_64 2:1.60-4.el9 appstream 81 k
perl-I18N-Langinfo x86_64 0.19-481.el9 appstream 21 k
perl-JSON-PP noarch 1:4.06-4.el9 appstream 65 k
perl-Test-Harness noarch 1:3.42-461.el9 appstream 267 k
perl-Text-Balanced noarch 2.04-4.el9 appstream 48 k
perl-deprecate noarch 0.04-481.el9 appstream 13 k
perl-locale noarch 1.09-481.el9 appstream 12 k
perl-srpm-macros noarch 1-41.el9 appstream 8.2 k
perl-version x86_64 7:0.99.28-4.el9 appstream 62 k
pixman-devel x86_64 0.40.0-6.el9_3 appstream 16 k
pyproject-srpm-macros noarch 1.12.0-1.el9 appstream 13 k
python-rpm-macros noarch 3.9-53.el9 appstream 15 k
python-srpm-macros noarch 3.9-53.el9 appstream 17 k
python3-packaging noarch 20.9-5.el9 appstream 69 k
python3-rpm-generators noarch 12-9.el9 appstream 27 k
python3-rpm-macros noarch 3.9-53.el9 appstream 10 k
qt5-srpm-macros noarch 5.15.9-1.el9 appstream 7.9 k
rdma-core-devel x86_64 48.0-1.el9 appstream 373 k
redhat-rpm-config noarch 207-1.el9 appstream 66 k
rocky-logos-httpd noarch 90.15-2.el9 appstream 24 k
rust-srpm-macros noarch 17-4.el9 appstream 9.3 k
sysprof-capture-devel x86_64 3.40.1-3.el9 appstream 59 k
systemtap-sdt-devel x86_64 5.0-4.el9 appstream 74 k
xorg-x11-proto-devel noarch 2022.2-1.el9 appstream 263 k
xz-devel x86_64 5.2.5-8.el9_0 appstream 52 k
zip x86_64 3.0-35.el9 baseos 263 k
zlib-devel x86_64 1.2.11-40.el9 appstream 44 k
Installing weak dependencies:
apr-util-openssl x86_64 1.6.1-23.el9 appstream 14 k
mariadb-connector-c-doc noarch 3.2.6-1.el9_0 devel 98 k
mod_http2 x86_64 2.0.26-2.el9_4 appstream 162 k
mod_lua x86_64 2.4.57-8.el9 appstream 59 k
perl-CPAN-Meta noarch 2.150010-460.el9 appstream 176 k
perl-CPAN-Meta-Requirements noarch 2.140-461.el9 appstream 31 k
perl-Encode-Locale noarch 1.05-21.el9 appstream 19 k
perl-Time-HiRes x86_64 4:1.9764-462.el9 appstream 57 k
perl-devel x86_64 4:5.32.1-481.el9 appstream 659 k
perl-doc noarch 5.32.1-481.el9 appstream 4.5 M

Transaction Summary
=================================================================================================================
Install 144 Packages

Ole Holm Nielsen via slurm-users

unread,
Jul 17, 2024, 11:04:08 AM7/17/24
to slurm...@lists.schedmd.com
Hi William,

Maybe you need to enable the CodeReady Linux Builder (CRB) repository for
AlmaLinux/RockyLinux 9? Look for CRB in
https://wiki.rockylinux.org/rocky/repo/

The command for EL9 is: dnf config-manager --set-enabled crb

For EL8 enable this in stead: dnf config-manager --set-enabled powertools

I believe the packages you list below as supplied by "devel" will be
installed automatically once you have enabled CRB on EL9 or PowerTools on
EL8. Can you verify this?

I should add the above information to my Wiki page.

Best regards,
Ole

William V via slurm-users

unread,
Jul 18, 2024, 2:16:49 AM7/18/24
to slurm...@lists.schedmd.com
yes ! that work with crb repo
Thanks

Ole Holm Nielsen via slurm-users

unread,
Jul 18, 2024, 2:46:47 AM7/18/24
to slurm...@lists.schedmd.com
On 18-07-2024 08:15, William V via slurm-users wrote:
> yes ! that work with crb repo

Thanks for the test, I'm glad the package installations works as
expected now!

I've corrected the repository documentation in the Wiki page at
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#install-prerequisites

Best regards,
Ole

William V via slurm-users

unread,
Aug 26, 2024, 8:05:22 AM8/26/24
to slurm...@lists.schedmd.com
Hello,

Thanks again for your documentation, I deployed 24.05.2 last week.
But this weekend slurmctld crashed with only the following in the logs:

"Aug 25 15:33:02 slurmadmin slurmctld[79950]: free(): invalid next size (fast)"

Also, I regularly get these messages in my logs even though these two machines are in the same subnet in VMs, and the slurmadmin machine is the same machine that runs slurmctld and slurmd, so it cannot lose itself. Meanwhile, all my compute nodes are never disconnected.
/var/log/slurm/slurmctld.log:[2024-08-25T14:12:02.009] agent/is_node_resp: node:slurmadmin RPC:REQUEST_PING : Communication connection failure
/var/log/slurm/slurmctld.log:[2024-08-25T14:12:02.009] agent/is_node_resp: node:vmjupyter RPC:REQUEST_PING : Communication connection failure
/var/log/slurm/slurmctld.log:[2024-08-25T14:12:02.009] agent/is_node_resp: node:vmdev RPC:REQUEST_PING : Communication connection failure

Should I open a new topic for this?

Thank you in advance.

William V via slurm-users

unread,
Oct 7, 2024, 2:01:24 AM10/7/24
to slurm...@lists.schedmd.com
Hello,
This weekend I had the same error, even though I am in 24.x now

Oct 04 19:41:02 systemd[1]: slurmctld.service: Failed with result 'core-dump'.
Oct 04 19:41:02 systemd[1]: slurmctld.service: Main process exited, code=dumped, status=6/ABRT
Oct 04 19:41:02 slurmctld[1981869]: double free or corruption (fasttop)
Reply all
Reply to author
Forward
0 new messages