[slurm-users] slurmdbd not connecting to mysql (mariadb)

1,060 views
Skip to first unread message

Radhouane Aniba via slurm-users

unread,
May 29, 2024, 5:06:54 PM5/29/24
to slurm...@lists.schedmd.com
Hi everyone
I am trying to get slurmdbd to run on my local home server but I am really struggling.
Note : am a novice slurm user
my slurmdbd always times out even though all the details in the conf file are correct

My log looks like this

[2024-05-29T20:51:30.088] Accounting storage MYSQL plugin loaded
[2024-05-29T20:51:30.088] debug2: ArchiveDir = /tmp
[2024-05-29T20:51:30.088] debug2: ArchiveScript = (null)
[2024-05-29T20:51:30.088] debug2: AuthAltTypes = (null)
[2024-05-29T20:51:30.088] debug2: AuthInfo = (null)
[2024-05-29T20:51:30.088] debug2: AuthType = auth/munge
[2024-05-29T20:51:30.088] debug2: CommitDelay = 0
[2024-05-29T20:51:30.088] debug2: DbdAddr = localhost
[2024-05-29T20:51:30.088] debug2: DbdBackupHost = (null)
[2024-05-29T20:51:30.088] debug2: DbdHost = head-node
[2024-05-29T20:51:30.088] debug2: DbdPort = 7032
[2024-05-29T20:51:30.088] debug2: DebugFlags = (null)
[2024-05-29T20:51:30.088] debug2: DebugLevel = 6
[2024-05-29T20:51:30.088] debug2: DebugLevelSyslog = 10
[2024-05-29T20:51:30.088] debug2: DefaultQOS = (null)
[2024-05-29T20:51:30.088] debug2: LogFile = /var/log/slurmdbd.log
[2024-05-29T20:51:30.088] debug2: MessageTimeout = 100
[2024-05-29T20:51:30.088] debug2: Parameters = (null)
[2024-05-29T20:51:30.088] debug2: PidFile = /run/slurmdbd.pid
[2024-05-29T20:51:30.088] debug2: PluginDir = /usr/lib/x86_64-linux-gnu/slurm-wlm
[2024-05-29T20:51:30.088] debug2: PrivateData = none
[2024-05-29T20:51:30.088] debug2: PurgeEventAfter = 1 months*
[2024-05-29T20:51:30.088] debug2: PurgeJobAfter = 12 months*
[2024-05-29T20:51:30.088] debug2: PurgeResvAfter = 1 months*
[2024-05-29T20:51:30.088] debug2: PurgeStepAfter = 1 months
[2024-05-29T20:51:30.088] debug2: PurgeSuspendAfter = 1 months
[2024-05-29T20:51:30.088] debug2: PurgeTXNAfter = 12 months
[2024-05-29T20:51:30.088] debug2: PurgeUsageAfter = 24 months
[2024-05-29T20:51:30.088] debug2: SlurmUser = root(0)
[2024-05-29T20:51:30.089] debug2: StorageBackupHost = (null)
[2024-05-29T20:51:30.089] debug2: StorageHost = localhost
[2024-05-29T20:51:30.089] debug2: StorageLoc = slurm_acct_db
[2024-05-29T20:51:30.089] debug2: StoragePort = 3306
[2024-05-29T20:51:30.089] debug2: StorageType = accounting_storage/mysql
[2024-05-29T20:51:30.089] debug2: StorageUser = slurm
[2024-05-29T20:51:30.089] debug2: TCPTimeout = 2
[2024-05-29T20:51:30.089] debug2: TrackWCKey = 0
[2024-05-29T20:51:30.089] debug2: TrackSlurmctldDown= 0
[2024-05-29T20:51:30.089] debug2: acct_storage_p_get_connection: request new connection 1
[2024-05-29T20:51:30.089] debug2: Attempting to connect to localhost:3306
[2024-05-29T20:51:30.090] slurmdbd version 19.05.5 started
[2024-05-29T20:51:30.090] debug2: running rollup at Wed May 29 20:51:30 2024
[2024-05-29T20:51:30.091] debug2: Everything rolled up
[2024-05-29T20:51:49.673] Terminate signal (SIGINT or SIGTERM) received
[2024-05-29T20:51:49.673] debug: rpc_mgr shutting down



my config file looks like this

ArchiveEvents=yes
ArchiveJobs=yes
ArchiveResvs=yes
ArchiveSteps=no
ArchiveSuspend=no
ArchiveTXN=no
ArchiveUsage=no
PurgeEventAfter=1month
PurgeJobAfter=12month
PurgeResvAfter=1month
PurgeStepAfter=1month
PurgeSuspendAfter=1month
PurgeTXNAfter=12month
PurgeUsageAfter=24month
# Authentication info
AuthType=auth/munge
# slurmDBD info
DbdAddr=localhost
DbdHost=head-node
DbdPort=7032
SlurmUser=root
MessageTimeout=100
DebugLevel=5
#DefaultQOS=normal,standby
LogFile=/var/log/slurmdbd.log
PidFile=/run/slurmdbd.pid
#PrivateData=accounts,users,usage,jobs
#TrackWCKey=yes
#
# Database info
StorageType=accounting_storage/mysql
StorageHost=localhost
StoragePort=3306
StoragePass=slurmdbpass
StorageUser=slurm
StorageLoc=slurm_acct_db


I used standard names and passwords to get started and I will change later

but everytime I try to start slurmdbd.service it crashes and I have that log that I shared with you

I use these versions

slurmdbd -V
slurm-wlm 19.05.5
mysql Ver 15.1 Distrib 10.3.39-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2

Everything else Is working properly except I cannot get slurmdbd to work and at this point I exhausted all my possible trials :) looking for some expert insights :)


Any idea what I am doing wrong here ? Also I didn't compile any slurm package. I used the binary from apt repos

Any help will be appreciated

Cheers

Rad

--

James Lam via slurm-users

unread,
May 29, 2024, 9:21:57 PM5/29/24
to slurm...@lists.schedmd.com

1. is your mysql database running?
2. slurm 19.x is far obselete and you should at least use 21.x

aradwen--- via slurm-users

unread,
May 29, 2024, 9:57:52 PM5/29/24
to slurm...@lists.schedmd.com
Yes mysql database is running
I can update and check, but I guess the update will break a couple of config , I need to check if this is something safe to do even though it is for my homelab but still :)

--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com

Ole Holm Nielsen via slurm-users

unread,
May 30, 2024, 1:27:00 AM5/30/24
to slurm...@lists.schedmd.com
This might be the firewall blocking communication to slurmdbd?

You may perhaps find some useful information in this Wiki page:
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/

/Ole

mercan via slurm-users

unread,
May 30, 2024, 2:53:19 AM5/30/24
to Radhouane Aniba, slurm...@lists.schedmd.com

Hi;

Did you check can you connect db with your conf parameters from head-node:

mysql --user=slurm --password=slurmdbpass  slurm_acct_db

Also, check and stop firewall and selinux, if they are running.

Last, you can stop slurmdbd, then run run terminal with:

slurmdbd -D -vvv

Regards;

C. Ahmet Mercan

Radhouane Aniba via slurm-users

unread,
May 30, 2024, 7:50:49 AM5/30/24
to mercan, slurm...@lists.schedmd.com
Thank you Ahmet,
I dont have a firewall active.
And because slurmdbd cannot connect to the database I am not able to getting it to be activated through systemctl I will share the output for slurmdbd -D -vvv shortly but overall it is always saying trying to connect to the db and then retries a couple of times and crashes

R.

mercan via slurm-users

unread,
May 30, 2024, 8:19:58 AM5/30/24
to Radhouane Aniba, slurm...@lists.schedmd.com

Did you try to connect database using mysql command?

mysql --user=slurm --password=slurmdbpass  slurm_acct_db


C. Ahmet Mercan

Radhouane Aniba via slurm-users

unread,
May 30, 2024, 9:55:29 AM5/30/24
to mercan, slurm...@lists.schedmd.com
Yes I can connect to my database using mysql --user=slurm --password=slurmdbpass  slurm_acct_db and there is no firewall blocking mysql after checking the firewall question

ALso here is the output of slurmdbd -D -vvv (note I can only run this as sudo )

sudo slurmdbd -D -vvv
slurmdbd: debug: Log file re-opened
slurmdbd: debug: Munge authentication plugin loaded
slurmdbd: debug2: mysql_connect() called for db slurm_acct_db
slurmdbd: debug2: Attempting to connect to localhost:3306
slurmdbd: debug2: innodb_buffer_pool_size: 134217728
slurmdbd: debug2: innodb_log_file_size: 50331648
slurmdbd: debug2: innodb_lock_wait_timeout: 50
slurmdbd: error: Database settings not recommended values: innodb_buffer_pool_size innodb_lock_wait_timeout
slurmdbd: Accounting storage MYSQL plugin loaded
slurmdbd: debug2: ArchiveDir = /tmp
slurmdbd: debug2: ArchiveScript = (null)
slurmdbd: debug2: AuthAltTypes = (null)
slurmdbd: debug2: AuthInfo = (null)
slurmdbd: debug2: AuthType = auth/munge
slurmdbd: debug2: CommitDelay = 0
slurmdbd: debug2: DbdAddr = localhost
slurmdbd: debug2: DbdBackupHost = (null)
slurmdbd: debug2: DbdHost = hannibal-hn
slurmdbd: debug2: DbdPort = 7032
slurmdbd: debug2: DebugFlags = (null)
slurmdbd: debug2: DebugLevel = 6
slurmdbd: debug2: DebugLevelSyslog = 10
slurmdbd: debug2: DefaultQOS = (null)
slurmdbd: debug2: LogFile = /var/log/slurmdbd.log
slurmdbd: debug2: MessageTimeout = 100
slurmdbd: debug2: Parameters = (null)
slurmdbd: debug2: PidFile = /run/slurmdbd.pid
slurmdbd: debug2: PluginDir = /usr/lib/x86_64-linux-gnu/slurm-wlm
slurmdbd: debug2: PrivateData = none
slurmdbd: debug2: PurgeEventAfter = 1 months*
slurmdbd: debug2: PurgeJobAfter = 12 months*
slurmdbd: debug2: PurgeResvAfter = 1 months*
slurmdbd: debug2: PurgeStepAfter = 1 months
slurmdbd: debug2: PurgeSuspendAfter = 1 months
slurmdbd: debug2: PurgeTXNAfter = 12 months
slurmdbd: debug2: PurgeUsageAfter = 24 months
slurmdbd: debug2: SlurmUser = root(0)
slurmdbd: debug2: StorageBackupHost = (null)
slurmdbd: debug2: StorageHost = localhost
slurmdbd: debug2: StorageLoc = slurm_acct_db
slurmdbd: debug2: StoragePort = 3306
slurmdbd: debug2: StorageType = accounting_storage/mysql
slurmdbd: debug2: StorageUser = slurm
slurmdbd: debug2: TCPTimeout = 2
slurmdbd: debug2: TrackWCKey = 0
slurmdbd: debug2: TrackSlurmctldDown= 0
slurmdbd: debug2: acct_storage_p_get_connection: request new connection 1
slurmdbd: debug2: Attempting to connect to localhost:3306
slurmdbd: slurmdbd version 19.05.5 started
slurmdbd: debug2: running rollup at Thu May 30 13:50:08 2024
slurmdbd: debug2: Everything rolled up


It goes like this for some time and then it crashes with this message

slurmdbd: Terminate signal (SIGINT or SIGTERM) received
slurmdbd: debug: rpc_mgr shutting down


--
Rad Aniba, PhD

mercan via slurm-users

unread,
May 30, 2024, 12:08:04 PM5/30/24
to Radhouane Aniba, slurm...@lists.schedmd.com

You should fix this error, this not a warning. It is an error:

"slurmdbd: error: Database settings not recommended values: innodb_buffer_pool_size innodb_lock_wait_timeout"

error. You can find info at slurm documentation:

https://slurm.schedmd.com/accounting.html#slurm-accounting-configuration-before-build


C. Ahmet Mercan


30.05.2024 16:53 tarihinde Radhouane Aniba via slurm-users yazdı:

Brian Andrus via slurm-users

unread,
May 30, 2024, 12:21:01 PM5/30/24
to slurm...@lists.schedmd.com

That SIGTERM message means something is telling slurmdbd to quit.

Check your cron jobs, maintenance scripts, etc. Slurmdbd is being told to shutdown. If you are running in the foreground, a ^C does that. If you run a kill or killall on it, you will get that same message.

Brian Andrus

Radhouane Aniba via slurm-users

unread,
May 30, 2024, 5:04:25 PM5/30/24
to Brian Andrus, slurm...@lists.schedmd.com
Thank you Ahmet and Brian,

Ahmet, which conf in particular slurmdbd is readiugn from, I parsed all the cnf files for mysql and I cannot find the data it is displaying here

slurmdbd: debug2: Attempting to connect to localhost:3306
slurmdbd: debug2: innodb_buffer_pool_size: 134217728
slurmdbd: debug2: innodb_log_file_size: 50331648
slurmdbd: debug2: innodb_lock_wait_timeout: 50
slurmdbd: error: Database settings not recommended values: innodb_buffer_pool_size innodb_lock_wait_timeout


sudo tree /etc/mysql/*
/etc/mysql/conf.d
├── mysql.cnf
└── mysqldump.cnf
/etc/mysql/debian.cnf
/etc/mysql/debian-start
/etc/mysql/FROZEN
/etc/mysql/mariadb.cnf
/etc/mysql/mariadb.conf.d
├── 50-client.cnf
├── 50-mysql-clients.cnf
├── 50-mysqld_safe.cnf
└── 50-server.cnf
/etc/mysql/my.cnf
/etc/mysql/my.cnf.fallback
/etc/mysql/mysql.cnf
/etc/mysql/mysql.conf.d
├── mysql.cnf
└── mysqld.cnf
--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com


--
Rad Aniba, PhD

Radhouane Aniba via slurm-users

unread,
May 30, 2024, 8:26:25 PM5/30/24
to Brian Andrus, slurm...@lists.schedmd.com
Ok I made some progress here.

I removed and purged slurmdbd mysql mariadb etc .. and started from scratch.
I added the recommended mysqld requirements

Started slurmdbd manually : sudo slurmdbd -D /path/to/conf and everything worked well

When I tried to start the service sudo systemctl start slurmdbd.service  it didnt work

sudo systemctl status  slurmdbd.service
● slurmdbd.service - Slurm DBD accounting daemon
     Loaded: loaded (/etc/systemd/system/slurmdbd.service; enabled; vendor preset: enabled)
     Active: failed (Result: timeout) since Fri 2024-05-31 00:21:30 UTC; 2min 5s ago
    Process: 6258 ExecStart=/usr/sbin/slurmdbd -D /etc/slurm-llnl/slurmdbd.conf (code=exited, status=0/SUCCESS)

May 31 00:20:00 hannibal-hn systemd[1]: Starting Slurm DBD accounting daemon...
May 31 00:21:30 hannibal-hn systemd[1]: slurmdbd.service: start operation timed out. Terminating.
May 31 00:21:30 hannibal-hn systemd[1]: slurmdbd.service: Failed with result 'timeout'.
May 31 00:21:30 hannibal-hn systemd[1]: Failed to start Slurm DBD accounting daemon.

Even though it is the same command ?!

Any idea ?

--
Rad Aniba, PhD

Radhouane Aniba via slurm-users

unread,
May 30, 2024, 11:59:47 PM5/30/24
to Ryan Novosielski, Brian Andrus, Slurm User Community List
manually running it through sudo slurmdbd -D /path/to/conf is very quick on my fresh install

trying to start the slurmdbd through systemctl take 3 minutes and then crashes and fail

Is there an alternative to systemctl to start the slurmdbd in the background ?

But most importantly I wanted to know why it takes so long through systemctl. Maybe I can increase the timeout limit ?

On Thu, May 30, 2024 at 11:54 PM Ryan Novosielski <novo...@rutgers.edu> wrote:
It may take longer to start than systemd allows for. How long does it take to start from the command line? It’s common to need to run it manually for upgrades to complete.

--
#BlackLivesMatter
____
|| \\UTGERS,     |---------------------------*O*---------------------------
||_// the State  |         Ryan Novosielski - novo...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ  | Office of Advanced Research Computing - MSB A555B, Newark
     `'


--
Rad Aniba, PhD

Ryan Novosielski via slurm-users

unread,
May 31, 2024, 12:03:06 AM5/31/24
to Radhouane Aniba, Brian Andrus, Slurm User Community List
Are you looking at the log/what appears on the screen, and do you know for a fact that it is all the way up (should say "version <whatever> started” at the end)?

If that’s not it, you could have a permissions thing or something.

I do not expect you’d need to extend the timeout for a normal run. I suspect it is doing something.

--
#BlackLivesMatter
____
|| \\UTGERS,     |---------------------------*O*---------------------------
||_// the State  |         Ryan Novosielski - novo...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ  | Office of Advanced Research Computing - MSB A555B, Newark
     `'
--
Rad Aniba, PhD


Radhouane Aniba via slurm-users

unread,
May 31, 2024, 12:05:31 AM5/31/24
to Ryan Novosielski, Brian Andrus, Slurm User Community List
Yes when I run it manually it says something like this

[2024-05-31T00:20:01.142] Accounting storage MYSQL plugin loaded
[2024-05-31T00:20:01.146] slurmdbd version 19.05.5 started

But when I try to do it through systemctl

[2024-05-31T00:21:30.953] Terminate signal (SIGINT or SIGTERM) received
[2024-05-31T00:21:30.953] debug:  rpc_mgr shutting down


--
Rad Aniba, PhD

Radhouane Aniba via slurm-users

unread,
May 31, 2024, 12:21:04 AM5/31/24
to Ryan Novosielski, Brian Andrus, Slurm User Community List
I also run both commands using sudo so I am assuming permission should not be the issue ?  my cluster user is root (i know not good, but im testing things out)
--
Rad Aniba, PhD

Benjamin Smith via slurm-users

unread,
May 31, 2024, 4:04:01 AM5/31/24
to slurm...@lists.schedmd.com, ara...@gmail.com

It could be systemd doing that.  Since slurmdbd is being started with -D, I would verify that slurmdbd.service has Type=simple and not Type=forking.   The systemctl status output later in the thread shows systemd starting slurmdbd with -D.

If that's the slurmdbd package from Ubuntu you might find that it's got the opposite config to what you expect.

Ben.

On 30/05/2024 17:18, Brian Andrus via slurm-users wrote:
This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email is genuine and the content is safe.
-- 
Benjamin Smith <bsm...@ed.ac.uk>
Computing Officer, AT-7.12a
Research and Teaching Unit
School of Informatics, University of Edinburgh
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
Reply all
Reply to author
Forward
0 new messages