[slurm-users] slurmdbd does not work

5,718 views
Skip to first unread message

Giuseppe G. A. Celano

unread,
Dec 2, 2021, 3:41:39 PM12/2/21
to slurm...@lists.schedmd.com
Hi everyone,

I am having trouble getting slurmdbd to work. This is the error I get:

error: Couldn't find the specified plugin name for accounting_storage/mysql looking at all files
error: cannot find accounting_storage plugin for accounting_storage/mysql
error: cannot create accounting_storage context for accounting_storage/mysql
fatal: Unable to initialize accounting_storage/mysql accounting storage plugin

I have installed mysql (apt install mysql) on Ubuntu 20.04.03 and followed the instructions on the slurm website; mysql is running (port 3306) and these are the relevant parts in my .conf files:

slurm.conf

# LOGGING AND ACCOUNTING
AccountingStorageHost=
localhost
AccountingStoragePort=3306
AccountingStorageType=
accounting_storage/slurmdbd
AccountingStorageUser=slurm
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_
gather/linux
SlurmctldDebug=info
SlurmctldLogFile=/var/log/
slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurmd.
log

slurmdbd.conf

AuthType=auth/munge
DbdAddr=localhost
DbdHost=localhost
DbdPort=3306
LogFile=/var/log/slurmdbd.log
PidFile=/var/run/slurmdbd.pid
PluginDir=/usr/lib/slurm
SlurmUser=slurm
StoragePass=password
StorageType=accounting_
storage/mysql
StorageUser=slurm
StorageLoc=slurm_acct_db

I changed the port to 3306 because otherwise slurmdbd could not communicate with mysql. If I run sacct, for example, I get:

sacct: error: _slurm_persist_recv_msg: read of fd 3 failed: No error
sacct: error: _slurm_persist_recv_msg: only read 126 of 2616 bytes
sacct: error: slurm_persist_conn_open: No response to persist_init
sacct: error: Sending PersistInit msg: No error
JobID           JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
sacct: error: _slurm_persist_recv_msg: read of fd 3 failed: No error
sacct: error: _slurm_persist_recv_msg: only read 126 of 2616 bytes
sacct: error: Sending PersistInit msg: No error
sacct: error: DBD_GET_JOBS_COND failure: Unspecified error

Does anyone have a suggestion to solve this problem? Thank you very much.

Best,
Giuseppe

Brian Andrus

unread,
Dec 2, 2021, 4:34:22 PM12/2/21
to slurm...@lists.schedmd.com


Your slurm needs built with the support. If you have mysql-devel installed it should pick it up, otherwise you can specify the location with --with-mysql when you configure/build slurm

Brian Andrus

Giuseppe G. A. Celano

unread,
Dec 3, 2021, 9:43:19 AM12/3/21
to Slurm User Community List
Thanks for the answer, Brian. I now added  --with-mysql_config=/etc/mysql/my.cnf, but the problem is still there and now also slurmctld does not work, with the error:

[2021-12-03T15:36:41.018] accounting_storage/slurmdbd: clusteracct_storage_p_register_ctld: Registering slurmctld at port 6817 with slurmdbd
[2021-12-03T15:36:41.019] error: _conn_readable: persistent connection for fd 9 experienced error[104]: Connection reset by peer
[2021-12-03T15:36:41.019] error: _slurm_persist_recv_msg: only read 150 of 2613 bytes
[2021-12-03T15:36:41.019] error: Sending PersistInit msg: No error
[2021-12-03T15:36:41.020] error: _conn_readable: persistent connection for fd 9 experienced error[104]: Connection reset by peer
[2021-12-03T15:36:41.020] error: _slurm_persist_recv_msg: only read 150 of 2613 bytes
[2021-12-03T15:36:41.020] error: Sending PersistInit msg: No error
[2021-12-03T15:36:41.020] error: _conn_readable: persistent connection for fd 9 experienced error[104]: Connection reset by peer
[2021-12-03T15:36:41.020] error: _slurm_persist_recv_msg: only read 150 of 2613 bytes
[2021-12-03T15:36:41.020] error: Sending PersistInit msg: No error
[2021-12-03T15:36:41.020] error: DBD_GET_TRES failure: No error
[2021-12-03T15:36:41.021] error: _conn_readable: persistent connection for fd 9 experienced error[104]: Connection reset by peer
[2021-12-03T15:36:41.021] error: _slurm_persist_recv_msg: only read 0 of 2613 bytes
[2021-12-03T15:36:41.021] error: Sending PersistInit msg: No error
[2021-12-03T15:36:41.021] error: DBD_GET_QOS failure: No error
[2021-12-03T15:36:41.021] error: _conn_readable: persistent connection for fd 9 experienced error[104]: Connection reset by peer
[2021-12-03T15:36:41.021] error: _slurm_persist_recv_msg: only read 150 of 2613 bytes
[2021-12-03T15:36:41.021] error: Sending PersistInit msg: No error
[2021-12-03T15:36:41.021] error: DBD_GET_USERS failure: No error
[2021-12-03T15:36:41.022] error: _conn_readable: persistent connection for fd 9 experienced error[104]: Connection reset by peer
[2021-12-03T15:36:41.022] error: _slurm_persist_recv_msg: only read 0 of 2613 bytes
[2021-12-03T15:36:41.022] error: Sending PersistInit msg: No error
[2021-12-03T15:36:41.022] error: DBD_GET_ASSOCS failure: No error
[2021-12-03T15:36:41.022] error: _conn_readable: persistent connection for fd 9 experienced error[104]: Connection reset by peer
[2021-12-03T15:36:41.022] error: _slurm_persist_recv_msg: only read 0 of 2613 bytes
[2021-12-03T15:36:41.022] error: Sending PersistInit msg: No error
[2021-12-03T15:36:41.022] error: DBD_GET_RES failure: No error
[2021-12-03T15:36:41.022] fatal: You are running with a database but for some reason we have no TRES from it.  This should only happen if the database is down and you don't have any state files.


Brian Andrus

unread,
Dec 3, 2021, 11:13:42 AM12/3/21
to slurm...@lists.schedmd.com

You will need to also reinstall/restart slurmdbd with the updated binary.

Look in the slurmdbd logs to see what is happening there. I suspect it had errors updating/creating the database and tables. If you have no data in it yet, you can just DROP the database and restart slurmdbd.

Brian Andrus

Giuseppe G. A. Celano

unread,
Dec 3, 2021, 6:08:06 PM12/3/21
to Slurm User Community List
The problem is the lack of /usr/lib/slurm/accounting_storage_mysql.so

I have installed many mariadb-related packages, but that file is not created by slurm after installation: is there a point in the documentation where the installation procedure for the database is made explicit?


Sean Crosby

unread,
Dec 3, 2021, 6:23:09 PM12/3/21
to Slurm User Community List
Did you run

./configure (with any other options you normally use)
make
make install

on your DBD server after you installed the mariadb-devel package?


From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of Giuseppe G. A. Celano <giuseppe...@gmail.com>
Sent: Saturday, 4 December 2021 10:07
To: Slurm User Community List <slurm...@lists.schedmd.com>
Subject: [EXT] Re: [slurm-users] slurmdbd does not work
 
External email: Please exercise caution


Giuseppe G. A. Celano

unread,
Dec 3, 2021, 7:20:54 PM12/3/21
to Slurm User Community List
After installation of libmariadb-dev, I have reinstalled the entire slurm with ./configure + options, make, and make install. Still, accounting_storage_mysql.so is missing.


Brian Andrus

unread,
Dec 3, 2021, 7:33:30 PM12/3/21
to slurm...@lists.schedmd.com

Which version of Mariadb are you using?

Brian Andrus

Giuseppe G. A. Celano

unread,
Dec 3, 2021, 7:40:53 PM12/3/21
to Slurm User Community List
10.4.22

Paul Edmon

unread,
Dec 3, 2021, 8:03:55 PM12/3/21
to slurm...@lists.schedmd.com

I would check that you have MariaDB-shared installed too on the host you build on prior to your build.  The changed the way the packaging is done in MariaDB and Slurm needs to detect the files in MariaDB-shared to actually trigger the configure to build the mysql libs.

-Paul Edmon-

Sean Crosby

unread,
Dec 3, 2021, 8:04:54 PM12/3/21
to Slurm User Community List
Try installing the libmariadb-dev-compat package and trying the configure/make again. It provides "libmysqlclient.so", whereas libmariadb-dev provides "libmariadb.so"

From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of Giuseppe G. A. Celano <giuseppe...@gmail.com>
Sent: Saturday, 4 December 2021 11:40

To: Slurm User Community List <slurm...@lists.schedmd.com>
Subject: Re: [slurm-users] [EXT] Re: slurmdbd does not work
 

Giuseppe G. A. Celano

unread,
Dec 3, 2021, 8:31:12 PM12/3/21
to Slurm User Community List
I have installed almost all of the possible packages, but that file doesn't show up:

libdbd-mariadb-perl/focal,now 1.11-3ubuntu2 amd64 [installed]
libmariadb-dev-compat/unknown,now 1:10.4.22+maria~focal amd64 [installed]
libmariadb-dev/unknown,now 1:10.4.22+maria~focal amd64 [installed]
libmariadb3-compat/unknown,now 1:10.4.22+maria~focal amd64 [installed]
libmariadb3/unknown,now 1:10.4.22+maria~focal amd64 [installed,automatic]
libmariadbclient18/unknown,now 1:10.4.22+maria~focal amd64 [installed]
libmariadbd-dev/unknown,now 1:10.4.22+maria~focal amd64 [installed]
libmariadbd19/unknown,now 1:10.4.22+maria~focal amd64 [installed]
mariadb-client-10.4/unknown,now 1:10.4.22+maria~focal amd64 [installed,automatic]
mariadb-client-core-10.4/unknown,now 1:10.4.22+maria~focal amd64 [installed]
mariadb-client/unknown,unknown,unknown,now 1:10.4.22+maria~focal all [installed]
mariadb-common/unknown,unknown,unknown,now 1:10.4.22+maria~focal all [installed]
mariadb-plugin-connect/unknown,now 1:10.4.22+maria~focal amd64 [installed]
mariadb-server-10.4/unknown,now 1:10.4.22+maria~focal amd64 [installed]
mariadb-server-core-10.4/unknown,now 1:10.4.22+maria~focal amd64 [installed]
mariadb-server/unknown,unknown,unknown,now 1:10.4.22+maria~focal all [installed]
odbc-mariadb/focal,now 3.1.4-1 amd64 [installed]

Gennaro Oliva

unread,
Dec 4, 2021, 6:36:57 AM12/4/21
to Slurm User Community List
Ciao Giuseppe,

On Sat, Dec 04, 2021 at 02:30:40AM +0100, Giuseppe G. A. Celano wrote:
> I have installed almost all of the possible packages, but that file doesn't
> show up:

can you please specify what options are you using with ./configure?

If you don't specify any prefix (--prefix option), the default location
for your installation is /usr/local, so you should find the plugins under
/usr/local/lib/slurm

Did you tried the slurm-wlm package shipped with ubuntu?
It comes with the mysql plugin.
Best regards
--
Gennaro Oliva

Giuseppe G. A. Celano

unread,
Dec 4, 2021, 11:32:31 AM12/4/21
to Slurm User Community List
Hi Gennaro,

That helped: slurm-wlm has accounting_storage_mysql.so, and I moved it to the location requested by the first slurm installation. Everything seems to work, even if I had to change the location of the .conf files, probably because this is required by the new slurm-wlm installation. I am not sure whether I should try to uninstall my previous installation and reinstall slurm-wlm...

Giuseppe G. A. Celano

unread,
Dec 5, 2021, 9:46:56 PM12/5/21
to Slurm User Community List
Hi,

I have reinstalled slurm using the ubuntu package slurm-wlm (and some related ones). After solving some problems with the directories where the pid files are stored (I keep getting the message "Can't open PID file /run/slurm/slurmd.pid (yet?) after start: Operation not permitted", even if the directory has slurm as owner and group). The services slurmdbd, slurmctld, and slurmd work, but I cannot use the commands sinfo, srun, etc.. because I get the errors:

sinfo: symbol lookup error: sinfo: undefined symbol: slurm_conf
srun: symbol lookup error: srun: undefined symbol: xfree_ptr
sacct: symbol lookup error: sacct: undefined symbol: slurm_destroy_selected_step

Does anyone know the reason for that? Thanks.

Best,
Giuseppe

Gennaro Oliva

unread,
Dec 6, 2021, 3:39:39 AM12/6/21
to Slurm User Community List
Ciao Giuseppe,

On Mon, Dec 06, 2021 at 03:46:02AM +0100, Giuseppe G. A. Celano wrote:
> sinfo: symbol lookup error: sinfo: undefined symbol: slurm_conf
> srun: symbol lookup error: srun: undefined symbol: xfree_ptr
> sacct: symbol lookup error: sacct: undefined symbol:
> slurm_destroy_selected_step
>
> Does anyone know the reason for that? Thanks.

please check that you are using the client tools from the slurm package
and not those coming from the source installation. The command:

which srun

should return /usr/bin/srun and not /usr/local/bin/srun

In the latter case remove everyting related to slurm under /usr/local

/usr/local/share/doc/slurm*
/usr/local/sbin/slurm*
/usr/local/lib/libslurm*
/usr/local/lib/slurm
/usr/local/include/slurm

/usr/local/bin/scancel
/usr/local/bin/sprio
/usr/local/bin/sdiag
/usr/local/bin/srun
/usr/local/bin/squeue
/usr/local/bin/sbcast
/usr/local/bin/sview
/usr/local/bin/salloc
/usr/local/bin/scontrol
/usr/local/bin/sreport
/usr/local/bin/sbatch
/usr/local/bin/strigger
/usr/local/bin/sacctmgr
/usr/local/bin/sacct
/usr/local/bin/sattach
/usr/local/bin/scrontab
/usr/local/bin/sh5util
/usr/local/bin/sstat
/usr/local/bin/sinfo
/usr/local/bin/sshare

Look also for files under:

/usr/local/share/man/

Best regards,
--
Gennaro Oliva

Giuseppe G. A. Celano

unread,
Dec 6, 2021, 5:52:26 AM12/6/21
to Slurm User Community List
Grazie Gennaro,

It's working!



Reply all
Reply to author
Forward
0 new messages