Hi all!
we have a master node and 2 compute nodes in a configuration with Slurm. We have configured the Accounting system with a MariaDB database, for some unknown reason, the Accounting data is not saved in the DB, we believe it may be due to the following error, which is shown in the file /var/log/slurmctld.log:
...............
...............
[2021-03-12T12:22:24.044] error: slurmdbd: DBD_ID_RC is -1
We currently have these parameters configured:
[root@ohpc-master-1 slurm]# sacctmgr show configuration
Configuration data as of 2021-03-12T11:34:55
AccountingStorageBackupHost = (null)
AccountingStorageHost = localhost
AccountingStorageLoc = N/A
AccountingStoragePass = (null)
AccountingStoragePort = 6819
AccountingStorageType = accounting_storage/slurmdbd
AccountingStorageUser = N/A
AuthType = auth/munge
MessageTimeout = 10 sec
PluginDir = /usr/lib64/slurm
PrivateData = none
SlurmUserId = slurm(202)
SLURM_CONF = /etc/slurm/slurm.conf
SLURM_VERSION = 18.08.8
TCPTimeout = 2 sec
TrackWCKey = 0
SlurmDBD configuration:
ArchiveDir = /tmp
ArchiveEvents = No
ArchiveJobs = No
ArchiveResvs = No
ArchiveScript = (null)
ArchiveSteps = No
ArchiveSuspend = No
ArchiveTXN = No
ArchiveUsage = No
AuthInfo = (null)
AuthType = auth/munge
BOOT_TIME = 2021-03-12T11:04:17
CommitDelay = No
DbdAddr = localhost
DbdBackupHost = (null)
DbdHost = localhost
DbdPort = 6819
DebugFlags = (null)
DebugLevel = verbose
DebugLevelSyslog = unknown
DefaultQOS = (null)
LogFile = /var/log/slurm/slurmdbd.log
MaxQueryTimeRange = UNLIMITED
MessageTimeout = 10 secs
Parameters = (null)
PidFile = /var/run/slurmdbd.pid
PluginDir = /usr/lib64/slurm
PrivateData = none
PurgeEventAfter = NONE
PurgeJobAfter = NONE
PurgeResvAfter = NONE
PurgeStepAfter = NONE
PurgeSuspendAfter = NONE
PurgeTXNAfter = NONE
PurgeUsageAfter = NONE
SLURMDBD_CONF = /etc/slurm/slurmdbd.conf
SLURMDBD_VERSION = 18.08.8
SlurmUser = slurm(202)
StorageBackupHost = (null)StorageHost = localhost
StorageLoc = slurm_acct_db
StoragePort = 3306
StorageType = accounting_storage/mysql
StorageUser = slurm
TCPTimeout = 2 secs
TrackWCKey = No
TrackSlurmctldDown = No
With these associations:
[juanignacio_sanchezmorales@ohpc-master-1 ~]$ sacctmgr show associations
Cluster Account User Partition Share GrpJobs GrpTRES GrpSubmit GrpWall GrpTRESMins MaxJobs MaxTRES MaxTRESPerNode MaxSubmit MaxWall MaxTRESMins QOS Def QOS GrpTRESRunMin
---------- ---------- ---------- ---------- --------- ------- ------------- --------- ----------- ------------- ------- ------------- -------------- --------- ----------- ------------- -------------------- --------- -------------
clusterhpc root 1 normal
clusterhpc root root 1 normal
clusterhpc general 1 normal
clusterhpc general juanignac+ 1 normal
And users:
[juanignacio_sanchezmorales@ohpc-master-1 ~]$ sacctmgr list user
User Def Acct Admin
---------- ---------- ---------
juanignac+ None
root Administ+
Aparentemente está todo configurado y funcionando, pero por alguna razón no se recopilan las estadísticas del Accounting, sabe alguien qué puede estar pasando?
Thanks!
Kind regards.
Juan Ignacio.
--
--
Juan Ignacio Sánchez Morales.
IT solutions & HPC System Administrator
"