[slurm-users] how to set slurmdbd.conf if using two slurmdb node with HA database?

175 views
Skip to first unread message

hermes via slurm-users

unread,
Feb 18, 2025, 5:02:34 AM2/18/25
to slurm...@lists.schedmd.com

The deployment scenario is as follows:

 

nodeA                             nodeB

(slurmctld)                        (backup slurmctld)

    | \-------------------------------/ |

    | /                               \ |

nodeC                             nodeD

(slurmdbd)                      (backup slurmdbd)

(mysql)   <--multi master replica-->  (mysql)

 

Since the database is multi-master replicated, the slurmdbd should only talk to the mysql on its own node.

 

In such case, how should we set the slurmdbd.conf? The conf file contains options “DbdAddr”, “DbdHost” and “DbdBackupHost”.

Should they be consistent between nodeA-2 and nodeB-2? Such as:

DbdAddr = nodeC              | DbdAddr = nodeC

DbdHost = nodeC              | DbdHost = nodeC

DbdBackupHost = nodeD        | DbdBackupHost = nodeD

StorageHost = nodeC           | StorageHost = nodeD

 

Or maybe just set different conf and don’t use the “DbdBackupHost” like:

DbdAddr = nodeC             | DbdAddr = nodeD

DbdHost = nodeC             | DbdHost = nodeD

StorageHost = nodeC          | StorageHost = nodeD

 

I’m quite confused about the usage of DbdAddr and DbdHost. What is the difference between them and why only DbdHost has the backup one?

 

Another confusing point is how DbdBackupHost work. I guess It is slurmctld that is responsible for selecting the available slurmdbd. Since the slurm.conf already contains “AccountingStorageHost” and “AccountingStorageBackupHost”, why we need set backupdbd again on slurmdbd side?

 

Daniel Letai via slurm-users

unread,
Feb 19, 2025, 5:23:51 AM2/19/25
to slurm...@lists.schedmd.com

I'm not sure it will work, didn't test it, but could you just do `dbdhost=localhost` to solve this?

hermes via slurm-users

unread,
Feb 19, 2025, 8:59:02 PM2/19/25
to Daniel Letai, slurm...@lists.schedmd.com

Do you mean the second configuration scheme?

I think configuring `dbdhost=localhost` is the same as configuring ` DbdAddr =nodeC` and ` DbdAddr =nodeD` on the two nodes respectively.

The key point is whether we should set the DbdBackupHost option and how it work?

 

 

发件人: Daniel Letai <da...@letai.org.il>
发送时间: 2025219 18:21
收件人: slurm...@lists.schedmd.com
主题: [slurm-users] Re: how to set slurmdbd.conf if using two slurmdb node with HA database?

Daniel Letai via slurm-users

unread,
Feb 20, 2025, 8:58:06 AM2/20/25
to taleint...@sjtu.edu.cn, slurm...@lists.schedmd.com

It's functionally the same with one difference - the configuration file is unmodified between nodes, allowing for simple deployment of nodes, and automation.


Regarding the backuphost - that depends on your setup. If you can ensure the slurmdbd service will stop if the local db replica is not healthy, you shouldn't need backuphost. Conversely, if there is no health check to ensure replica readiness, configure the backuphost. This will require using a different conf file for each node, unless setting up a more robust HA clustering scheme.


The other option is to separate the dbd from the db. Put the dbd on the ctld nodes (A,B) and let nodes C,D only be DB master replica (not dbd).


In slurm.conf on nodes A,B You will then have:

AccountingStorageHost = localhost

(without AccountingStorageBackupHost)


And in slurmdbd.conf you will have:

DbdHost = localhost

(without DbdBackupHost)

StorageHost = nodeC

StorageBackupHost = nodeD


This would mean identical slurm.conf and slurmdbd.conf on both nodes A,B, and no slurm conf files or processes on nodes C,D.


This setup assumes that the entire stack (ctld+dbd) is either working or not, which is usually true, as either the node is functioning or not. If the ctld is working but dbd is not, you will loose connection to the DB. If the ctld is not working, the other ctld will take charge and use its local dbd, so that scenario is covered.

Adding AccountingStorageBackupHost pointing to the other node is of course possible, but will mean different slurm.conf files which slurm will complain about.


It will mean that most of the time you will not load balance on the multi-master DB replicas. Whether that is a consideration or not is for you to decide.

hermes via slurm-users

unread,
Feb 20, 2025, 9:48:09 PM2/20/25
to Daniel Letai, slurm...@lists.schedmd.com

Thank you for your insightful suggestions. Placing both slurmdbd and slurmctld on the same node is indeed a new structure  that we hadn’t considered before, and it seems to provide a much clearer logic for deployment.

 

Regarding the usage of DbdBackupHost, I would like to confirm my understanding of how it works. Is it mean that the DbdBackupHost option will only be referenced when slurmdbd service detects its local database (specified by StorageHost) is unavailable? And I guess in that case, the first slurmdbd service would act as a proxy who forwards requests to the DbdBackupHost and returns the data from there to slurmctld?

 

 

发件人: Daniel Letai <da...@letai.org.il>
发送时间: 2025220 21:56
收件人: taleint...@sjtu.edu.cn
抄送: slurm...@lists.schedmd.com
主题: Re: [slurm-users] Re: how to set slurmdbd.conf if using two slurmdb node with HA database?

Brian Andrus via slurm-users

unread,
Feb 21, 2025, 12:11:39 AM2/21/25
to slurm...@lists.schedmd.com

Daniel,

One way to set up a true HA is to configure master-master SQL instances on both head nodes. Then have each slurmdbd point to the other SQL instance as the backup host.

This is likely not necessary as all data going to slurmdbd is cached if slurmdbd is unavailable. In the real world, this generally gives ample time to recover without issue.

Brian Andrus

Daniel Letai via slurm-users

unread,
Feb 21, 2025, 1:06:15 AM2/21/25
to taleint...@sjtu.edu.cn, slurm...@lists.schedmd.com

There are 2 backuphosts configurations.


DbdBackupHost is used if the slurmdbd service is unavailable (timeout). In that case the slurmctld will try to connect to the slurmdbd on another node.

StorageBackupHost, on the other hand, is what you describe - timeout when connecting to a DB replica will make slurmdbd switch to using the other replica.

Daniel Letai via slurm-users

unread,
Feb 21, 2025, 1:12:14 AM2/21/25
to slurm...@lists.schedmd.com

Agreed


And slurmdbd also caches if the DB is down, if I remember correctly.

hermes via slurm-users

unread,
Feb 21, 2025, 1:44:30 AM2/21/25
to Daniel Letai, slurm...@lists.schedmd.com

But there is even 3rd pair backup option if we count the slurm.conf : AccountingStorageHost and AccountingStorageBackupHost. I think this is what slurmctld referred to when it finds primary slurmdbd unavailable.

Will slurmctld also go to read the slurmdbd.conf? It seem to be the dedicated configuration file for slurmdbd. So I think the DbdBackupHost should not influence the slurmctld’s behaviour. Otherwise the usage of AccountingStorageBackupHost and DbdBackupHost will be totally duplicated.

 

张天阳

网络信息中心 计算业务部

 

发件人: Daniel Letai <da...@letai.org.il>
发送时间: 2025221 14:04
收件人: taleint...@sjtu.edu.cn
抄送: slurm...@lists.schedmd.com
主题: Re: 回复: [slurm-users] Re: how to set slurmdbd.conf if using two slurmdb node with HA database?

Daniel Letai via slurm-users

unread,
Feb 21, 2025, 7:35:46 AM2/21/25
to taleint...@sjtu.edu.cn, slurm...@lists.schedmd.com

Looking at the code, it would seem the DbdBackupHost (in slurmdbd.conf) is used to determine whether or not to run in standby mode.

https://github.com/SchedMD/slurm/blob/ea17bbffc381deae54e126b227d5290bf9525326/src/slurmdbd/slurmdbd.c#L296-L314

https://github.com/SchedMD/slurm/blob/ea17bbffc381deae54e126b227d5290bf9525326/src/slurmdbd/backup.c#L54-L55


Going back to your initial question - you can be consistent across all nodes and use the first option (explicitly set nodeC as primary and nodeD as backup on all four nodes). This might result in slurmdbd using a 'remote' replica if the local DB is not responding, rather than switching to the other node and using the dbd+DB on the same backup node. It's a matter of performance vs. robustness.


In short:


slurm.conf->AccountingStorageHost is used by slurmctld to connect to a dbd process on timeout to primary dbd

slurmdbd.conf->DbdBackupHost is used by slurmdbd process to determine if it should initially run in backup mode

slurmdbd.conf->StorageBackupHost is used by slurmdbd to point to a backup DB replica on timeout to primary DB


In terms of performance, the "worst" issue is if primary slurmdbd is down : slurmctld is connecting to a remote slurmdbd, which again connects to a remote DB.

The entire discussion using localhost was to avoid this scenario, by ensuring there is at most only a single remote.


Consider your initial configuration (A,B ctld), (C,D dbd+DB)

If nodeC only has slurmdbd down, then nodeA would switch to using nodeD for slurmdbd, but nodeD would connect to nodeC for DB.

Whether this is an issue in your use case is up to you, but this would be the most robust configuration.

hermes via slurm-users

unread,
Feb 23, 2025, 8:36:28 PM2/23/25
to Daniel Letai, slurm...@lists.schedmd.com

Thank you so much for your thorough explanations and insightful suggestions.

After the  discussion, I now have a clearer understanding of the differences between these backup configuration options.

 

 

发件人: Daniel Letai <da...@letai.org.il>
发送时间: 2025221 20:34
收件人: taleint...@sjtu.edu.cn
抄送: slurm...@lists.schedmd.com
主题: Re: 回复: 回复: [slurm-users] Re: how to set slurmdbd.conf if using two slurmdb node with HA database?

Kevin Buckley via slurm-users

unread,
Feb 25, 2025, 3:16:24 AM2/25/25
to slurm...@lists.schedmd.com, Daniel Letai, taleint...@sjtu.edu.cn
On 2025/02/20 21:55, Daniel Letai via slurm-users wrote:
> ...
>
> Adding AccountingStorageBackupHost pointing to the other node is of course
> possible, but will mean different slurm.conf files which slurm will complain
> about.

Just thought to note that, in general, it is useful to be aware
that one way to avoid Slurm complaining about per-host differences
is to have your slurm.conf Include a file, containing the different
per-host settings,


So, you have a line

Include /etc/slurm/slurm-acct_strge_backup_host.conf

in the slurm.conf on both hosts,


but have different file content, in this case the address in
the one line

AccountingStorageBackupHost=IP.AD.RE.SS

in the included file on each of the two hosts.


The SlurmCtld won't complain about that, but the SlurmDs will run
against a different config on each of the nodes.


Background:

Older Crays used to have some Slurm infrastructure running on a node,
"inside the Cray box", that was accessed via different IP addresses,
depending on whether you were a compute node, so "in-the-box" or an
"eLogin" node, so "out-of-the-box" and that was how we overcame that.

We use the same construct now (on newer HPE/Crays) for Account Gathering
where not all node hardware supports it, and so we can include

AcctGatherEnergyType=acct_gather_energy/none

or

AcctGatherEnergyType=acct_gather_energy/pm_counters

depending on the node.


Same slurm.conf: no complaining from the SlurmCtld.




--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com

Daniel Letai via slurm-users

unread,
Feb 27, 2025, 12:17:35 PM2/27/25
to Kevin Buckley, slurm...@lists.schedmd.com, taleint...@sjtu.edu.cn

Nice!

Did not know that, good to know, thanks.

Reply all
Reply to author
Forward
0 new messages