[slurm-users] Slurmdbd High Availability

1,171 views
Skip to first unread message

Shaghuf Rahman

unread,
Apr 13, 2023, 5:49:46 AM4/13/23
to Slurm User Community List
Hi,

I am setting up Slurmdb in my system and I need some inputs

My current setup is like
server1 : 192.168.123.12(slurmctld)
server2: 192.168.123.13(Slurmctld)
server3: 192.168.123.14(Slurmdbd) which is pointing to both Server1 and Server2.
database: MySQL

I have 1 more server named as server 4: 192.168.123.15 which I need to make it as a secondary database server. I want to configure this server4 which will sync the database and make it either Active-Active slurmdbd or Active-Passive.

Could anyone please help me with the steps how to configure and also how am i going to sync my database on both the servers simultaneously.

Thanks & Regards,
Shaghuf Rahman

Ole Holm Nielsen

unread,
Apr 13, 2023, 7:17:12 AM4/13/23
to slurm...@lists.schedmd.com
On 4/13/23 11:49, Shaghuf Rahman wrote:
> I am setting up Slurmdb in my system and I need some inputs
>
> My current setup is like
> server1 : 192.168.123.12(slurmctld)
> server2: 192.168.123.13(Slurmctld)
> server3: 192.168.123.14(Slurmdbd) which is pointing to both Server1 and
> Server2.
> database: MySQL
>
> I have 1 more server named as server 4: 192.168.123.15 which I need to
> make it as a secondary database server. I want to configure this server4
> which will sync the database and make it either Active-Active slurmdbd or
> Active-Passive.
>
> Could anyone please help me with the *steps* how to configure and also how
> am i going to *sync* my *database* on both the servers simultaneously.

Slurm administrators have different opinions about the usefulness versus
complexity of HA setups. You could read SchedMD's presentation from page
38 and onwards: https://slurm.schedmd.com/SLUG19/Field_Notes_3.pdf

Some noteworthy slides state:

> Separating slurmctld and slurmdbd in normal production use
> is recommended.
> Master/backup slurmctld is common, and - as long as the
> performance for StateSaveLocation is kept high - not that
> difficult to implement.

> For slurmdbd, the critical element in the failure domain is
> MySQL, not slurmdbd. slurmdbd itself is stateless.

> IMNSHO, the additional complexity of a redundant MySQL
> deployment is more likely to cause an outage than it is to
> prevent one.
> So don’t bother setting up a redundant slurmdbd, keep
> slurmdbd + MySQL local to a single server.

I hope this helps.

/Ole

Brian Andrus

unread,
Apr 13, 2023, 11:03:46 AM4/13/23
to slurm...@lists.schedmd.com

I think you mean both slurmctld servers are pointing the one slurmdbd server.

Ole is right about the usefulness of HA, especially on slurmdbd, as slurm will cache the writes to the database if it is down.

To do what you want, you need to look at configuring your database to be HA. That is a different topic and would be dictated by what database setup you are using. Understand the the backend database is a tool used by slurm and not part of slurm. So any HA in that are needs to be done by the database.

Once that is done, merely have 2 separate slurmdbd servers, each pointing at the HA database. One would be primary and the other a failover (AccountingStorageBackupHost). Although, technically, they would both be able to be active at the same time.

Brian Andrus

Tina Friedrich

unread,
Apr 14, 2023, 7:19:25 AM4/14/23
to slurm...@lists.schedmd.com
Or run your database server on something like VMWare ESXi (which is what
we do). Instant HA and I don't even need multiple servers for it :)

I don't mean to be flippant, and I realise it's not addressing the mysql
HA question (but that got answered). However, a lot of us will have some
sort of failure-and-load-balancing VM estate anyway, or not? Using that
does - at least in my mind - solve the same problem (just via a slightly
different route).

Other than that I'd agree that HA solutions - of the pacemaker &
mirrored block devices sort - tend to make things less reliable instead
of more.

Tina
>> Could anyone please help me with the *steps* how to configure and also
>> how am i going to *sync* my *database* on both the servers simultaneously.

Daniel Letai

unread,
Apr 15, 2023, 4:49:12 PM4/15/23
to slurm...@lists.schedmd.com

My go to solution is setting up Galera cluster using 2 slurmdbd servers (each pointing to it's local db) and a 3rd quorum server. It's fairly easy to setup and doesn't rely on block level duplication, HA semantics or shared storage.


Just my 2 cents

Shaghuf Rahman

unread,
Apr 17, 2023, 3:54:51 AM4/17/23
to slurm...@lists.schedmd.com
Hi,

Thanks everyone who shared the information with me.
Really appreciate it.

Thanks,
Shaghuf Rahman

Xaver Stiensmeier

unread,
Apr 17, 2023, 5:12:45 AM4/17/23
to slurm...@lists.schedmd.com
Dear slurm-users list,

is it possible to somehow have two default partitions? In the best case
in a way that slurm schedules to partition1 on default and only to
partition2 when partition1 can't handle the job right now.

Best regards,
Xaver Stiensmeier


Xaver Stiensmeier

unread,
Apr 17, 2023, 5:31:18 AM4/17/23
to slurm...@lists.schedmd.com

I found a solution that works for me, but it doesn't really answer the question:

It's the option https://slurm.schedmd.com/slurm.conf.html#OPT_all_partitions for JobSubmitPlugins. It works for me, because all partitions are default in my case, but it doesn't really answer my question as my question asks how to have multiple default partitions which could include having others that are not default.

Best regards,
Xaver Stiensmeier

Xaver Stiensmeier

unread,
Apr 17, 2023, 5:37:12 AM4/17/23
to slurm...@lists.schedmd.com
Dear slurm-users list,

let's say I want to submit a large batch job that should run on 8 nodes.
I have two partitions, each holding 4 nodes. Slurm will now tell me that
"Requested node configuration is not available". However, my desired
output would be that slurm makes use of both partitions and allocates
all 8 nodes.

Best regards,
Xaver Stiensmeier


Ozeryan, Vladimir

unread,
Apr 17, 2023, 5:44:20 AM4/17/23
to Slurm User Community List
You should be able to specify both partitions in your sbatch submission script, unless there is some other configuration preventing this.

-----Original Message-----
From: slurm-users <slurm-use...@lists.schedmd.com> On Behalf Of Xaver Stiensmeier
Sent: Monday, April 17, 2023 5:37 AM
To: slurm...@lists.schedmd.com
Subject: [EXT] [slurm-users] Submit sbatch to multiple partitions

APL external email warning: Verify sender slurm-use...@lists.schedmd.com before clicking links or attachments 

Ole Holm Nielsen

unread,
Apr 17, 2023, 5:56:20 AM4/17/23
to slurm...@lists.schedmd.com
On 4/17/23 11:36, Xaver Stiensmeier wrote:
> let's say I want to submit a large batch job that should run on 8 nodes.
> I have two partitions, each holding 4 nodes. Slurm will now tell me that
> "Requested node configuration is not available". However, my desired
> output would be that slurm makes use of both partitions and allocates
> all 8 nodes.

A compute node can be a member of multiple partitions, this is how you can
handle your case.

Suppose you have 4 nodes in part1 and 4 nodes in part2. Then you can
create a new partition "partbig" which contains all 8 nodes.

You may want to configure restrictions on "partbig" if you don't want
every user to submit to it, or configure a lower maximum time for jobs.

I hope this helps,
Ole

Bjørn-Helge Mevik

unread,
Apr 17, 2023, 6:24:52 AM4/17/23
to slurm...@schedmd.com
"Ozeryan, Vladimir" <Vladimir...@jhuapl.edu> writes:

> You should be able to specify both partitions in your sbatch submission script, unless there is some other configuration preventing this.

But Slurm will still only run the job in *one* of the partitions - it
will never "pool" two partitions and let the job run on all nodes. All
nodes of a job must belong to the same partition. (Another thing I
found out recently is that if you specify multiple partitions for an
array job, then all array subjobs will run in the same partition.)

As Ole suggests: creating a "super partition" containing all nodes will
work.

--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo

signature.asc

Diego Zuccato

unread,
Apr 17, 2023, 9:29:02 AM4/17/23
to slurm...@lists.schedmd.com
I used to set
SBATCH_PARTITION=list,of,partitions
in /etc/environment.
But it seems to override user choice, so users won't be able to specify
a partition for their jobs :(

Diego
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

Ward Poelmans

unread,
Apr 17, 2023, 10:26:45 AM4/17/23
to slurm...@lists.schedmd.com
Hi Xaver,

On 17/04/2023 11:36, Xaver Stiensmeier wrote:

> let's say I want to submit a large batch job that should run on 8 nodes.
> I have two partitions, each holding 4 nodes. Slurm will now tell me that
> "Requested node configuration is not available". However, my desired
> output would be that slurm makes use of both partitions and allocates
> all 8 nodes.


It depends on what you mean here. If you want a single jobs to be able to use all nodes in both partitions, have a look at HET jobs: https://slurm.schedmd.com/heterogeneous_jobs.html

If you want to let a job be able to start in multiple partitions, this can be done by specifying a list of partitions on job submission. We let the lua job plugin script fill in a list of partitions if the user didn't specify one themselves. These partitions have different priority so we can redirect jobs to the 'optimal' partition if they resources are available.



if job_desc.partition == nil and job_desc.clusters ~= nil then
job_desc.partition = "partition1,partition2"
end



Ward

Shaghuf Rahman

unread,
May 17, 2023, 10:24:39 AM5/17/23
to Slurm User Community List
Thanks ole for your input.
 
I'm looking for the best fit solution so have a quick question related to slurmctld backup as well.

I tested the read write speed on our NAS storage and local HDD, turns out the speed on local HDD is much higher than NAS storage. The r/w speed on NAS Storage is 250mb/s and on local HDD it's about 800-900mb/s.

1. I have a Storage NAS flashbox with r/w speed around 300-400 MB/s so wanted to know if this will suffice the requirement for setting up the slurmctld backup.Are there going to be any issue or impact?
2. Is it fine to implement it on NAS Storage?
3. What will be the prerequisite of setting up the slurmctld backup?

Looking forward to hearing from you,

Thanks,
Shaghuf Rahman
Reply all
Reply to author
Forward
0 new messages