[slurm-users] SLURM in K8s, any advice?

Viessmann Hans-Nikolai (PSI)

unread,

Nov 14, 2022, 3:43:17 AM11/14/22

to slurm...@lists.schedmd.com, Germann Elsa Sylvia (PSI)

Good Morning,

I'm working on a project at work to run SLURM cluster management components
(slurmctld and slurmdbd) as K8s pods, which manage a cluster of physical compute
nodes. I've come upon a few discussions of doing this (or more generally running
SLURM in containers); I especially found this one
(see https://groups.google.com/g/slurm-users/c/uevFWPHHr2U/m/fkwusc0JDwAJ)
very helpful.

Are there any further details or advice anyone has on such a setup?

Thank you and kind regards,
Hans

---------------------------------------------------------------------------------------------
Paul Scherrer Institut
Hans-Nikolai Viessmann
High Performance Computing & Emerging Technologies
Building/Room: OHSA/D02
Forschungsstrasse 111
5232 Villigen PSI
Switzerland

Telephone: +41 56 310 41 24
E-Mail: hans-nikola...@psi.ch
GPG: 46F7 826E 80E1 EE45 2DCA 1BFC A39B E4B6 EA0C E4C4

Nicolas Greneche

unread,

Nov 14, 2022, 7:00:52 AM11/14/22

to slurm...@lists.schedmd.com, christop...@univ-paris13.fr

Hi Hans,

I work on this topic, this is my PhD subject. Here are some links :

Positioning paper, begining of my thesis :

https://www.computer.org/csdl/proceedings-article/sbac-pad/2020/992400a281/1o8qfAgSll6

More up to date, a study on containerization of 3 majors HPC schedulers
(including SLURM) :

https://link.springer.com/chapter/10.1007/978-3-031-12597-3_13

And even more up to date (in fact, i'm presenting the paper at 9:15 AM
at SC'22 today), an application to an autoscaling containerized OAR
batch scheduler in the cloud (but il should easily be extended to SLURM) :

https://sites.google.com/view/supercompcloud

To cut a long story short. You will have to create a Pod containing
slurmctld and munge. Optionally you may have a Pod mysql and slurmdbd.

Then you have Pods containing slurmd and munge container. My advice is
to use the configless mode for slurmd. It avoid to distribute and
synchronize the slurm.conf. The drawback is that you have to configure
munge. But it's a good trade-off, the munge key is stable during the
time, the slurm.conf can change (so you have to restribute it).

Beware of the fact that slurmctld must be restarted for major topology
modification (it should have been fixed in more recent release). My
advice is to run slurmctld as a son of a supervise process (djb
daemontools will be perfect for this https://cr.yp.to/daemontools.html).
This way you can restart slurmctld without losing jobs state.

Be extra careful with network name resolution synchronisation. There is
a delta between the pod creation and its resolvability. You may use
initcontainers to wait for resultion be OK before starting the main
container of the pod.

Feel free to reach me if you want go furether in details !

Best Regards,

Urban Borštnik

unread,

Nov 16, 2022, 10:46:30 AM11/16/22

to slurm...@lists.schedmd.com

Hi Hans,

We run Slurm in k8s at the ETH Zurich to manage physical compute nodes. The link you include and Nicolas's followup already contain the basics.

We build several Docker containers based on CentOS 7 (for now) with Slurm compiled from source for the following services:

slurmdbd
slurmctld
slurmd (used for testing as “containerized nodes”)

All these containers include an sssd daemon that interfaces with the central LDAP though we are looking at ways to streamline this part.

We use several helper containers, such as mariadb, a prometheus exporter, a file server for the code and configuration (used to transfer these to the physical nodes), and a controller that configures users, accounts, QOS, … into Slurm.

PVCs hosted on an NFS appliance provide data persistence.

A Helm chart is used to for deploying to a local test k8s instance, a test/staging cluster, and the production cluster. The chart and containers are site specific but I am happy to share the relevant code & config with you if you contact me by PM.

With kind regards,

Urban

-- 
ETH Zurich, Dr. Urban Borštnik
High Performance Computing, Scientific IT Services
OCT G35, Binzmühlestrasse 130, 8092 Zurich, Switzerland
Phone +41 44 632 3512, http://www.id.ethz.ch/
urban.b...@id.ethz.ch

Viessmann Hans-Nikolai (PSI)

unread,

Nov 17, 2022, 7:33:12 AM11/17/22

to slurm...@lists.schedmd.com

Hi Nicolas and Urban,

Thank you for your replies!

Kind regards,
Hans

________________________________________
From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of Urban Borštnik <urban.b...@id.ethz.ch>
Sent: 16 November 2022 16:45
To: slurm...@lists.schedmd.com
Subject: Re: [slurm-users] SLURM in K8s, any advice?

Hi Hans,

We run Slurm in k8s at the ETH Zurich to manage physical compute nodes. The link you include and Nicolas's followup already contain the basics.

We build several Docker containers based on CentOS 7 (for now) with Slurm compiled from source for the following services:

* slurmdbd
* slurmctld
* slurmd (used for testing as “containerized nodes”)

E-Mail: hans-nikola...@psi.ch<mailto:hans-nikola...@psi.ch>

GPG: 46F7 826E 80E1 EE45 2DCA 1BFC A39B E4B6 EA0C E4C4

--
ETH Zurich, Dr. Urban Borštnik
High Performance Computing, Scientific IT Services
OCT G35, Binzmühlestrasse 130, 8092 Zurich, Switzerland
Phone +41 44 632 3512, http://www.id.ethz.ch/

urban.b...@id.ethz.ch<mailto:urban.b...@id.ethz.ch>

Reply all

Reply to author

Forward