[slurm-users] Ideal NFS exported StateSaveLocation size.

530 views
Skip to first unread message

Richard Chang

unread,
Oct 24, 2022, 12:13:19 AM10/24/22
to slurm...@lists.schedmd.com
Hi,

Is there a thumb rule for the size of the directory that is NFS
exported, and to be used as StateSaveLocation.

I have a two node Slurmctld setup and both will mount an NFS exported
directory as the state save location.

Let me know your thoughts.

Thanks & regards,

RC




Greg Wickham

unread,
Oct 24, 2022, 1:14:09 AM10/24/22
to Slurm User Community List

Hi Richard,

 

We have just over 400 nodes and the StateSaveLocation directory has ~600MB of data.

 

The share for SlurmdSpoolDir is about 17GB used across the nodes, but this also includes logs for each node (without log files it’s < 1GB).

 

   -Greg

Ole Holm Nielsen

unread,
Oct 24, 2022, 3:32:40 AM10/24/22
to slurm...@lists.schedmd.com
On 10/24/22 06:12, Richard Chang wrote:
> Is there a thumb rule for the size of the directory that is NFS exported,
> and to be used as StateSaveLocation.
>
> I have a two node Slurmctld setup and both will mount an NFS exported
> directory as the state save location.

It is definitely a BAD idea to store Slurm StateSaveLocation on a slow NFS
directory! SchedMD recommends to use local NVME or SSD disks because
there will be many IOPS to this file system!

I recommend you to read "Field Notes 6: From The Frontlines of Slurm
Support", Jason Booth, SchedMD available from
https://slurm.schedmd.com/publications.html. Read the Hardware pages
18-20 which recommend:

Fast path to the StateSaveLocation
■ IOPS this filesystem can sustain is a major bottleneck to job throughput
● At least 2 directories and two files created per job
● The corresponding unlink() calls will add to the load

/Ole

Ward Poelmans

unread,
Oct 24, 2022, 3:57:57 AM10/24/22
to slurm...@lists.schedmd.com
So what is the recommended way if you want to have HA with slurmctld?

Ward

Diego Zuccato

unread,
Oct 24, 2022, 3:58:13 AM10/24/22
to slurm...@lists.schedmd.com
Il 24/10/2022 09:32, Ole Holm Nielsen ha scritto:

> It is definitely a BAD idea to store Slurm StateSaveLocation on a slow
> NFS directory! SchedMD recommends to use local NVME or SSD disks
> because there will be many IOPS to this file system!

IIUC it does have to be shared between controllers, right?

Possibly use NVME-backed (or even better NVDIMM-backed) NFS share. Or
replica-3 Gluster volume with NVDIMMs for the bricks, for the paranoid :)

Diego

--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

Ole Holm Nielsen

unread,
Oct 24, 2022, 4:14:35 AM10/24/22
to slurm...@lists.schedmd.com
On 10/24/22 09:57, Diego Zuccato wrote:
> Il 24/10/2022 09:32, Ole Holm Nielsen ha scritto:
>
> > It is definitely a BAD idea to store Slurm StateSaveLocation on a slow
> > NFS directory!  SchedMD recommends to use local NVME or SSD disks
> > because there will be many IOPS to this file system!
>
> IIUC it does have to be shared between controllers, right?
>
> Possibly use NVME-backed (or even better NVDIMM-backed) NFS share. Or
> replica-3 Gluster volume with NVDIMMs for the bricks, for the paranoid  :)

IOPS is the key parameter! Local NVME or SSD should beat any networked
storage. The original question refers to having StateSaveLocation on a
standard (slow) NFS drive, AFAICT.

I don't know how many people prefer using 2 slurmctld hosts (primary and
backup)? I certainly don't do that. Slurm does have a configurable
SlurmctldTimeout parameter so that you can reboot the server quickly when
needed.

It would be nice if people with experience in HA storage for slurmctld
could comment.

/Ole

Paul Edmon

unread,
Oct 24, 2022, 9:37:56 AM10/24/22
to slurm...@lists.schedmd.com
HA for slurmctld is not multidatacenter HA but rather a traditional HA
setup where you have two server heads off of one storage brick
(connected by SAS cables or other fast interconnect).  Multidatacenter
HA has issues with keeping things in sync due to latency and IOPs (as
noted below).

So the HA setup for slurmctld will protect you from the server hosting
the slurmctld getting hosed, not the entire rack going down or the
datacenter going down.

-Paul Edmon-

Brian Andrus

unread,
Oct 24, 2022, 10:21:21 AM10/24/22
to slurm...@lists.schedmd.com

FWIW, I have used NFS/Gluster/Luster for a SaveStateLocation at various
times on various clusters.

I have never had an issue with any of them and run clusters in size up
to 1000+ nodes. I have even used the same share to symlink all the
nodes' slurm.conf with no issue.

Of course, YMMV, but if you aren't having excessive traffic to the
share, you should be good. I have yet to discover what would be
excessive enough to impact things.

The only use I have had for the HA is being able to keep the cluster
running/happy during maintenance.

Brian Andrus
Reply all
Reply to author
Forward
0 new messages