[slurm-users] spreading jobs out across the cluster

121 views
Skip to first unread message

Stephen Berg, Code 7309

unread,
Jun 14, 2023, 6:50:20 AM6/14/23
to Slurm User Community List
I'm currently testing a new slurm setup before converting an existing
pbs/torque grid over.  Right now I've got 8 nodes in one partition, 48
cores on each.  There's a second partition of older systems configured
as 4 core nodes so the users can run some serial jobs.

During some testing I've noticed that jobs always seem to take the nodes
in a top down fashion.  If I queue up a bunch of 3 node jobs they take
nodes 1, 2 and 3 for one job, 4,5 and 6 for another. Nodes 7 and 8 never
get used.  I'd like to have slurm spread the jobs out across the nodes
in a round robin fashion or even randomly.  My config is really basic
right now, I'm using defaults for most everything.

Which settings could get the jobs spread out across the nodes in each
partition a bit more fairly?

--
Stephen Berg, IT Specialist, Ocean Sciences Division, Code 7309
Naval Research Laboratory
W: (228) 688-5738
DSN: (312) 823-5738
C: (228) 365-0162
Email: stephe...@nrlssc.navy.mil <- (Preferred contact)
Flank Speed: stephen.p...@us.navy.mil

Loris Bennett

unread,
Jun 14, 2023, 7:45:01 AM6/14/23
to Slurm User Community List
Hi Stephen,

"Stephen Berg, Code 7309" <stephe...@nrlssc.navy.mil> writes:

> I'm currently testing a new slurm setup before converting an existing
> pbs/torque grid over.  Right now I've got 8 nodes in one partition, 48
> cores on each.  There's a second partition of older systems configured
> as 4 core nodes so the users can run some serial jobs.
>
> During some testing I've noticed that jobs always seem to take the
> nodes in a top down fashion.  If I queue up a bunch of 3 node jobs
> they take nodes 1, 2 and 3 for one job, 4,5 and 6 for another. Nodes 7
> and 8 never get used.  I'd like to have slurm spread the jobs out
> across the nodes in a round robin fashion or even randomly.  My config
> is really basic right now, I'm using defaults for most everything.
>
> Which settings could get the jobs spread out across the nodes in each
> partition a bit more fairly?

You can set

LLN

for "least loaded nodes" in the configuration of the partition (see 'man
slurm.conf')

However, this is often not what you want. If you maximise the number
of nodes in use, you won't be able to save energy by powering down nodes
which are not required. What is your use-case for wanting to spread the
jobs out?

Cheers,

Loris

--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin

Reply all
Reply to author
Forward
0 new messages