[slurm-users] random allocation of resources

420 views
Skip to first unread message

Benjamin Nacar

unread,
Dec 1, 2021, 2:08:04 PM12/1/21
to slurm...@lists.schedmd.com
Hi,

Is there a scheduling option such that, when there are multiple nodes
that are equivalent in terms of available and allocated resources, Slurm
would select randomly from among those nodes?

I've noticed that if no other jobs are running, and I submit a single
job via srun, with no parameters to specify anything other than the
defaults, the job *always* runs on the first node in slurm.conf. This
seems like it would lead to some hosts getting overused and others
getting underused. I'd like the stress on our hardware to be reasonably
evenly distributed.

Thanks,
~~ bnacar

--
Benjamin Nacar
Systems Programmer
Computer Science Department
Brown University
401.863.7621

Guillaume COCHARD

unread,
Dec 1, 2021, 2:18:55 PM12/1/21
to Slurm User Community List
Hello,

I think you are looking for the LLN option (Least Loaded Nodes): https://slurm.schedmd.com/slurm.conf.html#OPT_LLN

Guillaume

----- Mail original -----
De: "Benjamin Nacar" <benjami...@brown.edu>
À: slurm...@lists.schedmd.com
Envoyé: Mercredi 1 Décembre 2021 20:07:23
Objet: [slurm-users] random allocation of resources

mercan

unread,
Dec 1, 2021, 3:06:25 PM12/1/21
to Slurm User Community List, Benjamin Nacar
Hi;

The Slurm is selecting the nodes according to the weight parameter of
the nodes. I don't know any settings to change the way of the selecting
node, except the changing values of the weights. But it is not a
suitable for the randomly selecting nodes.

Fortunately, absolutely there is not any reason that I know to randomly
selecting nodes. The modern hardware is not delicate as you thought.
Nevertheless, if you want to circulate the nodes, you can change weights
of the nodes, for example, at every three months.

Regards,

Ahmet M.



1.12.2021 22:07 tarihinde Benjamin Nacar yazdı:

Benjamin Nacar

unread,
Dec 1, 2021, 3:07:13 PM12/1/21
to Slurm User Community List
Based on some quick experiments, that doesn't do what I'm looking for. I
set LLN=YES for the default partition and ran my test job several times,
waiting each time for it to finish before submitting it again (so that
all compute nodes were idle), and it still ended up on the same (first
in the file) node every time.

(The documentation is ambiguous on this, but my reading of LLN is that
it measures "least loaded" according to how many CPUs Slurm itself has
allocated, not by the actual load average according to "uptime" or some
other reporting tool. Experiments seem to bear this out - I was watching
and comparing the load average on the different available compute nodes
in between running my test jobs.)

~~ bnacar

Brian Andrus

unread,
Dec 1, 2021, 6:28:08 PM12/1/21
to slurm...@lists.schedmd.com
That would make sense, as slurm would not be aware of anything else.
Slurmd does not report any ongoing status of resources. It is slurmctld
that keeps track of what it has allocated.

If you truly want something like this, you could have a wrapper script
look at available nodes, pick a random one and set the job to use that node.

Brian Andrus

Christopher Samuel

unread,
Dec 1, 2021, 6:43:30 PM12/1/21
to slurm...@lists.schedmd.com
On 12/1/21 3:27 pm, Brian Andrus wrote:

> If you truly want something like this, you could have a wrapper script
> look at available nodes, pick a random one and set the job to use that
> node.

Alternatively you could have a cron job that adjusted nodes `weight`
periodically to change which ones Slurm will prefer to use over time
(everything else being equal Slurm picks nodes with the lowest weight).

Hope this helps!
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Reply all
Reply to author
Forward
0 new messages