Hi,
We'd like to have just one of the partitions over subscribe the nodes in
it. The nodes are not shared with any other partitions.
The SLURM documentation (
https://slurm.schedmd.com/cons_res_share.html)
seems to indicate that the least-loaded algorithm is always used when
oversubscribe=force. I believe oversubscribe=force is what we want (but
have it packeach node fully first).
Thanks for pointing out the -m option. Our jobs are separately
sbatched. So, unfortunately, I don't see we can use it in this case.
What we want to be able to do is on, say, a 4 core node run 8 (or 12)
jobs. But only do it for the nodes in this one partition. The other
partitions should continue to run N jobs on an N core node.
Herc
> <html style="direction: ltr;"> <head>
> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
> <style id="bidiui-paragraph-margins" type="text/css">body p { margin-bottom: 0cm; margin-top: 0pt; } </style>
> </head>
> <body bidimailui-charset-is-forced="true" style="direction: ltr;">
> <p>I could be missing something here, but if you refer to the <b>SelectTypeParameters=cr_lln
> </b>you could just try cr_pack_nodes.</p>
> <p><a class="moz-txt-link-freetext" href="
https://slurm.schedmd.com/slurm.conf.html#OPT_CR_Pack_Nodes">
https://slurm.schedmd.com/slurm.conf.html#OPT_CR_Pack_Nodes</a><br>
> </p> <p><br> </p>
> <p>If you want it on a per-partition configuration, I'm not sure
> that's possible, you might need to set a distribution (-m) in your
> job submit script/wrapper (E.g., -m block:*:*,pack)</p>
> <p><a class="moz-txt-link-freetext" href="
https://slurm.schedmd.com/sbatch.html#OPT_distribution">
https://slurm.schedmd.com/sbatch.html#OPT_distribution</a><br>
> </p> <p><br> </p>
> <p>If you're referring to something else entirely, could you
> elaborate on the least-loaded configuration in your setup?</p>
> <p><br> </p> <p><br> <b></b></p>
> <div class="moz-cite-prefix">On 24/02/2022 23:35:30, Herc
> Silverstein wrote:<br> </div> <blockquote type="cite"
> cite="
mid:3145b0e8-6ae0-f233...@schrodinger.com">
> <meta http-equiv="content-type" content="text/html; charset=UTF-8">
> <p>Hi,</p>
> <p>We would like to do over-subscription on a cluster that's
> running in the cloud. The cluster dynamically spins up and down
> cpu nodes as needed. What we see is that the least-loaded
> algorithm causes the maximum number of nodes specified in the
> partition to be spun up and each loaded with N jobs for the N
> cpu's in a node before it "doubles back" and starts
> over-subscribing.</p>
> <p>What we actually want is for the <i>minimum </i>number of
> nodes to be used and for it to fully load (to the limit of the
> oversubscription setting) one node before starting up another.Â
> That is, we really want a "most-loaded" algorithm. This would
> allow us to reduce the number of nodes we need to run and reduce
> costs.</p>
> <p>Is there a way to get this behavior somehow?</p>
> <p>Herc</p> <p><br> </p> <p><br> </p>
> </blockquote> <pre class="moz-signature" cols="72">-- Regards,
> Daniel Letai +972 (0)505 870 456</pre> </body> </html>