[slurm-users] Guarantee minimum amount of GPU resources to a Slurm account

384 views
Skip to first unread message

Stephan Roth

unread,
Sep 12, 2023, 10:15:13 AM9/12/23
to slurm...@lists.schedmd.com
Dear Slurm users,

I'm looking to fulfill the requirement of guaranteeing availability of
GPU resources to a Slurm account, while allowing this account to use
other available GPU resources as well.

The guaranteed GPU resources should be of at least 1 type, optionally up
to 3 types, as in:
Gres=gpu:type_1:N,gpu:type_2:P,gpu:type_3:Q

The version of Slurm I'm using is 20.11.9.


Ideas I came up with so far:

Placing a reservation seems like the simplest solution. But this forces
users of the account to decide whether to submit their jobs within the
reservation or outside, based on a manual check of currently available
GPU resources in the cluster.

Changing the partition setup by moving nodes into a new partition for
exclusive use of the account is an overhead I'd like to avoid, as this
is a time-limited scenario.
Even though this looks like a working solution when combined with an
extension to the job_submit.lua prioritizing partitions for users of
said account.


I haven't looked at QOS, yet, hoping for a short-cut from anyone who
already has a working solution to my problem.

If you have such a solution, would you mind sharing it?

Thanks,
Stephan

Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)

unread,
Sep 12, 2023, 10:28:36 AM9/12/23
to Slurm User Community List
Is this what you want?
Magnetic Reservations 

The default behavior for reservations is that jobs must request a reservation in order to run in it. The MAGNETIC flag allows you to create a reservation that will allow jobs to run in it without requiring that they specify the name of the reservation. The reservation will only "attract" jobs that meet the access control requirements.

(from https://slurm.schedmd.com/reservations.html)

On Sep 12, 2023, at 10:14 AM, Stephan Roth <stepha...@ee.ethz.ch> wrote:

Dear Slurm users,

I'm looking to fulfill the requirement of guaranteeing availability of GPU resources to a Slurm account, while allowing this account to use other available GPU resources as well.


 

   
   
   
   
U.S. NAVAL
   
   
RESEARCH
   
LABORATORY
Noam Bernstein, Ph.D.
Center for Materials Physics and Technology
U.S. Naval Research Laboratory
T +1 202 404 8628 F +1 202 404 7546
https://www.nrl.navy.mil


Stephan Roth

unread,
Sep 12, 2023, 11:23:36 AM9/12/23
to slurm...@lists.schedmd.com
Thanks Noam, this looks promising!

I'll have to test whether a job allowed to use such a reservation will
run outside of it in case the reservation's resources are all occupied
or queue up waiting to run in the reservation.


On 12.09.23 16:28, Bernstein, Noam CIV USN NRL (6393) Washington DC
(USA) wrote:
> Is this what you want?
>
> Magnetic Reservations
>
> The default behavior for reservations is that jobs must request a
> reservation in order to run in it. The MAGNETIC flag allows you to
> create a reservation that will allow jobs to run in it without
> requiring that they specify the name of the reservation. The
> reservation will only "attract" jobs that meet the access control
> requirements.
>
>
> (from https://slurm.schedmd.com/reservations.html
> <https://slurm.schedmd.com/reservations.html>)
>
>> On Sep 12, 2023, at 10:14 AM, Stephan Roth <stepha...@ee.ethz.ch

Chris Samuel

unread,
Sep 12, 2023, 8:25:12 PM9/12/23
to slurm...@lists.schedmd.com
On 12/9/23 9:22 am, Stephan Roth wrote:

> Thanks Noam, this looks promising!

I would suggest that as was as the "magnetic" flag you may want the
"flex" flag on the reservation too in order to let jobs that match it
run on GPUs outside of the reservation.

All the best,
Chris

Stephan Roth

unread,
Sep 13, 2023, 2:37:51 AM9/13/23
to slurm...@lists.schedmd.com
Thanks Chris, this completes what I was looking for.

Should have had a better look at the scontrol man page.

Best,
Stephan

Markus Kötter

unread,
Sep 13, 2023, 3:09:18 AM9/13/23
to slurm...@lists.schedmd.com
Hi,


currently reservations do not work for gres.

https://bugs.schedmd.com/show_bug.cgi?id=5771

23.11 might change this.


MfG
--
Markus Kötter, +49 681 870832434
30159 Hannover, Lange Laube 6
Helmholtz Center for Information Security
OpenPGP_0x4571F02A83828A0F.asc
OpenPGP_signature

Stephan Roth

unread,
Sep 13, 2023, 4:11:37 AM9/13/23
to slurm...@lists.schedmd.com
Markus, thanks for the heads-up.

I intend to either reserve specific nodes with GPUs or use features.

Best,
Stephan

Loris Bennett

unread,
Sep 13, 2023, 4:40:39 AM9/13/23
to Slurm User Community List
We have a dedicated partition for some GPUs nodes which belong to an
individual PI and only members of the PI's group can use the partition.
However, the nodes are also members of a 'scavenger' partition, which
can be used by anyone, albeit with certain restrictions, such as a
shorter maximum run-time.

What are the pros and cons of the reservation approach compared with the
above partition-based approach?

Cheers,

Loris

--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin

Reply all
Reply to author
Forward
0 new messages