[slurm-users] Assistance with Node Restrictions and Priority for Users in Floating Partition

97 views
Skip to first unread message

Manisha Yadav via slurm-users

unread,
Jan 27, 2025, 8:33:36 AMJan 27
to slurm...@lists.schedmd.com, Pankaj Dorlikar
Dear Team,

I have a scenario where I need to provide priority access to multiple users from different projects for only 3 nodes. This means that, at any given time, only 3 nodes can be used in that partition, and if one user is utilizing all 3 nodes, no other user should be able to submit jobs to that partition, or their jobs should remain in the queue.

To achieve this, I attempted to use QoS by creating a floating partition with some of the nodes and configuring a QoS with priority. I also set a limit with GrpTRES=gres/gpu=24, given that each node has 8 GPUs, and there are 3 nodes in total. I then attached the QoS to the partition and assigned it to the users who need access.
I Also tried MaxTRES=gres/gpu=24

While this setup works as expected in the testing environment for CPUs, it is not functioning as intended in production, and it is not effectively restricting node usage in the partition.
Could anyone provide suggestions or guidance on how to properly implement node restrictions along with priority?

Thank you for your assistance.

Best regards,
Manisha Yadav

------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
------------------------------------------------------------------------------------------------------------

Bjørn-Helge Mevik via slurm-users

unread,
Jan 28, 2025, 2:16:17 AMJan 28
to slurm...@schedmd.com
Manisha Yadav via slurm-users <slurm...@lists.schedmd.com> writes:

> To achieve this, I attempted to use QoS by creating a floating
> partition with some of the nodes and configuring a QoS with
> priority. I also set a limit with GrpTRES=gres/gpu=24, given that each
> node has 8 GPUs, and there are 3 nodes in total.

If there are more nodes with GPUs, this will not prevent these users
from getting GPUs on more than 3 nodes, it will only prevent them from
getting more than 24 GPUs. It will not prevent them from running
cpu-only jobs on other nodes either.. I think using
GrpTRES=gres/gpu=24,node=3 (or perhaps simply GrpTRES=node=3) should
work.

--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo

signature.asc

Manisha Yadav via slurm-users

unread,
Jan 31, 2025, 5:34:06 AMJan 31
to Bjørn-Helge Mevik, slurm...@schedmd.com
Hii,

Thanks for your valuable reply! Based on your input, I made the following changes to the system configuration:
Created a new QoS:
Priority: 200
Restriction: 3 nodes, 24 GPUs

Here are the commands I used:
sacctmgr add qos test
sacctmgr modify qos test set priority=200
sacctmgr modify qos test set GrpTRES=cpu=24
sacctmgr modify qos test set GrpTRES=gres/gpu=24,node=3

Attached the QoS to users from different groups as their default QoS.

Created a floating partition with all the nodes from the default partition and attached the same QoS to this partition. The configuration is as follows:

PartitionName=testingp MaxTime=7-0:00:00 DefaultTime=01:00:00 AllowQos=test State=UP Nodes=node1,node2,node3,node4,node5,node5 DefCpuPerGPU=16 MaxCPUsPerNode=192

However, when the users submit their jobs to the testingp partition, they are not receiving the expected priority. Their jobs are stuck in the queue and are not being allocated resources, while users without any priority are able to get resources on the default partition.

Could you please confirm if my setup is correct, or if any modifications are required on my end?
My slurm version is slurm 21.08.6

--
Regards,
Manisha Yadav
> --
> slurm-users mailing list -- slurm...@lists.schedmd.com
> To unsubscribe send an email to slurm-us...@lists.schedmd.com

------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
------------------------------------------------------------------------------------------------------------

--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com

Bjørn-Helge Mevik via slurm-users

unread,
Feb 3, 2025, 3:16:44 AMFeb 3
to slurm...@schedmd.com
Manisha Yadav <mani...@cdac.in> writes:

> Could you please confirm if my setup is correct, or if any modifications are required on my end?

I don't see anything wrong with the part of the setup that you've shown.

Have you checked with `sprio -l -j <jobids>` whether the jobs get the
extra qos priority? If not, perhaps the multifactor priority plugin
isn't in use, or the qos weight is zero. See, e.g,
https://slurm.schedmd.com/qos.html

> My slurm version is slurm 21.08.6

Oh, that is old. I'd seriously consider upgrading. For instance, this
is too old to get security patches.

--
B/H
signature.asc

Manisha Yadav via slurm-users

unread,
Mar 12, 2025, 5:53:26 AMMar 12
to Bjørn-Helge Mevik, slurm...@schedmd.com
Hi,

I am encountering the following error while configuring the burst buffer:

root@test-vm1:/opt/slurm-21.08.8/etc# /opt/slurm-21.08.8/sbin/slurmctld -D
slurmctld: Job accounting information stored, but details not gathered
slurmctld: slurmctld version 21.08.8 started on cluster sddg-cluster
slurmctld: accounting_storage/slurmdbd: clusteracct_storage_p_register_ctld: Registering slurmctld at port 6817 with slurmdbd
slurmctld: No memory enforcing mechanism configured.
slurmctld: Recovered state of 2 nodes
slurmctld: Recovered information about 0 jobs
slurmctld: select/cons_tres: part_data_create_array: select/cons_tres: preparing for 1 partitions
slurmctld: error: Couldn't find the specified plugin name for burst_buffer/lua looking at all files
slurmctld: error: cannot find burst_buffer plugin for burst_buffer/lua
slurmctld: error: cannot create burst_buffer context for burst_buffer/lua
slurmctld: Recovered state of 0 reservations
slurmctld: read_slurm_conf: backup_controller not specified
slurmctld: select/cons_tres: select_p_reconfigure: select/cons_tres: reconfigure
slurmctld: select/cons_tres: part_data_create_array: select/cons_tres: preparing for 1 partitions
slurmctld: Running as primary controller
slurmctld: error: Couldn't find the specified plugin name for burst_buffer/lua looking at all files
slurmctld: error: cannot find burst_buffer plugin for burst_buffer/lua
slurmctld: error: cannot create burst_buffer context for burst_buffer/lua
slurmctld: fatal: failed to initialize burst buffer plugin


Here are the details related to the burst buffer configuration:

root@test-vm1:/opt/slurm-21.08.8/etc# pwd
/opt/slurm-21.08.8/etc

root@test-vm1:/opt/slurm-21.08.8/etc# ls burst_buffer.lua
burst_buffer.lua

root@test-vm1:/opt/slurm-21.08.8/etc# cat slurm.conf | grep BurstBufferType=burst_buffer/lua
BurstBufferType=burst_buffer/lua


Could you please advise on how to resolve this issue?

Thank you!

Laura Hild via slurm-users

unread,
Mar 12, 2025, 2:15:34 PMMar 12
to Manisha Yadav, Bjørn-Helge Mevik, slurm...@schedmd.com
Hi Manisha. Does your Slurm build/installation have burst_buffer_lua.so?

Manisha Yadav via slurm-users

unread,
Aug 28, 2025, 1:58:03 AMAug 28
to Bjørn-Helge Mevik, Bjørn-Helge Mevik via slurm-users
Hii Bjørn-Helge,

I would like to know if it is possible to configure Slurm job notifications via SMS.

___
Regards,
Manisha Yadav

------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
------------------------------------------------------------------------------------------------------------

Benson Muite via slurm-users

unread,
Aug 28, 2025, 2:11:08 AMAug 28
to Manisha Yadav, Bjørn-Helge Mevik, Bjørn-Helge Mevik via slurm-users


On Thu, Aug 28, 2025, at 8:54 AM, Manisha Yadav via slurm-users wrote:
> Hii Bjørn-Helge,
>
> I would like to know if it is possible to configure Slurm job
> notifications via SMS.

It may be easier to integrate an app, for example using XMPP or other
chat protocols/services with efficient mobile applications.
SMS integrations are also possible, though you would likely want
to do this for a specific bulk SMS provider that has good rates and
reliable delivery for the users concerned.

>
> ___
> Regards,
> Manisha Yadav

Bjørn-Helge Mevik via slurm-users

unread,
Aug 28, 2025, 2:59:26 AMAug 28
to slurm...@schedmd.com
Manisha Yadav via slurm-users <slurm...@lists.schedmd.com> writes:

> Hii Bjørn-Helge,
>
> I would like to know if it is possible to configure Slurm job notifications via SMS.

It should be possible. Slurm itself only calls a program/script
(MailProg in slurm.conf, default "/bin/mail") when it is supposed to
send mail, and you can write your own script that does whatever you
want. For instance, on two of our clusters, we have written a script
that contacts a locally developed ReST API for sending messages to
users.

--
B/H
signature.asc
Reply all
Reply to author
Forward
0 new messages