[slurm-users] priority access and QoS

86 views
Skip to first unread message

Marko Markoc

unread,
Feb 22, 2023, 4:52:34 PM2/22/23
to slurm...@lists.schedmd.com
Hi All,

Currently in our environment we only have default one "free" tier of access to our resources and we are looking to add additional higher priority tier access. That means that the jobs from the users that "purchased" a certain amount of service units will preempt jobs of the users in the free tier. I was thinking of using slurm QoS to achieve this by adding users/groups via sacctmgr to this newly created QoS tier but I wanted to check with all of you if there is a better way to accomplish this through slurm. Also, could GrpTRESMins be used to automatically keep track of SU usage by a certain user or group or is there some better usage tracking mechanism ?  

Thank You all,
Marko

Styrk, Daryl

unread,
Feb 27, 2023, 2:07:08 PM2/27/23
to Slurm User Community List

Marko,

 

I’m in a similar situation. We have many Accounts with dedicated hardware and recently ran into a situation where a user with dedicated submitted hundreds of jobs and they overflowed into the community hardware which caused an unexpected backlog. I believe QoS will help us with that as well. I’ve been researching and reading about best practices.

 

Regards,

Daryl

 

From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of Marko Markoc <mma...@pdx.edu>
Date: Wednesday, February 22, 2023 at 1:56 PM
To: slurm...@lists.schedmd.com <slurm...@lists.schedmd.com>
Subject: [slurm-users] priority access and QoS

Hi All, Currently in our environment we only have default one "free" tier of access to our resources and we are looking to add additional higher priority tier access. That means that the jobs from the users that "purchased"

ZjQcmQRYFpfptBannerStart

This Message Is From an External Sender

This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

Jason Simms

unread,
Feb 27, 2023, 2:28:59 PM2/27/23
to Slurm User Community List
Hello all,

I haven't found any guidance that seems to be the current "better practice," but this does seem to be a common use case. I imagine there are multiple ways to accomplish this goal. For example, you could assuredly do it with QoS, but you can likely also accomplish this with some other weighting scheme based on, e.g., account. At my last position, I accomplished this by having a partition containing the purchased nodes that permitted a specific account only, which also had a PriorityTier setting, and ensuring the cluster was configured to preempt based on a partition's priority setting. So, even if the same nodes were in a different partition, if a user in the account requested resources, it would preempt (if needed) jobs from users not in that account. These are sample configuration lines to illustrate (obviously simplified):

PreemptType=preempt/partition_prio
PreemptMode=REQUEUE

PartitionName=node PriorityTier=50 Nodes=node[01-06]
PartitionName=smithlab AllowAccounts=smithlab PriorityTier=100 Nodes=node06

I never heard from a user that this failed to preempt when necessary, so I presume it works as advertised (in this case, if a user from smithlab ran a job on node06, it would preempt non-smithlab users if the requested resources were unavailable). Note that the user needs to specify the smithlab account in, e.g., the batch submission file or on the command line, especially if they have a non-smithlab account with the same username.

If someone can recommend why this approach isn't advisable, or if there is a preferred approach, I would welcome feedback.

Warmest regards,
Jason

--
Jason L. Simms, Ph.D., M.P.H.
Manager of Research Computing
Swarthmore College
Information Technology Services
Schedule a meeting: https://calendly.com/jlsimms
Reply all
Reply to author
Forward
0 new messages