[slurm-users] QOS time limit tighter than partition limit

414 views
Skip to first unread message

Ross Dickson

unread,
Dec 16, 2021, 5:59:09 PM12/16/21
to slurm...@schedmd.com
It would like to impose a time limit stricter than the partition limit on a certain subset of users.  I should be able to do this with a QOS, but I can't get it to work.  What am I missing?

At https://slurm.schedmd.com/resource_limits.html it says,
"Slurm's hierarchical limits are enforced in the following order ...:

1. Partition QOS limit
2. Job QOS limit
3. User association
4. Account association(s), ascending the hierarchy
5. Root/Cluster association
6. Partition limit
7. None

Note: If limits are defined at multiple points in this hierarchy, the point in this list where the limit is first defined will be used."  

And there's a little more later about the Partition limit being an upper bound on everything.

This says to me that if:
* there is a large time limit on a partition,
* there is a smaller time limit on the job QOS, and
* the partition has no associated QOS,
then the MaxWall on the Job QOS should have effect.  

But that's not what I observe.  I've created a QOS 'nonpaying' with MaxWall=1-0:0:0, and set MaxTime=7-0:0:0 on partition 'general'.  I set the association on  user1 so that their job will get QOS 'nonpaying', then submit a job with --time=7-0:0:0, and it runs:

$ scontrol show partition general | egrep 'QoS|MaxTime'
   AllocNodes=ALL Default=YES QoS=N/A
   MaxNodes=UNLIMITED MaxTime=7-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
$ sacctmgr show qos nonpaying format=name,flags,maxwall
      Name                Flags     MaxWall
---------- -------------------- -----------
 nonpaying                       1-00:00:00
$ scontrol show job 33 | egrep 'QOS|JobState|TimeLimit'
   Priority=4294901728 Nice=0 Account=acad1 QOS=nonpaying
   JobState=RUNNING Reason=None Dependency=(null)
   RunTime=00:00:40 TimeLimit=7-00:00:00 TimeMin=N/A
$ scontrol show config | grep AccountingStorageEnforce
AccountingStorageEnforce = associations,limits,qos

Help!?

--
Ross Dickson, Computational Research Consultant
ACENET  --   Compute Canada  --  Dalhousie University

Fulcomer, Samuel

unread,
Dec 16, 2021, 6:13:34 PM12/16/21
to Slurm User Community List
I've not parsed your message very far, but...

for i in `cat limit_users` ; do 

sacctmgr where user=$i partition=foo account=bar set grptresrunmins=cpu=Nlimit

Fulcomer, Samuel

unread,
Dec 16, 2021, 6:15:31 PM12/16/21
to Slurm User Community List
...and you shouldn't be able to do this with a QoS (I think as you want it to), as "grptresrunmins" applies to the aggregate of everything using the QoS.

Ross Dickson

unread,
Dec 17, 2021, 4:08:56 PM12/17/21
to Slurm User Community List
Thanks for the suggestions, Samuel.  It turns out the root of the problem was elsewhere:  Although I had updated slurm.conf with 'AccountingStorageEnforce = associations,limits,qos' and 'scontrol show config' said the same, I had neglected to restart slurmctld, so it *wasn't* actually in effect.  If you're listening, SchedMD, that is IMO a bug with 'scontrol show config'.  But also, silly me for not reading the docs and the log files better.

Cheers all!
Ross



On Thu, Dec 16, 2021 at 6:01 PM Ross Dickson <ross.d...@ace-net.ca> wrote:
It would like to impose a time limit stricter than the partition limit on a certain subset of users.  I should be able to do this with a QOS, but I can't get it to work.  ...
Reply all
Reply to author
Forward
0 new messages