Hi Matthew,
we use QOS for this and add it to the SLURM user, who needs to exceed
the partition time limit. You can also set a time limit in the QOS, so
that a user cannot exceed the 'limits' too much. Example from our system
with 8 hour runlimit per job:
# grep -i qos slurm.conf
PriorityWeightQOS=1000
AccountingStorageEnforce=limits,qos
#
# sacctmgr -s show qos ch0636 format=Name,Flags%32,MaxWall
Name Flags MaxWall
---------- -------------------------------- -----------
ch0636 DenyOnLimit,PartitionTimeLimit 12:00:00
#
# scontrol show part compute
PartitionName=compute
AllowGroups=ALL DenyAccounts=bmx825,mh1010 AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0
Hidden=NO
MaxNodes=512 MaxTime=08:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=m[10000-11420,11440-11545,11553-11577]
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO
OverSubscribe=EXCLUSIVE
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=74496 TotalNodes=1552 SelectTypeParameters=NONE
DefMemPerCPU=1280 MaxMemPerCPU=5360
With the flag PartitionTimeLimit a user can override the MaxTime of a
partition and with DenyOnLimit you can limit the maxwall of the job. The
QOS is added to the user like this:
# sacctmgr -s show user foo format=User,Account,MaxJobs,QOS%30
User Account MaxJobs QOS
---------- ---------- ------- ------------------------------
foo ch0636 20 ch0636,express,normal
foo noaccount 0 normal
foo mh0469 20 express,normal
#
The user also needs to add the QOS with '#SBATCH --qos=ch0636' in his
job description.
Cheers,
Carsten
--
Carsten Beyer
Abteilung Systeme
Deutsches Klimarechenzentrum GmbH (DKRZ)
Bundesstraße 45a * D-20146 Hamburg * Germany
Phone:
+49 40 460094-221
Fax:
+49 40 460094-270
Email:
be...@dkrz.de
URL:
http://www.dkrz.de
Geschäftsführer: Prof. Dr. Thomas Ludwig
Sitz der Gesellschaft: Hamburg
Amtsgericht Hamburg HRB 39784