Hi Dj,
the solution could be in two QOS. We use something similar to restrict
usage of GPU nodes (MaxTresPU=node=2). Examples below are from our
Testcluster.
1) create a QOS with e.g. MaxTresPU=cpu=200 and assign it to your
partition, e.g.
[root@bta0 ~]# sacctmgr -s show qos maxcpu format=Name,MaxTRESPU
Name MaxTRESPU
---------- -------------
maxcpu cpu=10
[root@bta0 ~]#
[root@bta0 ~]# scontrol show part maxtresputest
PartitionName=maxtresputest
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=maxcpu
If a user submits jobs requesting more cpus his (new) jobs get
'QOSMaxCpuPerUserLimit' in squeue.
kxxxxxx@btlogin1% squeue
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
125316 maxtrespu maxsubmi kxxxxxx PD 0:00 1
(QOSMaxCpuPerUserLimit)
125317 maxtrespu maxsubmi kxxxxxx PD 0:00 1
(QOSMaxCpuPerUserLimit)
125305 maxtrespu maxsubmi kxxxxxx R 0:45 1 btc30
125306 maxtrespu maxsubmi kxxxxxx R 0:45 1 btc30
2) create a second QOS with Flags=DenyOnLimit,OverPartQoS and
MaxTresPU=400. Assign it to a user that should overcome the limit of 200
cpus, but he will be limited then to 400. That user has to use this QOS,
when submiting new jobs, e.g.
[root@bta0 ~]# sacctmgr -s show qos overpart format=Name,Flags%30,MaxTRESPU
Name Flags MaxTRESPU
---------- ------------------------------ -------------
overpart DenyOnLimit,OverPartQOS cpu=40
Cheers,
Carsten
--
Carsten Beyer
Abteilung Systeme
Deutsches Klimarechenzentrum GmbH (DKRZ)
Bundesstraße 45a * D-20146 Hamburg * Germany
Phone:
+49 40 460094-221
Fax:
+49 40 460094-270
Email:
be...@dkrz.de
URL:
http://www.dkrz.de
Geschäftsführer: Prof. Dr. Thomas Ludwig
Sitz der Gesellschaft: Hamburg
Amtsgericht Hamburg HRB 39784