[slurm-users] User association with partition and Qos

2,318 views
Skip to first unread message

Amjad Syed

unread,
Aug 27, 2021, 6:29:10 AM8/27/21
to slurm...@schedmd.com
Hello all

We are having an issue understanding user association and partition.

Currently we have a partition with 30 GPU cards .

We have defined a qos gpu-rtx that allows user to reserve 2 cards 

sacctmgr show qos gpu-rtx format=MaxTRESPU%60

                                                   MaxTRESPU 

       ----------------------------------------------------- 

                                           cpu=96,gres/gpu=2  




We have defined a user test that is assoc with this qos


sacctmgr show assoc user=test format=user,qos


Qos

gpu-rtx



Now we define another qos  gpu-rtx-reserved  that allows gpu=8


sacctmgr show qos gpu-rtx-reserved format=MaxTRESPU%60

                                                   MaxTRESPU 

       ----------------------------------------------------- 

                                           cpu=192,gres/gpu=8 

User test is not associated with gpu-rtx-reserved qos. So he should not be able to use more then gpu=2 .
Both of these qos are now in slurm.conf for the partition

parrtitionName=gpu-rtx6000-2 State=UP Nodes=g[15-29] MaxNodes=9 MaxTime=168:00:00 DefMemPerCPU=3996 AllowQos=gpu-rtx,gpu-rtx-reserved



But we found out that even though user is not assoc with gpu-rtx-reserved if the user uses gpu-rtx-reserved  in his slurm script , he can reserve 8 gpu cards 


So our question is , can the users assoc with one partition qos can use the other qos in the partition  even if they are not associated with it . or in other words , we can only define one partition qos and not more then one.?


Hope i was able to explain ?


Any advice if we want partition to use more then one qos with different limits and users associated with one qos should not use other qos ?


Majid




Sean Crosby

unread,
Aug 27, 2021, 7:54:37 AM8/27/21
to Slurm User Community List
Hi Amjad,

Make sure you have qos in the config entry AccountingStorageEnforce

e.g.

AccountingStorageEnforce=associations,limits,qos,safe

Sean


From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of Amjad Syed <amja...@gmail.com>
Sent: Friday, 27 August 2021 20:28
To: slurm...@schedmd.com <slurm...@schedmd.com>
Subject: [EXT] [slurm-users] User association with partition and Qos
 
External email: Please exercise caution


Amjad Syed

unread,
Aug 27, 2021, 10:32:50 AM8/27/21
to Slurm User Community List
Hi Sean,

Thanks for the suggestion, seems to work now. 

Majid

Amjad Syed

unread,
Aug 31, 2021, 3:04:00 AM8/31/21
to Slurm User Community List
Hello me again

Just found out that when our slurmctld restarts all qos are gone. 

I mean users who have association with the qos can not submit job with sbatch, they get error as

sbatch: error: Batch job submission failed: Invalid qos specification


Do we need to make anymore changes in slurm.conf so that qos becomes permanent ?

Amjad

Sean Crosby

unread,
Aug 31, 2021, 3:20:33 AM8/31/21
to Slurm User Community List
What does sacctmgr show for the user you added to have access to the QoS, and what does Slurm show for the partition config?

sacctmgr show account withassoc -p
scontrol show part gpu-rtx6000-2

Sean

From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of Amjad Syed <amja...@gmail.com>
Sent: Tuesday, 31 August 2021 17:03
To: Slurm User Community List <slurm...@lists.schedmd.com>
Subject: Re: [slurm-users] [EXT] User association with partition and Qos
 

Amjad Syed

unread,
Aug 31, 2021, 3:47:12 AM8/31/21
to Slurm User Community List
Hi Sean

Here is the output for gpu-rtx-reserved qos

sacctmgr show account withassoc -p | grep gpu-rtx-reserved


default|default|default|uea_cluster||cjr13geu|1|||||||||||||||gpu,gpu-k40-1,gpu-rtx,gpu-rtx-reserved,hmem,ht,uea_def_qos|





sontrol show part gpu-rtx6000-2

PartitionName=gpu-rtx6000-2

   AllowGroups=ALL AllowAccounts=ALL AllowQos=gpu-rtx,gpu-rtx-reserved,jakeuea

   AllocNodes=ALL Default=NO QoS=N/A

   DefaultTime=1-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO

   MaxNodes=9 MaxTime=7-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED

   Nodes=g[15-29]

   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO

   OverTimeLimit=NONE PreemptMode=GANG,SUSPEND

   State=UP TotalCPUs=720 TotalNodes=15 SelectTypeParameters=NONE

   JobDefaults=(null)

   DefMemPerCPU=3996 MaxMemPerNode=UNLIMITED




On a different note we have the following in  slurm.conf


AccountingStorageUser=slurm


But we have been adding qos and assigning users as root ? Can this be an issue




Amjad

Sean Crosby

unread,
Aug 31, 2021, 4:36:21 AM8/31/21
to Slurm User Community List
Hi Amjad,

AccountingStorageUser is the user used to connect to the accounting database. If you have it defined in slurm.conf, it is ignored.

From the output you showed, it says the user cjr13geu in the cluster uea_cluster has access to the QoS.

How are you adding the QoS to other users? The way you would do it would be

sacctmgr modify account <accountname> user=<username> set qos+=gpu-rtx-reserved

or

sacctmgr modify account <accountname> set qos+=gpu-rtx-reserved

if you want to give it to every user in <accountname>

Sean

From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of Amjad Syed <amja...@gmail.com>
Sent: Tuesday, 31 August 2021 17:46

Amjad Syed

unread,
Aug 31, 2021, 5:18:32 AM8/31/21
to Slurm User Community List
Hi Sean

We have been adding by using the following command

sacctmgr modify user set qos+=gpu-rtx-reserved

We have a single account that is associated with all our users and root account for admin



Is that the issue, we need to associate user with account?

Amjad Syed

unread,
Aug 31, 2021, 6:04:45 AM8/31/21
to Slurm User Community List
Just a correction

We use 
sacctmgr modify user=<username> set qos+=gpu-rtx6000-2

Amjad
Reply all
Reply to author
Forward
0 new messages