[slurm-users] What is the 'Root/Cluster association' level in Resource Limits document mean?

224 views
Skip to first unread message

taleint...@sjtu.edu.cn

unread,
Feb 8, 2022, 2:31:13 AM2/8/22
to slurm...@lists.schedmd.com

Hi all,

 

According to Resource Limits page ( https://slurm.schedmd.com/resource_limits.html ), there is Root/Cluster association level under account level to provide default limitation. But how to check or modify this “cluster association”? Using command sacctmgr show association, I can only list all users’ association.

 

Considering the scene in which we want to set a default node number limitation for all users, command such as sacctmgr modify user set grptres="node=8" do can set the limitation on all users at once, but it will cover the original per-user limitation on some specific account. So it may not be an satisfying solution. If the “cluster association” exists, it may be exactly what we want. So how to set the “cluster association”?

Paul Brunk

unread,
Feb 9, 2022, 9:28:10 PM2/9/22
to Slurm User Community List

Hi:

 

You can use e.g. 'sacctmgr show -s users', and you'll see each user's

cluster assocation as one of the output columns.  If the name were

'yourcluster', then you could do: sacctmgr modify cluster

name=yourcluster set grpTres="node=8".

 

==

Paul Brunk, system administrator

Georgia Advanced Resource Computing Center

Enterprise IT Svcs, the University of Georgia

 

 

On 2/8/22, 2:33 AM, "slurm-users" <slurm-use...@lists.schedmd.com> wrote:

…[H]ow to check or modify this “cluster association”? Using command sacctmgr show association, I can only list all users’ association.

taleint...@sjtu.edu.cn

unread,
Feb 10, 2022, 3:42:56 AM2/10/22
to Paul Brunk, Slurm User Community List

Well, ‘sacctmgr modify cluster name=***’ is exactly what we want, and inspired by this command, we found that ‘sacctmgr show cluster’ can clearly list all the cluster associations.

 

But during test we found another problem. When limitation is defined both on cluster level and user level, the smaller one will take effect, user association did not take precedence of low level one. For example:

> sacctmgr show association format=cluster,account,user,grptres,qos

   Cluster    Account       User       GrpTRES                  QOS

---------- ---------- ---------- ------------- --------------------

    sjtupi       root               gres/gpu=1               normal

    sjtupi   acct-hpc                                        normal

    sjtupi   acct-hpc     hpczty    gres/gpu=2               normal

Cluster association defined 1-gpu limitation and User association defined 2-gpu limitation, and then 2-gpu job be blocked:

> scontrol show job 6567880

JobId=6567880 JobName=test

   UserId=hpczty(3861) GroupId=hpczty(3861) MCS_label=N/A

   Priority=127 Nice=0 Account=acct-hpc QOS=normal

   JobState=PENDING Reason=AssocGrpGRES Dependency=(null)

   Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0

   …

   NumNodes=1-1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*

   TRES=cpu=1,mem=7G,node=1,billing=1,gres/gpu=2

   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*

   MinCPUsNode=1 MinMemoryCPU=7G MinTmpDiskNode=0

   Features=(null) DelayBoot=00:00:00

   …

According to official document https://slurm.schedmd.com/resource_limits.html , User association at hierarchy 3 should have higher priority than Cluster association at hierarchy 5. Is this a bug or document wrote wrong?

 

发件人: Paul Brunk <pbr...@uga.edu>
发送时间: 2022210 10:28
收件人: Slurm User Community List <slurm...@lists.schedmd.com>
主题: Re: [slurm-users] What is the 'Root/Cluster association' level in Resource Limits document mean?

Reply all
Reply to author
Forward
0 new messages