[slurm-users] MaxTime only for a user

938 views
Skip to first unread message

Gestió Servidors

unread,
Feb 25, 2021, 4:11:57 AM2/25/21
to slurm...@lists.schedmd.com

Hi,

 

I need to configure a SLURM partition to allow jobs than need more than a hour, but only for a specific user. By default, that partition allows jobs with a “MaxTime=10:00” but, now, a user needs to run some test in the same partition that will last one one aprox. If I configure a “MaxTime” in slurm.conf, that value is applied for all users.

 

Can I configure SLURM in some way to allow that?

 

Thanks.

 

Ole Holm Nielsen

unread,
Feb 25, 2021, 5:01:30 AM2/25/21
to slurm...@lists.schedmd.com
I think so, please see https://slurm.schedmd.com/resource_limits.html and
look for the MaxWallDurationPerJob limit. You have to set that limit on
the user's association.

/Ole

Diego Zuccato

unread,
Feb 25, 2021, 5:17:56 AM2/25/21
to Slurm User Community List, Ole Holm Nielsen
Il 25/02/21 11:00, Ole Holm Nielsen ha scritto:

> I think so, please see https://slurm.schedmd.com/resource_limits.html
> and look for the MaxWallDurationPerJob limit.  You have to set that
> limit on the user's association.
IIUC the limit in the assoc can't override the one in the partition.
So the definition will have to be reversed: set the partition limit to
the max allowed (1h) and limit all users except one in the assoc.

--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

Ole Holm Nielsen

unread,
Feb 25, 2021, 5:25:01 AM2/25/21
to Slurm User Community List
On 2/25/21 11:17 AM, Diego Zuccato wrote:
> Il 25/02/21 11:00, Ole Holm Nielsen ha scritto:
>
>> I think so, please see https://slurm.schedmd.com/resource_limits.html
>> and look for the MaxWallDurationPerJob limit.  You have to set that
>> limit on the user's association.
> IIUC the limit in the assoc can't override the one in the partition.
> So the definition will have to be reversed: set the partition limit to
> the max allowed (1h) and limit all users except one in the assoc.

You are right! According to the resource_limits page the limits have a
specified precedence. But now I see that there is an exception listed:

> Note: The precedence order specified above is respected except for the following limits: Max[Time|Wall], [Min|Max]Nodes. For these limits, even if the job is enforced with QOS and/or Association limits, it can't go over the limit imposed at Partition level, even if it listed at the bottom. So the default for these 3 types of limits is that they are upper bound by the Partition one. This Partition level bound can be ignored if the respective QOS PartitionTimeLimit and/or Partition[Max|Min]Nodes flags are set, then the job would be enforced the limits imposed at QOS and/or association level respecting the order above.


/Ole

Loris Bennett

unread,
Feb 25, 2021, 5:28:31 AM2/25/21
to Slurm User Community List
But if you have set MaxWallDurationPerJob in a QOS, the value set in the
association will be ignored, because the QOS have higher priority than
associations, as described on the page Ole mentions above.

This makes it slightly difficulty to impose per-user restrictions. Our
current approach to, say, preventing a user from starting any more jobs
is to remove all the partition QOS from the user association. This is a
little clunky, so I suspect there may be a more elegant way.

Cheers,

Loris

--
Dr. Loris Bennett (Hr./Mr.)
ZEDAT, Freie Universität Berlin Email loris....@fu-berlin.de

Loris Bennett

unread,
Feb 25, 2021, 5:48:41 AM2/25/21
to Slurm User Community List
Ole Holm Nielsen <Ole.H....@fysik.dtu.dk> writes:

Thanks for pointing that out, Ole. So I should be able to stop a user
from submitting jobs by setting, say, MaxTime to 0, which is rather
neater than tweaking the QOS.

Gestió Servidors

unread,
Feb 25, 2021, 5:49:07 AM2/25/21
to slurm...@lists.schedmd.com

Hi,

 

I have test with "sacctmgr modify user name=my_user set MaxWallDurationPerJob=01:00" (in other words, user “my_user” will have only 1 minute per job), but after that, I have submit a job as “my_user” with a “sleep” of 50 minutes and jobs has NOT been cancelled… so something is wrong

 

“My_user” associations, now, are:

   Cluster    Account       User  Partition     Share GrpJobs       GrpTRES GrpSubmit     GrpWall   GrpTRESMins MaxJobs       MaxTRES MaxTRESPerNode MaxSubmit     MaxWall       

  ---------- ---------- ---------- ---------- --------- ------- ------------- --------- ----------- ------------- ------- ------------- -------------- --------- -----------   

   q50004   caosmembe+     druiz                    1                                    00:01:00                                                                 00:01:00                             

 

 

More help, please…

 

Thanks a lot!

Gestió Servidors

unread,
Feb 25, 2021, 12:29:40 PM2/25/21
to slurm...@lists.schedmd.com
Hi,

After configuring "MaxWallDurationPerJob" and not get any good result (job with a large sleep continues running although MaxWallDurationPerJob=1 (1 minute)), now I have test a time_limit reconfiguration within a "lua" script.
My "lua" script contains these lines:
[...]
if (job_desc.user_id == 1008) then
job_desc.time_limit = 1
end
[...]

My ID is 1008, so the time_limit for my job should be changed to 1 minute... but not :( :( :(

Help... please...

Reply all
Reply to author
Forward
0 new messages