[slurm-users] Prevent users from updating their jobs

Jordi Blasco

unread,

Dec 16, 2021, 3:34:22 PM12/16/21

to slurm...@schedmd.com

Hi everyone,

I was wondering if there is a way to prevent users from updating their jobs with "scontrol update job".

Here is the justification.

A hypothetical user submits a job requesting a regular node, but he/she realises that the large memory nodes or the GPU nodes are idle. Using the previous command, users can request the job to use one of those resources to avoid waiting without a real need for using them.

Any suggestions to prevent that?

Cheers,

Jordi

sbatch --mem=1G -t 0:10:00 --wrap="srun -n 1 sleep 360"
scontrol update job 791 Features=smp

[user01@slurm-simulator ~]$ sacct -j 791 -o "jobid,nodelist,user"
JobID NodeList User
------------ --------------- ---------
791 smp-1 user01

Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)

unread,

Dec 16, 2021, 3:44:52 PM12/16/21

to Slurm User Community List

Is there a meaningful difference between using "scontrol update" and just killing the job and resubmitting with those resources already requested?

Carlos Fenoy

unread,

Dec 16, 2021, 3:49:42 PM12/16/21

to Slurm User Community List

As far a I remember you can use the job_submit lua plugin to prevent any change on the jobs

On Thu, 16 Dec 2021 at 21:47, Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) <noam.be...@nrl.navy.mil> wrote:

Is there a meaningful difference between using "scontrol update" and just killing the job and resubmitting with those resources already requested?

--

--
Carles Fenoy

Bill Wichser

unread,

Dec 16, 2021, 3:57:51 PM12/16/21

to slurm...@lists.schedmd.com

Indeed. We use this and BELIEVE that it works, lol!

Bill

function slurm_job_modify ( job_desc, job_rec, part_list, modify_uid )
if modify_uid == 0 then
return 0
end
if job_desc.qos ~= nil then
return 1
end
return 0
end

On 12/16/21 15:49, Carlos Fenoy wrote:
> As far a I remember you can use the job_submit lua plugin to prevent any
> change on the jobs
>
> On Thu, 16 Dec 2021 at 21:47, Bernstein, Noam CIV USN NRL (6393)
> Washington DC (USA) <noam.be...@nrl.navy.mil

Fulcomer, Samuel

unread,

Dec 16, 2021, 4:04:58 PM12/16/21

to Slurm User Community List

There's no clear answer to this. It depends a bit on how you've segregated your resources.

In our environment, GPU and bigmem nodes are in their own partitions. There's nothing to prevent a user from specifying a list of potential partitions in the job submission, so there would be no need for them to do a post-submission "scontrol update jobid" to push a job into a partition that violated the spirit of the service.

Our practice has been to periodically look at running jobs to see if they are using (or have used, in the case of bigmem) less than their requested resources, and send them a nastygram telling them to stop doing that.

Creating a LUA submission script that, e.g., blocks jobs from the gpu queue that don't request gpus only helps to weed out the naive users. A subversive user could request a gpu and only use the allocated cores and memory. There's no way to deal with this other than monitoring running jobs and nastygrams, with removal of access after repeated offenses.

Diego Zuccato

unread,

Dec 17, 2021, 3:45:40 AM12/17/21

to Slurm User Community List, Fulcomer, Samuel

Well, there could be a way: make them "pay" (in some way) for the
requested resources.
Payment can be anything: in our case, the more resources one user
allocates the less priority his group gets. If there are enough users
impacted by bad behaviour they'll be your allies (if they have access to
tools like seff to check other users' job efficiency, and they notice
their jobs have low priority, they'll be the ones sending nastygrams to
their colleagues and you haven't to do anything).

--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

Loris Bennett

unread,

Dec 17, 2021, 4:17:28 AM12/17/21

to Slurm User Community List

Hi Samuel,

"Fulcomer, Samuel" <samuel_...@brown.edu> writes:

[snip (5 lines)]

> Our practice has been to periodically look at running jobs to see if
> they are using (or have used, in the case of bigmem) less than their
> requested resources, and send them a nastygram telling them to stop
> doing that.

[snip (28 lines)]

We already do a fair bit of nastygramming (although I like to feel it's
more infogramming, or, at the worst annoyagramming) to let users know
about things like their disk usage, the impending expiry of their
university account, or the fact that they haven't logged in for a
certain period.

I have also started generating histograms of CPU and memory efficiency
from seff data along with a short textual report indicating whether the
user needs to look more carefully at the resources he or she is
requesting.

I would be interested in the following:

1. Do you use some kind of framework to automate the actual sending of
the nastygrams?

2. What metrics do you use for deciding whether a nastygram regarding
resource usage needs to be sent?

Cheers,

Loris

--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin Email loris....@fu-berlin.de

Reply all

Reply to author

Forward