[slurm-users] How can we put limits on interactive jobs?

132 views
Skip to first unread message

Ole Holm Nielsen via slurm-users

unread,
Apr 25, 2025, 5:17:29 AM4/25/25
to slurm...@schedmd.com
We would like to put limits on interactive jobs (started by salloc) so
that users don't leave unused interactive jobs behind on the cluster by
mistake.

I can't offhand find any configurations that limit interactive jobs, such
as enforcing a timelimit.

Perhaps this could be done in job_submit.lua, but I couldn't find any
job_desc parameters in the source code which would indicate if a job is
interactive or not.

Question: How do people limit interactive jobs, or identify orphaned jobs
and kill them?

Thanks a lot,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark

--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com

Ewan Roche via slurm-users

unread,
Apr 25, 2025, 5:40:19 AM4/25/25
to Ole.H....@fysik.dtu.dk, slurm...@schedmd.com
Hello Ole,
the way I identify interactive jobs is by checking that the script is empty in job_submit.lua.

If it's the case then they're assigned to an interactive QoS that limits the time and resources as well as only allowing one job per user.


if job_desc.script == nil or job_desc.script == '' then

slurm.log_info("slurm_job_submit: jobscript is missing, assuming interactive job")
slurm.log_user("Launching an interactive job")

if job_desc.partition == "gpu" then
job_desc.qos = "gpu_interactive"
end

if job_desc.partition == "cpu" then
job_desc.qos = "cpu_interactive"
end

return slurm.SUCCESS

Thanks

Ewan

Loris Bennett via slurm-users

unread,
Apr 25, 2025, 5:56:31 AM4/25/25
to slurm...@lists.schedmd.com
Hi Ole,

Ole Holm Nielsen via slurm-users
<slurm...@lists.schedmd.com> writes:
> We would like to put limits on interactive jobs (started by salloc) so
> that users don't leave unused interactive jobs behind on the cluster
> by mistake.
>
> I can't offhand find any configurations that limit interactive jobs,
> such as enforcing a timelimit.
>
> Perhaps this could be done in job_submit.lua, but I couldn't find any
> job_desc parameters in the source code which would indicate if a job
> is interactive or not.
>
> Question: How do people limit interactive jobs, or identify orphaned
> jobs and kill them?

We would be interested in this too.

Currently we have a very make-shift solution which involves a script
which simply pipes all running job IDs to 'sjeff'
(https://github.com/ubccr/stubl/blob/master/bin/sjeff) every 30s. This
produces an output like the following:

Username Mem_Request Max_Mem_Use CPU_Efficiency Number_of_CPUs_In_Use
able 3600M 0.94Gn 99.22% (142.88 of 144)
baker 8G 0.90Gn 0.60% (0.02 of 4)
charlie varied 32.92Gn 42.54% (5.96 of 14)
...
== CPU efficiency: data above from Fri 25 Apr 11:17:09 CEST 2025 ==

where efficiencies under 50% are printed in red. As long as one only
has about a screenful of users, it is fairly easy to spot users with a
low CPU efficiency, whether it be due to idle interactive jobs or caused
by something else.

Apart from that, we have a partition called 'interactive' which has an
appropriately short MaxTime. We don't actually lie to our users by
saying that they have to used this partition, but we don't advertise the
fact they could use any of the other partitions for interactive work.
This is obviously also even more make-shift :-)

Cheers,

Loris

> Thanks a lot,
> Ole
>
> --
> Ole Holm Nielsen
> PhD, Senior HPC Officer
> Department of Physics, Technical University of Denmark
--
Dr. Loris Bennett (Herr/Mr)
FUB-IT, Freie Universität Berlin

René Sitt via slurm-users

unread,
Apr 25, 2025, 7:05:33 AM4/25/25
to slurm...@lists.schedmd.com
Hello,

we also do it this way, by checking if job_desc.script is empty. I have
no idea if this is foolproof in any way (and use cases like, say,
someone starting a Jupyter or RStudio instance via script are not
covered), but hopefully, users who are inventive enough to find ways
around this are also receptive enough to accept more reasonable and
robust solutions for their workflows. Aside from setting a reasonable
time limit, I'd say the most important limitation to steer users away
from overusing interactive jobs is enforcing (either via partition or
via QoS) that only one interactive job per user  can be running at any
given time.

Cheers,
René

Am 25.04.25 um 11:37 schrieb Ewan Roche via slurm-users:
Dipl.-Chem. René Sitt
Hessisches Kompetenzzentrum für Hochleistungsrechnen
Philipps-Universität Marburg
Hans-Meerwein-Straße
35032 Marburg

Tel. +49 6421 28 23523
si...@hrz.uni-marburg.de
www.hkhlr.de

Ole Holm Nielsen via slurm-users

unread,
Apr 25, 2025, 8:39:59 AM4/25/25
to ewan....@agroscope.admin.ch, slurm...@schedmd.com
Hi all,

Thanks for the great suggestions! It seems that the Slurm job_submit.lua
script is the most flexible way to check for interactive jobs, and change
job parameters such as QOS, time_limit etc.

I've added this Lua function to our job_submit.lua script and it seems to
work fine:

-- Check for interactive jobs
-- Policy: Interactive jobs are limited to 4 hours
function check_interactive_job (job_desc, submit_uid, log_prefix)
if (job_desc.script == nil or job_desc.script == '') then
local time_limit = 240
slurm.log_info("%s: user %s submitted an interactive
job", log_prefix, userinfo)
slurm.log_user("NOTICE: Job script is missing, assuming
an interactive job")
slurm.log_user(" Job timelimit is set to %d
minutes", time_limit)
job_desc.time_limit = time_limit
end
return slurm.SUCCESS
end

The complete script is available at
https://github.com/OleHolmNielsen/Slurm_tools/blob/master/plugins/job_submit.lua

Interestingly, Slurm by default (we're at 24.11.4) assigns
job_desc.job_name="interactive" to interactive jobs submitted by salloc,
from the manual page:

> The default job name is the name of the "command" specified on the command line.

Users can of course override this with the --job-name parameter.

Best regards,
Ole
Reply all
Reply to author
Forward
0 new messages