[slurm-users] What is an easy way to prevent users run programs on the master/login node.

3,348 views
Skip to first unread message

Cristóbal Navarro

unread,
Apr 23, 2021, 10:38:27 PM4/23/21
to Slurm User Community List
Hi Community,
I have a set of users still not so familiar with slurm, and yesterday they bypassed srun/sbatch and just ran their CPU program directly on the head/login node thinking it would still run on the compute node. I am aware that I will need to teach them some basic usage, but in the meanwhile, how have you solved this type of user-behavior problem? Is there a preffered way to restrict the master/login resources, or actions,  to the regular users ?

many thanks in advance
--
Cristóbal A. Navarro

Benson Muite

unread,
Apr 24, 2021, 1:31:16 AM4/24/21
to slurm...@lists.schedmd.com
Hi Cristóbal,
Not sure if there is a preferred way. One method is to use cgroups on
the login nodes. This is also integrated in SLURM and may be useful on
the compute nodes, but for the login node, you should use cgroups
directly. Some documentation:
https://events.static.linuxfound.org/images/stories/pdf/lfcs2012_georgiou.pdf?a
https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html
https://opensource.com/article/20/10/cgroups
https://slurm.schedmd.com/cgroups.html

Ole Holm Nielsen

unread,
Apr 24, 2021, 4:03:44 AM4/24/21
to slurm...@lists.schedmd.com
We restrict user limits in /etc/security/limits.conf so users can't run
very long or very big tasks on the login nodes:

# Normal user limits
* hard cpu 20
* hard rss 50000000
* hard data 50000000
* soft stack 40000000
* hard stack 50000000
* hard nproc 250

/Ole

Patrick Begou

unread,
Apr 25, 2021, 3:47:18 AM4/25/21
to slurm...@lists.schedmd.com
Hi,

I also saw a cluster setup where mpirun or mpiexec commands were
replaced by a shell script just saying "please use srun or sbatch...".

Patrick

Marcus Wagner

unread,
Apr 26, 2021, 2:02:19 AM4/26/21
to slurm...@lists.schedmd.com
Hi,

we also have a wrapper script, together with a number of "MPI-Backends".
If mpiexec is called on the login nodes, only the first process is started on the login node, the rest runs on the MPI backends.

Best
Marcus
--
Dipl.-Inf. Marcus Wagner

IT Center
Gruppe: Systemgruppe Linux
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de

Social Media Kanäle des IT Centers:
https://blog.rwth-aachen.de/itc/
https://www.facebook.com/itcenterrwth
https://www.linkedin.com/company/itcenterrwth
https://twitter.com/ITCenterRWTH
https://www.youtube.com/channel/UCKKDJJukeRwO0LP-ac8x8rQ

Prentice Bisbal

unread,
Apr 27, 2021, 11:35:53 AM4/27/21
to slurm...@lists.schedmd.com

I think someone asked this same exact question a few weeks ago. The best solution I know of is to use Arbiter, which was created exactly for this situation. It uses cgroups to limit resource usage, but it adjusts those limits based on login node utilization and each users behavior ("bad" users get their resources limited more severely when they do "bad" things.

I will be deploying it myself very soon.

https://dylngg.github.io/resources/arbiterTechPaper.pdf

Prentice

Prentice Bisbal

unread,
Apr 27, 2021, 11:40:36 AM4/27/21
to slurm...@lists.schedmd.com
Using limits.conf is not a very good approach. Limits in
/etc/security/limits.conf apply to each individual shell, so an
individual user can still abuse a login node by running tasks in
multiple shells. Cgroups, which is implemented in the kernel and takes a
system-wide view or resource usage is a much better option.

Also, /etc/security/limits.conf is applied by PAM, so if someone gets
onto a system in a way that bypasses PAM, this limits will not be
applied to those shells. One way top bypass PAM to use SSH with
public/private keys.

Prentice

Prentice Bisbal

unread,
Apr 27, 2021, 11:46:13 AM4/27/21
to slurm...@lists.schedmd.com
This is not a good approach. There's plenty of jobs you can run that
will hog a systems resources without using MPI. MATLAB and Mathematica
both support parallel computation, and don't need to use MPI to do so.
Then there's OpenMP and other threaded applications that don't need
mpirun/mpiexec to launch them.

Limiting the number of processes or threads is not the only concern. You
can easily run a single-threaded tasks that hogs all the RAM. Or a user
may use bbcp to transfer a large amount of data, choking the network
interface.

Using cgroups is really the only reliable way to limit users, and
Arbiter seems like the best way to automatically mange cgroup imposed
limits.

I haven't used arbiter myself, but I've seen presentations on it, and
I'm preparing to deploy it myself.

https://dylngg.github.io/resources/arbiterTechPaper.pdf

Prentice

Prentice Bisbal

unread,
Apr 27, 2021, 11:48:45 AM4/27/21
to slurm...@lists.schedmd.com
But won't that first process be able to use 100% of a core? What if
enough users do this such that every core is at 100% utilization? Or,
what if the application is MPI + OpenMP? In that case, that one process
on the login node could spawn multiple threads that use the remaining
cores on the login node.

Prentice

Cristóbal Navarro

unread,
Apr 27, 2021, 1:45:23 PM4/27/21
to Slurm User Community List
Many thanks to all,
I will try to set cgroups properly
best
--
Cristóbal A. Navarro

Alan Orth

unread,
May 19, 2021, 12:01:25 PM5/19/21
to Ole Holm Nielsen, Slurm User Community List
Regarding setting limits for users on the head node. We had this for years:

# CPU time in minutes
*       -   cpu         30
root    -   cpu         unlimited

But we eventually found that this was even causing long-running jobs like rsync/scp to fail when users were copying data to the cluster. For a while I blamed our network people, but then I did some tests and found that it was the limits that were responsible. I have removed this and other limits for now but I ruthlessly kill heavy processes that my users run on there. I will look into using cgroups on the head node.

Cheers,
--

Marcus Wagner

unread,
May 20, 2021, 1:14:05 AM5/20/21
to slurm...@lists.schedmd.com
Hi Prentice,

you are right, and I looked into the wrapper script (not my part, never did anything in that thing).
In fact the mpi processes are spawned on the backend nodes, the only process remaining on the login/frontend node is the spawner process.

The wrapper checks the load on the nodes and then creates a corresponding hostfile:
Host nrm214: current load 0.53 => 96 slots left
Host nrm215: current load 0.14 => 96 slots left
Host nrm212: current load 0.09 => 96 slots left
Host nrm213: current load 0.13 => 96 slots left

Used hosts:
nrm214 0 (current load is: 0.53)
nrm215 0 (current load is: 0.14)
nrm212 2.0 (current load is: 0.09)
nrm213 0 (current load is: 0.13)

Writing to /tmp/mw445520/login_60004/hostfile-613910

Contents:
nrm212:2

And then spawns the job:
Command: /opt/intel/impi/2018.4.274/compilers_and_libraries/linux/mpi/bin64/mpirun -launcher ssh -machinefile /tmp/mw445520/login_60004/hostfile-63375 -np 2 <code>


I hope to have cleared things up a little bit.


Best
Marcus

Timo Rothenpieler

unread,
May 20, 2021, 8:41:18 AM5/20/21
to slurm...@lists.schedmd.com
I just put a drop-in config file for systemd into
/etc/systemd/system/user-.slice.d/user-limits.conf

> [Slice]
> CPUQuota=800%
> MemoryHigh=48G
> MemoryMax=56G
> MemorySwapMax=0

Accompanied by another drop-in that resets all those limits for root.

This enforces that no single user can use up all CPUs (limited to 8
Hyperthreads) and RAM, and can't cause the system to swap.

Other than that, we leave it to the users due diligence to not trash up
the login nodes, which so far worked fine.
They occasionally compile stuff on the login nodes in preparation of
runs, so I don't want to limit them too much.

mercan

unread,
May 20, 2021, 10:03:55 AM5/20/21
to Slurm User Community List
Hi;

We use a bash script to watch and kill users' processes, if they exceed
the our cpu and memory limits. Also this solution ensures total usage of
cpu or memory can not exceed because of a lot of well behaved users as
well as a bad user:

https://github.com/mercanca/kill_for_loginnode.sh

Ahmet M.


20.05.2021 15:40 tarihinde Timo Rothenpieler yazdı:

Bas van der Vlies

unread,
May 20, 2021, 10:28:32 AM5/20/21
to slurm...@lists.schedmd.com
same here we use the systemd user slice in out pam stack:
```
# Setup for local and ldap logins
session required pam_systemd.so
session required pam_exec.so seteuid type=open_session
/etc/security/limits.sh
```

limit.sh:
```
#!/bin/sh -e

PAM_UID=$(getent passwd "${PAM_USER}" | cut -d: -f3)

if [ "${PAM_UID}" -ge 1000 ]; then
/bin/systemctl set-property "user-${PAM_UID}.slice" CPUQuota=400%
CPUAccounting=true MemoryLimit=16G MemoryAccounting=true
fi
```

and also kill process that use to much time and exlude some processes:
*
https://github.com/basvandervlies/cf_surfsara_lib/blob/master/doc/services/sara_user_consume_resources.md
--
Bas van der Vlies
| HPCV Supercomputing | Internal Services | SURF |
https://userinfo.surfsara.nl |
| Science Park 140 | 1098 XG Amsterdam | Phone: +31208001300 |
| bas.van...@surf.nl

Timo Rothenpieler

unread,
May 20, 2021, 11:30:35 AM5/20/21
to slurm...@lists.schedmd.com
You shouldn't need this script and pam_exec.
You can set those limits directly in the systemd config to match every user.

Bas van der Vlies

unread,
May 20, 2021, 12:02:44 PM5/20/21
to slurm...@lists.schedmd.com
I know but see script we only do this for uid > 1000.

Stefan Staeglich

unread,
Jun 11, 2021, 8:02:20 AM6/11/21
to Slurm User Community List
Hi Prentice,

thanks for the hint. I'm evaluating this too.

Seems that arbiter doesn't distinguish between RAM that's used really and RAM
that's sused as cache only. Or is my impression wrong?

Best,
Stefan
Stefan Stäglich, Universität Freiburg, Institut für Informatik
Georges-Köhler-Allee, Geb.52, 79110 Freiburg, Germany

E-Mail : stae...@informatik.uni-freiburg.de
WWW : gki.informatik.uni-freiburg.de
Telefon: +49 761 203-8223
Fax : +49 761 203-8222




Juergen Salk

unread,
Jun 11, 2021, 7:02:55 PM6/11/21
to Slurm User Community List
Hi,

I can't speak specifically for arbiter but to my very best knowledge
this is just how cgroup memory limits work in general, i.e. both,
anonymous memory and page cache, always count against the cgroup
memory limit.

This also applies for memory constraints imposed to compute jobs if
ConstrainRAMSpace=yes is set in cgroup.conf.

Best regards
Jürgen


* Stefan Staeglich <stae...@informatik.uni-freiburg.de> [210611 14:01]:

Prentice Bisbal

unread,
Jul 1, 2021, 1:41:29 PM7/1/21
to slurm...@lists.schedmd.com
I'm not sure. I just installed Arbiter myself only a few weeks ago, and
I'm still learning it. The systems it's installed on haven't gone live
yet, so I haven't had many "learning opportunities" yet. Arbiter is
using cgroups, so I would imagine that depends on whether cgroups
distinguishes between the two or not. But I'm not a cgroups expert,
either. ;(

Prentice

Stefan Staeglich

unread,
Feb 7, 2022, 8:49:03 AM2/7/22
to Slurm User Community List
Hi,

I've just noticed that the repository https://gitlab.chpc.utah.edu/arbiter2
seems is down. Does someone know more?

Thank you!

Best,
Stefan

Michael Robbert

unread,
Feb 7, 2022, 10:51:20 AM2/7/22
to Slurm User Community List

They moved Arbiter2 to Github. Here is the new official repo: https://github.com/CHPC-UofU/arbiter2

 

Mike

Stefan Staeglich

unread,
Feb 18, 2022, 7:53:13 AM2/18/22
to Slurm User Community List
Hi Mike,

thank you very much :)

Stefan

Am Montag, 7. Februar 2022, 16:50:54 CET schrieb Michael Robbert:
> They moved Arbiter2 to Github. Here is the new official repo:
> https://github.com/CHPC-UofU/arbiter2
>
> Mike
>
> On 2/7/22, 06:51, "slurm-users" <slurm-use...@lists.schedmd.com>
> wrote: Hi,
>
> I've just noticed that the repository https://gitlab.chpc.utah.edu/arbiter2
> seems is down. Does someone know more?
>
> Thank you!
>
> Best,
> Stefan
>
> Am Dienstag, 27. April 2021, 17:35:35 CET schrieb Prentice Bisbal:
> > I think someone asked this same exact question a few weeks ago. The best
> > solution I know of is to use Arbiter, which was created exactly for this
> > situation. It uses cgroups to limit resource usage, but it adjusts those
> > limits based on login node utilization and each users behavior ("bad"
> > users get their resources limited more severely when they do "bad" things.
> >
> > I will be deploying it myself very soon.
> >
> > https://dylngg.github.io/resources/arbiterTechPaper.pdf
> > <https://dylngg.github.io/resources/arbiterTechPaper.pdf><https://dylngg.g
> > ithub.io/resources/arbiterTechPaper.pdf%3e>
> >
> > Prentice
> >
> > On 4/23/21 10:37 PM, Cristóbal Navarro wrote:
> > > Hi Community,
> > > I have a set of users still not so familiar with slurm, and yesterday
> > > they bypassed srun/sbatch and just ran their CPU program directly on
> > > the head/login node thinking it would still run on the compute node. I
> > > am aware that I will need to teach them some basic usage, but in the
> > > meanwhile, how have you solved this type of user-behavior problem? Is
> > > there a preffered way to restrict the master/login resources, or
> > > actions, to the regular users ?
> > >
> > > many thanks in advance
>
> --
> Stefan Stäglich, Universität Freiburg, Institut für Informatik
> Georges-Köhler-Allee, Geb.52, 79110 Freiburg, Germany
>
> E-Mail :
> stae...@informatik.uni-freiburg.de<mailto:stae...@informatik.uni-freiburg
> .de> WWW : gki.informatik.uni-freiburg.de
Reply all
Reply to author
Forward
0 new messages