[slurm-users] Rate Limiting of RPC calls

1,165 views
Skip to first unread message

Kota Tsuyuzaki

unread,
Feb 9, 2021, 8:00:05 PM2/9/21
to slurm...@lists.schedmd.com
Hello guys,

In our cluster, sometimes new incoming member accidentally creates too many slurm RPC calls (sbatch, sacct, etc), then slurmctld,
slurmdbd, and mysql may be overloaded.
To prevent such a situation, I'm looking for something like RPC Rate Limit for users. Does Slurm supports such a RateLimit feature?
If not, is there way to save Slurm server-side resources?

Best,
Kota

--------------------------------------------
露崎 浩太 (Kota Tsuyuzaki)
kota.tsu...@hco.ntt.co.jp
NTTソフトウェアイノベーションセンタ
分散処理基盤技術プロジェクト
0422-59-2837
---------------------------------------------




Paul Edmon

unread,
Feb 9, 2021, 8:08:28 PM2/9/21
to slurm...@lists.schedmd.com
We've hit this before several times. The tricks we've used to deal with
this are:

1. Being on the latest release: A lot of work has gone into improving
RPC throughput, if you aren't running the latest 20.11 release I highly
recommend upgrading.  20.02 also was pretty good at this.

2. max_rpc_cnt/defer: I would recommend using either of these settings
for SchedulerParameters as it will allow the scheduler more time to breathe.

3. I would make sure that your mysql settings are set such that your DB
is fully cached in memory and not hitting disk.  I also recommend
running your DB on the same server as you run your ctld.  We've found
that this can improve throughput.

4. We put a caching version of squeue in place which gives almost live
data to the users rather than live data.  This additional buffer layer
helps cut down traffic.  This is something we rolled in house with a
database that updates every 30 seconds.

5. Recommend to users to submit jobs that last for more than 10 minutes
and to use Job arrays instead of looping sbatch.  This will reduce
thrashing.

Those are my recommendations for how to deal with this.

-Paul Edmon-

Christopher Samuel

unread,
Feb 9, 2021, 8:34:11 PM2/9/21
to slurm...@lists.schedmd.com
On 2/9/21 5:08 pm, Paul Edmon wrote:

> 1. Being on the latest release: A lot of work has gone into improving
> RPC throughput, if you aren't running the latest 20.11 release I highly
> recommend upgrading.  20.02 also was pretty good at this.

We've not gone to 20.11 on production systems yet, but I can vouch for
20.02 being far better than previous versions for scheduling performance.

We also use the cli_filter lua plugin to write our own RPC limiting
mechanism using a local directory for per-user files. The big advantage
of this is that it does the rate limiting client side and so they don't
get sent to the slurmctld in the first place. Yes, it is theoretically
possible for users to discover and work around this, but the intent here
is to catch accidental/naive use rather than anything malicious.

Also getting users to use `sacct` rather than `squeue` to check what
state a job is in can help a lot too, it reduces the load on slurmctld.

All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Kevin Buckley

unread,
Feb 9, 2021, 9:32:48 PM2/9/21
to Slurm User Community List, Christopher Samuel
On 2021/02/10 09:33, Christopher Samuel wrote:
>
> Also getting users to use `sacct` rather than `squeue` to check what
> state a job is in can help a lot too, it reduces the load on slurmctld.

That raises an interesting take on the two utilities, Chris,
in that

1) It should be possible to write a wrapper, or even a binary,
that gives the user the squeue format using the API calls
which sacct targets, for a subset of squeue functionality ?

2) How much of the functionality of squeue would be lost if
SchedMD had only provided an "sacct with squeue formatting"
and how much of the lost functionality would really be
missed?

Kevin
--
Supercomputing Systems Administrator
Pawsey Supercomputing Centre

Kota Tsuyuzaki

unread,
Feb 12, 2021, 4:14:23 AM2/12/21
to Slurm User Community List, Christopher Samuel
Thanks Guys!

All information is valuable. I'll look up our setting and try to tune our Slurm cluster to get higher performance.

Best,
Kota

--------------------------------------------
露崎 浩太 (Kota Tsuyuzaki)
kota.tsu...@hco.ntt.co.jp
NTTソフトウェアイノベーションセンタ
分散処理基盤技術プロジェクト
0422-59-2837
---------------------------------------------

Kota Tsuyuzaki

unread,
Feb 17, 2021, 1:03:17 AM2/17/21
to Slurm User Community List, Christopher Samuel
Hello guys,


> > 1) It should be possible to write a wrapper, or even a binary,
> > that gives the user the squeue format using the API calls
> > which sacct targets, for a subset of squeue functionality ?
> >
> > 2) How much of the functionality of squeue would be lost if
> > SchedMD had only provided an "sacct with squeue formatting"
> > and how much of the lost functionality would really be
> > missed?
> >

Thinking of use of sacct instead of squeue, I got interesting results on resource usage viewpoint.
At first, exactly sacct communicates slurmdbd directly. It means it'll reduce the communication to slurmctld.
However, when I ran sacct pure queries (e.g. just `sacct` command) resulting in bunch of job records more than hundreds, mysql db raised CPU usage. Looking at query log in mysql db, it looks like slurmdbd requested many similar "select" queries for each RPC execution. The difference between the select queries for one `sacct` execution looks only job_db_inx so it seems something (slurmdbd? mysql subquery?) would request more select queries according to the number of records. And more, when I ran sacct with only one job id (i.e. `sacct -j <jobid>`), the mysql CPU spike was reduced so that I'm realizing that sacct may exhaust the resources more rapidly than squeue on mysql point of view because, as I understand correctly, squeue doesn't affects such a mysql db query performance.

Any thoughts?

Kevin Buckley

unread,
Feb 19, 2021, 3:21:49 AM2/19/21
to Slurm User Community List, Kota Tsuyuzaki, Christopher Samuel
On 2021/02/17 14:02, Kota Tsuyuzaki wrote:
>> .. so that I'm realizing that sacct may exhaust the resources more rapidly
> than squeue on mysql point of view because, as I understand correctly,
> squeue doesn't affects such a mysql db query performance.
>
> Any thoughts?
>
> Best,
> Kota


Pulling together some things from the various docs and man pages.


In the "Performance" section of the manual pages for sacct and squeue,
a clear disticntion is made between the two utilities, in that:

sacct makes an RPC to slurmdbd

squeue makes an RPC to slurmctld

(although the diagram in the quickstart guide differs, in
suggesting that sacct has links with all three of slurmctld,
slurmdbd, and the slurmds ?)

but both man pages warn that too many invocations of either
utility can cause a degradation in the performance of their
respective daemon.

So yes, by replacing squeue with an sacct wrapper, you could
reduce the load on the slurmctld but degrade the performance
of the slurmdbd,

However, it's the slurmctld (the control daemon) that's doing
the real work, the scheduling of your jobs, whilst the slurmdbd,
and the database behind it, is merely storing accounting info
that the slurmctld sends it.

Furthermore, if the slurmdbd isn't responding to the slurmctld,
then the slurmctld will cache the accounting info until the
slurmdbd does respond.

However, if the slurmctld isn't doing its job, because it's
overloaded, then the throughput of whole of your cluster is
affected and the slurmctld won't have new accounting info to
send to the slurmdbd.

Ideally then, you don't want either slurmctld or slurmdbd
overloaded but, if you are forced to set things up so as to
make a choice, you would choose to send more of the load
towards the slurmdbd.
Reply all
Reply to author
Forward
0 new messages