[slurm-users] Determining Cluster Usage Rate

2,615 views
Skip to first unread message

Sid Young

unread,
May 13, 2021, 6:09:32 PM5/13/21
to Slurm User Community List

Hi All,

Is there a way to define an effective "usage rate" of a HPC Cluster using the data captured in the slurm database.

Primarily I want to see if it can be helpful in presenting to the business a case for buying more hardware for the HPC  :) 

Sid Young

Doug Meyer

unread,
May 13, 2021, 11:17:36 PM5/13/21
to Slurm User Community List
Probably need to define the problem a bit better.  spreport has very good functionality, see the boom of the man page for examples.  You can group orgs in accounting groups to map like use and use wckeys to provide accounting for specific users billing groups.  Configure tres billing to get a chargeback/shareback view.  Recently found this very good site on chargeback with tools to help.  Usage Charging Policy - ULHPC Technical Documentation (uni.lu)

Hope it helps.

Doug

Sid Young

unread,
May 13, 2021, 11:31:38 PM5/13/21
to Slurm User Community List
Yes, on reflection I should have said utilization rather than usage! I've been researching what the most likely combination of metrics would give me an overall utilization of the HPC.
Sadly its not as clear cut as I would have hoped.

Does anyone have any ideas?



Sid Young

Christopher Samuel

unread,
May 14, 2021, 2:20:05 AM5/14/21
to slurm...@lists.schedmd.com
On 5/13/21 3:08 pm, Sid Young wrote:

> Hi All,

Hiya,
I have a memory that it's possible to use "sreport" to show you what
amount of time jobs were waiting for what TRES - in other words whether
they were waiting for CPUs, memory, GPUs, etc (or some combination).

Ah here you go..

sreport -t percent -T ALL cluster utilization

That breaks things down by all the trackable resources on your system.

Hope that helps!
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Diego Zuccato

unread,
May 14, 2021, 2:52:51 AM5/14/21
to Slurm User Community List, Christopher Samuel
Il 14/05/2021 08:19, Christopher Samuel ha scritto:

> sreport -t percent -T ALL cluster utilization
"sreport: fatal: No valid TRES given" :(

--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

Ole Holm Nielsen

unread,
May 14, 2021, 4:25:01 AM5/14/21
to slurm...@lists.schedmd.com
On 14-05-2021 08:52, Diego Zuccato wrote:
> Il 14/05/2021 08:19, Christopher Samuel ha scritto:
>
>> sreport -t percent -T ALL cluster utilization
> "sreport: fatal: No valid TRES given" :(

This works correctly on our cluster:

$ sreport -t percent -T ALL cluster utilization
--------------------------------------------------------------------------------
Cluster Utilization 2021-05-13T00:00:00 - 2021-05-13T23:59:59
Usage reported in Percentage of Total
--------------------------------------------------------------------------------
Cluster TRES Name Allocated Down PLND Dow
Idle Reserved Reported
--------- -------------- -------------- ----------- --------
------------- -------- --------------
niflheim cpu 98.22% 0.11% 0.00%
0.00% 1.67% 100.00%
niflheim mem 86.52% 0.10% 0.00%
13.38% 0.00% 100.00%
niflheim energy 0.00% 0.00% 0.00%
0.00% 0.00% 0.00%
niflheim billing 92.70% 0.04% 0.00%
7.26% 0.00% 100.00%
niflheim fs/disk 0.00% 0.00% 0.00%
0.00% 0.00% 0.00%
niflheim vmem 0.00% 0.00% 0.00%
0.00% 0.00% 0.00%
niflheim pages 0.00% 0.00% 0.00%
0.00% 0.00% 0.00%


Referring to https://slurm.schedmd.com/tres.html, which TRES are defined
on your cluster?

$ sacctmgr show tres

I get this output:

Type Name ID
-------- --------------- ------
cpu 1
mem 2
energy 3
node 4
billing 5
fs disk 6
vmem 7
pages 8

/Ole

Diego Zuccato

unread,
May 14, 2021, 4:46:05 AM5/14/21
to Ole.H....@fysik.dtu.dk, Slurm User Community List
Il 14/05/21 10:24, Ole Holm Nielsen ha scritto:

> Referring to https://slurm.schedmd.com/tres.html, which TRES are defined
> on your cluster?
It just doesn't recognize 'ALL'. It works if I specify the resources.

root@str957-cluster:/var/log# sacctmgr show tres
Type Name ID
-------- --------------- ------
cpu 1
mem 2
energy 3
node 4
billing 5
fs disk 6
vmem 7
pages 8
root@str957-cluster:/var/log# sreport -t percent -T ALL cluster utilization
sreport: fatal: No valid TRES given
root@str957-cluster:/var/log# sreport -t percent -T cpu,mem cluster
utilization
--------------------------------------------------------------------------------
Cluster Utilization 2021-05-13T00:00:00 - 2021-05-13T23:59:59
Usage reported in Percentage of Total
--------------------------------------------------------------------------------
Cluster TRES Name Allocated Down PLND Dow Idle
Reserved Reported
--------- -------------- ------------ -------- -------- -----------
-------- ------------
oph cpu 81.93% 0.00% 0.00% 15.85%
2.22% 100.00%
oph mem 80.60% 0.00% 0.00% 19.40%
0.00% 100.00%

BYtE,
Diego

Paul Edmon

unread,
May 14, 2021, 9:46:24 AM5/14/21
to slurm...@lists.schedmd.com

XDMod can give these sorts of stats.  I also have some diamond collectors we use in concert with grafana to pull data and plot it which is useful for seeing large scale usage trends:

https://github.com/fasrc/slurm-diamond-collector

-Paul Edmon-

Christopher Samuel

unread,
May 14, 2021, 6:44:05 PM5/14/21
to slurm...@lists.schedmd.com
On 5/14/21 1:45 am, Diego Zuccato wrote:

> It just doesn't recognize 'ALL'. It works if I specify the resources.

That's odd, what does this say?

sreport --version

All the best,

Christopher Samuel

unread,
May 14, 2021, 6:47:41 PM5/14/21
to slurm...@lists.schedmd.com
On 5/14/21 1:45 am, Diego Zuccato wrote:

> Usage reported in Percentage of Total
> --------------------------------------------------------------------------------
>
>   Cluster      TRES Name    Allocated     Down PLND Dow        Idle
> Reserved     Reported
> --------- -------------- ------------ -------- -------- -----------
> -------- ------------
>       oph            cpu       81.93%    0.00%    0.00%      15.85%
> 2.22%      100.00%
>       oph            mem       80.60%    0.00%    0.00%      19.40%
> 0.00%      100.00%

The "Reserved" column is the one you're interested in, it's indicating
that for the 13th some jobs were waiting for CPUs, not memory.

You can look at a longer reporting period by specifying a start date,
something like:

sreport -t percent -T cpu,mem cluster utilization start=2021-01-01

All the best,

Juergen Salk

unread,
May 15, 2021, 5:54:48 PM5/15/21
to Slurm User Community List
* Christopher Samuel <ch...@csamuel.org> [210514 15:47]:

> > Usage reported in Percentage of Total
> > --------------------------------------------------------------------------------
> >
> >   Cluster      TRES Name    Allocated     Down PLND Dow        Idle
> > Reserved     Reported
> > --------- -------------- ------------ -------- -------- -----------
> > -------- ------------
> >       oph            cpu       81.93%    0.00%    0.00%      15.85%
> > 2.22%      100.00%
> >       oph            mem       80.60%    0.00%    0.00%      19.40%
> > 0.00%      100.00%
>
> The "Reserved" column is the one you're interested in, it's indicating that
> for the 13th some jobs were waiting for CPUs, not memory.

Hi Chris,

the wording in the documentation is somewhat nebulous, but my
understanding is that the "Reserved" column in sreport indicates the
amount of resources that were actually idle but reserved by Slurm for
scheduling purposes and, thus, unavailable for immediate job
allocations. I assume this includes, for example, the time the
scheduler needs to free sufficient resources for the highest priority
job that is waiting for the number of requested nodes to become
available. I think, there might be more reasons for Slurm to mark
resources reserved (but not including resource reservations created
with scontrol as these are reported as "Allocated" resources by
sreport unless created with MAINT or IGNORE_JOBS flags).

Anyway, as far as I understand the documentation, the sreport
"Reserved" column by itself does not necessarily indicate the degree
of (over-)utilitzation of the cluster as it does not take into account
the amount of jobs in the queue for which Slurm has not yet started
blocking idle resources. So, confusingly, there is a difference between
"Reserved" in sreport and sacct. In sreport "Reserved" refers to idle
but reserved cluster resources whereas in sacct "Reserved" means the
waiting time of jobs. Or do I understand this wrong?

However, there is also "Overcommited" in the sreport man page which
looks promising by description - although its exact definition
is also not completely clear to me right away:

--- snip ---

Overcommited

Time of eligible jobs waiting in the queue over the Reserved time.
Unlike Reserved, this has no limit. It is typically useful to
determine whether your system is overloaded and by how much.

--- snip ---

This field is not included by default in the report but can be added with
the Format option, e.g.

sreport -t percent -T ALL cluster utilization Format=TRESName,Allocated,PlannedDown,Down,Idle,Reserved,Overcommitted,Reported

(Note: There seems to be a typo in the scontrol man page. It should read
"Overcommitted" rather than "Overcommited".)

Best regards
Jürgen



Juergen Salk

unread,
May 16, 2021, 7:29:24 AM5/16/21
to Slurm User Community List
* Juergen Salk <juerge...@uni-ulm.de> [210515 23:54]:
> * Christopher Samuel <ch...@csamuel.org> [210514 15:47]:
>
> > > Usage reported in Percentage of Total
> > > --------------------------------------------------------------------------------
> > >
> > >   Cluster      TRES Name    Allocated     Down PLND Dow        Idle
> > > Reserved     Reported
> > > --------- -------------- ------------ -------- -------- -----------
> > > -------- ------------
> > >       oph            cpu       81.93%    0.00%    0.00%      15.85%
> > > 2.22%      100.00%
> > >       oph            mem       80.60%    0.00%    0.00%      19.40%
> > > 0.00%      100.00%
> >
> > The "Reserved" column is the one you're interested in, it's indicating that
> > for the 13th some jobs were waiting for CPUs, not memory.
>
>
> However, there is also "Overcommited" in the sreport man page which
> looks promising by description - although its exact definition
> is also not completely clear to me right away:
>
> --- snip ---
>
> Overcommited
>
> Time of eligible jobs waiting in the queue over the Reserved time.
> Unlike Reserved, this has no limit. It is typically useful to
> determine whether your system is overloaded and by how much.
>
> --- snip ---

And I just noticed that this description of "Overcommited" in sreport(1)
man page first came in with versions 20.02.7 and 20.11.1, respectively.

In versions prior to 20.02.7 and 20.11.1 this still was:

--- snip ---

Overcommited

Time that the nodes were over allocated, either with the -O,
--overcommit flag at submission time or OverSubscribe set to FORCE
in the slurm.conf. This time is not counted against the total
reported time.

--- snip ---

So, I assume, the description of "Overcommited" in sreport(1) man page was
simply wrong in older versions (unless its semantics has changed with
version 20.02.7 and 20.11.1 ) ...

Best regards
Jürgen



Diego Zuccato

unread,
May 17, 2021, 3:00:47 AM5/17/21
to Slurm User Community List, Christopher Samuel
Il 15/05/21 00:43, Christopher Samuel ha scritto:

>> It just doesn't recognize 'ALL'. It works if I specify the resources.
> That's odd, what does this say?
> sreport --version
slurm-wlm 18.08.5-2
That's the package from Debian stable (we don't have the manpower to
handle manually-compiled packages).
As Ole said, it's an old version. I'd love to be able to keep up with
the newest releases, but ... :(

Ole Holm Nielsen

unread,
May 17, 2021, 3:26:16 AM5/17/21
to slurm...@lists.schedmd.com
On 5/17/21 8:59 AM, Diego Zuccato wrote:
> Il 15/05/21 00:43, Christopher Samuel ha scritto:
>
>>> It just doesn't recognize 'ALL'. It works if I specify the resources.
>> That's odd, what does this say?
>> sreport --version
> slurm-wlm 18.08.5-2
> That's the package from Debian stable (we don't have the manpower to
> handle manually-compiled packages).
> As Ole said, it's an old version. I'd love to be able to keep up with the
> newest releases, but ... :(

I hope that someone on the list can help you build Debian packages. When
you find the time, you must upgrade by at most 2 Slurm versions at a time,
so you have to upgrade in two steps, for example 18.08->19.05->20.11.

My Slurm upgrade instructions refer to CentOS, but the overall process
would be the same for all Linuxes:
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm
Please read carefully the existing documentation from SchedMD linked to in
this page.

I upgrade Slurm frequently and have no problems doing so. We're at
20.11.7 now. You should avoid 20.11.{0-2} due to a bug in MPI.

/Ole

Diego Zuccato

unread,
May 17, 2021, 6:38:07 AM5/17/21
to Slurm User Community List, Ole Holm Nielsen
Il 17/05/21 09:25, Ole Holm Nielsen ha scritto:

> I hope that someone on the list can help you build Debian packages.
The problem is not just rebuilding Slurm: if I rebuild Slurm, I have to
rebuild OpenMPI, OpenIB and a alot of other stuff that I don't know with
the needed detail.

> When you find the time, you must upgrade by at most 2 Slurm versions at
> a time, so you have to upgrade in two steps, for example
> 18.08->19.05->20.11.
I usually just stop everything for the upgrade, then upgrade to whatever
Debian is shipping at the moment. If the history is lost, it's not a big
issue (that's what DB backups are for :) ).

> My Slurm upgrade instructions refer to CentOS, but the overall process
> would be the same for all Linuxes:
> https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm
> Please read carefully the existing documentation from SchedMD linked to
> in this page.
Tks.

> I upgrade Slurm frequently and have no problems doing so.  We're at
> 20.11.7 now.  You should avoid 20.11.{0-2} due to a bug in MPI.
That's a really useful info.
Reply all
Reply to author
Forward
0 new messages