[slurm-users] Getting usage reporting from sacct/sreport

2,716 views
Skip to first unread message

Thomas Arildsen

unread,
Mar 25, 2023, 11:19:33 AM3/25/23
to slurm...@lists.schedmd.com
I am experimenting with getting information from a Slurm cluster on how
many resources each user has been consuming. I would like to get
accumulated amount of CPU and GPU time over specified periods. GPU
resources reported by type of GPU would be extra helpful.
I am currently looking at sacct where I try options like:

sacct -a --starttime=2023-03-21T00:00 --
format="user,totalcpu,tresusageintot%100"

"tresusageintot" shows me:
"cpu=00:00:20,energy=0,fs/disk=0,mem=0,pages=3465,vmem=285140K ", so
GPU information does not seem to be included and I have found no other
option that can tell me.
Also, it shows me individual job steps which I would really just like
to aggregate. In fact I would just like to aggregate per user and
ignore individual jobs.

I have also tried `sreport`, but I cannot really get anything useful
out of it at the user level. For example:

sreport user TopUsage
-----------------------------------------------------------------------
---------
Top 10 Users 2023-03-21T00:00:00 - 2023-03-21T23:59:59 (86400 secs)
Usage reported in CPU Minutes
-----------------------------------------------------------------------
---------
Cluster Login Proper Name Account Used Energy
--------- --------- --------------- --------------- --------- --------

It just gives me an empty table with no user information. I am guessing
something is not configured right here to be storing that data.

I have "AccountingStorageTRES=gres/gpu" in slurm.conf. I am not sure
what more I should perhaps put here.

I hope someone can advise on what I am missing here and how I can best
get the usage stats I am hoping for.
Best regards,

Thomas

--
Special Consultant | CLAAUDIA

Phone: (+45) 9940 9844 | Email: ta...@its.aau.dk | Web:
https://www.claaudia.aau.dk/
Aalborg University | Fredrik Bajers Vej 1, A1.65, 9220 Aalborg Ø,
Denmark

Mike Mikailov

unread,
Mar 26, 2023, 10:14:04 AM3/26/23
to Slurm User Community List, ta...@its.aau.dk
Hi Thomas et al,

I have just written a Linux shell script which does exactly what you are asking for.

Please use “—allocations” option in sacct command to generate aggregated resources usage per user.

You may also use awk Linux command to summarize all CPU usages.

More advanced awk command may also summarize all GPU usages.

I have also placed the script on the GitHub but it is private now until we clear it for public.

Traceable resources normalization along with traceable resources weights are needed for more fair usage reports. in this case “billing” value represents combined (max or sum of individual traceable resources) billing unit. Note by default this values equals to the number of CPUs used.

Thanks,
-Mike
USA

Sent from my iPhone

> On Mar 25, 2023, at 11:21 AM, Thomas Arildsen <tho...@arildsen.org> wrote:
>
> I am experimenting with getting information from a Slurm cluster on how

Juergen Salk

unread,
Mar 26, 2023, 11:49:44 AM3/26/23
to Slurm User Community List
Hi Thomas,

I think sreport should actually do what you want out of the box if you
have permissions to retrieve that information for other users than
yourself.

In my understanding, sacct is meant for individual job and job step
accounting while sreport is more suitable for aggregated cluster usage
accounting. Thus, sreport also accounts for reservation hours which
sacct does not.

sreport should also be able to report on consumed GRES-hours, such as
GPU hours in your case, but you'll probably have to use '-T' option in
order to include that information to the report.

In case it matters, our AccountingStorageTRES looks like that:

AccountingStorageTRES=gres/scratch,gres/gpu

(We also account for local scratch space allocations as a GRES.)

These are the commands that we usually point our users to when
they ask for their historical ressource utilization:

https://wiki.bwhpc.de/e/BwForCluster_JUSTUS_2_Slurm_HOWTO#How_to_retrieve_historical_resource_usage_for_a_specific_user_or_account.3F

(But omit 'user=<username>' or 'account=<account>' for a report on all
users or accounts.)

Hope that helps.

Best regards
Jürgen


* Thomas Arildsen <tho...@arildsen.org> [230325 16:18]:

Thomas Arildsen

unread,
May 3, 2023, 5:50:10 AM5/3/23
to slurm...@lists.schedmd.com
Hi Mike

Thanks for the suggestion. I think something else may be missing here on
my end. With `acct` I can actually get the usage of individual jobs with
TRES information, but there must be something else causing GPU not to be
included in the information I get.
When I include the "--allocations" option, the TRES information
disappears from my output.
In any case, I think it would kind of be re-implementing the job of
`sreport` this way, so I will look further into making `sreport` work
for me.

Best regards,

Thomas

Den 27.03.2023 kl. 11.07 skrev slurm-use...@lists.schedmd.com:
> Date: Sun, 26 Mar 2023 10:13:09 -0400
> From: Mike Mikailov<mmik...@gmail.com>
> To: Slurm User Community List<slurm...@lists.schedmd.com>
> Cc:ta...@its.aau.dk
> Subject: Re: [slurm-users] Getting usage reporting from sacct/sreport
> Message-ID:<06FE0D12-9CE0-46B0...@gmail.com>
> Content-Type: text/plain; charset=utf-8
>
> Hi Thomas et al,
>
> I have just written a Linux shell script which does exactly what you are asking for.
>
> Please use ??allocations? option in sacct command to generate aggregated resources usage per user.
>
> You may also use awk Linux command to summarize all CPU usages.
>
> More advanced awk command may also summarize all GPU usages.
>
> I have also placed the script on the GitHub but it is private now until we clear it for public.
>
> Traceable resources normalization along with traceable resources weights are needed for more fair usage reports. in this case ?billing? value represents combined (max or sum of individual traceable resources) billing unit. Note by default this values equals to the number of CPUs used.
>
> Thanks,
> -Mike
> USA

Thomas Arildsen

unread,
May 3, 2023, 5:57:54 AM5/3/23
to slurm...@lists.schedmd.com
Hi Jürgen

Thanks for your feedback. I think you are right that I should probably
be using `sreport` for this. I think there must be some other reason
that `sreport` is not showing me any actual output. Perhaps the
explanation could be that we currently do not have users organised in
accounts. We just have one big pile of users. I will look further into this.

Best regards,

Thomas

Den 27.03.2023 kl. 11.07 skrev slurm-use...@lists.schedmd.com:
> Date: Sun, 26 Mar 2023 17:49:06 +0200
> From: Juergen Salk<juerge...@uni-ulm.de>
> To: Slurm User Community List<slurm...@lists.schedmd.com>
> Subject: Re: [slurm-users] Getting usage reporting from sacct/sreport
> Message-ID:<20230326154...@qualle.rz.uni-ulm.de>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi Thomas,
>
> I think sreport should actually do what you want out of the box if you
> have permissions to retrieve that information for other users than
> yourself.
>
> In my understanding, sacct is meant for individual job and job step
> accounting while sreport is more suitable for aggregated cluster usage
> accounting. Thus, sreport also accounts for reservation hours which
> sacct does not.
>
> sreport should also be able to report on consumed GRES-hours, such as
> GPU hours in your case, but you'll probably have to use '-T' option in
> order to include that information to the report.
>
> In case it matters, our AccountingStorageTRES looks like that:
>
> AccountingStorageTRES=gres/scratch,gres/gpu
>
> (We also account for local scratch space allocations as a GRES.)
>
> These are the commands that we usually point our users to when
> they ask for their historical ressource utilization:
>
> https://wiki.bwhpc.de/e/BwForCluster_JUSTUS_2_Slurm_HOWTO#How_to_retrieve_historical_resource_usage_for_a_specific_user_or_account.3F
>
> (But omit 'user=<username>' or 'account=<account>' for a report on all
> users or accounts.)
>
> Hope that helps.
>
> Best regards
> J?rgen

Reply all
Reply to author
Forward
0 new messages