[slurm-users] Re: Using more cores/CPUs that requested with

12 views
Skip to first unread message

Gestió Servidors via slurm-users

unread,
Mar 26, 2025, 3:50:42 AM3/26/25
to slurm...@lists.schedmd.com

Hello,

 

Thanks for your answers. I will try now!! One more question: is there any way to check if Cgroups restrictions is working fine during a “running” job or during SLURM scheduling process?

 

Thanks again!

 

Cutts, Tim via slurm-users

unread,
Mar 26, 2025, 7:35:03 AM3/26/25
to Gestió Servidors, slurm...@lists.schedmd.com

Cgroups don’t take effect until the job has started;.  It’s a bit clunky, but you can do things like this

 

inspect_job_cgroup_memory () 

{ 

    set -- $(squeue "$@" -O JobId,UserName | sed -n '$p');

    sudo -u $2 srun --pty --jobid "$1" bash -c 'cat /sys/fs/cgroup/memory/slurm/uid_$(id -u)/job_${SLURM_JOB_ID}/memory.usage_in_bytes'

}

 

There are lots of other files in that filesystem hierarchy to report on other things like cpusets, IO etc.

 

Obviously if you’re not the admin of the system, you can only do this for your own jobs, and then you don’t need the sudo part of the shell function.

 

Tim

 

-- 

Tim Cutts

Senior Director, R&D IT - Data, Analytics & AI, Scientific Computing Platform

AstraZeneca

 

Find out more about R&D IT Data, Analytics & AI and how we can support you by visiting our Service Catalogue |


AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.

This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com

Laura Hild via slurm-users

unread,
Mar 26, 2025, 9:48:45 AM3/26/25
to Gestió Servidors, slurm...@lists.schedmd.com
In addition to checking under /sys/fs/cgroup like Tim said, if this is just to convince yourself that you got the CPU restriction working, you could also open `top` on the host running the job and observe that %CPU is now being held to 200,0 or lower (or if its multiple processes per job, summing to that) instead of 4800 or whatever all the cores would be.


________________________________________
Od: Cutts, Tim via slurm-users <slurm...@lists.schedmd.com>
Poslano: sreda, 26. marec 2025 07:32
Za: Gestió Servidors; slurm...@lists.schedmd.com
Zadeva: [slurm-users] Re: Using more cores/CPUs that requested with

Cgroups don’t take effect until the job has started;. It’s a bit clunky, but you can do things like this

inspect_job_cgroup_memory ()
{
set -- $(squeue "$@" -O JobId,UserName | sed -n '$p');
sudo -u $2 srun --pty --jobid "$1" bash -c 'cat /sys/fs/cgroup/memory/slurm/uid_$(id -u)/job_${SLURM_JOB_ID}/memory.usage_in_bytes'
}

There are lots of other files in that filesystem hierarchy to report on other things like cpusets, IO etc.

Obviously if you’re not the admin of the system, you can only do this for your own jobs, and then you don’t need the sudo part of the shell function.

Tim

[...]


From: Gestió Servidors via slurm-users <slurm...@lists.schedmd.com>
Date: Wednesday, 26 March 2025 at 7:50 am
To: slurm...@lists.schedmd.com <slurm...@lists.schedmd.com>
Subject: [slurm-users] Re: Using more cores/CPUs that requested with

Hello,

Thanks for your answers. I will try now!! One more question: is there any way to check if Cgroups restrictions is working fine during a “running” job or during SLURM scheduling process?

Thanks again!

________________________________

[...]

--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com

Shunran Zhang via slurm-users

unread,
Mar 26, 2025, 10:54:16 AM3/26/25
to Gestió Servidors, Slurm User Community List
If you are letting systemd taking most things over, you got systemd-cgtop that work better than top for your case. There is also systemd-cgls for non-interactive listing.

Also mind to check if you are using cgroup2? A mount to check your cgroup would suffice. As cgroup is likely not supposed to be used in newer deployments of Slurm.


2025年3月26日(水) 17:14 Gestió Servidors via slurm-users <slurm...@lists.schedmd.com>:

Hello,

 

Thanks for your answers. I will try now!! One more question: is there any way to check if Cgroups restrictions is working fine during a “running” job or during SLURM scheduling process?

 

Thanks again!

 


Shunran Zhang via slurm-users

unread,
Mar 26, 2025, 12:21:24 PM3/26/25
to Williams, Jenny Avis, Gestió Servidors, Slurm User Community List
Ugh I think I did not catch up with the docs.

I started with a system that defaults to cgroup v1 but the Slurm doc for that plugin is NOT available at that time. Thus I converted everything to cgroup v2.

It appears that they are both supported and that documentation issue is more on the dev side than admin side.

Thanks for pointing that out. I misinterpreted the "coming soon" part of cgroup v1 plugin and the "legacy" naming for "do not use". It should be fine.

2025年3月27日(木) 0:48 Williams, Jenny Avis <jenny_w...@unc.edu>:

“ … As cgroup is likely not supposed to be used in newer deployments of Slurm.”

 

I am curious about this statement. Would someone expand on this, to either support or counter it?

 

Jenny Williams

UNC Chapel Hill

 

 

From: Shunran Zhang via slurm-users <slurm...@lists.schedmd.com>
Sent: Wednesday, March 26, 2025 10:52 AM
To: Gestió Servidors <sysadm...@uab.cat>
Cc: Slurm User Community List <slurm...@lists.schedmd.com>
Subject: [slurm-users] Re: Using more cores/CPUs that requested with

 

If you are letting systemd taking most things over, you got systemd-cgtop that work better than top for your case. There is also systemd-cgls for non-interactive listing.

Reply all
Reply to author
Forward
0 new messages