[slurm-users] Nodes TRES double what is requested

5 views
Skip to first unread message

jack.mellor--- via slurm-users

unread,
Jul 10, 2024, 4:27:52 AM (6 days ago) Jul 10
to slurm...@lists.schedmd.com
Hi,

We are running slurm 23.02.6. Our nodes have hyperthreading disabled and we have slurm.conf set to CPU=32 for each node (each node has 2 processes with 16 cores). When we allocated a job, such as salloc -n 32, it will allocate a whole node but using sinfo shows double the allocation in the TRES=64. It also shows in sinfo that the node has 4294967264 idle CPUs.

Not sure if its a known bug, or an issue with our config? I have tried various things, like setting the sockets/boards in slurm.conf.

Thanks
Jack

--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com

Brian Andrus via slurm-users

unread,
Jul 10, 2024, 11:31:34 AM (6 days ago) Jul 10
to slurm...@lists.schedmd.com
Jack,

To make sure things are set right, run 'slurmd -C' on the node and use
that output in your config.

It can also give you insight as to what is being seen on the node versus
what you may expect.

Brian Andrus

Diego Zuccato via slurm-users

unread,
Jul 11, 2024, 2:09:02 AM (5 days ago) Jul 11
to slurm...@lists.schedmd.com
Hint: round down a bit the RAM reported by 'slurmd -C'. Or you risk the
nodes not coming back up after an upgrade that leaves a bit less free
RAM than configured.

Diego

Il 10/07/2024 17:29, Brian Andrus via slurm-users ha scritto:
> Jack,
>
> To make sure things are set right, run 'slurmd -C' on the node and use
> that output in your config.
>
> It can also give you insight as to what is being seen on the node versus
> what you may expect.
>
> Brian Andrus
>
> On 7/10/2024 1:25 AM, jack.mellor--- via slurm-users wrote:
>> Hi,
>>
>> We are running slurm 23.02.6. Our nodes have hyperthreading disabled
>> and we have slurm.conf set to CPU=32 for each node (each node has 2
>> processes with 16 cores). When we allocated a job, such as salloc -n
>> 32, it will allocate a whole node but using sinfo shows double the
>> allocation in the TRES=64. It also shows in sinfo that the node has
>> 4294967264 idle CPUs.
>>
>> Not sure if its a known bug, or an issue with our config? I have tried
>> various things, like setting the sockets/boards in slurm.conf.
>>
>> Thanks
>> Jack
>>
>

--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

Kevin Buckley via slurm-users

unread,
Jul 11, 2024, 4:12:17 AM (5 days ago) Jul 11
to slurm...@lists.schedmd.com
On 2024/07/10 16:25, jack.mellor--- via slurm-users wrote:
>
> We are running slurm 23.02.6.
> Our nodes have hyperthreading disabled and we have slurm.conf
> set to CPU=32 for each node (each node has 2 processes with 16 cores).
> When we allocated a job, such as salloc -n 32, it will allocate
> a whole node but using sinfo shows double the allocation in the TRES=64.
> It also shows in sinfo that the node has 4294967264 idle CPUs.

What does an

scontrol show node

tell you about the node(s)

On our systems, where, sadly, our vendor is unable/unwilling
to turn off SMT/hyperthreading, we see (not all fields shown),
for a fully allocated, AMD EPYC 7763: so 128 physical core, node

CoresPerSocket=64

CPUAlloc=256 CPUEfctv=256 CPUTot=256

Sockets=2 Boards=1

ThreadsPerCore=2

CfgTRES=cpu=256
AllocTRES=cpu=256

so I guess the question would be,
depending on exactly what you see,

have you explictly set, or tried setting,

ThreadsPerCore=1

in the config.

















--
Supercomputing Systems Administrator
Pawsey Supercomputing Centre
SMS: +61 4 7497 6266
Eml: kevin....@pawsey.org.au

Emyr James via slurm-users

unread,
Jul 12, 2024, 6:04:29 AM (4 days ago) Jul 12
to slurm...@lists.schedmd.com, Diego Zuccato
Not sure if this is correct but I think you need to leave a bit of RAM for the OS to use so best not to allow slurm to allocate ALL of it. I usually take 8G off to allow for that - negligible when our nodes have at least 768GB of RAM. At least this is my experience when using cgroups.

Emyr James
Head of Scientific IT
CRG - Centre for Genomic Regulation


From: Diego Zuccato via slurm-users <slurm...@lists.schedmd.com>
Sent: 11 July 2024 08:06
To: slurm...@lists.schedmd.com <slurm...@lists.schedmd.com>
Subject: [slurm-users] Re: Nodes TRES double what is requested
 
Reply all
Reply to author
Forward
0 new messages