[slurm-users] Problems with gres.conf

617 views
Skip to first unread message

Gestió Servidors via slurm-users

unread,
May 10, 2024, 2:18:36 AM5/10/24
to slurm...@lists.schedmd.com

Hello,

 

I am trying to rewrite my gres.conf file.

 

Before changes, this file was just like this:

NodeName=node-gpu-1 AutoDetect=off Name=gpu Type=GeForceRTX2070 File=/dev/nvidia0 Cores=0-11

NodeName=node-gpu-1 AutoDetect=off Name=gpu Type=GeForceGTX1080Ti File=/dev/nvidia1 Cores=12-23

NodeName=node-gpu-2 AutoDetect=off Name=gpu Type=GeForceGTX1080Ti File=/dev/nvidia0 Cores=0-11

NodeName=node-gpu-2 AutoDetect=off Name=gpu Type=GeForceGTX1080 File=/dev/nvidia1 Cores=12-23

NodeName=node-gpu-3 AutoDetect=off Name=gpu Type=GeForceRTX3080 File=/dev/nvidia0 Cores=0-11

NodeName=node-gpu-4 AutoDetect=off Name=gpu Type=GeForceRTX3080 File=/dev/nvidia0 Cores=0-7

# you can seee that nodes node-gpu-1 and node-gpu-2 have two GPUs each one, whereas nodes node-gpu-3 and node-gpu-4 have only one GPU each one

 

 

And my slurmd.conf was this:

[...]

AccountingStorageTRES=gres/gpu

GresTypes=gpu

NodeName=node-gpu-1 CPUs=24 SocketsPerBoard=2 CoresPerSocket=6 ThreadsPerCore=2 RealMemory=96000 TmpDisk=47000 Gres=gpu:GeForceRTX2070:1,gpu:GeForceGTX1080Ti:1

NodeName=node-gpu-2 CPUs=24 SocketsPerBoard=2 CoresPerSocket=6 ThreadsPerCore=2 RealMemory=96000 TmpDisk=47000 Gres=gpu:GeForceGTX1080Ti:1,gpu:GeForceGTX1080:1

NodeName=node-gpu-3 CPUs=12 SocketsPerBoard=1 CoresPerSocket=6 ThreadsPerCore=2 RealMemory=23000 Gres=gpu:GeForceRTX3080:1

NodeName=node-gpu-4 CPUs=8 SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=2 RealMemory=7800 Gres=gpu:GeForceRTX3080:1

NodeName=node-worker-[0-22] CPUs=12 SocketsPerBoard=1 CoresPerSocket=6 ThreadsPerCore=2 RealMemory=47000

[...]

 

With this configuration, all seems works fine, except slurmctld.log reports:

[...]

error: _node_config_validate: gres/gpu: invalid GRES core specification (0-11) on node node-gpu-3

error: _node_config_validate: gres/gpu: invalid GRES core specification (12-23) on node node-gpu-1

error: _node_config_validate: gres/gpu: invalid GRES core specification (12-23) on node node-gpu-2

error: _node_config_validate: gres/gpu: invalid GRES core specification (0-7) on node node-gpu-4

[...]

 

However, even these errors, users can submit jobs and request GPUs resources.

 

 

 

Now, I have tried to reconfigure gres.conf and slurmd.conf in this way:

gres.conf:

Name=gpu Type=GeForceRTX2070 File=/dev/nvidia0

Name=gpu Type=GeForceGTX1080Ti File=/dev/nvidia1

Name=gpu Type=GeForceGTX1080Ti File=/dev/nvidia0

Name=gpu Type=GeForceGTX1080 File=/dev/nvidia1

Name=gpu Type=GeForceRTX3080 File=/dev/nvidia0

Name=gpu Type=GeForceRTX3080 File=/dev/nvidia0

# there is no NodeName attribute

 

slurmd.conf:

[...]

NodeName=node-gpu-1 SocketsPerBoard=2 CoresPerSocket=6 ThreadsPerCore=2 RealMemory=96000 TmpDisk=47000 Gres=gpu:GeForceRTX2070:1,gpu:GeForceGTX1080Ti:1

NodeName=node-gpu-2 SocketsPerBoard=2 CoresPerSocket=6 ThreadsPerCore=2 RealMemory=96000 TmpDisk=47000 Gres=gpu:GeForceGTX1080Ti:1,gpu:GeForceGTX1080:1

NodeName=node-gpu-3 SocketsPerBoard=1 CoresPerSocket=6 ThreadsPerCore=2 RealMemory=23000 Gres=gpu:GeForceRTX3080:1

NodeName=node-gpu-4 SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=2 RealMemory=7800 Gres=gpu:GeForceRTX3080:1

NodeName=node-worker-[0-22] SocketsPerBoard=1 CoresPerSocket=6 ThreadsPerCore=2 RealMemory=47000

# there is no CPUs attribute

[...]

 

 

With this new configuration, nodes with GPU start correctly slurmd.service daemon, but nodes without GPU (node-worker-[0-22]) can’t start slurmd.service daemon and returns this error:

[...]

error: Waiting for gres.conf file /dev/nvidia0

fatal: can't stat gres.conf file /dev/nvidia0: No such file or directory

[...]

 

It seems SLURM is waiting that “node-workers” have also an nvidia GPU but not, theses nodes haven’t GPU... So, where is my configuration error?

 

I have read in https://slurm.schedmd.com/gres.conf.html about syntax and examples but it seems I’m doing some wrong.

 

Thanks!!

Gestió Servidors via slurm-users

unread,
May 20, 2024, 12:19:25 PM5/20/24
to slurm...@lists.schedmd.com

Patryk Bełzak via slurm-users

unread,
Jun 4, 2024, 3:28:55 AM6/4/24
to Gestió Servidors, slurm...@lists.schedmd.com
Hi,
I believe that setting cores in gres.conf explicitly gives you better control over hardware configuration, I wouldn't trust slurm on that one.

We have the gres.conf along with "Cores", all you have to do is proper Numa discovery (as long as your hardware has numa), and then assign correct cores to correct gpu's. One of the simple ways to discover CPU's affinity with GPU's is with command `nvidia-smi topo -m` which will display HW topology. You need relatively new nvidia driver though.

Also keep in mind, that Intel with newer hardware made a mess with Numa and core bindings, we have a system which has 2 Numa nodes, and one of it has even core numbers, while second numa is with odd numbers. Because of that cores cannot be merged like [1-128], and are comma separated [1,3,5,7,(..),128]. This kind of output doesn't fit into `nvidia-smi` command. I think that slurm affinity discovery may rely on something similar to `nvidia-smi`, because when I assigned correct cores from NUMA (all 64) discovered with hwloc, I had the same error in slurmctld. I haven't investigated impact of this error, as you mentioned it is possible to use that resources.

Best regards,
Patryk.

On 24/05/20 04:17, Gestió Servidors via slurm-users wrote:
[-- Type: text/plain; charset=US-ASCII, Encoding: 7bit, Size: 3,7K --]
[-- Alternative Type #1: text/html; charset=US-ASCII, Encoding: 7bit, Size: 11K --]

>
> --
> slurm-users mailing list -- slurm...@lists.schedmd.com
> To unsubscribe send an email to slurm-us...@lists.schedmd.com

Gestió Servidors via slurm-users

unread,
Jun 5, 2024, 5:48:54 AM6/5/24
to slurm...@lists.schedmd.com

Hi,

 

my GPU testing system (named “gpu-node”) is a simple computer with one socket and a processor " Intel(R) Core(TM) i7 CPU 950  @ 3.07GHz". Executing "lscpu", I can see there are 4 cores per socket, 2 threads per core and 8 CPUs:

Architecture:          x86_64

CPU op-mode(s):        32-bit, 64-bit

Byte Order:            Little Endian

CPU(s):                8

On-line CPU(s) list:   0-7

Thread(s) per core:    2

Core(s) per socket:    4

Socket(s):             1

NUMA node(s):          1

Vendor ID:             GenuineIntel

CPU family:            6

Model:                 26

Model name:            Intel(R) Core(TM) i7 CPU         950  @ 3.07GHz

 

 

My “gres.conf” file is:

NodeName=gpu-node Name=gpu Type=GeForce-GTX-TITAN-X File=/dev/nvidia0 CPUs=0-1

NodeName=gpu-node Name=gpu Type=GeForce-GTX-TITAN-Black File=/dev/nvidia1 CPUs=2-3

 

Running “numactl -H” in “gpu-node” host, reports:

available: 1 nodes (0)

node 0 cpus: 0 1 2 3 4 5 6 7

node 0 size: 7809 MB

node 0 free: 6597 MB

node distances:

node   0

  0:  10

 

CPUs are assigned 0-1 for first GPU and 2-3 for second GPU. However, “lscpu” shows 8 CPUs… If I rewrite “gres.conf” in this way:

NodeName=gpu-node Name=gpu Type=GeForce-GTX-TITAN-X File=/dev/nvidia0 CPUs=0-3

NodeName=gpu-node Name=gpu Type=GeForce-GTX-TITAN-Black File=/dev/nvidia1 CPUs=4-7

 

when I run “scontrol reconfigure”, slurmctld log reports this error message:

[2024-06-05T11:42:18.558] error: _node_config_validate: gres/gpu: invalid GRES core specification (4-7) on node gpu-node

 

So I think SLURM only can get physical cores and not threads, so my node only can serve 4 cores (in “lspcu”) but in gres.conf I need to write “CPUs”, not “Cores”… isn’t it?

 

But if “numactl -H” shows 8 CPUs, why I can use this 8 CPUs in “gres.conf”?

 

Sorry about this large email.

 

Thanks.

Reply all
Reply to author
Forward
0 new messages