[slurm-users] How to bind GPUs with CPU cores

William Zhang

unread,

Oct 14, 2022, 5:42:19 AM10/14/22

to slurm...@lists.schedmd.com

Dear all,

Our compute nodes have 128 CPU cores with 8 nvidia GPU cards.

I set 8 numa node like this .

[root@g0025 ~]# numactl --hardware

available: 8 nodes (0-7)

node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

node 0 size: 64130 MB

node 0 free: 62086 MB

node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

node 1 size: 64492 MB

node 1 free: 62306 MB

node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

node 2 size: 64508 MB

node 2 free: 62487 MB

node 3 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

node 3 size: 64496 MB

node 3 free: 62443 MB

node 4 cpus: 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79

node 4 size: 64508 MB

node 4 free: 62283 MB

node 5 cpus: 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95

node 5 size: 64508 MB

node 5 free: 61074 MB

node 6 cpus: 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111

node 6 size: 64508 MB

node 6 free: 62284 MB

node 7 cpus: 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127

node 7 size: 64507 MB

node 7 free: 62354 MB

node distances:

node 0 1 2 3 4 5 6 7

0: 10 12 12 12 32 32 32 32

1: 12 10 12 12 32 32 32 32

2: 12 12 10 12 32 32 32 32

3: 12 12 12 10 32 32 32 32

4: 32 32 32 32 10 12 12 12

5: 32 32 32 32 12 10 12 12

6: 32 32 32 32 12 12 10 12

7: 32 32 32 32 12 12 12 10

[root@g0025 ~]# nvidia-smi topo -m

GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 CPU Affinity NUMA Affinity

GPU0 X SYS SYS SYS SYS SYS SYS SYS 48-63 3

GPU1 SYS X SYS SYS SYS SYS SYS SYS 32-47 2

GPU2 SYS SYS X SYS SYS SYS SYS SYS 16-31 1

GPU3 SYS SYS SYS X SYS SYS SYS SYS 0-15 0

GPU4 SYS SYS SYS SYS X SYS SYS SYS 112-127 7

GPU5 SYS SYS SYS SYS SYS X SYS SYS 96-111 6

GPU6 SYS SYS SYS SYS SYS SYS X SYS 80-95 5

GPU7 SYS SYS SYS SYS SYS SYS SYS X 64-79 4

Legend:

X = Self

SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)

NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node

PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)

PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)

PIX = Connection traversing at most a single PCIe bridge

NV# = Connection traversing a bonded set of # NVLinks

Users can submit job one GPU with 6~16 CPU cores .

And I set gres.conf like this .

[root@g0038 ~]# cat /etc/slurm/gres.conf

Name=gpu File=/dev/nvidia0 COREs=0-15

Name=gpu File=/dev/nvidia1 COREs=16-31

Name=gpu File=/dev/nvidia2 COREs=32-47

Name=gpu File=/dev/nvidia3 COREs=48-63

Name=gpu File=/dev/nvidia4 COREs=64-79

Name=gpu File=/dev/nvidia5 COREs=80-95

Name=gpu File=/dev/nvidia6 COREs=96-111

Name=gpu File=/dev/nvidia7 COREs=112-127

How to realize this function .

For example ,

A job requires 6 CPUs with 1 GPU .And it runs on gpu ID 0 , CPU ID 0-5 .

The second job requires 8 CPUs with 1 GPU . If it runs on gpu ID 1 ,we hope the CPU ID is 16-23.

The third job requires 6 CPUs with 1 GPU . If it runs on gpu ID 2 ,we hope the CPU ID is 32-37.

The next job requires 12 CPUs with 2 GPU . If it runs on gpu ID 3-4 ,we hope the CPU ID is 48-53,64-69 .

Can we implement this function ?

Ward Poelmans

unread,

Oct 14, 2022, 5:59:17 AM10/14/22

to slurm...@lists.schedmd.com

Hi William,

On 14/10/2022 11:41, William Zhang wrote:

> How to realize this function .
> For example ,
> A job requires 6 CPUs with 1 GPU .And it runs on gpu ID 0 , CPU ID 0-5 .
> The second job requires 8 CPUs with 1 GPU . If it runs on gpu ID 1 ,we hope the CPU ID is 16-23.
> The third job requires 6 CPUs with 1 GPU . If it runs on gpu ID 2 ,we hope the CPU ID is 32-37.
> The next job requires 12 CPUs with 2 GPU . If it runs on gpu ID 3-4 ,we hope the CPU ID is 48-53,64-69 .
>
>
> Can we implement this function ?

Have a look at the --gres-flags=enforce-binding option of sbatch.

Ward

William Zhang

unread,

Oct 14, 2022, 7:05:31 AM10/14/22

to slurm...@lists.schedmd.com

Hi Ward,

I have a try with --gres-flags=enforce-binding but it doesn't work.

The first job apply 1GPU and 6 CPUs . The CPU ID is 0-5 GPU ID is 0.

The second job apply 1GPU and 6 CPUs.The CPU ID is 6-11.But I hope the CPU ID is 16-21.

[zhangyc@ln01 numa]$ sbatch -w g0036 -p gpu_c128 --gpus=1 -n 6 --gres-flags=enforce-binding ./run.sh

Submitted batch job 198106

[zhangyc@ln01 numa]$

[zhangyc@ln01 numa]$ scontrol show job 198106 -d

JobId=198106 JobName=run.sh

UserId=zhangyc(1004) GroupId=zhangyc(1004) MCS_label=N/A

Priority=4294704732 Nice=0 Account=zhangyc QOS=normal WCKey=*

JobState=RUNNING Reason=None Dependency=(null)

Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0

DerivedExitCode=0:0

RunTime=00:00:05 TimeLimit=UNLIMITED TimeMin=N/A

SubmitTime=2022-10-14T18:38:52 EligibleTime=2022-10-14T18:38:52

AccrueTime=2022-10-14T18:38:52

StartTime=2022-10-14T18:38:56 EndTime=Unknown Deadline=N/A

SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-10-14T18:38:56

Partition=gpu_c128 AllocNode:Sid=ln01:26986

ReqNodeList=g0036 ExcNodeList=(null)

NodeList=g0036

BatchHost=g0036

NumNodes=1 NumCPUs=6 NumTasks=6 CPUs/Task=1 ReqB:S:C:T=0:0:*:*

TRES=cpu=6,mem=60000M,node=1,billing=6,gres/gpu=1

Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*

JOB_GRES=gpu:1

Nodes=g0036 CPU_IDs=0-5 Mem=60000 GRES=gpu:1(IDX:0)

MinCPUsNode=1 MinMemoryNode=60000M MinTmpDiskNode=0

Features=(null) DelayBoot=00:00:00

OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)

Command=./run.sh

WorkDir=/data/run01/zhangyc/numa

StdErr=/data/run01/zhangyc/numa/slurm-198106.out

StdIn=/dev/null

StdOut=/data/run01/zhangyc/numa/slurm-198106.out

Power=

GresEnforceBind=Yes

CpusPerTres=gpu:6

TresPerJob=gpu:1

NtasksPerTRES:0

[zhangyc@ln01 numa]$ sbatch -w g0036 -p gpu_c128 --gpus=1 -n 6 --gres-flags=enforce-binding ./run.sh

Submitted batch job 198107

[zhangyc@ln01 numa]$ scontrol show job 198107 -d

JobId=198107 JobName=run.sh

UserId=zhangyc(1004) GroupId=zhangyc(1004) MCS_label=N/A

Priority=4294704731 Nice=0 Account=zhangyc QOS=normal WCKey=*

JobState=RUNNING Reason=None Dependency=(null)

Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0

DerivedExitCode=0:0

RunTime=00:00:04 TimeLimit=UNLIMITED TimeMin=N/A

SubmitTime=2022-10-14T18:39:05 EligibleTime=2022-10-14T18:39:05

AccrueTime=2022-10-14T18:39:05

StartTime=2022-10-14T18:39:05 EndTime=Unknown Deadline=N/A

SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-10-14T18:39:05

Partition=gpu_c128 AllocNode:Sid=ln01:26986

ReqNodeList=g0036 ExcNodeList=(null)

NodeList=g0036

BatchHost=g0036

NumNodes=1 NumCPUs=6 NumTasks=6 CPUs/Task=1 ReqB:S:C:T=0:0:*:*

TRES=cpu=6,mem=60000M,node=1,billing=6,gres/gpu=1

Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*

JOB_GRES=gpu:1

Nodes=g0036 CPU_IDs=6-11 Mem=60000 GRES=gpu:1(IDX:1)

MinCPUsNode=1 MinMemoryNode=60000M MinTmpDiskNode=0

Features=(null) DelayBoot=00:00:00

OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)

Command=./run.sh

WorkDir=/data/run01/zhangyc/numa

StdErr=/data/run01/zhangyc/numa/slurm-198107.out

StdIn=/dev/null

StdOut=/data/run01/zhangyc/numa/slurm-198107.out

Power=

GresEnforceBind=Yes

CpusPerTres=gpu:6

TresPerJob=gpu:1

NtasksPerTRES:0

发件人: slurm-users 代表 Ward Poelmans
已发送: 2022 年 10 月 14 日星期五 17:58
收件人: slurm...@lists.schedmd.com
主题: Re: [slurm-users] How to bind GPUs with CPU cores

Reply all

Reply to author

Forward