[slurm-users] How to bind GPUs with CPU cores

1,056 views
Skip to first unread message

William Zhang

unread,
Oct 14, 2022, 5:42:19 AM10/14/22
to slurm...@lists.schedmd.com
Dear all,
     Our compute nodes have 128 CPU cores with 8 nvidia GPU cards.
     I set 8 numa node like this .
[root@g0025 ~]# numactl --hardware
available: 8 nodes (0-7)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 64130 MB
node 0 free: 62086 MB
node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 1 size: 64492 MB
node 1 free: 62306 MB
node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
node 2 size: 64508 MB
node 2 free: 62487 MB
node 3 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
node 3 size: 64496 MB
node 3 free: 62443 MB
node 4 cpus: 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
node 4 size: 64508 MB
node 4 free: 62283 MB
node 5 cpus: 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
node 5 size: 64508 MB
node 5 free: 61074 MB
node 6 cpus: 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
node 6 size: 64508 MB
node 6 free: 62284 MB
node 7 cpus: 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
node 7 size: 64507 MB
node 7 free: 62354 MB
node distances:
node   0   1   2   3   4   5   6   7
  0:  10  12  12  12  32  32  32  32
  1:  12  10  12  12  32  32  32  32
  2:  12  12  10  12  32  32  32  32
  3:  12  12  12  10  32  32  32  32
  4:  32  32  32  32  10  12  12  12
  5:  32  32  32  32  12  10  12  12
  6:  32  32  32  32  12  12  10  12
  7:  32  32  32  32  12  12  12  10

[root@g0025 ~]# nvidia-smi topo -m
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    CPU Affinity    NUMA Affinity
GPU0     X      SYS     SYS     SYS     SYS     SYS     SYS     SYS     48-63   3
GPU1    SYS      X      SYS     SYS     SYS     SYS     SYS     SYS     32-47   2
GPU2    SYS     SYS      X      SYS     SYS     SYS     SYS     SYS     16-31   1
GPU3    SYS     SYS     SYS      X      SYS     SYS     SYS     SYS     0-15    0
GPU4    SYS     SYS     SYS     SYS      X      SYS     SYS     SYS     112-127 7
GPU5    SYS     SYS     SYS     SYS     SYS      X      SYS     SYS     96-111  6
GPU6    SYS     SYS     SYS     SYS     SYS     SYS      X      SYS     80-95   5
GPU7    SYS     SYS     SYS     SYS     SYS     SYS     SYS      X      64-79   4

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks


Users can submit job one GPU with 6~16 CPU cores .
And I set gres.conf like this .

[root@g0038 ~]# cat /etc/slurm/gres.conf
Name=gpu File=/dev/nvidia0 COREs=0-15
Name=gpu File=/dev/nvidia1 COREs=16-31
Name=gpu File=/dev/nvidia2 COREs=32-47
Name=gpu File=/dev/nvidia3 COREs=48-63
Name=gpu File=/dev/nvidia4 COREs=64-79
Name=gpu File=/dev/nvidia5 COREs=80-95
Name=gpu File=/dev/nvidia6 COREs=96-111
Name=gpu File=/dev/nvidia7 COREs=112-127


How to realize this function .
For example ,
A job requires 6 CPUs with 1 GPU .And it runs on gpu ID 0 , CPU ID 0-5 .
The second job requires 8 CPUs with 1 GPU . If it runs on gpu ID 1 ,we hope the CPU ID is 16-23.
The third job requires 6 CPUs with 1 GPU . If it runs on gpu ID 2 ,we hope the CPU ID is 32-37.
The next job requires 12 CPUs with 2 GPU . If it runs on gpu ID 3-4 ,we hope the CPU ID is 48-53,64-69 .


Can we implement this function ?

Ward Poelmans

unread,
Oct 14, 2022, 5:59:17 AM10/14/22
to slurm...@lists.schedmd.com
Hi William,

On 14/10/2022 11:41, William Zhang wrote:

> How to realize this function .
> For example ,
> A job requires 6 CPUs with 1 GPU .And it runs on gpu ID 0 , CPU ID 0-5 .
> The second job requires 8 CPUs with 1 GPU . If it runs on gpu ID 1 ,we hope the CPU ID is 16-23.
> The third job requires 6 CPUs with 1 GPU . If it runs on gpu ID 2 ,we hope the CPU ID is 32-37.
> The next job requires 12 CPUs with 2 GPU . If it runs on gpu ID 3-4 ,we hope the CPU ID is 48-53,64-69 .
>
>
> Can we implement this function ?

Have a look at the --gres-flags=enforce-binding option of sbatch.

Ward

William Zhang

unread,
Oct 14, 2022, 7:05:31 AM10/14/22
to slurm...@lists.schedmd.com
Hi Ward,
I have a try with --gres-flags=enforce-binding but it doesn't work.
The first job apply 1GPU and 6 CPUs . The CPU ID is 0-5 GPU ID is 0.
The second job apply 1GPU and 6 CPUs.The CPU ID is 6-11.But I hope the CPU ID is 16-21.

[zhangyc@ln01 numa]$ sbatch -w g0036 -p gpu_c128 --gpus=1 -n 6 --gres-flags=enforce-binding ./run.sh
Submitted batch job 198106
[zhangyc@ln01 numa]$
[zhangyc@ln01 numa]$ scontrol show job 198106 -d
JobId=198106 JobName=run.sh
   UserId=zhangyc(1004) GroupId=zhangyc(1004) MCS_label=N/A
   Priority=4294704732 Nice=0 Account=zhangyc QOS=normal WCKey=*
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=00:00:05 TimeLimit=UNLIMITED TimeMin=N/A
   SubmitTime=2022-10-14T18:38:52 EligibleTime=2022-10-14T18:38:52
   AccrueTime=2022-10-14T18:38:52
   StartTime=2022-10-14T18:38:56 EndTime=Unknown Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-10-14T18:38:56
   Partition=gpu_c128 AllocNode:Sid=ln01:26986
   ReqNodeList=g0036 ExcNodeList=(null)
   NodeList=g0036
   BatchHost=g0036
   NumNodes=1 NumCPUs=6 NumTasks=6 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=6,mem=60000M,node=1,billing=6,gres/gpu=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   JOB_GRES=gpu:1
     Nodes=g0036 CPU_IDs=0-5 Mem=60000 GRES=gpu:1(IDX:0)
   MinCPUsNode=1 MinMemoryNode=60000M MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=./run.sh
   WorkDir=/data/run01/zhangyc/numa
   StdErr=/data/run01/zhangyc/numa/slurm-198106.out
   StdIn=/dev/null
   StdOut=/data/run01/zhangyc/numa/slurm-198106.out
   Power=
   GresEnforceBind=Yes
   CpusPerTres=gpu:6
   TresPerJob=gpu:1
   NtasksPerTRES:0


[zhangyc@ln01 numa]$ sbatch -w g0036 -p gpu_c128 --gpus=1 -n 6 --gres-flags=enforce-binding ./run.sh
Submitted batch job 198107
[zhangyc@ln01 numa]$ scontrol show job 198107 -d
JobId=198107 JobName=run.sh
   UserId=zhangyc(1004) GroupId=zhangyc(1004) MCS_label=N/A
   Priority=4294704731 Nice=0 Account=zhangyc QOS=normal WCKey=*
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=00:00:04 TimeLimit=UNLIMITED TimeMin=N/A
   SubmitTime=2022-10-14T18:39:05 EligibleTime=2022-10-14T18:39:05
   AccrueTime=2022-10-14T18:39:05
   StartTime=2022-10-14T18:39:05 EndTime=Unknown Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-10-14T18:39:05
   Partition=gpu_c128 AllocNode:Sid=ln01:26986
   ReqNodeList=g0036 ExcNodeList=(null)
   NodeList=g0036
   BatchHost=g0036
   NumNodes=1 NumCPUs=6 NumTasks=6 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=6,mem=60000M,node=1,billing=6,gres/gpu=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   JOB_GRES=gpu:1
     Nodes=g0036 CPU_IDs=6-11 Mem=60000 GRES=gpu:1(IDX:1)
   MinCPUsNode=1 MinMemoryNode=60000M MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=./run.sh
   WorkDir=/data/run01/zhangyc/numa
   StdErr=/data/run01/zhangyc/numa/slurm-198107.out
   StdIn=/dev/null
   StdOut=/data/run01/zhangyc/numa/slurm-198107.out
   Power=
   GresEnforceBind=Yes
   CpusPerTres=gpu:6
   TresPerJob=gpu:1
   NtasksPerTRES:0




发件人: slurm-users 代表 Ward Poelmans
已发送: 2022 年 10 月 14 日 星期五 17:58
收件人: slurm...@lists.schedmd.com
主题: Re: [slurm-users] How to bind GPUs with CPU cores

Reply all
Reply to author
Forward
0 new messages