[slurm-dev] Confine tasks to one socket with -B option

3 views
Skip to first unread message

Mamerto Bacallado

unread,
Mar 14, 2016, 5:37:23 AM3/14/16
to slurm-dev

Hi all,

I am working on a computer with dual socket, 12 cores per socket nodes with enabled hyperthread.
Slurm version is 14.11.9 and the relevant configuration is:

TaskPlugin=task/affinity
TaskPluginParam=Cpusets

Cpuinfo shows cpus labelled as :

                       SOCKET 1                                   SOCKET 2
----------------------------------------------------------------------------------------
Core id  | 00 01 02 03 04 05 06 07 08 09 10 11  |  00 01 02 03 04 05 06 07 08 09 10 11 |
-------- -------------------------------------- ---------------------------------------| 
Thread 0 | 00 01 02 03 04 05 06 07 08 09 10 11  |  12 13 14 15 16 17 18 19 20 21 22 23 |
Thread 1 | 24 25 26 27 28 29 30 31 32 33 34 35  |  36 37 38 39 40 41 42 43 44 45 46 47 |
----------------------------------------------------------------------------------------

In order to launch 6 tasks in one socket only I run:

$ srun -N1 -n6  -B 1:3:2 --exclusive -p operation -o log hostname

assuming -B option wil set 1 sockets-per-node, 3 cores-per-socket and 2 thread-per-core
Nevertheless, the log file says that all 6 tasks have run in both sockets, cycliclly assigned:

cpu_bind=MASK - node001, task  0  0 [100602]: mask 0x1 set           --> cpuid=00
cpu_bind=MASK - node001, task  1  1 [100603]: mask 0x1000 set        --> cpuid=12 
cpu_bind=MASK - node001, task  2  2 [100604]: mask 0x1000000 set     --> cpuid=24
cpu_bind=MASK - node001, task  3  3 [100605]: mask 0x1000000000 set  --> cpuid=36
cpu_bind=MASK - node001, task  4  4 [100606]: mask 0x2 set           --> cpuid=01  
cpu_bind=MASK - node001, task  5  5 [100607]: mask 0x2000 set        --> cpuid=13

                       SOCKET 1                                   SOCKET 2
----------------------------------------------------------------------------------------
Core id  | 00 01 02 03 04 05 06 07 08 09 10 11  |  00 01 02 03 04 05 06 07 08 09 10 11 |
-------- -------------------------------------- ---------------------------------------| 
Thread 0 | x  x                                 |  x  x                                |
Thread 1 | x                                    |  x                                   |
----------------------------------------------------------------------------------------

Am I misinterpreting how -B option works?

Regards,
Mam

Perry, Martin

unread,
Mar 14, 2016, 2:21:31 PM3/14/16
to slurm-dev

The -B option is a constraint on node selection. You specified --exclusive, so Slurm allocated the entire node to your job. It then applied the default distribution method of cyclic to select the threads to bind to your tasks. To select threads in the same socket for binding, try specifying block second distribution, -m block:block or -m *:block. See the CPU Management Guide for more information and examples: http://slurm.schedmd.com/cpu_management.html

Martin Perry

Reply all
Reply to author
Forward
0 new messages