installing likwid on a cluster for likwid-mpirun

96 views
Skip to first unread message

Vihan Pandey

unread,
Nov 17, 2015, 12:32:13 PM11/17/15
to likwid-users
Dear all,

I am planning on installing likwid on a cluster so as to use likwid-mpirun so as to get power measurements from each compute node.

I will of course be installing likwid on the head node, but would I need to install likwid on each of the compute nodes as well?

Please note I am using LSF as my job manager. Please also let me know if you require any more information.

Thanks and Cheers!

- vihan

Thomas Röhl

unread,
Nov 17, 2015, 2:25:39 PM11/17/15
to likwid-users
Hi Vihan,

Yes, you have to install LIKWID also on all cluster nodes. Commonly cluster nodes get their software over the network exported by some central host. You can install LIKWID there but you have to check that the network filesystem supports the suid bit (NFSv3 yes, NFSv4 no). The bit is required by the access daemon. Of course, you can also include LIKWID into the nodes' disk image.

likwid-mpirun basically uses the path to the host file which is commonly exported by the job manager in the environment. likwid-mpirun checks the environment variables PBS_NODEFILE, LOADL_HOSTFILE and SLURM_HOSTFILE. For your cluster you can easily integrate your own environment variable by replacing one of them in likwid-mpirun (It is a Lua script, so editable). I can also integrate it, when you report the environment variable of LSF back to me.

Moreover, you maybe have to change the paths to the MPI start wrapper you use. likwid-mpirun checks the environment variables
MPIHOME and MPI_BASE for the path. It is internally extended to $MPI_BASE/bin/[mpiexec|mpirun] depending on the often automatically detected or on the command line given MPI implementation.

In our clusters we made sure that the mount path on the cluster nodes is the same as on the central exporting host to avoid problems with the adjusted paths in the Lua scripts during installation of LIKWID.

I have to admit, likwid-mpirun is not as well tested as other LIKWID tools because we concentrate our work on node-level performance montoring and optimization. It would be nice to get informed about your experiences with likwid-mpirun.

Greetings,
Thomas

Vihan Pandey

unread,
Nov 19, 2015, 5:56:09 AM11/19/15
to likwid-users, ajay
Hi Thomas,

We are currently experimenting with likwid on our cluster and would be happy to share anything we learn :-)

There is one problem, we just faced, we just installed likwid on the headnode and we get :

$ likwid-powermeter ls -al
The Intel Westmere EX processor does not support reading power data

Though interestingly, when I do :

$ likwid-perfctr -i
--------------------------------------------------------------------------------
CPU name:    Intel(R) Xeon(R) CPU E7- 4820  @ 2.00GHz
CPU type:    Intel Westmere EX processor
CPU clock:    2.00 GHz
CPU family:    6
CPU model:    47
CPU short:    westmereEX
CPU stepping:    2
CPU features:    ACPI MMX SSE SSE2 HTT TM RDTSCP MONITOR VMX EIST TM2 SSSE3 SSE4.1 SSE4.2 AES SSE3
--------------------------------------------------------------------------------
PERFMON version:    3
PERFMON number of counters:    4
PERFMON width of counters:    48
PERFMON number of fixed counters:    3
--------------------------------------------------------------------------------
Supported Intel processors:
    Intel Core 2 65nm processor
    Intel Core 2 45nm processor
    Intel Xeon MP processor
    Intel Atom 45nm processor
    Intel Atom 32nm processor
    Intel Atom 22nm processor
    Intel Core Bloomfield processor
    Intel Core Lynnfield processor
    Intel Core Westmere processor
    Intel Nehalem EX processor
    Intel Westmere EX processor
    Intel Core SandyBridge processor
    Intel Xeon SandyBridge EN/EP processor
    Intel Core IvyBridge processor
    Intel Xeon IvyBridge EN/EP/EX processor
    Intel Core Haswell processor
    Intel Xeon Haswell EN/EP/EX processor
    Intel Atom (Silvermont) processor
    Intel Atom (Airmont) processor
    Intel Xeon Phi (Knights Corner) Coprocessor
    Intel Core Broadwell processor
    Intel Xeon D Broadwell processor
    Intel Xeon Broadwell EN/EP/EX processor
    Intel Skylake processor

Supported AMD processors:
    AMD Opteron single core 130nm processor
    AMD Opteron Dual Core Rev E 90nm processor
    AMD Opteron Dual Core Rev F 90nm processor
    AMD Barcelona processor
    AMD Shanghai processor
    AMD Istanbul processor
    AMD Magny Cours processor
    AMD Interlagos processor
    AMD Family 16 model - Kabini processor

Intel Westmere EX processor is listed as supported.

Other commands also have some issues :

$ likwid-perfctr -C 1 -g ENERGY -O ls -al
--------------------------------------------------------------------------------
CPU name:    Intel(R) Xeon(R) CPU E7- 4820  @ 2.00GHz
CPU type:    Intel Westmere EX processor
CPU clock:    2.00 GHz
ERROR: No valid eventset given on commandline. Exiting...

$ likwid-perfctr -c 0-31 -g ENERGY -O ls -al
--------------------------------------------------------------------------------
CPU name:    Intel(R) Xeon(R) CPU E7- 4820  @ 2.00GHz
CPU type:    Intel Westmere EX processor
CPU clock:    2.00 GHz
ERROR: No valid eventset given on commandline. Exiting...

likwid is being installed on an NFS v3 mounted volume.

Some more info :

# cat /proc/cpuinfo

model name      : Intel(R) Xeon(R) CPU E7- 4820  @ 2.00GHz
stepping        : 2
cpu MHz         : 1997.689
cache size      : 18432 KB
physical id     : 0
siblings        : 16
core id         : 0
cpu cores       : 8
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes lahf_lm ida arat epb dts tpr_shadow vnmi flexpriority ept vpid
bogomips        : 3995.37
clflush size    : 64
cache_alignment : 64
address sizes   : 44 bits physical, 48 bits virtual
power management:

I am not pasting all the entires, they go till :

processor    : 63

Please let me know if you need any more info. I am marking a CC of this to the resident Sysadmin.


Thanks and Cheers!

- vihan


Thomas Röhl

unread,
Nov 19, 2015, 6:28:31 AM11/19/15
to likwid-users, aj...@iucaa.in
Hi Vihan, Hi Sysadmin

The Intel Westmere EX processor does not have the RAPL interface which offers measurements of the energy consumption. That's why likwid-powermeter exits directly. Of course the Westmere EX processor is listed by likwid-perfctr as all provided hardware performance counters are supported. The ENERGY group is not available for Westmere EX due to the missing of the RAPL interface. The RAPL interface was introduced with the Intel SandyBridge architecture.

You can use LIKWID for measuring other performance metrics, but energy consumption won't be possible on your machines.

Greetings,
Thomas

Vihan Pandey

unread,
Nov 19, 2015, 6:31:20 AM11/19/15
to likwid-users, aj...@iucaa.in
Hi Thomas,

Thanks for this.

Cheers!

- vihan

Vihan Pandey

unread,
Nov 19, 2015, 6:45:05 AM11/19/15
to likwid-users, aj...@iucaa.in
Dear Thomas,

We have another cluster, an AMD one :

$ cat /proc/cpuinfo |less

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 4
model name      : Quad-Core AMD Opteron(tm) Processor 2384
stepping        : 2
cpu MHz         : 2700.097
cache size      : 512 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4

apicid          : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflu
sh mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_ts
c nonstop_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a mis
alignsse 3dnowprefetch osvw
bogomips        : 5405.18
TLB size        : 1024 4K pages

clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate [8]

It goes up til processor    : 7

Its an AMD processor, would this perhaps support likwid based Energy and Power measurement?

Cheers!

- vihan

Thomas Röhl

unread,
Nov 19, 2015, 7:29:38 AM11/19/15
to likwid-users, aj...@iucaa.in
Hi Vihan,

I'm sorry to say that: No AMD processor provides an interface to measure the energy consumption.

Maybe your Westmere EX machines provide access to a coarse energy measurement facility using the IPMI interface.

Greetings,
Thomas
Reply all
Reply to author
Forward
0 new messages