Hi Vince & Philip,
TL;DR: After many trial-and-error steps, I got PAPI to work. I am using
version 4.4.0, and it seems to be the only one of all I have tried to
compile and work correctly.
The sysadmins installed libpapi from the SLES 11 repositories
(libpapi-3.7.2-0.10.1, papi-devel-3.7.2-0.10.1, papi-3.7.2-0.10.1), but
it didn't work corretly. It only detected 3 counters (including
PAPI_TOT_INS and PAPI_TOT_CYC), but they all returned zero and had an
output saying
PAPI Error:
pfm_find_full_event(RETIRED_MISPREDICTED_BRANCH_INSTRUCTIONS,0x7fffa09bd3f0):
event not found.
According to the software releases on the PAPI website, 4.1.0 is the
first version to have support for Westmere, but it does not compile
fine. I tried with 4.1.1, but had to add -Wno-unused-but-set-variable to
DBG in libpfm-3.y/
config.mk to make it compile. The output was the same
as for 3.7.2 from the repo (which I could not compile manually myself).
I later tried with 4.1.2, same problems. Then I went straight to the
latest 4.x release (4.4.0), and voilà! Everything worked fine with no
hassle. Maybe the reason is that the processors are Westmere EX?
The output of `ctests/zero`:
Test case 0: start, stop.
-----------------------------------------------
Default domain is: 1 (PAPI_DOM_USER)
Default granularity is: 1 (PAPI_GRN_THR)
Using 20000000 iterations of c += a*b
-------------------------------------------------------------------------
Test type : 1
PAPI_FP_INS : 60000028
PAPI_TOT_CYC : 200020238
Real usec : 73533
Real cycles : 196041287
Virt usec : 75182
Virt cycles : 200437878
-------------------------------------------------------------------------
Verification: PAPI_TOT_CYC should be roughly real_cycles
zero.c PASSED
The output of `libpfm4/perf_examples/self`:
INITIAL: 15,773 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling,
raw=15,773, ena=11,538, run=11,538)
INITIAL: 21,688 PERF_COUNT_HW_INSTRUCTIONS (0.00%
scaling, raw=21,688, ena=62,870, run=62,870)
Final counts:
FINAL: 26,654,801,768 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling,
raw=26,654,801,768, ena=9,997,788,016, run=9,997,788,016)
FINAL: 39,948,683,352 PERF_COUNT_HW_INSTRUCTIONS (0.00% scaling,
raw=39,948,683,352, ena=9,997,783,658, run=9,997,783,658)
For `utils/papi_component_avail`:
Available components and hardware information.
--------------------------------------------------------------------------------
PAPI Version : 4.4.0.0
Vendor string and code : GenuineIntel (1)
Model string and code : Intel(R) Xeon(R) CPU E7- 8837 @ 2.67GHz (47)
CPU Revision : 2.000000
CPUID Info : Family: 6 Model: 47 Stepping: 2
CPU Megahertz : 2666.887939
CPU Clock Megahertz : 2666
Hdw Threads per core : 1
Cores per Socket : 8
NUMA Nodes : 12
CPUs per Node : 8
Total CPUs : 96
Number Hardware Counters : 7
Max Multiplex Counters : 32
--------------------------------------------------------------------------------
Name: perf_events.c
--------------------------------------------------------------------------------
component.c PASSED
If you want the outputs for `papi_avail` or for the tests using version
4.1.1/4.1.2 you can ask for them and I will attach them here.