Hi!
I've been trying to use PAPI to gather events for all applications
running on a specific core. My idea was use to run my program pinned
to a specific core and at the same time gather events with PAPI. I
tried follow techniques and tips discussed multiple times at the
mailing list but without luck so far, usually I can't observe an
increase in number of cycles/instructions retired for any core. I can
only generate reliable data for computations performed inside PAPI
measurement program. My problem seems to be similar to a one discussed
already on the mailing list in 2014 [1]. I can't use PAPI_GRN_SYS_CPU
and PAPI_GRN_SYS does not catch activity on other cores. I saw this
problem discussed a couple of times already [2,3]. Yet, no solution
seems to work for me.
Do you have any suggestions if there might be something fundamentally
wrong with my configuration? A missing dependency, problem with
permissions? Below I describe my configuration and attempted
solutions.
Best,
Marcin
a) "papi_command_line INSTRUCTIONS_RETIRED:cpu=$CPU" always return 0
unless it's executed on $CPU
b) running perf_event_system_wide gives suspicious results. Even if I
force it to run on other core, it still returns similar numbers for
the test with attaching to CPU.
Curiously enough, I started seeing incorrect numbers (a lot of them
were negative) after changing the attach test to use DOM_ALL. For the
affinity test with DOM_USER I see incorrect numbers after changing the
source code to attach to a core different then the one used for
affinity.
Trying PAPI_TOT_INS with different domains:
Default: 200000216
PAPI_DOM_USER: 200000216
PAPI_DOM_KERNEL: 55722
PAPI_DOM_USER|PAPI_DOM_KERNEL: 200051703
PAPI_DOM_ALL: 200055784
Trying different granularities:
PAPI_GRN_THR: 200000209
PAPI_GRN_PROC: Unable to set PAPI_GRN_PROC
PAPI_GRN_SYS: 200000209
PAPI_GRN_SYS_CPU: Unable to set PAPI_GRN_SYS_CPU
PAPI_GRN_SYS plus CPU attach:
GRN_SYS, DOM_USER, CPU 0 attach: 200000344
GRN_SYS, DOM_USER, CPU 0 affinity: 200000219
GRN_SYS, DOM_ALL, CPU 0 affinity: 200053737
Validating:
DOM_USER|DOM_KERNEL (200051703) > DOM_USER (200000216)
PASSED
c) papi/src/ctests/attach_cpu always returns 0 unless used as "taskset
-c ${i} ~/software/papi/src/ctests/attach_cpu ${i}"
d) papi/src/ctests/attach_cpu_sys_validate fails with the following output
Setting Eventset[0] granularity to: 8 (PAPI_GRN_SYS)
Setting Eventset[1] granularity to: 8 (PAPI_GRN_SYS)
Setting Eventset[2] granularity to: 8 (PAPI_GRN_SYS)
Setting Eventset[3] granularity to: 8 (PAPI_GRN_SYS)
Setting Eventset[4] granularity to: 8 (PAPI_GRN_SYS)
Setting Eventset[5] granularity to: 8 (PAPI_GRN_SYS)
Setting Eventset[6] granularity to: 8 (PAPI_GRN_SYS)
Setting Eventset[7] granularity to: 8 (PAPI_GRN_SYS)
Setting Eventset[8] granularity to: 8 (PAPI_GRN_SYS)
Setting Eventset[9] granularity to: 8 (PAPI_GRN_SYS)
Setting Eventset[10] granularity to: 8 (PAPI_GRN_SYS)
Setting Eventset[11] granularity to: 8 (PAPI_GRN_SYS)
Setting Eventset[12] granularity to: 8 (PAPI_GRN_SYS)
Setting Eventset[13] granularity to: 8 (PAPI_GRN_SYS)
Setting Eventset[14] granularity to: 8 (PAPI_GRN_SYS)
Setting Eventset[15] granularity to: 8 (PAPI_GRN_SYS)
Event: PAPI_TOT_INS: 200000236 on Cpu: 0
Event: PAPI_TOT_INS: 200000556 on Cpu: 1
Event: PAPI_TOT_INS: 200000861 on Cpu: 2
Event: PAPI_TOT_INS: 200001196 on Cpu: 3
Event: PAPI_TOT_INS: 200001501 on Cpu: 4
Event: PAPI_TOT_INS: 200001836 on Cpu: 5
Event: PAPI_TOT_INS: 200002156 on Cpu: 6
Event: PAPI_TOT_INS: 200002476 on Cpu: 7
Event: PAPI_TOT_INS: 200002796 on Cpu: 8
Event: PAPI_TOT_INS: 200003116 on Cpu: 9
Event: PAPI_TOT_INS: 200003436 on Cpu: 10
Event: PAPI_TOT_INS: 200003741 on Cpu: 11
Event: PAPI_TOT_INS: 200004076 on Cpu: 12
Event: PAPI_TOT_INS: 200004396 on Cpu: 13
Event: PAPI_TOT_INS: 200004716 on Cpu: 14
Event: PAPI_TOT_INS: 200005023 on Cpu: 15
Average: 200002632
Error! 16 events were the same
FAILED!!!
Line # 146 Error: Too similar
Some tests require special hardware, permissions, OS, compilers
or library versions. PAPI may still function perfectly on your
system without the particular feature being tested here.
d) running the attached test program produces small and maybe corrrect
values as long the measurement program is executed on the measured
core. Otherwise I quickly observe negative values for all three
counters. When not setting the granularity, the result is always zero.
Details:
- /proc/sys/kernel/perf_event_paranoid is set to -1
- there's no change when running PAPI under root or not
- "perf stat -a --per-core" seems to work and "perf stat -a -C 5 sleep
$TIME" shows a significant change in activity when a test program is
pinned to that core with "taskset"
- outputs of "papi_avail" and "papi_components_avail" are attached.
[1]
https://groups.google.com/a/icl.utk.edu/forum/#!topic/ptools-perfapi/zQLIzyQ823M
[2]
http://icl.cs.utk.edu/papi/forum/viewtopic.php?f=2&t=1260
[3]
https://ptools-perfapi.eecs.utk.narkive.com/KVILwZ0U/how-to-use-counters-for-all-processes-on-core