PAPI support for AMD Zen2/3 counters?

570 views
Skip to first unread message

Jose Gracia

unread,
Jul 15, 2021, 12:21:28 PM7/15/21
to ptools-...@icl.utk.edu, Björn Dick
Dear PAPI team,

we would like to access HWCs related to L1/L2 caches but also uncore
events on AMD Zen2 (EPYC 7742) processors. However, no presets seem to
be available. See output of papi_avail below.

Is there any plans to support more AMD Zen2/3 counters through PAPI?
Can we assist with testing, etc?

Cheers,
Jose Gracia



$ $ cat /proc/sys/kernel/perf_event_paranoid
0

$ papi_avail -a
Available PAPI preset and user defined events plus hardware information.
--------------------------------------------------------------------------------
PAPI version : 6.0.0.1
Operating system : Linux 4.18.0-193.28.1.el8_2.x86_64
Vendor string and code : AuthenticAMD (2, 0x2)
Model string and code : AMD EPYC 7742 64-Core Processor (49, 0x31)
CPU revision : 0.000000
CPUID : Family/Model/Stepping 23/49/0, 0x17/0x31/0x00
CPU Max MHz : 3393
CPU Min MHz : 3393
Total cores : 256
SMT threads per core : 2
Cores per socket : 64
Sockets : 2
Cores per NUMA region : 32
NUMA regions : 8
Running in a VM : no
Number Hardware Counters : 5
Max Multiplex Counters : 384
Fast counter read (rdpmc): yes
--------------------------------------------------------------------------------

================================================================================
PAPI Preset Events
================================================================================
Name Code Deriv Description (Note)

PAPI_TLB_DM 0x80000014 No Data translation lookaside buffer misses
PAPI_TLB_IM 0x80000015 Yes Instruction translation lookaside buffer
misses
PAPI_BR_TKN 0x8000002c No Conditional branch instructions taken
PAPI_BR_MSP 0x8000002e No Conditional branch instructions mispredicted
PAPI_TOT_INS 0x80000032 No Instructions completed
PAPI_BR_INS 0x80000037 No Branch instructions
PAPI_TOT_CYC 0x8000003b No Total cycles
--------------------------------------------------------------------------------
Of 7 available events, 1 is derived.


$ papi_component_avail
Available components and hardware information.
--------------------------------------------------------------------------------
PAPI version : 6.0.0.1
Operating system : Linux 4.18.0-193.28.1.el8_2.x86_64
Vendor string and code : AuthenticAMD (2, 0x2)
Model string and code : AMD EPYC 7742 64-Core Processor (49, 0x31)
CPU revision : 0.000000
CPUID : Family/Model/Stepping 23/49/0, 0x17/0x31/0x00
CPU Max MHz : 3393
CPU Min MHz : 3393
Total cores : 256
SMT threads per core : 2
Cores per socket : 64
Sockets : 2
Cores per NUMA region : 32
NUMA regions : 8
Running in a VM : no
Number Hardware Counters : 5
Max Multiplex Counters : 384
Fast counter read (rdpmc): yes
--------------------------------------------------------------------------------

Compiled-in components:
Name: perf_event Linux perf_event CPU counters
Name: perf_event_uncore Linux perf_event CPU uncore and northbridge
\-> Disabled: No uncore PMUs or events found

Active components:
Name: perf_event Linux perf_event CPU counters
Native: 135, Preset: 7, Counters: 5
PMUs supported: perf, perf_raw,
amd64_fam17h_zen2


--------------------------------------------------------------------------------

--

Dr. Jose Gracia email: gra...@hlrs.de
HLRS, Uni Stuttgart http://www.hlrs.de/people/gracia
Nobelstrasse 19 phone: +49 711 685 87208
70569 Stuttgart fax: +49 711 685 65832
Germany pgp key ID: FBDADD6F

Satish Kamath

unread,
Feb 1, 2022, 10:31:41 AM2/1/22
to ptools-perfapi, Jose Gracia, Björn Dick
Dear PAPI team,

I agree with Jose. I also get the same:

[satishk@tcn20 ~]$ papi_avail
Available PAPI preset and user defined events plus hardware information.
--------------------------------------------------------------------------------
PAPI version             : 6.0.0.1
Operating system         : Linux 4.18.0-305.25.1.el8_4.x86_64

Vendor string and code   : AuthenticAMD (2, 0x2)
Model string and code    : AMD EPYC 7H12 64-Core Processor (49, 0x31)

CPU revision             : 0.000000
CPUID                    : Family/Model/Stepping 23/49/0, 0x17/0x31/0x00
CPU Max MHz              : 2600
CPU Min MHz              : 1500
Total cores              : 128
SMT threads per core     : 1

Cores per socket         : 64
Sockets                  : 2
Cores per NUMA region    : 16

NUMA regions             : 8
Running in a VM          : no
Number Hardware Counters : 5
Max Multiplex Counters   : 384
Fast counter read (rdpmc): yes
--------------------------------------------------------------------------------

================================================================================
  PAPI Preset Events
================================================================================
    Name        Code    Avail Deriv Description (Note)
PAPI_L1_DCM  0x80000000  No    No   Level 1 data cache misses
PAPI_L1_ICM  0x80000001  No    No   Level 1 instruction cache misses
PAPI_L2_DCM  0x80000002  No    No   Level 2 data cache misses
PAPI_L2_ICM  0x80000003  No    No   Level 2 instruction cache misses
PAPI_L3_DCM  0x80000004  No    No   Level 3 data cache misses
PAPI_L3_ICM  0x80000005  No    No   Level 3 instruction cache misses
PAPI_L1_TCM  0x80000006  No    No   Level 1 cache misses
PAPI_L2_TCM  0x80000007  No    No   Level 2 cache misses
PAPI_L3_TCM  0x80000008  No    No   Level 3 cache misses
PAPI_CA_SNP  0x80000009  No    No   Requests for a snoop
PAPI_CA_SHR  0x8000000a  No    No   Requests for exclusive access to shared cache line
PAPI_CA_CLN  0x8000000b  No    No   Requests for exclusive access to clean cache line
PAPI_CA_INV  0x8000000c  No    No   Requests for cache line invalidation
PAPI_CA_ITV  0x8000000d  No    No   Requests for cache line intervention
PAPI_L3_LDM  0x8000000e  No    No   Level 3 load misses
PAPI_L3_STM  0x8000000f  No    No   Level 3 store misses
PAPI_BRU_IDL 0x80000010  No    No   Cycles branch units are idle
PAPI_FXU_IDL 0x80000011  No    No   Cycles integer units are idle
PAPI_FPU_IDL 0x80000012  No    No   Cycles floating point units are idle
PAPI_LSU_IDL 0x80000013  No    No   Cycles load/store units are idle
PAPI_TLB_DM  0x80000014  Yes   No   Data translation lookaside buffer misses
PAPI_TLB_IM  0x80000015  Yes   Yes  Instruction translation lookaside buffer misses
PAPI_TLB_TL  0x80000016  No    No   Total translation lookaside buffer misses
PAPI_L1_LDM  0x80000017  No    No   Level 1 load misses
PAPI_L1_STM  0x80000018  No    No   Level 1 store misses
PAPI_L2_LDM  0x80000019  No    No   Level 2 load misses
PAPI_L2_STM  0x8000001a  No    No   Level 2 store misses
PAPI_BTAC_M  0x8000001b  No    No   Branch target address cache misses
PAPI_PRF_DM  0x8000001c  No    No   Data prefetch cache misses
PAPI_L3_DCH  0x8000001d  No    No   Level 3 data cache hits
PAPI_TLB_SD  0x8000001e  No    No   Translation lookaside buffer shootdowns
PAPI_CSR_FAL 0x8000001f  No    No   Failed store conditional instructions
PAPI_CSR_SUC 0x80000020  No    No   Successful store conditional instructions
PAPI_CSR_TOT 0x80000021  No    No   Total store conditional instructions
PAPI_MEM_SCY 0x80000022  No    No   Cycles Stalled Waiting for memory accesses
PAPI_MEM_RCY 0x80000023  No    No   Cycles Stalled Waiting for memory Reads
PAPI_MEM_WCY 0x80000024  No    No   Cycles Stalled Waiting for memory writes
PAPI_STL_ICY 0x80000025  No    No   Cycles with no instruction issue
PAPI_FUL_ICY 0x80000026  No    No   Cycles with maximum instruction issue
PAPI_STL_CCY 0x80000027  No    No   Cycles with no instructions completed
PAPI_FUL_CCY 0x80000028  No    No   Cycles with maximum instructions completed
PAPI_HW_INT  0x80000029  No    No   Hardware interrupts
PAPI_BR_UCN  0x8000002a  No    No   Unconditional branch instructions
PAPI_BR_CN   0x8000002b  No    No   Conditional branch instructions
PAPI_BR_TKN  0x8000002c  Yes   No   Conditional branch instructions taken
PAPI_BR_NTK  0x8000002d  No    No   Conditional branch instructions not taken
PAPI_BR_MSP  0x8000002e  Yes   No   Conditional branch instructions mispredicted
PAPI_BR_PRC  0x8000002f  No    No   Conditional branch instructions correctly predicted
PAPI_FMA_INS 0x80000030  No    No   FMA instructions completed
PAPI_TOT_IIS 0x80000031  No    No   Instructions issued
PAPI_TOT_INS 0x80000032  Yes   No   Instructions completed
PAPI_INT_INS 0x80000033  No    No   Integer instructions
PAPI_FP_INS  0x80000034  No    No   Floating point instructions
PAPI_LD_INS  0x80000035  No    No   Load instructions
PAPI_SR_INS  0x80000036  No    No   Store instructions
PAPI_BR_INS  0x80000037  Yes   No   Branch instructions
PAPI_VEC_INS 0x80000038  No    No   Vector/SIMD instructions (could include integer)
PAPI_RES_STL 0x80000039  No    No   Cycles stalled on any resource
PAPI_FP_STAL 0x8000003a  No    No   Cycles the FP unit(s) are stalled
PAPI_TOT_CYC 0x8000003b  Yes   No   Total cycles
PAPI_LST_INS 0x8000003c  No    No   Load/store instructions completed
PAPI_SYC_INS 0x8000003d  No    No   Synchronization instructions completed
PAPI_L1_DCH  0x8000003e  No    No   Level 1 data cache hits
PAPI_L2_DCH  0x8000003f  No    No   Level 2 data cache hits
PAPI_L1_DCA  0x80000040  No    No   Level 1 data cache accesses
PAPI_L2_DCA  0x80000041  No    No   Level 2 data cache accesses
PAPI_L3_DCA  0x80000042  No    No   Level 3 data cache accesses
PAPI_L1_DCR  0x80000043  No    No   Level 1 data cache reads
PAPI_L2_DCR  0x80000044  No    No   Level 2 data cache reads
PAPI_L3_DCR  0x80000045  No    No   Level 3 data cache reads
PAPI_L1_DCW  0x80000046  No    No   Level 1 data cache writes
PAPI_L2_DCW  0x80000047  No    No   Level 2 data cache writes
PAPI_L3_DCW  0x80000048  No    No   Level 3 data cache writes
PAPI_L1_ICH  0x80000049  No    Yes  Level 1 instruction cache hits
PAPI_L2_ICH  0x8000004a  No    No   Level 2 instruction cache hits
PAPI_L3_ICH  0x8000004b  No    No   Level 3 instruction cache hits
PAPI_L1_ICA  0x8000004c  No    No   Level 1 instruction cache accesses
PAPI_L2_ICA  0x8000004d  No    No   Level 2 instruction cache accesses
PAPI_L3_ICA  0x8000004e  No    No   Level 3 instruction cache accesses
PAPI_L1_ICR  0x8000004f  No    No   Level 1 instruction cache reads
PAPI_L2_ICR  0x80000050  No    No   Level 2 instruction cache reads
PAPI_L3_ICR  0x80000051  No    No   Level 3 instruction cache reads
PAPI_L1_ICW  0x80000052  No    No   Level 1 instruction cache writes
PAPI_L2_ICW  0x80000053  No    No   Level 2 instruction cache writes
PAPI_L3_ICW  0x80000054  No    No   Level 3 instruction cache writes
PAPI_L1_TCH  0x80000055  No    No   Level 1 total cache hits
PAPI_L2_TCH  0x80000056  No    No   Level 2 total cache hits
PAPI_L3_TCH  0x80000057  No    No   Level 3 total cache hits
PAPI_L1_TCA  0x80000058  No    Yes  Level 1 total cache accesses
PAPI_L2_TCA  0x80000059  No    No   Level 2 total cache accesses
PAPI_L3_TCA  0x8000005a  No    No   Level 3 total cache accesses
PAPI_L1_TCR  0x8000005b  No    No   Level 1 total cache reads
PAPI_L2_TCR  0x8000005c  No    No   Level 2 total cache reads
PAPI_L3_TCR  0x8000005d  No    No   Level 3 total cache reads
PAPI_L1_TCW  0x8000005e  No    No   Level 1 total cache writes
PAPI_L2_TCW  0x8000005f  No    No   Level 2 total cache writes
PAPI_L3_TCW  0x80000060  No    No   Level 3 total cache writes
PAPI_FML_INS 0x80000061  No    No   Floating point multiply instructions
PAPI_FAD_INS 0x80000062  No    No   Floating point add instructions
PAPI_FDV_INS 0x80000063  No    No   Floating point divide instructions
PAPI_FSQ_INS 0x80000064  No    No   Floating point square root instructions
PAPI_FNV_INS 0x80000065  No    No   Floating point inverse instructions
PAPI_FP_OPS  0x80000066  No    No   Floating point operations
PAPI_SP_OPS  0x80000067  No    No   Floating point operations; optimized to count scaled single precision vector operations
PAPI_DP_OPS  0x80000068  No    No   Floating point operations; optimized to count scaled double precision vector operations
PAPI_VEC_SP  0x80000069  No    No   Single precision vector/SIMD instructions
PAPI_VEC_DP  0x8000006a  No    No   Double precision vector/SIMD instructions
PAPI_REF_CYC 0x8000006b  No    No   Reference clock cycles
--------------------------------------------------------------------------------
Of 108 possible events, 7 are available, of which 3 are derived.

[satishk@tcn20 ~]$

Is there going to be further support for AMD zen2/3 architechtures within PAPI?

With best regards,
Satish Kamath

Advisor SURF Amsterdam 

Anthony Danalis

unread,
Feb 1, 2022, 1:29:08 PM2/1/22
to Satish Kamath, ptools-perfapi, Jose Gracia, Björn Dick
Hello,

Are you using a recent enough version of PAPI? Can you try pulling
from the git repo (git clone g...@bitbucket.org:icl/papi.git)
Do you link against an external libpfm4?

On a local machine that is also Zen2 (Family/Model/Stepping 23/49/0)
we get the following cache preset events:
PAPI_L1_ICM 0x80000001 Yes No Level 1 instruction cache misses
PAPI_L2_DCM 0x80000002 Yes No Level 2 data cache misses
PAPI_L2_ICM 0x80000003 Yes No Level 2 instruction cache misses
PAPI_L2_DCH 0x8000003f Yes No Level 2 data cache hits
PAPI_L1_DCA 0x80000040 Yes No Level 1 data cache accesses
PAPI_L2_DCR 0x80000044 Yes No Level 2 data cache reads
PAPI_L2_ICH 0x8000004a Yes No Level 2 instruction cache hits
PAPI_L2_ICR 0x80000050 Yes No Level 2 instruction cache reads

However, on our machine papi_component_avail lists the following PMUs
under perf_event:
PMUs supported: perf, perf_raw,
amd64_fam17h_zen2
On your output, I noticed that "amd64_fam17h_zen2" is missing.

Anthony
> --
> You received this message because you are subscribed to the Google Groups "ptools-perfapi" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to ptools-perfap...@icl.utk.edu.
> To view this discussion on the web visit https://groups.google.com/a/icl.utk.edu/d/msgid/ptools-perfapi/bb82e21d-b0f5-42bb-aeb5-f4361e612619n%40icl.utk.edu.

Satish Kamath

unread,
Feb 1, 2022, 5:37:31 PM2/1/22
to ptools-perfapi, adanalis, ptools-perfapi, Jose Gracia, Björn Dick, Satish Kamath
Hi Anthony,

Thank you for your reply. We are using PAPI-6.0.0.1 compiled with GCCcore-10.3.0. We do not link against external libpfm4.

I forgot to attach the result of papi_component_avail.

[satishk@tcn20 ~]$ papi_component_avail
Available components and hardware information.
--------------------------------------------------------------------------------
PAPI version             : 6.0.0.1
Operating system         : Linux 4.18.0-305.25.1.el8_4.x86_64
Vendor string and code   : AuthenticAMD (2, 0x2)
Model string and code    : AMD EPYC 7H12 64-Core Processor (49, 0x31)
CPU revision             : 0.000000
CPUID                    : Family/Model/Stepping 23/49/0, 0x17/0x31/0x00
CPU Max MHz              : 2600
CPU Min MHz              : 1500
Total cores              : 128
SMT threads per core     : 1
Cores per socket         : 64
Sockets                  : 2
Cores per NUMA region    : 16
NUMA regions             : 8
Running in a VM          : no
Number Hardware Counters : 5
Max Multiplex Counters   : 384
Fast counter read (rdpmc): yes
--------------------------------------------------------------------------------

Compiled-in components:
Name:   perf_event              Linux perf_event CPU counters
Name:   perf_event_uncore       Linux perf_event CPU uncore and northbridge
   \-> Disabled: No uncore PMUs or events found

Active components:
Name:   perf_event              Linux perf_event CPU counters
                                Native: 135, Preset: 7, Counters: 5
                                PMUs supported: perf, perf_raw, amd64_fam17h_zen2


--------------------------------------------------------------------------------
 
I do get amd64_fam17h_zen2 . 

With best regards,
Satish Kamath

Anthony Danalis

unread,
Feb 1, 2022, 5:38:50 PM2/1/22
to Satish Kamath, ptools-perfapi, Jose Gracia, Björn Dick
Did you download PAPI as a tarball, or did you clone from the repo? If
the former, can you please try the latest version from the repo?

thanks,
Anthony

Дмитрий Хаби

unread,
Feb 15, 2022, 11:15:39 AM2/15/22
to ptools-perfapi, adanalis, ptools-perfapi, Jose Gracia, Björn Dick, Satish Kamath
I used the the latest source from https://bitbucket.org/icl/papi/downloads/ (icl-papi-8cdd7e90b398)
The results are mostly the same as above. It looks like the core architecture (AMD Zen2) is supportde,
but what about the EPYC Uncore architecture (see below: Disabled: No uncore PMUs or events found)?
Can it depend on the kernel version?
Thank you

papi_component_avail
Available components and hardware information.
--------------------------------------------------------------------------------
PAPI version             : 6.0.0.1
Operating system         : Linux 4.18.0-348.2.1.el8_5.x86_64

Vendor string and code   : AuthenticAMD (2, 0x2)
Model string and code    : AMD EPYC 7742 64-Core Processor (49, 0x31)
CPU revision             : 0.000000
CPUID                    : Family/Model/Stepping 23/49/0, 0x17/0x31/0x00
CPU Max MHz              : 2250

CPU Min MHz              : 1500
Total cores              : 128
SMT threads per core     : 2
Cores per socket         : 64
Sockets                  : 1

Cores per NUMA region    : 32
NUMA regions             : 4

Running in a VM          : no
Number Hardware Counters : 5
Max Multiplex Counters   : 384
Fast counter read (rdpmc): yes
--------------------------------------------------------------------------------

Compiled-in components:
Name:   perf_event              Linux perf_event CPU counters
Name:   perf_event_uncore       Linux perf_event CPU uncore and northbridge
   \-> Disabled: No uncore PMUs or events found

Active components:
Name:   perf_event              Linux perf_event CPU counters
                                Native: 138, Preset: 17, Counters: 5

                                PMUs supported: perf, perf_raw, amd64_fam17h_zen2

Anthony Danalis

unread,
Feb 15, 2022, 11:24:55 AM2/15/22
to Дмитрий Хаби, ptools-perfapi, Jose Gracia, Björn Dick, Satish Kamath
This looks like a permissions issue. What is the value in
/proc/sys/kernel/perf_event_paranoid?

Thanks,
Anthony

Anthony Danalis

unread,
Mar 21, 2022, 1:34:17 PM3/21/22
to Radim Vavrik, ptools-perfapi, Jose Gracia, Björn Dick, Satish Kamath, Дмитрий Хаби
Hello Radim,

Are you using the latest PAPI from the repo?
The other user reporting issues on the Zen2 had problems with the
un-core events, but was able to see 17 presets under the core
(perf_event) component.

thanks,
Anthony

On Mon, Mar 21, 2022 at 12:06 PM Radim Vavrik <radim...@gmail.com> wrote:
>
> Hello all,
> we have the same issue, note only 7 presets available:
> papi_component_avail
> Available components and hardware information.
> --------------------------------------------------------------------------------
> PAPI version : 6.0.0.1
> Operating system : Linux 3.10.0-1160.59.1.el7.x86_64
> Vendor string and code : AuthenticAMD (2, 0x2)
> Model string and code : AMD EPYC 7H12 64-Core Processor (49, 0x31)
> CPU revision : 0.000000
> CPUID : Family/Model/Stepping 23/49/0, 0x17/0x31/0x00
> CPU Max MHz : 2595
> CPU Min MHz : 2595
> Total cores : 128
> SMT threads per core : 1
> Cores per socket : 64
> Sockets : 2
> Cores per NUMA region : 16
> NUMA regions : 8
> Running in a VM : no
> Number Hardware Counters : 5
> Max Multiplex Counters : 384
> Fast counter read (rdpmc): no
> --------------------------------------------------------------------------------
>
> Compiled-in components:
> Name: perf_event Linux perf_event CPU counters
> Name: perf_event_uncore Linux perf_event CPU uncore and northbridge
> \-> Disabled: No uncore PMUs or events found
>
> Active components:
> Name: perf_event Linux perf_event CPU counters
> Native: 135, Preset: 7, Counters: 5
> PMUs supported: perf, perf_raw, amd64_fam17h_zen2
>
>
> --------------------------------------------------------------------------------
>
> The value in /proc/sys/kernel/perf_event_paranoid is 2
>
> Thanks for any suggestions,
> Radim
>
> Dne úterý 15. února 2022 v 17:24:55 UTC+1 uživatel adanalis napsal:

Radim Vavrik

unread,
Mar 21, 2022, 1:36:37 PM3/21/22
to ptools-perfapi, adanalis, ptools-perfapi, Jose Gracia, Björn Dick, Satish Kamath, Дмитрий Хаби
Hello all,
we have the same issue, note only 7 presets available:
papi_component_avail
Available components and hardware information.
--------------------------------------------------------------------------------
PAPI version             : 6.0.0.1
Operating system         : Linux 3.10.0-1160.59.1.el7.x86_64

Vendor string and code   : AuthenticAMD (2, 0x2)
Model string and code    : AMD EPYC 7H12 64-Core Processor (49, 0x31)
CPU revision             : 0.000000
CPUID                    : Family/Model/Stepping 23/49/0, 0x17/0x31/0x00
CPU Max MHz              : 2595
CPU Min MHz              : 2595
Total cores              : 128
SMT threads per core     : 1
Cores per socket         : 64
Sockets                  : 2
Cores per NUMA region    : 16
NUMA regions             : 8
Running in a VM          : no
Number Hardware Counters : 5
Max Multiplex Counters   : 384
Fast counter read (rdpmc): no

--------------------------------------------------------------------------------

Compiled-in components:
Name:   perf_event              Linux perf_event CPU counters
Name:   perf_event_uncore       Linux perf_event CPU uncore and northbridge
   \-> Disabled: No uncore PMUs or events found

Active components:
Name:   perf_event              Linux perf_event CPU counters
                                Native: 135, Preset: 7, Counters: 5
                                PMUs supported: perf, perf_raw, amd64_fam17h_zen2


--------------------------------------------------------------------------------

The value in /proc/sys/kernel/perf_event_paranoid is 2

Thanks for any suggestions,
Radim

Dne úterý 15. února 2022 v 17:24:55 UTC+1 uživatel adanalis napsal:
This looks like a permissions issue. What is the value in

Satish Kamath

unread,
Mar 24, 2023, 4:25:30 PM3/24/23
to ptools-perfapi, Radim Vavrik, adanalis, ptools-perfapi, Jose Gracia, Björn Dick, Satish Kamath, Дмитрий Хаби
Dear Anthony,

I have been following the conversation here. The value of /proc/sys/kernel/perf_event_paranoid is 2.

I can discuss with our system admins about changing the settings.  CAP_SYS_ADMIN is the only change that is required? Also if PAPI is already installed in the current state, then do I need to re-install it with the above change? 

Also, any other settings that are required?

With best regards,
Satish Kamath



Reply all
Reply to author
Forward
0 new messages