PAPI KNL Uncore

226 views
Skip to first unread message

Michael Knobloch

unread,
Aug 3, 2017, 5:12:16 AM8/3/17
to ptools-...@icl.utk.edu
Hi all,

I try to measure the CAS uncore events on KNL, but I failed so far.

papi_native_avail tells me that the events are there:

-bash-4.2$ ./papi_native_avail | grep CAS
| knl_unc_imc0::UNC_M_CAS_COUNT
|
| DRAM RD_CAS and WR_CAS Commands.
|
| Counts total number of DRAM CAS commands issued on this
channel |
| Counts number of DRAM write CAS commands on this channel
|
| knl_unc_imc1::UNC_M_CAS_COUNT
|
| DRAM RD_CAS and WR_CAS Commands.
|
| Counts total number of DRAM CAS commands issued on this
channel |
| Counts number of DRAM write CAS commands on this channel
|
| knl_unc_imc2::UNC_M_CAS_COUNT
|
| DRAM RD_CAS and WR_CAS Commands.
|
| Counts total number of DRAM CAS commands issued on this
channel |
| Counts number of DRAM write CAS commands on this channel
|
| knl_unc_imc3::UNC_M_CAS_COUNT
|
| DRAM RD_CAS and WR_CAS Commands.
|
| Counts total number of DRAM CAS commands issued on this
channel |
| Counts number of DRAM write CAS commands on this channel
|
| knl_unc_imc4::UNC_M_CAS_COUNT
|
| DRAM RD_CAS and WR_CAS Commands.
|
| Counts total number of DRAM CAS commands issued on this
channel |
| Counts number of DRAM write CAS commands on this channel
|
| knl_unc_imc5::UNC_M_CAS_COUNT
|
| DRAM RD_CAS and WR_CAS Commands.
|
| Counts total number of DRAM CAS commands issued on this
channel |
| Counts number of DRAM write CAS commands on this channel
|

But papi_command_line fails telling me the event does not exist:

-bash-4.2$ ./papi_command_line knl_unc_imc0::UNC_M_CAS_COUNT
Failed adding: knl_unc_imc0::UNC_M_CAS_COUNT
because: Event does not exist
command_line.c PASSED

I get the same error for every uncore event I tried so far. Adding RAPL
events the same way worked (as root).

I'm running PAPI 5.5.1 on an Intel Xeon Phi 7250:

--------------------------------------------------------------------------------
PAPI Version : 5.5.1.0
Vendor string and code : GenuineIntel (1)
Model string and code : Intel(R) Xeon Phi(TM) CPU 7250 @ 1.40GHz (87)
CPU Revision : 1.000000
CPUID Info : Family: 6 Model: 87 Stepping: 1
CPU Max Megahertz : 1600
CPU Min Megahertz : 1000
Hdw Threads per core : 4
Cores per Socket : 68
Sockets : 1
NUMA Nodes : 2
CPUs per Node : 136
Total CPUs : 272
Running in a VM : no
Number Hardware Counters : 5
Max Multiplex Counters : 384
--------------------------------------------------------------------------------

Compiled-in components:
Name: perf_event Linux perf_event CPU counters
Name: perf_event_uncore Linux perf_event CPU uncore and northbridge
Name: rapl Linux RAPL energy measurements
\-> Disabled: Can't open fd for cpu0: Permission denied

Active components:
Name: perf_event Linux perf_event CPU counters
Native: 119, Preset: 26, Counters: 5
PMU's supported: ix86arch, perf,
perf_raw, knl

Name: perf_event_uncore Linux perf_event CPU uncore and northbridge
Native: 3134, Preset: 0, Counters: 250
PMU's supported: knl_unc_imc0,
knl_unc_imc1, knl_unc_imc2, knl_unc_imc3, knl_unc_imc4, knl_unc_imc5
knl_unc_edc_eclk0,
knl_unc_edc_eclk1, knl_unc_edc_eclk2, knl_unc_edc_eclk3, knl_unc_edc_eclk4
knl_unc_edc_eclk5,
knl_unc_edc_eclk6, knl_unc_edc_eclk7, knl_unc_edc_uclk0, knl_unc_edc_uclk1
knl_unc_edc_uclk2,
knl_unc_edc_uclk3, knl_unc_edc_uclk4, knl_unc_edc_uclk5, knl_unc_edc_uclk6
knl_unc_edc_uclk7,
knl_unc_cha0, knl_unc_cha1, knl_unc_cha2, knl_unc_cha3, knl_unc_cha4,
knl_unc_cha5
knl_unc_cha6,
knl_unc_cha7, knl_unc_cha8, knl_unc_cha9, knl_unc_cha10, knl_unc_cha11,
knl_unc_cha12
knl_unc_cha13,
knl_unc_cha14, knl_unc_cha15, knl_unc_cha16, knl_unc_cha17


--------------------------------------------------------------------------------


Does anyone have an idea what I might be doing wrong and how to fix this
issue?

Thanks,

Michael


------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------

Brian Van Straalen

unread,
Aug 3, 2017, 11:37:22 AM8/3/17
to Michael Knobloch, ptools-...@icl.utk.edu
Have you tried

"knl_unc_imc0::UNC_M_CAS_COUNT:RD:cpu=0"

Brian Van Straalen


--
You received this message because you are subscribed to the Google Groups "ptools-perfapi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ptools-perfapi+unsubscribe@icl.utk.edu.
To post to this group, send email to ptools-...@icl.utk.edu.
Visit this group at https://groups.google.com/a/icl.utk.edu/group/ptools-perfapi/.



--
Brian Van Straalen         Lawrence Berkeley Lab
BVStr...@lbl.gov        Computational Research
(510) 486-4976            Division (crd.lbl.gov)

Michael Knobloch

unread,
Aug 3, 2017, 12:11:35 PM8/3/17
to Brian Van Straalen, ptools-...@icl.utk.edu
Thanks, that did it.

I've tried the cpu qualifier without any of the ALL, RD, WR options,
which leads to an invalid argument error. But giving both options work.

Do you have any pointers on what the values for cpu are doing? I got
results up to cpu=303 and invalid argument errors after that, but the
7250 "only" has 68 cores.

-Michael
> send an email to ptools-perfap...@icl.utk.edu
> <mailto:ptools-perfapi%2Bunsu...@icl.utk.edu>.
> To post to this group, send email to ptools-...@icl.utk.edu
> <mailto:ptools-...@icl.utk.edu>.
> <https://groups.google.com/a/icl.utk.edu/group/ptools-perfapi/>.
>
>
>
>
> --
> Brian Van Straalen Lawrence Berkeley Lab
> BVStr...@lbl.gov <mailto:BVStr...@lbl.gov> Computational Research
> (510) 486-4976 Division (crd.lbl.gov <http://crd.lbl.gov>)

Vince Weaver

unread,
Aug 3, 2017, 12:35:20 PM8/3/17
to Michael Knobloch, Brian Van Straalen, ptools-...@icl.utk.edu
On Thu, 3 Aug 2017, Michael Knobloch wrote:

> Do you have any pointers on what the values for cpu are doing? I got
> results up to cpu=303 and invalid argument errors after that, but the
> 7250 "only" has 68 cores.

You can look in the
/sys/devices/system/cpu/
directory to see how many CPUs Linux thinks you have.

Vince

Brian Van Straalen

unread,
Aug 3, 2017, 2:35:35 PM8/3/17
to Michael Knobloch, ptools-...@icl.utk.edu
I"m still figuring out how to correctly measure bandwidth with my hybrid MPI+OpenMP program.   I'm running on Haswells at the moment.  I have 32 core haswell processor.   I have run with 4 MPI ranks and 8 OpenMP threads, and there are 8 MCDRAM controllers, imc[0-7].   So, how do I measure my DRAM bandwidth for a job?  How many counters can I have?  I don't think the PAPI counters are thread or process aware, how many counters can I have counting?   The processor has 64 cpus (hyperthreading turned on, but I use proc bind with Intel OMP runtime to pin thread i to core i and i+32)

so, yeah, I'm still not sure what cpu= is doing in the counter name.   

Brian


>     send an email to ptools-perfapi+unsubscribe@icl.utk.edu
>     <mailto:ptools-perfapi%2Bunsu...@icl.utk.edu>.
>     To post to this group, send email to ptools-...@icl.utk.edu
>     <mailto:ptools-perfapi@icl.utk.edu>.

>     Visit this group at
>     https://groups.google.com/a/icl.utk.edu/group/ptools-perfapi/
>     <https://groups.google.com/a/icl.utk.edu/group/ptools-perfapi/>.
>
>
>
>
> --
> Brian Van Straalen         Lawrence Berkeley Lab
> BVStr...@lbl.gov <mailto:BVStr...@lbl.gov>        Computational Research
> (510) 486-4976            Division (crd.lbl.gov <http://crd.lbl.gov>)



--
Brian Van Straalen         Lawrence Berkeley Lab
BVStr...@lbl.gov        Computational Research
(510) 486-4976            Division (crd.lbl.gov)

Michael Knobloch

unread,
Aug 3, 2017, 4:30:05 PM8/3/17
to Vince Weaver, Brian Van Straalen, ptools-...@icl.utk.edu
/sys/devices/system/cpu/ shows 272 entries, so just what I'd expect from
the 68 cores with 4 threads each.

Still wondering where the additional 32 cpus are coming from and whether
there is a binding of the cpu qualifier of the counters and the entries
in /sys/devices/system/cpu/.

Anyway, my understanding is that the uncore counters cannot be mapped to
individual cores, so I'm still struggling to understand what the cpu
qualifier is doing.

-Michael

Michael Knobloch

unread,
Aug 3, 2017, 4:57:04 PM8/3/17
to Brian Van Straalen, ptools-...@icl.utk.edu
Yeah, that's exactly the same issue we're tackling. Interested in
collaborating?

-Michael
> > <m.kno...@fz-juelich.de <mailto:m.kno...@fz-juelich.de>
> <mailto:m.kno...@fz-juelich.de <mailto:m.kno...@fz-juelich.de>>>
> > send an email to ptools-perfap...@icl.utk.edu
> <mailto:ptools-perfapi%2Bunsu...@icl.utk.edu>
> > <mailto:ptools-perfapi%2Bunsu...@icl.utk.edu
> <mailto:ptools-perfapi%252Buns...@icl.utk.edu>>.
> > To post to this group, send email to ptools-...@icl.utk.edu <mailto:ptools-...@icl.utk.edu>
> > <mailto:ptools-...@icl.utk.edu
> <mailto:ptools-...@icl.utk.edu>>.
> <mailto:BVStr...@lbl.gov <mailto:BVStr...@lbl.gov>>
> Computational Research
> > (510) 486-4976 <tel:%28510%29%20486-4976> Division
> (crd.lbl.gov <http://crd.lbl.gov> <http://crd.lbl.gov>)
>
>
>
>
> --
> Brian Van Straalen Lawrence Berkeley Lab
> BVStr...@lbl.gov <mailto:BVStr...@lbl.gov> Computational Research
> (510) 486-4976 Division (crd.lbl.gov <http://crd.lbl.gov>)


Vince Weaver

unread,
Aug 3, 2017, 5:28:31 PM8/3/17
to Michael Knobloch, Brian Van Straalen, ptools-...@icl.utk.edu
On Thu, 3 Aug 2017, Michael Knobloch wrote:

> /sys/devices/system/cpu/ shows 272 entries, so just what I'd expect from
> the 68 cores with 4 threads each.
>
> Still wondering where the additional 32 cpus are coming from and whether
> there is a binding of the cpu qualifier of the counters and the entries
> in /sys/devices/system/cpu/.
>
> Anyway, my understanding is that the uncore counters cannot be mapped to
> individual cores, so I'm still struggling to understand what the cpu
> qualifier is doing.

Each package has its own set of uncore counters. I am not an expert on
KNL, but for example on a high end haswell-ep server you might have two
packages, each with 16 cores, and each core with 2 threads, for a total
of 64 CPUs seen by Linux. In this case there are two uncores, one for
each package.

So when specifying the event, you specify the CPU number to properly
indicate which package you want the measurements from. The PAPI and perf
interface is not great for this. For example in the case I gave above,
specifying from CPU=0 to CPU=31 would give you the results for the first
uncore (they are aliased to give the same results) and CPU=32 to CPU=63
would give you the results for the second uncore.

I assume KNL is similar but I don't know.

I'm not sure where your extra cores are coming from. The
perf_event_open() call takes the CPU field directly and the kernel should
reject any that are invalid. The actual CPU= parsing is done by libpfm4
so it's possible there are some bugs there too.

You can find the package/cpu mapping under
/sys/devices/system/cpu/cpu0/topology/

Vince
Reply all
Reply to author
Forward
0 new messages