Re: PAPI

165 views
Skip to first unread message

Heike Jagode

unread,
Jun 26, 2017, 2:58:21 PM6/26/17
to Collins, Gary Lynn, ptools-perfapi
Gary,

Glad it was helpful.

In the future, please send questions to the ptools-...@icl.utk.edu mailing list so that other members can chime in and help.

Thanks,
Heike


On Mon, Jun 26, 2017 at 2:51 PM, Collins, Gary Lynn <gary_c...@lanl.gov> wrote:

Ahh. I see. Thank you Heike! Could I ask you in the future if I have anymore questions about PAPI?


Thanks again!

Gary


From: Heike Jagode <jag...@icl.utk.edu>
Sent: Friday, June 23, 2017 2:23 PM
To: Collins, Gary Lynn
Subject: Re: PAPI
 
Gary,

A description the PAPI_get_virt_cyc can be found here:
http://icl.cs.utk.edu/projects/papi/wiki/PAPI3:PAPI_get_virt_cyc.3
What virtual timer is used depends on the platform you are running on. Also the resolution of this timer is determined by the OS.
Similarly, we have another call, papi_get_real_cyc, which is equivalent to wall clock time.

That said, if you are interested in clock cycles, then I would use real hardware counters, PAPI_TOT_CYC and PAPI_REF_CYC, if they are defined on the platform you are using. If you run the PAPI utility papi_avail, you can check the availability of these two event.

PAPI_TOT_CYC measures the number of cycles required to do a fixed amount of work. It should be roughly constant for constant work, regardless of the speed state a core is in.

PAPI_REF_CYC measures the number of cycles at a constant reference clock rate, independent of the actual clock rate of the core.

If the core is running at nominal clock rate, PAPI_TOT_CYC and PAPI_REF_CYC should match and the ratio should be approximately 1. If the core is in an idle state (such as at startup), the ratio of TOT / REF should be less than 1. If the core is accelerated above nominal, such as TurboBoost when only one core is active, the ratio of TOT / REF should be greater than 1.

In <papi-dir>/src/ctests, we have a test cycle_ratio.c that measures the ratio first from a roughly idle state. It then does floating point intensive work to push this core into a fully active or accelerated state, and then it measures the ratio again. Using this technique allows you to measure the effective clock rate of the processor over a specific region of code, allowing you to infer the state of acceleration.

Hope this helps.
Heike


On Thu, Jun 22, 2017 at 5:29 PM, Collins, Gary Lynn <gary_c...@lanl.gov> wrote:
Dr. Jagode,


I took COCS 594 last semester (I was hartwig's asian friend, if you remember), and I am wanting to use papi to time some algorithms that I am working on.


We are interested in the number of clock cycles and before we used chrono to time, multiplied by the clock speed, then scaled it by 20~30 percent in order to remove the effects of cpu boosting.


I did not think that it was the most correct route and wanted to use PAPI's "PAPI_get_virt_cyc()" function.


If I recall correctly, PAPI uses hardware counters so I should except these values to be reflect the total number of cycles for the algorithms (with a bit of error, of course, which we measured).


However, the plotted results from PAPI and chrono without the 30 % scaling were almost aligned. This caused me to be confused as I would expect that the results from PAPI and the unscaled chrono to be different.


So my question, as well as my adviser, is how exactly does PAPI measure clock cycles with the hardware counters?


Thank you!

Best,

Gary






--
___________________________________
Heike Jagode
Innovative Computing Laboratory (ICL)
University of Tennessee, Knoxville (UTK)
http://icl.utk.edu/~jagode/
Reply all
Reply to author
Forward
0 new messages