On Sun, 29 Nov 2020, Khaled Z Mahmoud wrote:
> I am reading performance counters in a multithreaded application ever 1ms.
> I realized using pure Perf, there is an 4x overhead. With PAPI, there is almost no overhead.
you're probably seeing the difference of PAPI using the rdpmc instruction.
You can read more about that here:
http://web.eece.maine.edu/~vweaver/projects/papi-rdpmc/