Problem sampling performance counters of a running process?

69 views
Skip to first unread message

Kamil Iskra

unread,
Jun 22, 2023, 11:39:47 AM6/22/23
to ptools-...@icl.utk.edu
Good morning,

I'm having strange problems doing periodic sampling of even the most basic
performance counters when it comes to attached processes, to the point that
I can't help wondering if I'm not using the correct API? Or is this a use
case that PAPI simply wasn't designed for (because it seems to be too
simple a problem to be a bug that wasn't caught).

Anyway, I won't bore you with our actual source code, but I modified one of
the PAPI testcases (ctests/attach2.c) to demonstrate the same issue. The
complete source code is attached to this email but here's the diff:

--- attach2.c.orig 2023-06-22 09:59:05.220181155 -0500
+++ attach2.c 2023-06-22 09:59:05.206847812 -0500
@@ -48,7 +48,7 @@ wait_for_attach_and_loop( void )
putenv(newpath);

if (ptrace(PTRACE_TRACEME, 0, 0, 0) == 0) {
- execlp("attach_target","attach_target","100000000",NULL);
+ execlp("attach_target","attach_target","1000000000",NULL);
perror("execl(attach_target) failed");
}
perror("PTRACE_TRACEME");
@@ -176,6 +176,16 @@ main( int argc, char **argv )
return 1;
}

+ sleep(1);
+ retval = PAPI_read(EventSet1, values[0]);
+// retval = PAPI_stop(EventSet1, values[0]);
+ if ( retval != PAPI_OK )
+ test_fail( __FILE__, __LINE__, "PAPI_read", retval );
+ printf( TAB1, "PAPI_TOT_CYC : \t", ( values[0] )[0] );
+ printf( "%s : \t %12lld\n",event_name, ( values[0] )[1]);
+// retval = PAPI_start(EventSet1);
+// if ( retval != PAPI_OK )
+// test_fail( __FILE__, __LINE__, "PAPI_read", retval );

do {
child = wait( &status );

Basically, I'm invoking PAPI_read() while the attached process is still
running (I increased its runtime to be a couple of seconds on my system so
that the sleep(1) makes sense). And I'm getting garbage results:

must_ptrace is 1
Debugger exited wait() with 27085
Child has stopped due to signal 5 (Trace/breakpoint trap)
After 0
Continuing
PAPI_TOT_CYC : 140737488355327
PAPI_TOT_INS : 140737488355327
Debugger exited wait() with 27085
Child exited with value 0
Test case: 3rd party attach start, stop.
-----------------------------------------------
Default domain is: 1 (PAPI_DOM_USER)
Default granularity is: 1 (PAPI_GRN_THR)
Using 20000000 iterations of c += a*b
-------------------------------------------------------------------------
Test type : 1
PAPI_TOT_CYC : 9145700936
PAPI_TOT_INS : 10000141883
Real usec : 2659234
Real cycles : 7722401191
Virt usec : 242
Virt cycles : 836500
-------------------------------------------------------------------------
Verification: none
PASSED

As you can see from the two added lines immediately following "Continuing",
instead of valid counts I get 2^47-1. I tried doing
PAPI_stop()/PAPI_start() instead of PAPI_read() (see the commented out
code) but that doesn't seem to help.

This is on my old x86 laptop (Intel iCore i7-7500U, Gentoo stable, Linux
kernel version 6.3.3) with the latest release of PAPI (7.0.1) but to check
that I'm not crazy I also compiled it on a login node of the ALCF sunspot
(Intel Xeon Gold 5320, SLES 15-SP3, Linux kernel
5.3.18-150300.59.115-default), with the same outcome.

Could you shed some light on what might be going on here and how to fix it?

Thank you,

Kamil

--
Kamil Iskra, PhD
Argonne National Laboratory, Mathematics and Computer Science Division
9700 South Cass Avenue, Building 240, Lemont, IL 60439, USA
phone: +1-630-252-7197 fax: +1-630-252-5986
attach2.c

Giuseppe Congiu

unread,
Jun 22, 2023, 2:49:20 PM6/22/23
to Kamil Iskra, ptools-...@icl.utk.edu
Hi Kamil,

I have created an issue to track this https://github.com/icl-utk-edu/papi/issues/29

Thank you,
Giuseppe

--
You received this message because you are subscribed to the Google Groups "ptools-perfapi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ptools-perfap...@icl.utk.edu.
To view this discussion on the web visit https://groups.google.com/a/icl.utk.edu/d/msgid/ptools-perfapi/ZJRoqythDVFhhGwj%40mcs.anl.gov.
<attach2.c>

Vince Weaver

unread,
Jun 22, 2023, 3:27:16 PM6/22/23
to Kamil Iskra, ptools-...@icl.utk.edu
On Thu, 22 Jun 2023, 'Kamil Iskra' via ptools-perfapi wrote:

> Good morning,
>
> I'm having strange problems doing periodic sampling of even the most basic
> performance counters when it comes to attached processes, to the point that
> I can't help wondering if I'm not using the correct API? Or is this a use
> case that PAPI simply wasn't designed for (because it seems to be too
> simple a problem to be a bug that wasn't caught).

try compiling PAPI with
./configure --disable-perfevent-rdpmc

there is a bug with the Linux kernel that allows you to use rdpmc user
reads without reporting an error when attaching, even though this won't
work (due to the program being attached to likely be running on a
different core than the one doing the measurements).

I had tried at one point to get an answer from the kernel devs on whether
this was the exepcted behavior (silent failure) but never managed to get a
good answer from them.

It looks like the proper fix is to have PAPI refuse to use rdpmc in the
event attach case.

Vince Weaver
vincent...@maine.edu

Kamil Iskra

unread,
Jun 22, 2023, 5:24:00 PM6/22/23
to Vince Weaver, ptools-...@icl.utk.edu
On Thu, Jun 22, 2023 at 15:27:09 -0400, Vince Weaver wrote:

> try compiling PAPI with
> ./configure --disable-perfevent-rdpmc

Indeed, this seems to do the trick! The testcase works as expected now.

Thank you,

Kamil

Phil Mucci

unread,
Jun 26, 2023, 1:05:57 PM6/26/23
to ptools-...@icl.utk.edu, Vince Weaver
Hi Vince,

Hope you are well… is there any reason we couldn’t disable this functionality at run-time instead of compile time?

Ie if it’s compiled in, we disable the feature if the eventset is attached (or if some global is set).

Phil

> On Jun 22, 2023, at 23:24, 'Kamil Iskra' via ptools-perfapi <ptools-...@icl.utk.edu> wrote:
> --
> You received this message because you are subscribed to the Google Groups "ptools-perfapi" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to ptools-perfap...@icl.utk.edu.
> To view this discussion on the web visit https://groups.google.com/a/icl.utk.edu/d/msgid/ptools-perfapi/ZJS7xw946jkESwj1%40mcs.anl.gov.

Vince Weaver

unread,
Jun 27, 2023, 10:31:09 AM6/27/23
to Phil Mucci, ptools-...@icl.utk.edu

yes, in theory it should be possible to just not use rdpmc when attaching.

I was working on that last time this came up but then the scope got bigger
as I was trying to chase down from the linux-kernel people the root cause
of the issue and also was trying to work out some issues with ARM rdpmc
support, and that dragged on long enough that other things came up and I
never finished the work.

I'll see if I can work out some sort of patch in the next day or two.

Vince
Vince Weaver
vincent...@maine.edu
Associate Professor, Electrical and Computer Engineering
http://web.eece.maine.edu/~vweaver/

Anthony Danalis

unread,
Jun 27, 2023, 10:35:23 AM6/27/23
to woo...@redhat.com, Phil Mucci, ptools-...@icl.utk.edu, Vince Weaver
I'm adding Ben Woodart in this thread, as he can probably get an
answer from kernel developers.

Anthony
> To view this discussion on the web visit https://groups.google.com/a/icl.utk.edu/d/msgid/ptools-perfapi/fcffc86e-3f06-1128-9ec5-2d732cdbf041%40maine.edu.

Vince Weaver

unread,
Jun 27, 2023, 11:28:53 AM6/27/23
to Vince Weaver, Phil Mucci, ptools-...@icl.utk.edu

OK, the patch might be as simple as this. Someone should look over it
though as the perf_event code can be hard to follow at times.

On my one test machine this seems to pass the test suite except for
"attach_cpu_sys_validate" which if I recall might have other issues and I
should look into it.



Don't use fast rdpmc reads when reading an attached event. This is not
supported on Linux (rdpmc can only really work in self-monitoring cases
where the read is happening on the same core and in the same process
context) but unfortunately Linux doesn't report an error if you try to do
this.

diff --git a/src/components/perf_event/perf_event.c b/src/components/perf_event/perf_event.c
index 331288c55..0ab42f3b8 100644
--- a/src/components/perf_event/perf_event.c
+++ b/src/components/perf_event/perf_event.c
@@ -1309,7 +1309,9 @@ _pe_read( hwd_context_t *ctx, hwd_control_state_t *ctl,
/* FIXME: we fallback to slow reads if *any* event in eventset fails */
/* in theory we could only fall back for the one event */
/* but that makes the code more complicated. */
- if ((_perf_event_vector.cmp_info.fast_counter_read) && (!pe_ctl->inherit)) {
+ if ((_perf_event_vector.cmp_info.fast_counter_read) &&
+ (!pe_ctl->inherit) &&
+ (!pe_ctl->attached)) {
result=_pe_rdpmc_read( ctx, ctl, events, flags);
/* if successful we are done, otherwise fall back to read */
if (result==PAPI_OK) return PAPI_OK;

Vince Weaver

unread,
Jun 27, 2023, 11:48:52 AM6/27/23
to Vince Weaver, Phil Mucci, ptools-...@icl.utk.edu

> On my one test machine this seems to pass the test suite except for
> "attach_cpu_sys_validate" which if I recall might have other issues and I
> should look into it.

and the cpu sys validate issue is similar. We might want this overall
patch which fixes both issues.

Long story: the rdpmc code reads a mmap()ed page before doing things, and
the Linux kernel can indicate if there's an issue and that you should fall
back to using the slower read() system call. Linux knows that certain
things won't work and *could* set the flag for fallback, and in the case
all users of the stock rdpmc code would just work. For various reasons
the kernel devs didn't want to do that, so instead all users of the rdpmc
interface (like PAPI) have to independently maintain this list of which
features the kernel will silently fail on and work around it ourselves.

This patch works around it but I don't know the performance impact of
doing a full check of all the features each read. I don't know if we
should just set a single "rdpmc allowed" flag at open() time and check
that instead.



diff --git a/src/components/perf_event/perf_event.c b/src/components/perf_event/perf_event.c
index 331288c55..7c16b6d0c 100644
--- a/src/components/perf_event/perf_event.c
+++ b/src/components/perf_event/perf_event.c
@@ -1309,7 +1309,10 @@ _pe_read( hwd_context_t *ctx, hwd_control_state_t *ctl,
/* FIXME: we fallback to slow reads if *any* event in eventset fails */
/* in theory we could only fall back for the one event */
/* but that makes the code more complicated. */
- if ((_perf_event_vector.cmp_info.fast_counter_read) && (!pe_ctl->inherit)) {
+ if ((_perf_event_vector.cmp_info.fast_counter_read) &&
+ (!pe_ctl->inherit) &&
+ (!pe_ctl->attached) &&
+ (pe_ctl->granularity==PAPI_GRN_THR)) {

Giuseppe Congiu

unread,
Jun 27, 2023, 12:06:32 PM6/27/23
to Vince Weaver, Phil Mucci, ptools-...@icl.utk.edu
Hi Vince,

On 27 Jun 2023, at 17:48, Vince Weaver <vincent...@maine.edu> wrote:


On my one test machine this seems to pass the test suite except for
"attach_cpu_sys_validate" which if I recall might have other issues and I
should look into it.

and the cpu sys validate issue is similar.  We might want this overall
patch which fixes both issues.

Long story:  the rdpmc code reads a mmap()ed page before doing things, and
the Linux kernel can indicate if there's an issue and that you should fall
back to using the slower read() system call.  Linux knows that certain
things won't work and *could* set the flag for fallback, and in the case
all users of the stock rdpmc code would just work.  For various reasons
the kernel devs didn't want to do that, so instead all users of the rdpmc
interface (like PAPI) have to independently maintain this list of which
features the kernel will silently fail on and work around it ourselves.

This patch works around it but I don't know the performance impact of
doing a full check of all the features each read.  I don't know if we
should just set a single "rdpmc allowed" flag at open() time and check
that instead.

The rdpmc flag sounds reasonable. I guess the overhead would be negligible (to none) if the flag is set at PAPI_attach() time.

If you want to create a PR with the patch you attached I can look at it. I have a bit of familiarity with the cpu component. I guess I can ping you if I need clarifications.

For the PR could you please use the new PAPI GitHub repo: https://github.com/icl-utk-edu/papi?

Thanks,
Giuseppe


diff --git a/src/components/perf_event/perf_event.c b/src/components/perf_event/perf_event.c
index 331288c55..7c16b6d0c 100644
--- a/src/components/perf_event/perf_event.c
+++ b/src/components/perf_event/perf_event.c
@@ -1309,7 +1309,10 @@ _pe_read( hwd_context_t *ctx, hwd_control_state_t *ctl,
/* FIXME: we fallback to slow reads if *any* event in eventset fails */
/*        in theory we could only fall back for the one event        */
/*        but that makes the code more complicated.                  */
- if ((_perf_event_vector.cmp_info.fast_counter_read) && (!pe_ctl->inherit)) {
+ if ((_perf_event_vector.cmp_info.fast_counter_read) &&
+ (!pe_ctl->inherit) &&
+ (!pe_ctl->attached) &&
+ (pe_ctl->granularity==PAPI_GRN_THR)) {
result=_pe_rdpmc_read( ctx, ctl, events, flags);
/* if successful we are done, otherwise fall back to read */
if (result==PAPI_OK) return PAPI_OK;

--
You received this message because you are subscribed to the Google Groups "ptools-perfapi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ptools-perfap...@icl.utk.edu.
Reply all
Reply to author
Forward
0 new messages