[PATCH] perf_events: fix X86 bogus counts when multiplexing

era...@google.com

unread,

Mar 11, 2010, 1:20:01 AM3/11/10

to

This patch fixes a bug in 2.6.33 X86 event scheduling whereby
all counts are bogus as soon as events need to be multiplexed
because the PMU is overcommitted.

The code in hw_perf_enable() was causing multiplexed events
to accumulate collected counts twice causing bogus results.

This is demonstrated on AMD Barcelona with the example
below. First run, no conflict, you obtain the actual counts.
Second run, PMU overcommitted, multiplexing, all events are
over-counted. Third run, patch applied, you obtain the correct
count through scaling.

Intel processors would be affected in the same way.

# perf stat -e instructions,cycles ./noploop 10
noploop for 10 seconds

Performance counter stats for './noploop 10':

10884992991 instructions # 0.495 IPC
21976457932 cycles

10.000906311 seconds time elapsed

# perf stat -e instructions,instructions,instructions,instructions,cycles ./noploop 10
noploop for 10 seconds

Performance counter stats for './noploop 10':

16342703033 instructions # 1.000 IPC (scaled from 80.00%)
16337667144 instructions # 0.999 IPC (scaled from 80.00%)
16342494809 instructions # 1.000 IPC (scaled from 80.00%)
16344432632 instructions # 1.000 IPC (scaled from 80.00%)
16346620711 cycles (scaled from 80.00%)

10.015941304 seconds time elapsed

# perf stat -e instructions,instructions,instructions,instructions,cycles ./noploop 10
noploop for 10 seconds

Performance counter stats for './noploop 10':

10865832804 instructions # 0.495 IPC (scaled from 80.00%)
10866436957 instructions # 0.495 IPC (scaled from 80.00%)
10866172153 instructions # 0.495 IPC (scaled from 80.00%)
10866276672 instructions # 0.495 IPC (scaled from 80.00%)
21944300714 cycles (scaled from 80.00%)

10.000686860 seconds time elapsed

Signed-off-by: Stephane Eranian <era...@google.com>
--
perf_event.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 97cddbf..ef5d63f 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -818,8 +818,6 @@ void hw_perf_enable(void)
match_prev_assignment(hwc, cpuc, i))
continue;

- x86_pmu_stop(event);
-
hwc->idx = -1;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Peter Zijlstra

unread,

Mar 11, 2010, 3:40:02 AM3/11/10

to

On Wed, 2010-03-10 at 22:17 -0800, era...@google.com wrote:
> This patch fixes a bug in 2.6.33 X86 event scheduling whereby
> all counts are bogus as soon as events need to be multiplexed
> because the PMU is overcommitted.
>
> The code in hw_perf_enable() was causing multiplexed events
> to accumulate collected counts twice causing bogus results.
>
> This is demonstrated on AMD Barcelona with the example
> below. First run, no conflict, you obtain the actual counts.
> Second run, PMU overcommitted, multiplexing, all events are
> over-counted. Third run, patch applied, you obtain the correct
> count through scaling.
>

I'm a bit puzzled by this one, if we, during scheduling move an event
from idx 1 to idx 2, we need to stop it on 1 and start if on 2,
otherwise we do not properly transfer its count, right?

With the below patch it does no such thing.

I did fix some funnies I observed with hw_perf_enable() while doing the
PEBS stuff, and -tip does it wrong differently from what you illustrate,
so while there defenately is something to fix, I doubt the below is
correct.

Peter Zijlstra

unread,

Mar 11, 2010, 7:40:02 AM3/11/10

to

On Thu, 2010-03-11 at 09:32 +0100, Peter Zijlstra wrote:
> On Wed, 2010-03-10 at 22:17 -0800, era...@google.com wrote:
> > This patch fixes a bug in 2.6.33 X86 event scheduling whereby
> > all counts are bogus as soon as events need to be multiplexed
> > because the PMU is overcommitted.
> >
> > The code in hw_perf_enable() was causing multiplexed events
> > to accumulate collected counts twice causing bogus results.
> >
> > This is demonstrated on AMD Barcelona with the example
> > below. First run, no conflict, you obtain the actual counts.
> > Second run, PMU overcommitted, multiplexing, all events are
> > over-counted. Third run, patch applied, you obtain the correct
> > count through scaling.
> >
>
> I'm a bit puzzled by this one, if we, during scheduling move an event
> from idx 1 to idx 2, we need to stop it on 1 and start if on 2,
> otherwise we do not properly transfer its count, right?
>
> With the below patch it does no such thing.
>
> I did fix some funnies I observed with hw_perf_enable() while doing the
> PEBS stuff, and -tip does it wrong differently from what you illustrate,
> so while there defenately is something to fix, I doubt the below is
> correct.

OK, so what happens is that we schedule badly like:

<...>-1987 [019] 280.252808: x86_pmu_start: event-46/1300c0: idx: 0
<...>-1987 [019] 280.252811: x86_pmu_start: event-47/1300c0: idx: 1
<...>-1987 [019] 280.252812: x86_pmu_start: event-48/1300c0: idx: 2
<...>-1987 [019] 280.252813: x86_pmu_start: event-49/1300c0: idx: 3
<...>-1987 [019] 280.252814: x86_pmu_start: event-50/1300c0: idx: 32
<...>-1987 [019] 280.252825: x86_pmu_stop: event-46/1300c0: idx: 0
<...>-1987 [019] 280.252826: x86_pmu_stop: event-47/1300c0: idx: 1
<...>-1987 [019] 280.252827: x86_pmu_stop: event-48/1300c0: idx: 2
<...>-1987 [019] 280.252828: x86_pmu_stop: event-49/1300c0: idx: 3
<...>-1987 [019] 280.252829: x86_pmu_stop: event-50/1300c0: idx: 32
<...>-1987 [019] 280.252834: x86_pmu_start: event-47/1300c0: idx: 1
<...>-1987 [019] 280.252834: x86_pmu_start: event-48/1300c0: idx: 2
<...>-1987 [019] 280.252835: x86_pmu_start: event-49/1300c0: idx: 3
<...>-1987 [019] 280.252836: x86_pmu_start: event-50/1300c0: idx: 32
<...>-1987 [019] 280.252837: x86_pmu_start: event-51/1300c0: idx: 32 *FAIL*

This happens because we only iterate the n_running events in the first
pass, and reset their index to -1 if they don't match to force a
re-assignment.

Now, in our RR example, n_running == 0 because we fully unscheduled, so
event-50 will retain its idx==32, even though in scheduling it will have
gotten idx=0, and we don't trigger the re-assign path.

The easiest way to fix this is the below patch, which simply validates
the full assignment in the second pass.

---
arch/x86/kernel/cpu/perf_event.c | 12 +++---------
1 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 4a0514d..a3aff76 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -787,7 +787,6 @@ void hw_perf_enable(void)
* step2: reprogram moved events into new counters
*/
for (i = 0; i < n_running; i++) {
-
event = cpuc->event_list[i];
hwc = &event->hw;

@@ -802,21 +801,16 @@ void hw_perf_enable(void)
continue;

x86_pmu_stop(event);
-

- hwc->idx = -1;
}

for (i = 0; i < cpuc->n_events; i++) {
-
event = cpuc->event_list[i];
hwc = &event->hw;

- if (i < n_running &&
- match_prev_assignment(hwc, cpuc, i))
- continue;
-
- if (hwc->idx == -1)
+ if (!match_prev_assignment(hwc, cpuc, i))
x86_assign_hw_event(event, cpuc, i);
+ else if (i < n_running)
+ continue;

x86_pmu_start(event);

tip-bot for Peter Zijlstra

unread,

Mar 11, 2010, 9:50:01 AM3/11/10

to

Commit-ID: 45e16a6834b6af098702e5ea6c9a40de42ff77d8
Gitweb: http://git.kernel.org/tip/45e16a6834b6af098702e5ea6c9a40de42ff77d8
Author: Peter Zijlstra <a.p.zi...@chello.nl>
AuthorDate: Thu, 11 Mar 2010 13:40:30 +0100
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Thu, 11 Mar 2010 15:21:28 +0100

perf, x86: Fix hw_perf_enable() event assignment

What happens is that we schedule badly like:

<...>-1987 [019] 280.252808: x86_pmu_start: event-46/1300c0: idx: 0
<...>-1987 [019] 280.252811: x86_pmu_start: event-47/1300c0: idx: 1
<...>-1987 [019] 280.252812: x86_pmu_start: event-48/1300c0: idx: 2
<...>-1987 [019] 280.252813: x86_pmu_start: event-49/1300c0: idx: 3
<...>-1987 [019] 280.252814: x86_pmu_start: event-50/1300c0: idx: 32
<...>-1987 [019] 280.252825: x86_pmu_stop: event-46/1300c0: idx: 0
<...>-1987 [019] 280.252826: x86_pmu_stop: event-47/1300c0: idx: 1
<...>-1987 [019] 280.252827: x86_pmu_stop: event-48/1300c0: idx: 2
<...>-1987 [019] 280.252828: x86_pmu_stop: event-49/1300c0: idx: 3
<...>-1987 [019] 280.252829: x86_pmu_stop: event-50/1300c0: idx: 32
<...>-1987 [019] 280.252834: x86_pmu_start: event-47/1300c0: idx: 1
<...>-1987 [019] 280.252834: x86_pmu_start: event-48/1300c0: idx: 2
<...>-1987 [019] 280.252835: x86_pmu_start: event-49/1300c0: idx: 3
<...>-1987 [019] 280.252836: x86_pmu_start: event-50/1300c0: idx: 32
<...>-1987 [019] 280.252837: x86_pmu_start: event-51/1300c0: idx: 32 *FAIL*

This happens because we only iterate the n_running events in the first
pass, and reset their index to -1 if they don't match to force a
re-assignment.

Now, in our RR example, n_running == 0 because we fully unscheduled, so
event-50 will retain its idx==32, even though in scheduling it will have
gotten idx=0, and we don't trigger the re-assign path.

The easiest way to fix this is the below patch, which simply validates
the full assignment in the second pass.

Reported-by: Stephane Eranian <era...@google.com>
Signed-off-by: Peter Zijlstra <a.p.zi...@chello.nl>
LKML-Reference: <1268311069.5037.31.camel@laptop>
Signed-off-by: Ingo Molnar <mi...@elte.hu>

---
arch/x86/kernel/cpu/perf_event.c | 12 +++---------
1 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index c6bde7d..5fb490c 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -811,7 +811,6 @@ void hw_perf_enable(void)

* step2: reprogram moved events into new counters
*/
for (i = 0; i < n_running; i++) {
-
event = cpuc->event_list[i];
hwc = &event->hw;

@@ -826,21 +825,16 @@ void hw_perf_enable(void)