Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

2.6.38-rc2: Uhhuh. NMI received for unknown reason 2d on CPU 0.

2,556 views
Skip to first unread message

George Spelvin

unread,
Feb 1, 2011, 11:27:15 AM2/1/11
to linux-...@vger.kernel.org, li...@horizon.com
Since upgrading to -rc2 (-rc3 is compiling right now), I've been getting
complaints at irregular intervals. This didn't used to happen with 2.6.37.

It's an old crappy 1.6 GHz P4 (HP Pavilion) with an ASUS P4B266LA
motherboard and a 2001 Award BIOS.

00:00.0 Host bridge [0600]: Intel Corporation 82845 845 [Brookdale] Chipset Host Bridge [8086:1a30] (rev 04)
00:01.0 PCI bridge [0604]: Intel Corporation 82845 845 [Brookdale] Chipset AGP Bridge [8086:1a31] (rev 04)
00:1e.0 PCI bridge [0604]: Intel Corporation 82801 PCI Bridge [8086:244e] (rev 05)
00:1f.0 ISA bridge [0601]: Intel Corporation 82801BA ISA Bridge (LPC) [8086:2440] (rev 05)
00:1f.1 IDE interface [0101]: Intel Corporation 82801BA IDE U100 Controller [8086:244b] (rev 05)
00:1f.2 USB Controller [0c03]: Intel Corporation 82801BA/BAM USB Controller #1 [8086:2442] (rev 05)
00:1f.3 SMBus [0c05]: Intel Corporation 82801BA/BAM SMBus Controller [8086:2443] (rev 05)
00:1f.4 USB Controller [0c03]: Intel Corporation 82801BA/BAM USB Controller #1 [8086:2444] (rev 05)
00:1f.5 Multimedia audio controller [0401]: Intel Corporation 82801BA/BAM AC'97 Audio Controller [8086:2445] (rev 05)
01:00.0 VGA compatible controller [0300]: ATI Technologies Inc Radeon RV250 If [Radeon 9000] [1002:4966] (rev 01)
01:00.1 Display controller [0380]: ATI Technologies Inc Radeon RV250 [Radeon 9000] (Secondary) [1002:496e] (rev 01)
02:08.0 Ethernet controller [0200]: Intel Corporation 82801BA/BAM/CA/CAM Ethernet Controller [8086:2449] (rev 03)
02:09.0 FireWire (IEEE 1394) [0c00]: Texas Instruments TSB12LV26 IEEE-1394 Controller (Link) [104c:8020]


Should I bisect this, or does someone know what might be happening?

Thank you!


Jan 30 13:13:25 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Jan 30 13:13:25 kernel: Do you have a strange power saving mode enabled?
Jan 30 13:13:25 kernel: Dazed and confused, but trying to continue
Jan 30 17:51:10 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Jan 30 17:51:10 kernel: Do you have a strange power saving mode enabled?
Jan 30 17:51:10 kernel: Dazed and confused, but trying to continue
Jan 30 18:05:11 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Jan 30 18:05:11 kernel: Do you have a strange power saving mode enabled?
Jan 30 18:05:11 kernel: Dazed and confused, but trying to continue
Jan 30 18:19:16 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Jan 30 18:19:16 kernel: Do you have a strange power saving mode enabled?
Jan 30 18:19:16 kernel: Dazed and confused, but trying to continue
Jan 30 18:33:33 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Jan 30 18:33:33 kernel: Do you have a strange power saving mode enabled?
Jan 30 18:33:33 kernel: Dazed and confused, but trying to continue
Jan 30 18:48:23 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Jan 30 18:48:23 kernel: Do you have a strange power saving mode enabled?
Jan 30 18:48:23 kernel: Dazed and confused, but trying to continue
Jan 30 21:39:58 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Jan 30 21:39:58 kernel: Do you have a strange power saving mode enabled?
Jan 30 21:39:58 kernel: Dazed and confused, but trying to continue
Jan 30 22:01:46 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Jan 30 22:01:46 kernel: Do you have a strange power saving mode enabled?
Jan 30 22:01:46 kernel: Dazed and confused, but trying to continue
Jan 30 22:03:13 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Jan 30 22:03:13 kernel: Do you have a strange power saving mode enabled?
Jan 30 22:03:13 kernel: Dazed and confused, but trying to continue
Jan 30 22:04:38 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Jan 30 22:04:38 kernel: Do you have a strange power saving mode enabled?
Jan 30 22:04:38 kernel: Dazed and confused, but trying to continue
Jan 30 22:06:03 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Jan 30 22:06:03 kernel: Do you have a strange power saving mode enabled?
Jan 30 22:06:03 kernel: Dazed and confused, but trying to continue
Jan 30 22:07:23 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Jan 30 22:07:23 kernel: Do you have a strange power saving mode enabled?
Jan 30 22:07:23 kernel: Dazed and confused, but trying to continue
Jan 31 01:00:28 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Jan 31 01:00:28 kernel: Do you have a strange power saving mode enabled?
Jan 31 01:00:28 kernel: Dazed and confused, but trying to continue
Jan 31 03:00:02 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Jan 31 03:00:02 kernel: Do you have a strange power saving mode enabled?
Jan 31 03:00:02 kernel: Dazed and confused, but trying to continue
Jan 31 06:27:52 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Jan 31 06:27:52 kernel: Do you have a strange power saving mode enabled?
Jan 31 06:27:52 kernel: Dazed and confused, but trying to continue
Jan 31 07:36:54 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Jan 31 07:36:54 kernel: Do you have a strange power saving mode enabled?
Jan 31 07:36:54 kernel: Dazed and confused, but trying to continue
Jan 31 10:08:08 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Jan 31 10:08:08 kernel: Do you have a strange power saving mode enabled?
Jan 31 10:08:08 kernel: Dazed and confused, but trying to continue
Jan 31 16:42:02 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Jan 31 16:42:02 kernel: Do you have a strange power saving mode enabled?
Jan 31 16:42:02 kernel: Dazed and confused, but trying to continue
Jan 31 20:05:21 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Jan 31 20:05:21 kernel: Do you have a strange power saving mode enabled?
Jan 31 20:05:21 kernel: Dazed and confused, but trying to continue
Feb 1 01:00:19 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 01:00:19 kernel: Do you have a strange power saving mode enabled?
Feb 1 01:00:19 kernel: Dazed and confused, but trying to continue
Feb 1 01:36:42 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 01:36:42 kernel: Do you have a strange power saving mode enabled?
Feb 1 01:36:42 kernel: Dazed and confused, but trying to continue
Feb 1 02:01:04 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 02:01:04 kernel: Do you have a strange power saving mode enabled?
Feb 1 02:01:04 kernel: Dazed and confused, but trying to continue
Feb 1 05:58:05 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 05:58:05 kernel: Do you have a strange power saving mode enabled?
Feb 1 05:58:05 kernel: Dazed and confused, but trying to continue
Feb 1 06:28:18 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 06:28:18 kernel: Do you have a strange power saving mode enabled?
Feb 1 06:28:18 kernel: Dazed and confused, but trying to continue
Feb 1 08:59:18 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 08:59:18 kernel: Do you have a strange power saving mode enabled?
Feb 1 08:59:18 kernel: Dazed and confused, but trying to continue
Feb 1 11:04:43 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb 1 11:04:43 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:04:43 kernel: Dazed and confused, but trying to continue
Feb 1 11:05:47 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb 1 11:05:47 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:05:47 kernel: Dazed and confused, but trying to continue
Feb 1 11:06:48 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb 1 11:06:48 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:06:48 kernel: Dazed and confused, but trying to continue
Feb 1 11:07:50 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 11:07:50 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:07:50 kernel: Dazed and confused, but trying to continue
Feb 1 11:08:52 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb 1 11:08:52 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:08:52 kernel: Dazed and confused, but trying to continue
Feb 1 11:09:54 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 11:09:54 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:09:54 kernel: Dazed and confused, but trying to continue
Feb 1 11:10:56 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb 1 11:10:56 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:10:56 kernel: Dazed and confused, but trying to continue
Feb 1 11:11:58 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 11:11:58 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:11:58 kernel: Dazed and confused, but trying to continue
Feb 1 11:13:00 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 11:13:00 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:13:00 kernel: Dazed and confused, but trying to continue
Feb 1 11:14:01 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb 1 11:14:01 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:14:01 kernel: Dazed and confused, but trying to continue
Feb 1 11:15:04 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 11:15:04 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:15:04 kernel: Dazed and confused, but trying to continue
Feb 1 11:16:05 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb 1 11:16:05 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:16:05 kernel: Dazed and confused, but trying to continue
Feb 1 11:17:07 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 11:17:07 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:17:07 kernel: Dazed and confused, but trying to continue
Feb 1 11:18:33 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 11:18:33 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:18:33 kernel: Dazed and confused, but trying to continue
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Cyrill Gorcunov

unread,
Feb 1, 2011, 12:52:35 PM2/1/11
to George Spelvin, linux-...@vger.kernel.org, Ingo Molnar, Peter Zijlstra, Don Zickus, Lin Ming, Stephane Eranian
On 02/01/2011 07:27 PM, George Spelvin wrote:
> Since upgrading to -rc2 (-rc3 is compiling right now), I've been getting
> complaints at irregular intervals. This didn't used to happen with 2.6.37.
>
..

> Should I bisect this, or does someone know what might be happening?
>
> Thank you!
>

I fear it's known issue at moment, we're trying to resolve it. There is
an option -- to disable nmi_watchdog (nmi_watchdog=0 boot option).

But if you have a will or would like to help debug the problem -- mind to
try the patch below? Note the patch is ugly at moment and must *not* be
running on non-P4 system (and I only compile-tested it so no guarantees
at all, and I've CC'ed a couple of people as well)

Cyrill

---
arch/x86/kernel/cpu/perf_event.c | 12 +++++++++++-
arch/x86/kernel/cpu/perf_event_p4.c | 8 +++++++-
2 files changed, 18 insertions(+), 2 deletions(-)

Index: linux-2.6.git/arch/x86/kernel/cpu/perf_event.c
=====================================================================
--- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6.git/arch/x86/kernel/cpu/perf_event.c
@@ -1075,7 +1075,17 @@ static void x86_pmu_start(struct perf_ev

cpuc->events[idx] = event;
__set_bit(idx, cpuc->active_mask);
- __set_bit(idx, cpuc->running);
+ if (1) {
+ /* running mask is shared across a core */
+ int leader_cpu;
+ struct cpu_hw_events *leader_cpuc;
+
+ leader_cpu = cpumask_first(__get_cpu_var(cpu_sibling_map));
+ leader_cpuc = &per_cpu(cpu_hw_events, leader_cpu);
+
+ __set_bit(idx, leader_cpuc->running);
+ } else
+ __set_bit(idx, cpuc->running);
x86_pmu.enable(event);
perf_event_update_userpage(event);
}
Index: linux-2.6.git/arch/x86/kernel/cpu/perf_event_p4.c
=====================================================================
--- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event_p4.c
+++ linux-2.6.git/arch/x86/kernel/cpu/perf_event_p4.c
@@ -907,8 +907,14 @@ static int p4_pmu_handle_irq(struct pt_r
int overflow;

if (!test_bit(idx, cpuc->active_mask)) {
+ int leader_cpu;
+ struct cpu_hw_events *leader_cpuc;
+
+ leader_cpu = cpumask_first(__get_cpu_var(cpu_sibling_map));
+ leader_cpuc = &per_cpu(cpu_hw_events, leader_cpu);
+
/* catch in-flight IRQs */
- if (__test_and_clear_bit(idx, cpuc->running))
+ if (__test_and_clear_bit(idx, leader_cpuc->running))
handled++;

Don Zickus

unread,
Feb 1, 2011, 1:41:44 PM2/1/11
to Cyrill Gorcunov, George Spelvin, linux-...@vger.kernel.org, Ingo Molnar, Peter Zijlstra, Lin Ming, Stephane Eranian
On Tue, Feb 01, 2011 at 08:52:19PM +0300, Cyrill Gorcunov wrote:
> On 02/01/2011 07:27 PM, George Spelvin wrote:
> > Since upgrading to -rc2 (-rc3 is compiling right now), I've been getting
> > complaints at irregular intervals. This didn't used to happen with 2.6.37.
> >
> ...

> > Should I bisect this, or does someone know what might be happening?
> >
> > Thank you!
> >
>
> I fear it's known issue at moment, we're trying to resolve it. There is
> an option -- to disable nmi_watchdog (nmi_watchdog=0 boot option).
>
> But if you have a will or would like to help debug the problem -- mind to
> try the patch below? Note the patch is ugly at moment and must *not* be
> running on non-P4 system (and I only compile-tested it so no guarantees
> at all, and I've CC'ed a couple of people as well)

Unfortunately, I have not had success with patch below on my system. :-(

Cheers,
Don

Cyrill Gorcunov

unread,
Feb 1, 2011, 1:44:27 PM2/1/11
to Don Zickus, George Spelvin, linux-...@vger.kernel.org, Ingo Molnar, Peter Zijlstra, Lin Ming, Stephane Eranian
On 02/01/2011 09:41 PM, Don Zickus wrote:
..

>
> Unfortunately, I have not had success with patch below on my system. :-(
>
> Cheers,
> Don

You mean it didn't help?

--
Cyrill

Don Zickus

unread,
Feb 1, 2011, 1:51:47 PM2/1/11
to Cyrill Gorcunov, George Spelvin, linux-...@vger.kernel.org, Ingo Molnar, Peter Zijlstra, Lin Ming, Stephane Eranian
On Tue, Feb 01, 2011 at 09:44:15PM +0300, Cyrill Gorcunov wrote:
> On 02/01/2011 09:41 PM, Don Zickus wrote:
> ...

> >
> > Unfortunately, I have not had success with patch below on my system. :-(
> >
> > Cheers,
> > Don
>
> You mean it didn't help?

Not that I noticed no.

Cheers,
Don

Cyrill Gorcunov

unread,
Feb 1, 2011, 3:01:06 PM2/1/11
to Don Zickus, George Spelvin, linux-...@vger.kernel.org, Ingo Molnar, Peter Zijlstra, Lin Ming, Stephane Eranian
On 02/01/2011 09:51 PM, Don Zickus wrote:
..
>>
>> You mean it didn't help?
>
> Not that I noticed no.
>
> Cheers,
> Don

Thanks a huge for testing, Don! I'll check what else I can do.

--
Cyrill

George Spelvin

unread,
Feb 1, 2011, 9:36:17 PM2/1/11
to gorc...@gmail.com, li...@horizon.com, a.p.zi...@chello.nl, dzi...@redhat.com, era...@google.com, linux-...@vger.kernel.org, ming....@intel.com, mi...@elte.hu
> But if you have a will or would like to help debug the problem -- mind to
> try the patch below? Note the patch is ugly at moment and must *not* be
> running on non-P4 system (and I only compile-tested it so no guarantees
> at all, and I've CC'ed a couple of people as well)

Promising... After 32 minute of uptime, no NMI complaints so far.

I'll let it run overnight and see what happens.

Thank you very much!

Cyrill Gorcunov

unread,
Feb 1, 2011, 11:18:29 PM2/1/11
to George Spelvin, a.p.zi...@chello.nl, dzi...@redhat.com, era...@google.com, linux-...@vger.kernel.org, ming....@intel.com, mi...@elte.hu
On 2/2/11, George Spelvin <li...@horizon.com> wrote:
>> But if you have a will or would like to help debug the problem -- mind to
>> try the patch below? Note the patch is ugly at moment and must *not* be
>> running on non-P4 system (and I only compile-tested it so no guarantees
>> at all, and I've CC'ed a couple of people as well)
>
> Promising... After 32 minute of uptime, no NMI complaints so far.
>
> I'll let it run overnight and see what happens.
>

Great, thanks. Though the patch didn't help for Don, ie there is still
an issue which needs to be resolved as well.

Dave Airlie

unread,
Feb 15, 2011, 8:58:15 PM2/15/11
to Cyrill Gorcunov, George Spelvin, a.p.zi...@chello.nl, dzi...@redhat.com, era...@google.com, linux-...@vger.kernel.org, ming....@intel.com, mi...@elte.hu

Ping on this problem, still seeing

Uhhuh. NMI received for unknown reason 3c on CPU 0.


Do you have a strange power saving mode enabled?

Dazed and confused, but trying to continue

on my Pentium-D system here with latest Linus head.

its sometimes 3c, sometimes 3d, I'm going to bisect and push for
reverts if nobody still has any clue about how to fix this.

Dave.

Cyrill Gorcunov

unread,
Feb 15, 2011, 11:19:11 PM2/15/11
to Dave Airlie, George Spelvin, a.p.zi...@chello.nl, dzi...@redhat.com, era...@google.com, linux-...@vger.kernel.org, ming....@intel.com, mi...@elte.hu

We still trying to resolve it but without success yet. There is no
easy way to revert it. One of the option might be to disable perf on
p4 for a while. If this is acceptable -- i'll cook such patch and send
it to Ingo. Hm?

Ingo Molnar

unread,
Feb 16, 2011, 3:38:08 AM2/16/11
to Cyrill Gorcunov, Dave Airlie, George Spelvin, a.p.zi...@chello.nl, dzi...@redhat.com, era...@google.com, linux-...@vger.kernel.org, ming....@intel.com

That's not really acceptable - need to fix it or revert it to the last working
state. Which commit broke it?

Thanks,

Ingo

Ingo Molnar

unread,
Feb 16, 2011, 3:56:28 AM2/16/11
to Cyrill Gorcunov, Dave Airlie, George Spelvin, a.p.zi...@chello.nl, dzi...@redhat.com, era...@google.com, linux-...@vger.kernel.org, ming....@intel.com

* Cyrill Gorcunov <gorc...@gmail.com> wrote:

> On Wed, Feb 16, 2011 at 11:37 AM, Ingo Molnar <mi...@elte.hu> wrote:
> ...


> >> >>
> >> >
> >> > Ping on this problem, still seeing
> >> >
> >> > Uhhuh. NMI received for unknown reason 3c on CPU 0.
> >> > Do you have a strange power saving mode enabled?
> >> > Dazed and confused, but trying to continue
> >> >
> >> > on my Pentium-D system here with latest Linus head.
> >> >
> >> > its sometimes 3c, sometimes 3d, I'm going to bisect and push for
> >> > reverts if nobody still has any clue about how to fix this.
> >> >
> >> > Dave.
> >> >
> >>
> >> We still trying to resolve it but without success yet. There is no
> >> easy way to revert it. One of the option might be to disable perf on
> >> p4 for a while. If this is acceptable -- i'll cook such patch and send
> >> it to Ingo. Hm?
> >
> > That's not really acceptable - need to fix it or revert it to the last working
> > state. Which commit broke it?
> >
> > Thanks,
> >
> > � � � �Ingo
> >
>

> I can't say you the commit id after which unknown-nmi start happening
> (i'm out of git tree
> at moment) but even then this commit should not be reverted since the
> problem is in
> p4 code not in the rest of perf system.
>
> I have two patches here (attached) and would really appreciate of
> their testing on HT machine
> together with kgdb bootup tests enabled. Dave could you please?

Could these patches fix Dave's non-kgdb problem? Dave isnt using kgdb but is
probably using perf which triggers NMIs? Dave, can you confirm that?

And it's a spurious NMI message, not actual lockup or other misbehavior, right?

Cyrill Gorcunov

unread,
Feb 16, 2011, 4:33:43 AM2/16/11
to Ingo Molnar, Dave Airlie, George Spelvin, a.p.zi...@chello.nl, dzi...@redhat.com, era...@google.com, linux-...@vger.kernel.org, ming....@intel.com
For nonkgdb case 'unflagged nmi fix' patch should be enough. i've
tested it on non-ht machine by self. without it there is no lockup
but only a message about unknown nmi.

for hr-machine with kgdb the things go harder, Don reported lockup on
boot. The second patch might help but i cant test it (here i need help
in testing)

Ingo Molnar

unread,
Feb 16, 2011, 5:10:18 AM2/16/11
to Cyrill Gorcunov, Dave Airlie, George Spelvin, a.p.zi...@chello.nl, dzi...@redhat.com, era...@google.com, linux-...@vger.kernel.org, ming....@intel.com

Ok, please submit it ASAP then - that ought to address the regression. Please Cc:
Dave to the patch.

Thanks,

Ingo

George Spelvin

unread,
Feb 16, 2011, 6:57:17 AM2/16/11
to air...@gmail.com, gorc...@gmail.com, a.p.zi...@chello.nl, dzi...@redhat.com, era...@google.com, linux-...@vger.kernel.org, li...@horizon.com, ming....@intel.com, mi...@elte.hu
> Ping on this problem, still seeing
>
> Uhhuh. NMI received for unknown reason 3c on CPU 0.
> Do you have a strange power saving mode enabled?
> Dazed and confused, but trying to continue
>
> on my Pentium-D system here with latest Linus head.
>
> its sometimes 3c, sometimes 3d, I'm going to bisect and push for
> reverts if nobody still has any clue about how to fix this.

The second patch (not the one you quote) fixed it for me. Almost 8 days
of uptime and no log spam.

It's appended below for your convenience. Are you using this
unsuccessfully?


From: Cyrill Gorcunov <gorc...@openvz.org>
Subject: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test

A couple of people have reported an unknown NMI issue on p4 pmu.
This patch should fix it.

Reported-by: George Spelvin <li...@horizon.com>
Reported-by: Meelis Roos <mr...@linux.ee>
Reported-by: Don Zickus <dzi...@redhat.com>
Signed-off-by: Cyrill Gorcunov <gorc...@openvz.org>
CC: Ingo Molnar <mi...@elte.hu>
CC: Lin Ming <ming....@intel.com>
CC: Don Zickus <dzi...@redhat.com>
CC: Peter Zijlstra <a.p.zi...@chello.nl>
---
arch/x86/include/asm/perf_event_p4.h | 1 +
arch/x86/kernel/cpu/perf_event_p4.c | 11 ++++++++---
2 files changed, 9 insertions(+), 3 deletions(-)

Index: linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
===================================================================
--- linux-2.6.tip.orig/arch/x86/include/asm/perf_event_p4.h
+++ linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
@@ -22,6 +22,7 @@

#define ARCH_P4_CNTRVAL_BITS (40)
#define ARCH_P4_CNTRVAL_MASK ((1ULL << ARCH_P4_CNTRVAL_BITS) - 1)
+#define ARCH_P4_UNFLAGGED_BIT ((1ULL) << (ARCH_P4_CNTRVAL_BITS - 1))

#define P4_ESCR_EVENT_MASK 0x7e000000U
#define P4_ESCR_EVENT_SHIFT 25
Index: linux-2.6.tip/arch/x86/kernel/cpu/perf_event_p4.c
===================================================================
--- linux-2.6.tip.orig/arch/x86/kernel/cpu/perf_event_p4.c
+++ linux-2.6.tip/arch/x86/kernel/cpu/perf_event_p4.c
@@ -770,9 +770,14 @@ static inline int p4_pmu_clear_cccr_ovf(
return 1;
}

- /* it might be unflagged overflow */
- rdmsrl(hwc->event_base + hwc->idx, v);
- if (!(v & ARCH_P4_CNTRVAL_MASK))
+ /*
+ * at some circumstances the overflow might issue NMI but did
+ * not set P4_CCCR_OVF bit so since a counter holds a negative value
+ * we simply check for high bit being set, if it's cleared it means
+ * the counter has reached zero value and continued counting before
+ * real NMI signal was received
+ */
+ if (!(v & ARCH_P4_UNFLAGGED_BIT))
return 1;

return 0;

Dave Airlie

unread,
Feb 16, 2011, 9:56:12 PM2/16/11
to George Spelvin, gorc...@gmail.com, a.p.zi...@chello.nl, dzi...@redhat.com, era...@google.com, linux-...@vger.kernel.org, ming....@intel.com, mi...@elte.hu
>
> It's appended below for your convenience.  Are you using this
> unsuccessfully?

This patch quoted below fixes it for me.

No more spurious NMIs on my P4.

Tested-by: Dave Airlie <air...@redhat.com>

0 new messages