It's an old crappy 1.6 GHz P4 (HP Pavilion) with an ASUS P4B266LA
motherboard and a 2001 Award BIOS.
00:00.0 Host bridge [0600]: Intel Corporation 82845 845 [Brookdale] Chipset Host Bridge [8086:1a30] (rev 04)
00:01.0 PCI bridge [0604]: Intel Corporation 82845 845 [Brookdale] Chipset AGP Bridge [8086:1a31] (rev 04)
00:1e.0 PCI bridge [0604]: Intel Corporation 82801 PCI Bridge [8086:244e] (rev 05)
00:1f.0 ISA bridge [0601]: Intel Corporation 82801BA ISA Bridge (LPC) [8086:2440] (rev 05)
00:1f.1 IDE interface [0101]: Intel Corporation 82801BA IDE U100 Controller [8086:244b] (rev 05)
00:1f.2 USB Controller [0c03]: Intel Corporation 82801BA/BAM USB Controller #1 [8086:2442] (rev 05)
00:1f.3 SMBus [0c05]: Intel Corporation 82801BA/BAM SMBus Controller [8086:2443] (rev 05)
00:1f.4 USB Controller [0c03]: Intel Corporation 82801BA/BAM USB Controller #1 [8086:2444] (rev 05)
00:1f.5 Multimedia audio controller [0401]: Intel Corporation 82801BA/BAM AC'97 Audio Controller [8086:2445] (rev 05)
01:00.0 VGA compatible controller [0300]: ATI Technologies Inc Radeon RV250 If [Radeon 9000] [1002:4966] (rev 01)
01:00.1 Display controller [0380]: ATI Technologies Inc Radeon RV250 [Radeon 9000] (Secondary) [1002:496e] (rev 01)
02:08.0 Ethernet controller [0200]: Intel Corporation 82801BA/BAM/CA/CAM Ethernet Controller [8086:2449] (rev 03)
02:09.0 FireWire (IEEE 1394) [0c00]: Texas Instruments TSB12LV26 IEEE-1394 Controller (Link) [104c:8020]
Should I bisect this, or does someone know what might be happening?
Thank you!
Jan 30 13:13:25 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Jan 30 13:13:25 kernel: Do you have a strange power saving mode enabled?
Jan 30 13:13:25 kernel: Dazed and confused, but trying to continue
Jan 30 17:51:10 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Jan 30 17:51:10 kernel: Do you have a strange power saving mode enabled?
Jan 30 17:51:10 kernel: Dazed and confused, but trying to continue
Jan 30 18:05:11 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Jan 30 18:05:11 kernel: Do you have a strange power saving mode enabled?
Jan 30 18:05:11 kernel: Dazed and confused, but trying to continue
Jan 30 18:19:16 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Jan 30 18:19:16 kernel: Do you have a strange power saving mode enabled?
Jan 30 18:19:16 kernel: Dazed and confused, but trying to continue
Jan 30 18:33:33 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Jan 30 18:33:33 kernel: Do you have a strange power saving mode enabled?
Jan 30 18:33:33 kernel: Dazed and confused, but trying to continue
Jan 30 18:48:23 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Jan 30 18:48:23 kernel: Do you have a strange power saving mode enabled?
Jan 30 18:48:23 kernel: Dazed and confused, but trying to continue
Jan 30 21:39:58 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Jan 30 21:39:58 kernel: Do you have a strange power saving mode enabled?
Jan 30 21:39:58 kernel: Dazed and confused, but trying to continue
Jan 30 22:01:46 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Jan 30 22:01:46 kernel: Do you have a strange power saving mode enabled?
Jan 30 22:01:46 kernel: Dazed and confused, but trying to continue
Jan 30 22:03:13 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Jan 30 22:03:13 kernel: Do you have a strange power saving mode enabled?
Jan 30 22:03:13 kernel: Dazed and confused, but trying to continue
Jan 30 22:04:38 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Jan 30 22:04:38 kernel: Do you have a strange power saving mode enabled?
Jan 30 22:04:38 kernel: Dazed and confused, but trying to continue
Jan 30 22:06:03 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Jan 30 22:06:03 kernel: Do you have a strange power saving mode enabled?
Jan 30 22:06:03 kernel: Dazed and confused, but trying to continue
Jan 30 22:07:23 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Jan 30 22:07:23 kernel: Do you have a strange power saving mode enabled?
Jan 30 22:07:23 kernel: Dazed and confused, but trying to continue
Jan 31 01:00:28 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Jan 31 01:00:28 kernel: Do you have a strange power saving mode enabled?
Jan 31 01:00:28 kernel: Dazed and confused, but trying to continue
Jan 31 03:00:02 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Jan 31 03:00:02 kernel: Do you have a strange power saving mode enabled?
Jan 31 03:00:02 kernel: Dazed and confused, but trying to continue
Jan 31 06:27:52 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Jan 31 06:27:52 kernel: Do you have a strange power saving mode enabled?
Jan 31 06:27:52 kernel: Dazed and confused, but trying to continue
Jan 31 07:36:54 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Jan 31 07:36:54 kernel: Do you have a strange power saving mode enabled?
Jan 31 07:36:54 kernel: Dazed and confused, but trying to continue
Jan 31 10:08:08 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Jan 31 10:08:08 kernel: Do you have a strange power saving mode enabled?
Jan 31 10:08:08 kernel: Dazed and confused, but trying to continue
Jan 31 16:42:02 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Jan 31 16:42:02 kernel: Do you have a strange power saving mode enabled?
Jan 31 16:42:02 kernel: Dazed and confused, but trying to continue
Jan 31 20:05:21 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Jan 31 20:05:21 kernel: Do you have a strange power saving mode enabled?
Jan 31 20:05:21 kernel: Dazed and confused, but trying to continue
Feb 1 01:00:19 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 01:00:19 kernel: Do you have a strange power saving mode enabled?
Feb 1 01:00:19 kernel: Dazed and confused, but trying to continue
Feb 1 01:36:42 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 01:36:42 kernel: Do you have a strange power saving mode enabled?
Feb 1 01:36:42 kernel: Dazed and confused, but trying to continue
Feb 1 02:01:04 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 02:01:04 kernel: Do you have a strange power saving mode enabled?
Feb 1 02:01:04 kernel: Dazed and confused, but trying to continue
Feb 1 05:58:05 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 05:58:05 kernel: Do you have a strange power saving mode enabled?
Feb 1 05:58:05 kernel: Dazed and confused, but trying to continue
Feb 1 06:28:18 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 06:28:18 kernel: Do you have a strange power saving mode enabled?
Feb 1 06:28:18 kernel: Dazed and confused, but trying to continue
Feb 1 08:59:18 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 08:59:18 kernel: Do you have a strange power saving mode enabled?
Feb 1 08:59:18 kernel: Dazed and confused, but trying to continue
Feb 1 11:04:43 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb 1 11:04:43 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:04:43 kernel: Dazed and confused, but trying to continue
Feb 1 11:05:47 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb 1 11:05:47 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:05:47 kernel: Dazed and confused, but trying to continue
Feb 1 11:06:48 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb 1 11:06:48 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:06:48 kernel: Dazed and confused, but trying to continue
Feb 1 11:07:50 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 11:07:50 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:07:50 kernel: Dazed and confused, but trying to continue
Feb 1 11:08:52 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb 1 11:08:52 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:08:52 kernel: Dazed and confused, but trying to continue
Feb 1 11:09:54 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 11:09:54 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:09:54 kernel: Dazed and confused, but trying to continue
Feb 1 11:10:56 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb 1 11:10:56 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:10:56 kernel: Dazed and confused, but trying to continue
Feb 1 11:11:58 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 11:11:58 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:11:58 kernel: Dazed and confused, but trying to continue
Feb 1 11:13:00 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 11:13:00 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:13:00 kernel: Dazed and confused, but trying to continue
Feb 1 11:14:01 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb 1 11:14:01 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:14:01 kernel: Dazed and confused, but trying to continue
Feb 1 11:15:04 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 11:15:04 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:15:04 kernel: Dazed and confused, but trying to continue
Feb 1 11:16:05 kernel: Uhhuh. NMI received for unknown reason 3d on CPU 0.
Feb 1 11:16:05 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:16:05 kernel: Dazed and confused, but trying to continue
Feb 1 11:17:07 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 11:17:07 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:17:07 kernel: Dazed and confused, but trying to continue
Feb 1 11:18:33 kernel: Uhhuh. NMI received for unknown reason 2d on CPU 0.
Feb 1 11:18:33 kernel: Do you have a strange power saving mode enabled?
Feb 1 11:18:33 kernel: Dazed and confused, but trying to continue
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
I fear it's known issue at moment, we're trying to resolve it. There is
an option -- to disable nmi_watchdog (nmi_watchdog=0 boot option).
But if you have a will or would like to help debug the problem -- mind to
try the patch below? Note the patch is ugly at moment and must *not* be
running on non-P4 system (and I only compile-tested it so no guarantees
at all, and I've CC'ed a couple of people as well)
Cyrill
---
arch/x86/kernel/cpu/perf_event.c | 12 +++++++++++-
arch/x86/kernel/cpu/perf_event_p4.c | 8 +++++++-
2 files changed, 18 insertions(+), 2 deletions(-)
Index: linux-2.6.git/arch/x86/kernel/cpu/perf_event.c
=====================================================================
--- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6.git/arch/x86/kernel/cpu/perf_event.c
@@ -1075,7 +1075,17 @@ static void x86_pmu_start(struct perf_ev
cpuc->events[idx] = event;
__set_bit(idx, cpuc->active_mask);
- __set_bit(idx, cpuc->running);
+ if (1) {
+ /* running mask is shared across a core */
+ int leader_cpu;
+ struct cpu_hw_events *leader_cpuc;
+
+ leader_cpu = cpumask_first(__get_cpu_var(cpu_sibling_map));
+ leader_cpuc = &per_cpu(cpu_hw_events, leader_cpu);
+
+ __set_bit(idx, leader_cpuc->running);
+ } else
+ __set_bit(idx, cpuc->running);
x86_pmu.enable(event);
perf_event_update_userpage(event);
}
Index: linux-2.6.git/arch/x86/kernel/cpu/perf_event_p4.c
=====================================================================
--- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event_p4.c
+++ linux-2.6.git/arch/x86/kernel/cpu/perf_event_p4.c
@@ -907,8 +907,14 @@ static int p4_pmu_handle_irq(struct pt_r
int overflow;
if (!test_bit(idx, cpuc->active_mask)) {
+ int leader_cpu;
+ struct cpu_hw_events *leader_cpuc;
+
+ leader_cpu = cpumask_first(__get_cpu_var(cpu_sibling_map));
+ leader_cpuc = &per_cpu(cpu_hw_events, leader_cpu);
+
/* catch in-flight IRQs */
- if (__test_and_clear_bit(idx, cpuc->running))
+ if (__test_and_clear_bit(idx, leader_cpuc->running))
handled++;
Unfortunately, I have not had success with patch below on my system. :-(
Cheers,
Don
You mean it didn't help?
--
Cyrill
Not that I noticed no.
Cheers,
Don
Thanks a huge for testing, Don! I'll check what else I can do.
--
Cyrill
Promising... After 32 minute of uptime, no NMI complaints so far.
I'll let it run overnight and see what happens.
Thank you very much!
Great, thanks. Though the patch didn't help for Don, ie there is still
an issue which needs to be resolved as well.
Ping on this problem, still seeing
Uhhuh. NMI received for unknown reason 3c on CPU 0.
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue
on my Pentium-D system here with latest Linus head.
its sometimes 3c, sometimes 3d, I'm going to bisect and push for
reverts if nobody still has any clue about how to fix this.
Dave.
We still trying to resolve it but without success yet. There is no
easy way to revert it. One of the option might be to disable perf on
p4 for a while. If this is acceptable -- i'll cook such patch and send
it to Ingo. Hm?
That's not really acceptable - need to fix it or revert it to the last working
state. Which commit broke it?
Thanks,
Ingo
> On Wed, Feb 16, 2011 at 11:37 AM, Ingo Molnar <mi...@elte.hu> wrote:
> ...
> >> >>
> >> >
> >> > Ping on this problem, still seeing
> >> >
> >> > Uhhuh. NMI received for unknown reason 3c on CPU 0.
> >> > Do you have a strange power saving mode enabled?
> >> > Dazed and confused, but trying to continue
> >> >
> >> > on my Pentium-D system here with latest Linus head.
> >> >
> >> > its sometimes 3c, sometimes 3d, I'm going to bisect and push for
> >> > reverts if nobody still has any clue about how to fix this.
> >> >
> >> > Dave.
> >> >
> >>
> >> We still trying to resolve it but without success yet. There is no
> >> easy way to revert it. One of the option might be to disable perf on
> >> p4 for a while. If this is acceptable -- i'll cook such patch and send
> >> it to Ingo. Hm?
> >
> > That's not really acceptable - need to fix it or revert it to the last working
> > state. Which commit broke it?
> >
> > Thanks,
> >
> > � � � �Ingo
> >
>
> I can't say you the commit id after which unknown-nmi start happening
> (i'm out of git tree
> at moment) but even then this commit should not be reverted since the
> problem is in
> p4 code not in the rest of perf system.
>
> I have two patches here (attached) and would really appreciate of
> their testing on HT machine
> together with kgdb bootup tests enabled. Dave could you please?
Could these patches fix Dave's non-kgdb problem? Dave isnt using kgdb but is
probably using perf which triggers NMIs? Dave, can you confirm that?
And it's a spurious NMI message, not actual lockup or other misbehavior, right?
for hr-machine with kgdb the things go harder, Don reported lockup on
boot. The second patch might help but i cant test it (here i need help
in testing)
Ok, please submit it ASAP then - that ought to address the regression. Please Cc:
Dave to the patch.
Thanks,
Ingo
The second patch (not the one you quote) fixed it for me. Almost 8 days
of uptime and no log spam.
It's appended below for your convenience. Are you using this
unsuccessfully?
From: Cyrill Gorcunov <gorc...@openvz.org>
Subject: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test
A couple of people have reported an unknown NMI issue on p4 pmu.
This patch should fix it.
Reported-by: George Spelvin <li...@horizon.com>
Reported-by: Meelis Roos <mr...@linux.ee>
Reported-by: Don Zickus <dzi...@redhat.com>
Signed-off-by: Cyrill Gorcunov <gorc...@openvz.org>
CC: Ingo Molnar <mi...@elte.hu>
CC: Lin Ming <ming....@intel.com>
CC: Don Zickus <dzi...@redhat.com>
CC: Peter Zijlstra <a.p.zi...@chello.nl>
---
arch/x86/include/asm/perf_event_p4.h | 1 +
arch/x86/kernel/cpu/perf_event_p4.c | 11 ++++++++---
2 files changed, 9 insertions(+), 3 deletions(-)
Index: linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
===================================================================
--- linux-2.6.tip.orig/arch/x86/include/asm/perf_event_p4.h
+++ linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
@@ -22,6 +22,7 @@
#define ARCH_P4_CNTRVAL_BITS (40)
#define ARCH_P4_CNTRVAL_MASK ((1ULL << ARCH_P4_CNTRVAL_BITS) - 1)
+#define ARCH_P4_UNFLAGGED_BIT ((1ULL) << (ARCH_P4_CNTRVAL_BITS - 1))
#define P4_ESCR_EVENT_MASK 0x7e000000U
#define P4_ESCR_EVENT_SHIFT 25
Index: linux-2.6.tip/arch/x86/kernel/cpu/perf_event_p4.c
===================================================================
--- linux-2.6.tip.orig/arch/x86/kernel/cpu/perf_event_p4.c
+++ linux-2.6.tip/arch/x86/kernel/cpu/perf_event_p4.c
@@ -770,9 +770,14 @@ static inline int p4_pmu_clear_cccr_ovf(
return 1;
}
- /* it might be unflagged overflow */
- rdmsrl(hwc->event_base + hwc->idx, v);
- if (!(v & ARCH_P4_CNTRVAL_MASK))
+ /*
+ * at some circumstances the overflow might issue NMI but did
+ * not set P4_CCCR_OVF bit so since a counter holds a negative value
+ * we simply check for high bit being set, if it's cleared it means
+ * the counter has reached zero value and continued counting before
+ * real NMI signal was received
+ */
+ if (!(v & ARCH_P4_UNFLAGGED_BIT))
return 1;
return 0;
This patch quoted below fixes it for me.
No more spurious NMIs on my P4.
Tested-by: Dave Airlie <air...@redhat.com>