WARNING: at arch/x86/kernel/smp.c:119 native_smp_send

Sasha Levin

unread,

Feb 8, 2012, 6:40:02 PM2/8/12

to

Hi all,

I got the following warning when shutting down a KVM guest with a whole bunch of cores (254 in this case).

It's actually pretty easy to reproduce it, it happens every once in 2-3 shutdowns.

[ 32.448626] ------------[ cut here ]------------
[ 32.449160] WARNING: at arch/x86/kernel/smp.c:119 native_smp_send_reschedule+0x25/0x43()
[ 32.449621] Pid: 1, comm: init_stage2 Not tainted 3.2.0+ #14
[ 32.449621] Call Trace:
[ 32.449621] <IRQ> [<ffffffff81041a44>] ? native_smp_send_reschedule+0x25/0x43
[ 32.449621] [<ffffffff810735b2>] warn_slowpath_common+0x7b/0x93
[ 32.449621] [<ffffffff810962cc>] ? tick_nohz_handler+0xc9/0xc9
[ 32.449621] [<ffffffff81073675>] warn_slowpath_null+0x15/0x18
[ 32.449621] [<ffffffff81041a44>] native_smp_send_reschedule+0x25/0x43
[ 32.449621] [<ffffffff81067a00>] smp_send_reschedule+0xa/0xc
[ 32.449621] [<ffffffff8106f25e>] scheduler_tick+0x21a/0x242
[ 32.449621] [<ffffffff8107da10>] update_process_times+0x62/0x73
[ 32.449621] [<ffffffff81096336>] tick_sched_timer+0x6a/0x8a
[ 32.449621] [<ffffffff8108c5eb>] __run_hrtimer.clone.26+0x55/0xcb
[ 32.449621] [<ffffffff8108cd77>] hrtimer_interrupt+0xcb/0x19b
[ 32.449621] [<ffffffff810428a8>] smp_apic_timer_interrupt+0x72/0x85
[ 32.449621] [<ffffffff8165a8de>] apic_timer_interrupt+0x6e/0x80
[ 32.449621] <EOI> [<ffffffff8165928e>] ? _raw_spin_unlock_irqrestore+0x3a/0x3e
[ 32.449621] [<ffffffff81042f4e>] ? arch_local_irq_restore+0x6/0xd
[ 32.449621] [<ffffffff810430c4>] default_send_IPI_mask_allbutself_phys+0x78/0x88
[ 32.449621] [<ffffffff8106c3c4>] ? __migrate_task+0xf1/0xf1
[ 32.449621] [<ffffffff81045445>] physflat_send_IPI_allbutself+0x12/0x14
[ 32.449621] [<ffffffff81041aaf>] native_stop_other_cpus+0x4d/0xa8
[ 32.449621] [<ffffffff810411c6>] native_machine_shutdown+0x56/0x6d
[ 32.449621] [<ffffffff81048499>] kvm_shutdown+0x1a/0x1c
[ 32.449621] [<ffffffff810411f9>] machine_shutdown+0xa/0xc
[ 32.449621] [<ffffffff81041265>] native_machine_restart+0x20/0x32
[ 32.449621] [<ffffffff81041297>] machine_restart+0xa/0xc
[ 32.449621] [<ffffffff81081d53>] kernel_restart+0x49/0x4d
[ 32.449621] [<ffffffff81081f26>] sys_reboot+0x14b/0x18a
[ 32.449621] [<ffffffff81089937>] ? remove_wait_queue+0x4c/0x51
[ 32.449621] [<ffffffff8107637f>] ? do_wait+0x1a4/0x1e7
[ 32.449621] [<ffffffff8107735a>] ? sys_wait4+0xa8/0xbc
[ 32.449621] [<ffffffff8107522b>] ? clear_tsk_thread_flag+0xf/0xf
[ 32.449621] [<ffffffff81659a25>] ? async_page_fault+0x25/0x30
[ 32.449621] [<ffffffff81659e92>] system_call_fastpath+0x16/0x1b
[ 32.449621] ---[ end trace d0f03651493fd3d6 ]--

--

Sasha.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Josh Boyer

unread,

Feb 8, 2012, 8:00:03 PM2/8/12

to

You don't really point out exactly which kernel this is, but we saw this in
3.3 git and it was fixed by commit 71325960d16cd68ea0e22a8da15b2495b0f363f7.
Or at least something very like it was.

josh

Sasha Levin

unread,

Feb 9, 2012, 2:50:02 PM2/9/12

to

The kernel there was vanilla 3.2 (as stated in the warning header).

I've tried it again with linux-next from today which includes the
commit you mentioned, and still get the same error.

Srivatsa S. Bhat

unread,

Feb 10, 2012, 5:10:02 AM2/10/12

to

Adding Suresh and Peter to Cc.

Peter Zijlstra

unread,

Feb 10, 2012, 2:00:03 PM2/10/12

to

OK, so a 'modern' kernel does it slightly different and I've no idea
what exactly goes wrong in your vintage version. But I can see the
current stuff going at it all wrong.

What seems to happen is that native_nmi_stop_other_cpus() NMI broadcasts
for smp_stop_nmi_callback()->stop_this_cpu(). Which without any
serialization what so ever marks all remote CPUs offline and calls halt
with IRQs disabled -> dead.

While we're waiting for this all to complete, the scheduler tries to
no_hz load-balance and kick a cpu it thinks is still around and we get
the above splat because the NMI just marked it offline without telling
anybody about it.

Now, arguably you don't want to go through the whole hotplug crap to
shut down your machine, esp not on panic, but clearing the online state
without telling anybody about it is bound to lead to these things.

No immediate solution comes to mind...

Peter Zijlstra

unread,

Feb 10, 2012, 2:10:03 PM2/10/12

to

On Fri, 2012-02-10 at 19:58 +0100, Peter Zijlstra wrote:
> OK, so a 'modern' kernel does it slightly different and I've no idea
> what exactly goes wrong in your vintage version. But I can see the
> current stuff going at it all wrong.
>
> What seems to happen is that native_nmi_stop_other_cpus() NMI broadcasts
> for smp_stop_nmi_callback()->stop_this_cpu(). Which without any
> serialization what so ever marks all remote CPUs offline and calls halt
> with IRQs disabled -> dead.
>
> While we're waiting for this all to complete, the scheduler tries to
> no_hz load-balance and kick a cpu it thinks is still around and we get
> the above splat because the NMI just marked it offline without telling
> anybody about it.
>
> Now, arguably you don't want to go through the whole hotplug crap to
> shut down your machine, esp not on panic, but clearing the online state
> without telling anybody about it is bound to lead to these things.
>
> No immediate solution comes to mind...

Don, any reason you wait for the NMI broadcast to complete with IRQs
enabled? If you disable IRQs before the broadcast the interrupt can't
happen and should side-step this particular problem.

Its not like we have 'latency' issues on this path :-)

Don Zickus

unread,

Feb 10, 2012, 3:10:01 PM2/10/12

to

On Fri, Feb 10, 2012 at 08:03:53PM +0100, Peter Zijlstra wrote:
> On Fri, 2012-02-10 at 19:58 +0100, Peter Zijlstra wrote:
> > OK, so a 'modern' kernel does it slightly different and I've no idea
> > what exactly goes wrong in your vintage version. But I can see the
> > current stuff going at it all wrong.
> >
> > What seems to happen is that native_nmi_stop_other_cpus() NMI broadcasts
> > for smp_stop_nmi_callback()->stop_this_cpu(). Which without any
> > serialization what so ever marks all remote CPUs offline and calls halt
> > with IRQs disabled -> dead.
> >
> > While we're waiting for this all to complete, the scheduler tries to
> > no_hz load-balance and kick a cpu it thinks is still around and we get
> > the above splat because the NMI just marked it offline without telling
> > anybody about it.
> >
> > Now, arguably you don't want to go through the whole hotplug crap to
> > shut down your machine, esp not on panic, but clearing the online state
> > without telling anybody about it is bound to lead to these things.
> >
> > No immediate solution comes to mind...
>
> Don, any reason you wait for the NMI broadcast to complete with IRQs
> enabled? If you disable IRQs before the broadcast the interrupt can't
> happen and should side-step this particular problem.

Well I believe the old way had the same problem using the REBOOT_IRQ as
opposed to NMI. I also don't know how to shutdown interrupts system wide
without just broadcasting an IRQ to locally disable interrupts.

>
> Its not like we have 'latency' issues on this path :-)

Heh. Oddly I was writing the changelog for a patch that kinda changes
this path to sorta revert back to the old way of using a REBOOT_IRQ with
an NMI follow-on when the IRQ fails.

Originally, I wanted to make sure the cpus were shutdown immediately so we
can serialize the panic path hence the original change.

I also ran into the same problem you did and hacked up another patch that
checked a global atomic variable that let the system know we were shutting
down and not to do the WARN_ON (the global is already created for the NMI
case now).

I'll try to post that soon once I finish my long winded changelog.

Though it kinda addresses your issue, I'm not sure it does it in a way
that will satisfy you. But I look forward to the discussion. :-)

Cheers,
Don

Peter Zijlstra

unread,

Feb 10, 2012, 3:20:02 PM2/10/12

to

On Fri, 2012-02-10 at 15:02 -0500, Don Zickus wrote:
> I also ran into the same problem you did and hacked up another patch that
> checked a global atomic variable that let the system know we were shutting
> down and not to do the WARN_ON (the global is already created for the NMI
> case now).

system_state seems like that thing..

Don Zickus

unread,

Feb 10, 2012, 3:40:01 PM2/10/12

to

On Fri, Feb 10, 2012 at 09:18:41PM +0100, Peter Zijlstra wrote:
> On Fri, 2012-02-10 at 15:02 -0500, Don Zickus wrote:
> > I also ran into the same problem you did and hacked up another patch that
> > checked a global atomic variable that let the system know we were shutting
> > down and not to do the WARN_ON (the global is already created for the NMI
> > case now).
>
> system_state seems like that thing..

except it doesn't seem to have a PANIC state, though we could add one I
suppose.

The thing is even if you reverted my changes:

e58d429 x86, reboot: Fix typo in nmi reboot path
bda6263 x86, NMI: Add knob to disable using NMI IPIs to stop cpus
3603a25 x86, reboot: Use NMI instead of REBOOT_VECTOR to stop cpus

I think you still run into the same problem because the reschedule code
changed.

So my second patch which I will eventually post will just skip the WARN_ON
if the system is going down. Not sure if that is the proper way to address
this problem or change all of the stop_this_cpu code to use a different
bitmask than the cpu_online bitmask (but then you run the risk of a stuck
IPI I guess if the cpu is halted without notifying anyone).

Cheers,
Don

Peter Zijlstra

unread,

Feb 10, 2012, 3:40:03 PM2/10/12

to

On Fri, 2012-02-10 at 15:31 -0500, Don Zickus wrote:
> So my second patch which I will eventually post will just skip the WARN_ON
> if the system is going down. Not sure if that is the proper way to address
> this problem or change all of the stop_this_cpu code to use a different
> bitmask than the cpu_online bitmask (but then you run the risk of a stuck
> IPI I guess if the cpu is halted without notifying anyone).

Yeah, the async hard kill of all cpus is bound to make problems.. what
I'm wondering is, why is this in the normal shutdown path and not
specific to a hard panic?

Trying to make this work is just not going to be pretty, and in the
panic case we really don't care much.

Don Zickus

unread,

Feb 10, 2012, 4:10:03 PM2/10/12

to

On Fri, Feb 10, 2012 at 09:36:03PM +0100, Peter Zijlstra wrote:
> On Fri, 2012-02-10 at 15:31 -0500, Don Zickus wrote:
> > So my second patch which I will eventually post will just skip the WARN_ON
> > if the system is going down. Not sure if that is the proper way to address
> > this problem or change all of the stop_this_cpu code to use a different
> > bitmask than the cpu_online bitmask (but then you run the risk of a stuck
> > IPI I guess if the cpu is halted without notifying anyone).
>
> Yeah, the async hard kill of all cpus is bound to make problems.. what
> I'm wondering is, why is this in the normal shutdown path and not
> specific to a hard panic?

I didn't write the original code, I just changed it from REBOOT_IRQ to
NMI and left all the stop_this_cpu stuff alone.

>
> Trying to make this work is just not going to be pretty, and in the
> panic case we really don't care much.

Sure.

Cheers,
Don

Sasha Levin

unread,

Mar 23, 2012, 6:50:02 AM3/23/12

to

I'm just wondering about the status of the patches to fix this issue,
this is still happening on linux-next.

Don Zickus

unread,

Mar 23, 2012, 9:30:02 AM3/23/12

to

On Fri, Mar 23, 2012 at 12:47:38PM +0200, Sasha Levin wrote:
> I'm just wondering about the status of the patches to fix this issue,
> this is still happening on linux-next.

I got distracted with other stuff. I have been running code that does the
following in the shutdown path:

foreach_online_cpu
cpu_down

but I get occasional hangs on reboot that I haven't gotten around to
debugging. I assumed this is the approach Peter was suggesting though I
don't think he was sure if it was going to be reliable.

Cheers,
Don

Tony Luck

unread,

Apr 5, 2012, 4:40:02 PM4/5/12

to

A plain v3.3 kernel hits this when I just type "reboot" on a 32 cpu (2
socket * 8 core * 2 HT) system:

sd 0:0:0:0: [sda] Synchronizing SCSI cache
Restarting system.
machine restart
------------[ cut here ]------------
WARNING: at arch/x86/kernel/smp.c:120 native_smp_send_reschedule+0x5c/0x60()
Hardware name: S2600CP
Modules linked in:
Pid: 10068, comm: reboot Not tainted 3.3.0- #1
Call Trace:
<IRQ> [<ffffffff8104c37f>] warn_slowpath_common+0x7f/0xc0
[<ffffffff8104c3da>] warn_slowpath_null+0x1a/0x20
[<ffffffff810314bc>] native_smp_send_reschedule+0x5c/0x60
[<ffffffff81084824>] trigger_load_balance+0x244/0x2f0
[<ffffffff8107bf71>] scheduler_tick+0x101/0x160
[<ffffffff8105b1de>] update_process_times+0x6e/0x90
[<ffffffff8109e6a6>] tick_sched_timer+0x66/0xc0
[<ffffffff81072d63>] __run_hrtimer+0x83/0x1d0
[<ffffffff8109e640>] ? tick_nohz_handler+0xf0/0xf0
[<ffffffff81073146>] hrtimer_interrupt+0x106/0x240
[<ffffffff8157bf69>] smp_apic_timer_interrupt+0x69/0x99
[<ffffffff8157ac1e>] apic_timer_interrupt+0x6e/0x80
<EOI> [<ffffffff81033276>] ? default_send_IPI_mask_allbutself_phys+0xd6/0x110
[<ffffffff81036397>] physflat_send_IPI_allbutself+0x17/0x20
[<ffffffff81031849>] native_nmi_stop_other_cpus+0xa9/0x110
[<ffffffff81030e24>] native_machine_shutdown+0x64/0x90
[<ffffffff81030a97>] native_machine_restart+0x27/0x40
[<ffffffff810309cf>] machine_restart+0xf/0x20
[<ffffffff810637de>] kernel_restart+0x3e/0x60
[<ffffffff810639e0>] sys_reboot+0x1c0/0x240
[<ffffffff811734af>] ? __d_free+0x4f/0x70
[<ffffffff8117353c>] ? d_free+0x6c/0x80
[<ffffffff81174dbd>] ? d_kill+0xad/0x110
[<ffffffff8117b603>] ? mntput+0x23/0x40
[<ffffffff8115f9b7>] ? fput+0x197/0x260
[<ffffffff8115ba63>] ? filp_close+0x63/0x90
[<ffffffff8157a169>] system_call_fastpath+0x16/0x1b
---[ end trace 5e0dddabdbb21c7e ]---

Borislav Petkov

unread,

Jun 1, 2012, 9:40:02 AM6/1/12

to

On Thu, Apr 05, 2012 at 01:38:41PM -0700, Tony Luck wrote:
> A plain v3.3 kernel hits this when I just type "reboot" on a 32 cpu (2
> socket * 8 core * 2 HT) system:

Same here on latest linus on a 24 CPU box right before the box reboots:

[ 6851.207504] ------------[ cut here ]------------
[ 6851.212340] WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x2a/0x56()
[ 6851.220727] Hardware name: Dinar
[ 6851.224096] Modules linked in: kvm_amd kvm radeon microcode ttm drm_kms_helper hwmon amd64_edac_mod e1000e ohci_hcd backlight cfbcopyarea cfbimgblt cfbfillrect ehci_hcd edac_core
[ 6851.240585] Pid: 23976, comm: reboot Tainted: G W 3.4.0+ #15
[ 6851.247351] Call Trace:
[ 6851.249895] <IRQ> [<ffffffff8102f114>] warn_slowpath_common+0x85/0x9d
[ 6851.256753] [<ffffffff8102f146>] warn_slowpath_null+0x1a/0x1c
[ 6851.262801] [<ffffffff81019bbd>] native_smp_send_reschedule+0x2a/0x56
[ 6851.269573] [<ffffffff8105e625>] trigger_load_balance+0x1ed/0x21a
[ 6851.275983] [<ffffffff810582ac>] scheduler_tick+0xe9/0xf2
[ 6851.281671] [<ffffffff8103cb91>] update_process_times+0x67/0x77
[ 6851.287900] [<ffffffff81071323>] tick_sched_timer+0x72/0x91
[ 6851.293769] [<ffffffff8104e5e5>] __run_hrtimer+0xc3/0x17f
[ 6851.299456] [<ffffffff810712b1>] ? tick_nohz_handler+0xd1/0xd1
[ 6851.305593] [<ffffffff8104eee1>] hrtimer_interrupt+0xd4/0x197
[ 6851.311642] [<ffffffff8145e36a>] smp_apic_timer_interrupt+0x86/0x99
[ 6851.325636] [<ffffffff8145d35c>] apic_timer_interrupt+0x6c/0x80
[ 6851.339424] <EOI> [<ffffffff811d68ea>] ? delay_tsc+0x23/0x50
[ 6851.353034] [<ffffffff811d6849>] __delay+0xf/0x11
[ 6851.365561] [<ffffffff811d6874>] __const_udelay+0x29/0x2b
[ 6851.378817] [<ffffffff81019c8a>] native_stop_other_cpus+0x78/0x13d
[ 6851.392924] [<ffffffff81019305>] native_machine_shutdown+0x53/0x6a
[ 6851.406992] [<ffffffff81019347>] machine_shutdown+0xf/0x11
[ 6851.420438] [<ffffffff810193ae>] native_machine_restart+0x25/0x37
[ 6851.434580] [<ffffffff810193ea>] machine_restart+0xf/0x11
[ 6851.448039] [<ffffffff81041b62>] kernel_restart+0x4e/0x52
[ 6851.461517] [<ffffffff81041cc9>] sys_reboot+0x151/0x187
[ 6851.474818] [<ffffffff81114ba6>] ? mntput_no_expire+0x31/0x105
[ 6851.488748] [<ffffffff81114ca4>] ? mntput+0x2a/0x2c
[ 6851.501659] [<ffffffff810fe66e>] ? fput+0x1e0/0x1ef
[ 6851.514551] [<ffffffff811d7a0e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[ 6851.529021] [<ffffffff8145c952>] system_call_fastpath+0x16/0x1b
[ 6851.543148] ---[ end trace 4eaa2a86a8e2da24 ]---

--
Regards/Gruss,
Boris.