Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH] x86: make reboot task only run on the appropriate processor

8 views
Skip to first unread message

Baoquan He

unread,
Nov 5, 2013, 4:20:01 AM11/5/13
to
Currently system always reboot after below message when execute "kexec -e".

[ 0.572119] smpboot: Booting Node 0, Processors # 1 OK

In commit 1b3a5d02ee070c8f9943333b9b6370f486601e0f, reboot= handling was
moved to kerne/reboot.c. However, the code to migrate current thread to
reboot cpu was removed. That cause this incorrect kexec behavior.

Now add that code block back.

Reported-by: Matthew Whitehead <mwhi...@redhat.com>
Reported-by: Dave Young <dyo...@redhat.com>
Tested-by: WANG Chao <chao...@redhat.com>
Signed-off-by: Baoquan He <b...@redhat.com>
---
arch/x86/kernel/reboot.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 7e920bf..3049de9 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -551,6 +551,21 @@ void native_machine_shutdown(void)
{
/* Stop the cpus and apics */
#ifdef CONFIG_SMP
+ /* The boot cpu is always logical cpu 0 */
+ int reboot_cpu_id = 0;
+
+ /* See if there has been given a command line override */
+ if ((reboot_cpu != -1) && (reboot_cpu < nr_cpu_ids) &&
+ cpu_online(reboot_cpu))
+ reboot_cpu_id = reboot_cpu;
+
+ /* Make certain the cpu I'm about to reboot on is online */
+ if (!cpu_online(reboot_cpu_id))
+ reboot_cpu_id = smp_processor_id();
+
+ /* Make certain I only run on the appropriate processor */
+ set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id));
+
/*
* Stop all of the others. Also disable the local irq to
* not receive the per-cpu timer interrupt which may trigger
--
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Vivek Goyal

unread,
Nov 5, 2013, 3:30:02 PM11/5/13
to
On Tue, Nov 05, 2013 at 05:16:07PM +0800, Baoquan He wrote:
> Currently system always reboot after below message when execute "kexec -e".
>
> [ 0.572119] smpboot: Booting Node 0, Processors # 1 OK
>

So is it same problem were we reboot on non-boot cpu and sending INIT
to boot cpu in second kernel brings down the machine?

I think for x86, it makes sense to reboot on boot cpu.

Thanks
Vivek

H. Peter Anvin

unread,
Nov 5, 2013, 4:50:02 PM11/5/13
to
We should, yes, if cpu 0 is available.
Sent from my mobile phone. Please pardon brevity and lack of formatting.

Baoquan He

unread,
Nov 6, 2013, 5:00:04 AM11/6/13
to
On 11/05/13 at 03:28pm, Vivek Goyal wrote:
> On Tue, Nov 05, 2013 at 05:16:07PM +0800, Baoquan He wrote:
> > Currently system always reboot after below message when execute "kexec -e".
> >
> > [ 0.572119] smpboot: Booting Node 0, Processors # 1 OK
> >
>
> So is it same problem were we reboot on non-boot cpu and sending INIT
> to boot cpu in second kernel brings down the machine?

They are different. Kdump hang when crash didn't happen on bsp, then
multiple CPUs are brought up, the bsp will go through the boot strap
code after INIT IPI message received. However in this case, executing
"kexec -e" will make system reboot and bring up only 1 CPU, then go back
to BIOS and reboot.

In kdump, nmi is sent to shutdown other CPUs. In this shutdown, cleanup
is done and finally processor is halt. And if crash didn't happen on
bsp, during system startup INIT sent to bsp will cause system hang.

However, in kexec reboot, a REBOOT_VECTOR interrupt is sent to all other
CPUs. In commit 4ef702c10b5df18ab04921fc252c26421d4d6c75, Andi imported
this IPI interrupt vector. Andi said it has been given the highest
priority.

It's still not very clear why this happened without that code block.
Seems interrupt disable is very doubtful.

>
> I think for x86, it makes sense to reboot on boot cpu.
>
> Thanks
> Vivek
>

> _______________________________________________
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Baoquan He

unread,
Nov 6, 2013, 9:30:02 PM11/6/13
to
On 11/05/13 at 03:28pm, Vivek Goyal wrote:
> On Tue, Nov 05, 2013 at 05:16:07PM +0800, Baoquan He wrote:
> > Currently system always reboot after below message when execute "kexec -e".
> >
> > [ 0.572119] smpboot: Booting Node 0, Processors # 1 OK
> >
>
> So is it same problem were we reboot on non-boot cpu and sending INIT
> to boot cpu in second kernel brings down the machine?

I was wrong, they are the same problem. With multiple CPUs, by setting
affinity to execute crash or kexec on CPUn(n!==0), they all reboot after
the same message as below and then reboot through BIOS. It should be the
BSP got a INIT IPI message and make system hang.

[ 0.572119] smpboot: Booting Node 0, Processors # 1 OK

>
> _______________________________________________
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Dave Young

unread,
Nov 7, 2013, 8:40:01 PM11/7/13
to
On 11/05/13 at 05:16pm, Baoquan He wrote:
> Currently system always reboot after below message when execute "kexec -e".
>
> [ 0.572119] smpboot: Booting Node 0, Processors # 1 OK
>
> In commit 1b3a5d02ee070c8f9943333b9b6370f486601e0f, reboot= handling was
> moved to kerne/reboot.c. However, the code to migrate current thread to
> reboot cpu was removed. That cause this incorrect kexec behavior.
>
> Now add that code block back.

quote a mail from Thomas Renninger <tr...@suse.de>:
Answer his questions here.

> > For the smp boot issue I think there's nothing to do with dracut.
> >
> > Can you try below patch?
> > https://lkml.org/lkml/2013/11/5/88
>
> That seem to have helped, thanks!
>
> Feel free to add:
>
> Tested-by: tr...@suse.de
>
> The problem existed in 3.11 already?
> I had the problem with a 3.11 kernel and tried a
> 3.12 kernel with this patch now.

It's a regression between 3.10 ~ 3.11

>
> It would be great if:
> Cc: sta...@vger.kernel.org
> is added as well then.
>

I also like it to be in stable.

Thanks
Dave

Vivek Goyal

unread,
Nov 8, 2013, 10:20:02 AM11/8/13
to
On Tue, Nov 05, 2013 at 05:16:07PM +0800, Baoquan He wrote:
> Currently system always reboot after below message when execute "kexec -e".
>
> [ 0.572119] smpboot: Booting Node 0, Processors # 1 OK
>
> In commit 1b3a5d02ee070c8f9943333b9b6370f486601e0f, reboot= handling was
> moved to kerne/reboot.c. However, the code to migrate current thread to
> reboot cpu was removed. That cause this incorrect kexec behavior.
>
> Now add that code block back.
>
> Reported-by: Matthew Whitehead <mwhi...@redhat.com>
> Reported-by: Dave Young <dyo...@redhat.com>
> Tested-by: WANG Chao <chao...@redhat.com>
> Signed-off-by: Baoquan He <b...@redhat.com>

Hi Bao,

This patch fixes the issue for me too. I noticed that we have generic
function migrate_to_reboot_cpu() to achieve what we want and rest of
the reboot paths are using it. So how about using that function. I
wrote the new patch below. It works for me. Can you please give it
a try.

Thanks
Vivek

kexec: migrate to reboot cpu

Commit 1b3a5d02ee070c8f9943333b9b6370f486601e0f moved reboot= handling to
generic code. In the process it also removed the code in
native_machine_shutdown() which are moving reboot process to reboot_cpu/cpu0.

I guess that thought must have been that all reboot paths are calling
migrate_to_reboot_cpu(), so we don't need this special handling. But
kexec reboot path (kernel_kexec()) is not calling migrate_to_reboot_cpu()
so above change broke kexec. Now reboot can happen on non-boot cpu and when
INIT is sent in second kerneo to bring up BP, it brings down the machine.

So start calling migrate_to_reboot_cpu() in kexec reboot path to avoid
this problem.

Reported-by: Matthew Whitehead <mwhi...@redhat.com>
Reported-by: Dave Young <dyo...@redhat.com>
Signed-off-by: Vivek Goyal <vgo...@redhat.com>
---
include/linux/reboot.h | 1 +
kernel/kexec.c | 1 +
kernel/reboot.c | 2 +-
3 files changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6/kernel/reboot.c
===================================================================
--- linux-2.6.orig/kernel/reboot.c 2013-10-16 00:30:50.000000000 -0400
+++ linux-2.6/kernel/reboot.c 2013-11-08 21:31:03.379064848 -0500
@@ -104,7 +104,7 @@ int unregister_reboot_notifier(struct no
}
EXPORT_SYMBOL(unregister_reboot_notifier);

-static void migrate_to_reboot_cpu(void)
+void migrate_to_reboot_cpu(void)
{
/* The boot cpu is always logical cpu 0 */
int cpu = reboot_cpu;
Index: linux-2.6/include/linux/reboot.h
===================================================================
--- linux-2.6.orig/include/linux/reboot.h 2013-11-08 21:33:27.000000000 -0500
+++ linux-2.6/include/linux/reboot.h 2013-11-08 21:34:29.778073522 -0500
@@ -43,6 +43,7 @@ extern int unregister_reboot_notifier(st
* Architecture-specific implementations of sys_reboot commands.
*/

+extern void migrate_to_reboot_cpu(void);
extern void machine_restart(char *cmd);
extern void machine_halt(void);
extern void machine_power_off(void);
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c 2013-10-16 00:30:50.000000000 -0400
+++ linux-2.6/kernel/kexec.c 2013-11-08 21:34:02.492072375 -0500
@@ -1676,6 +1676,7 @@ int kernel_kexec(void)
#endif
{
kernel_restart_prepare(NULL);
+ migrate_to_reboot_cpu();
printk(KERN_EMERG "Starting new kernel\n");
machine_shutdown();

H. Peter Anvin

unread,
Nov 8, 2013, 11:20:02 AM11/8/13
to
On 11/08/2013 07:14 AM, Vivek Goyal wrote:
>
> Hi Bao,
>
> This patch fixes the issue for me too. I noticed that we have generic
> function migrate_to_reboot_cpu() to achieve what we want and rest of
> the reboot paths are using it. So how about using that function. I
> wrote the new patch below. It works for me. Can you please give it
> a try.
>
> Thanks
> Vivek
>

Is this path exercised for the kdump flow? migrate_to_reboot_cpu() is
unsafe in that case.

Vivek Goyal

unread,
Nov 8, 2013, 11:30:02 AM11/8/13
to
On Fri, Nov 08, 2013 at 08:12:00AM -0800, H. Peter Anvin wrote:
> On 11/08/2013 07:14 AM, Vivek Goyal wrote:
> >
> > Hi Bao,
> >
> > This patch fixes the issue for me too. I noticed that we have generic
> > function migrate_to_reboot_cpu() to achieve what we want and rest of
> > the reboot paths are using it. So how about using that function. I
> > wrote the new patch below. It works for me. Can you please give it
> > a try.
> >
> > Thanks
> > Vivek
> >
>
> Is this path exercised for the kdump flow? migrate_to_reboot_cpu() is
> unsafe in that case.

kdump path should not be affected by this change as it uses crash_kexec()
instead of kerenl_kexec() for its entry. And crash_kexec() path does not
call migrate_to_reboot_cpu().

Thanks
Vivek

Baoquan He

unread,
Nov 10, 2013, 4:50:01 AM11/10/13
to
On 11/08/13 at 10:14am, Vivek Goyal wrote:
> On Tue, Nov 05, 2013 at 05:16:07PM +0800, Baoquan He wrote:

>
> Hi Bao,
>
> This patch fixes the issue for me too. I noticed that we have generic
> function migrate_to_reboot_cpu() to achieve what we want and rest of
> the reboot paths are using it. So how about using that function. I
> wrote the new patch below. It works for me. Can you please give it
> a try.

Seems there's something wrong with this function, that error happened
when I executed the 2nd "kexec -e". That means executing "kexec -e" in
kexec-ed kernel.

Tomorrow I will ask people around take more tests. We need set affinity
to test this. E.g always execute :

taskset -c 1 -sh "kexec -e"

Baoquan He

unread,
Nov 11, 2013, 2:00:02 AM11/11/13
to
On 11/10/13 at 05:44pm, Baoquan He wrote:
> On 11/08/13 at 10:14am, Vivek Goyal wrote:
> > On Tue, Nov 05, 2013 at 05:16:07PM +0800, Baoquan He wrote:
>
> >
> > Hi Bao,
> >
> > This patch fixes the issue for me too. I noticed that we have generic
> > function migrate_to_reboot_cpu() to achieve what we want and rest of
> > the reboot paths are using it. So how about using that function. I
> > wrote the new patch below. It works for me. Can you please give it
> > a try.

Hi Vivek,

Your patch works, my test script has wrong typo.

Baoquan
Thanks a lot

WANG Chao

unread,
Nov 11, 2013, 2:00:02 AM11/11/13
to
On 11/08/13 at 10:14am, Vivek Goyal wrote:
> On Tue, Nov 05, 2013 at 05:16:07PM +0800, Baoquan He wrote:
> > Currently system always reboot after below message when execute "kexec -e".
> >
> > [ 0.572119] smpboot: Booting Node 0, Processors # 1 OK
> >
> > In commit 1b3a5d02ee070c8f9943333b9b6370f486601e0f, reboot= handling was
> > moved to kerne/reboot.c. However, the code to migrate current thread to
> > reboot cpu was removed. That cause this incorrect kexec behavior.
> >
> > Now add that code block back.
> >
> > Reported-by: Matthew Whitehead <mwhi...@redhat.com>
> > Reported-by: Dave Young <dyo...@redhat.com>
> > Tested-by: WANG Chao <chao...@redhat.com>
> > Signed-off-by: Baoquan He <b...@redhat.com>
>
> Hi Bao,
>
> This patch fixes the issue for me too. I noticed that we have generic
> function migrate_to_reboot_cpu() to achieve what we want and rest of
> the reboot paths are using it. So how about using that function. I
> wrote the new patch below. It works for me. Can you please give it
> a try.

Hi, Vivek

I confirm this patch works for me, even in a nested kexec scenario. Feel
free to add these:

Bisected-by: WANG Chao <chao...@redhat.com>
Tested-by: WANG Chao <chao...@redhat.com>

Thanks
WANG Chao

Vivek Goyal

unread,
Nov 11, 2013, 10:40:02 AM11/11/13
to
On Fri, Nov 08, 2013 at 08:12:00AM -0800, H. Peter Anvin wrote:
> On 11/08/2013 07:14 AM, Vivek Goyal wrote:
> >
> > Hi Bao,
> >
> > This patch fixes the issue for me too. I noticed that we have generic
> > function migrate_to_reboot_cpu() to achieve what we want and rest of
> > the reboot paths are using it. So how about using that function. I
> > wrote the new patch below. It works for me. Can you please give it
> > a try.
> >
> > Thanks
> > Vivek
> >
>
> Is this path exercised for the kdump flow? migrate_to_reboot_cpu() is
> unsafe in that case.

Hi Peter,

Can you please consider queuing up this patch for next release.

Thanks
Vivek

H. Peter Anvin

unread,
Nov 11, 2013, 10:50:01 AM11/11/13
to
Yes, unless there is a better path for it since it is not x86-specific. I am fine taking it, though.
Sent from my mobile phone. Please pardon brevity and lack of formatting.

Vivek Goyal

unread,
Nov 11, 2013, 11:00:01 AM11/11/13
to
On Mon, Nov 11, 2013 at 07:39:18AM -0800, H. Peter Anvin wrote:
> Yes, unless there is a better path for it since it is not x86-specific. I am fine taking it, though.

Generally Andrew Morton routes kexec/kdump patches through his tree. Andrew,
do you think it should go through your tree?

Thanks
Vivek

H. Peter Anvin

unread,
Nov 11, 2013, 11:10:01 AM11/11/13
to
On 11/11/2013 07:57 AM, Vivek Goyal wrote:
> On Mon, Nov 11, 2013 at 07:39:18AM -0800, H. Peter Anvin wrote:
>> Yes, unless there is a better path for it since it is not x86-specific. I am fine taking it, though.
>
> Generally Andrew Morton routes kexec/kdump patches through his tree. Andrew,
> do you think it should go through your tree?
>
> Thanks
> Vivek

If so, you can have my:

Acked-by: H. Peter Anvin <h...@linux.intel.com>

-hpa
0 new messages