Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH v4 2/3] x86, apic: Add disable_cpu_apicid kernel parameter

16 views
Skip to first unread message

HATAYAMA Daisuke

unread,
Oct 22, 2013, 11:10:01 AM10/22/13
to
Add disable_cpu_apicid kernel parameter. To use this kernel parameter,
specify an initial APIC ID of the corresponding CPU you want to
disable.

This is mostly used for the kdump 2nd kernel to disable BSP to wake up
multiple CPUs without causing system reset or hang due to sending INIT
from AP to BSP.

Kdump users first figure out initial APIC ID of the BSP, CPU0 in the
1st kernel, for example from /proc/cpuinfo and then set up this kernel
parameter for the 2nd kernel using the obtained APIC ID.

This design is more flexible than disabling BSP in kernel boot time
automatically in that in kernel boot time we have no choice but
referring to ACPI/MP table to obtain initial APIC ID for BSP, meaning
that the method is not applicable to the systems without such BIOS
tables.

This is designed based on the assumption that users get initial APIC
ID of the BSP in still healthy state and so BSP is uniquely kept in
CPU0; so through this kernel parameter, only one initial APIC ID can
be specified.

Signed-off-by: HATAYAMA Daisuke <d.hat...@jp.fujitsu.com>
---
arch/x86/kernel/apic/apic.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index a7eb82d..8cc4180 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -74,6 +74,13 @@ unsigned int max_physical_apicid;
physid_mask_t phys_cpu_present_map;

/*
+ * Processor to be disabled specified by kernel parameter
+ * disable_cpu_apicid=<int>, mostly used for the kdump 2nd kernel to
+ * avoid undefined behaviour caused by sending INIT from AP to BSP.
+ */
+unsigned int disabled_cpu_apicid = BAD_APICID;
+
+/*
* Map cpu index to physical APIC ID
*/
DEFINE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_cpu_to_apicid, BAD_APICID);
@@ -2113,6 +2120,19 @@ void generic_processor_info(int apicid, int version)
bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
phys_cpu_present_map);

+ if (disabled_cpu_apicid != BAD_APICID &&
+ disabled_cpu_apicid != boot_cpu_physical_apicid &&
+ disabled_cpu_apicid == apicid) {
+ int thiscpu = num_processors + disabled_cpus;
+
+ pr_warning("ACPI: Disable specified CPU."
+ " Processor %d/0x%x ignored.\n",
+ thiscpu, apicid);
+
+ disabled_cpus++;
+ return;
+ }
+
/*
* If boot cpu has not been detected yet, then only allow upto
* nr_cpu_ids - 1 processors and keep one slot free for boot cpu
@@ -2589,3 +2609,12 @@ static int __init lapic_insert_resource(void)
* that is using request_resource
*/
late_initcall(lapic_insert_resource);
+
+static int __init apic_set_disabled_cpu_apicid(char *arg)
+{
+ if (!arg || !get_option(&arg, &disabled_cpu_apicid))
+ return -EINVAL;
+
+ return 0;
+}
+early_param("disable_cpu_apicid", apic_set_disabled_cpu_apicid);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

HATAYAMA Daisuke

unread,
Oct 22, 2013, 11:10:02 AM10/22/13
to
This patch set is to allow kdump 2nd kernel to wake up multiple CPUs
even if 1st kernel crashs on some AP, a continueing work from:

[PATCH v3 0/2] x86, apic, kdump: Disable BSP if boot cpu is AP
https://lkml.org/lkml/2013/10/16/300.

In this version, basic design has changed. Now users need to figure
out initial APIC ID of BSP in the 1st kernel and configures kernel
parameter for the 2nd kernel manually using disable_cpu_apic kernel
parameter to be newly introduced in this patch set. This design is
more flexible than the previous version in that we no longer have to
rely on ACPI/MP table to get initial APIC ID of BSP.

Sorry, this patch set have not include in-source documentation
requested by Borislav Petkov yet, but I'll post it later separately,
which would be better to focus on documentation reviewing.

ChangeLog

v3 => v4)

- Rebased on top of v3.12-rc6

- Basic design has been changed. Now users need to figure out initial
APIC ID of BSP in the 1st kernel and configures kernel parameter for
the 2nd kernel manually using disable_cpu_apic kernel parameter to
be newly introduced in this patch set. This design is more flexible
than the previous version in that we no longer have to rely on
ACPI/MP table to get initial APIC ID of BSP.

v2 => v3)

- Change default value of boot_cpu_is_bsp to true.

- Before executing rdmsr(MSR_IA32_APICBASE), check if the number of
processor family is larger than or equal to 6 in order to avoid
invalid opcode exception on processors where MSR_IA32_APICBASE is
not supported.

v1 => v2)

- Rebased on top of v3.12-rc5.

- Fix linking time error of boot_cpu_is_bsp_init() in case of
CONFIG_LOCAL_APIC disabled by adding empty static inline function
instead.

- Fix missing feature check by means of cpu_has_apic macro in
boot_cpu_is_bsp_init() before calling rdmsr_safe(MSR_IA32_APICBASE).

NOTE: I've checked local apic-present case only; I don't have any
x86 processor without local apic.

- Add __init annotation to boot_cpu_is_bsp_init().

Test

- built with and without CONFIG_LOCAL_APIC
- tested x86_64 in case of acpi and MP table

---

HATAYAMA Daisuke (3):
x86, apic: Don't count the CPU with BP flag from MP table as booting-up CPU
x86, apic: Add disable_cpu_apicid kernel parameter
Documentation, x86, apic, kexec: Add disable_cpu_apicid kernel parameter


Documentation/kernel-parameters.txt | 9 +++++++++
arch/x86/kernel/apic/apic.c | 29 +++++++++++++++++++++++++++++
arch/x86/kernel/mpparse.c | 1 -
3 files changed, 38 insertions(+), 1 deletion(-)

--

Thanks.
HATAYAMA, Daisuke

HATAYAMA Daisuke

unread,
Oct 22, 2013, 11:10:01 AM10/22/13
to
If crash occurs on some AP, then kdump 2nd kernel is booted up on the
AP. Therefore, it is not always correct that the CPU that is currently
booting up the kernel is BSP. It's wrong to reflect BSP information in
MP table as for the current booting up CPU.

Also, boot_cpu_physical_apicid has already been initialized before
reaching here, for example, in register_lapic_address().

This is a preparation for next patch that will introduce a new kernel
parameter to disabls specified CPU where boot_cpu_physical_apicid
needs to have apicid for the currently booting up CPU to identify it
to avoid falsely disabling it.

Signed-off-by: HATAYAMA Daisuke <d.hat...@jp.fujitsu.com>
---
arch/x86/kernel/mpparse.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index d2b5648..969bb9f 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -64,7 +64,6 @@ static void __init MP_processor_info(struct mpc_cpu *m)

if (m->cpuflag & CPU_BOOTPROCESSOR) {
bootup_cpu = " (Bootup-CPU)";
- boot_cpu_physical_apicid = m->apicid;
}

printk(KERN_INFO "Processor #%d%s\n", m->apicid, bootup_cpu);

HATAYAMA Daisuke

unread,
Oct 22, 2013, 11:10:02 AM10/22/13
to
Add disable_cpu_apicid kernel parameter to disable the CPU with the
specified number of initial APIC ID, mostly used for the kdump 2nd
kernel to disable BSP to wake up multiple CPUs without causing system
reset or hang due to sending INIT from AP to BSP.

Signed-off-by: HATAYAMA Daisuke <d.hat...@jp.fujitsu.com>
---
Documentation/kernel-parameters.txt | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index fcbb736..0ca0902 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -774,6 +774,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
disable= [IPV6]
See Documentation/networking/ipv6.txt.

+ disable_cpu_apicid= [X86,APIC,KEXEC,SMP]
+ Format: <int>
+ The number of initial APIC ID for the
+ corresponding CPU to be disabled at boot,
+ mostly used for the kdump 2nd kernel to
+ disable BSP to wake up multiple CPUs without
+ causing system reset or hang due to sending
+ INIT from AP to BSP.
+
disable_ddw [PPC/PSERIES]
Disable Dynamic DMA Window support. Use this if
to workaround buggy firmware.

jerry....@hp.com

unread,
Oct 22, 2013, 6:10:02 PM10/22/13
to
On Wed, Oct 23, 2013 at 12:01:18AM +0900, HATAYAMA Daisuke wrote:
> This patch set is to allow kdump 2nd kernel to wake up multiple CPUs
> even if 1st kernel crashs on some AP, a continueing work from:
>
> [PATCH v3 0/2] x86, apic, kdump: Disable BSP if boot cpu is AP
> https://lkml.org/lkml/2013/10/16/300.
>
> In this version, basic design has changed. Now users need to figure
> out initial APIC ID of BSP in the 1st kernel and configures kernel
> parameter for the 2nd kernel manually using disable_cpu_apic kernel
> parameter to be newly introduced in this patch set. This design is
> more flexible than the previous version in that we no longer have to
> rely on ACPI/MP table to get initial APIC ID of BSP.
>
> Sorry, this patch set have not include in-source documentation
> requested by Borislav Petkov yet, but I'll post it later separately,
> which would be better to focus on documentation reviewing.
>
> ChangeLog
>
> v3 => v4)
>
> - Rebased on top of v3.12-rc6
>
> - Basic design has been changed. Now users need to figure out initial
> APIC ID of BSP in the 1st kernel and configures kernel parameter for
> the 2nd kernel manually using disable_cpu_apic kernel parameter to
> be newly introduced in this patch set. This design is more flexible
> than the previous version in that we no longer have to rely on
> ACPI/MP table to get initial APIC ID of BSP.


Do you literally mean a human at each boot will have to configure
the kdump configuration files for passing disable_cpu_apic?
Or do you envision the setting of disable_cpu_apic being put into
the kdump initialization scripts?

thanks

Jerry
> _______________________________________________
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

--

----------------------------------------------------------------------------
Jerry Hoemann Software Engineer Hewlett-Packard/MODL

3404 E Harmony Rd. MS 57 phone: (970) 898-1022
Ft. Collins, CO 80528 FAX: (970) 898-XXXX
email: jerry....@hp.com
----------------------------------------------------------------------------

HATAYAMA Daisuke

unread,
Oct 22, 2013, 8:10:01 PM10/22/13
to
Nearer to the former case, but this is not what a human should do. It's
a cumbersome task. I think, on fedora/RHEL system for example, kdump
service should check at each boot automatically.

--
Thanks.
HATAYAMA, Daisuke

Vivek Goyal

unread,
Oct 23, 2013, 12:00:02 PM10/23/13
to
On Wed, Oct 23, 2013 at 09:05:06AM +0900, HATAYAMA Daisuke wrote:

[..]
> >Do you literally mean a human at each boot will have to configure
> >the kdump configuration files for passing disable_cpu_apic?
> >Or do you envision the setting of disable_cpu_apic being put into
> >the kdump initialization scripts?
> >
> >thanks
> >
> >Jerry
>
> Nearer to the former case, but this is not what a human should do. It's
> a cumbersome task. I think, on fedora/RHEL system for example, kdump
> service should check at each boot automatically.

Hi Hatayama,

So what information should I look for to prepare disable_cpu_apic=X in
kdump script?

Is BSP processor info exported to user space somewhere? Or assuming that
processor 0 is BSP and corresponding apicid should be disabled in kdump
kernel is good enough?

I am looking at /proc/cpuinfo and following 3 fields seem interesting.

processor: 0
apicid : 0
initial apicid : 0

What's the difference between apicid and "initial apicid". I guess
initial apicid reflects the apicid number as set by firmware and then
kernel can overwrite it and new number would be reflected in "apicid"?

If that's the case, then I guess we should be looking at "apicid" of
processor "0" and set that in disable_cpu_apic? Because that's the
number kdump kernel boot should see in apic upon boot.

Thanks
Vivek

HATAYAMA Daisuke

unread,
Oct 23, 2013, 9:50:01 PM10/23/13
to
(2013/10/24 0:51), Vivek Goyal wrote:
> On Wed, Oct 23, 2013 at 09:05:06AM +0900, HATAYAMA Daisuke wrote:
>
> [..]
>>> Do you literally mean a human at each boot will have to configure
>>> the kdump configuration files for passing disable_cpu_apic?
>>> Or do you envision the setting of disable_cpu_apic being put into
>>> the kdump initialization scripts?
>>>
>>> thanks
>>>
>>> Jerry
>>
>> Nearer to the former case, but this is not what a human should do. It's
>> a cumbersome task. I think, on fedora/RHEL system for example, kdump
>> service should check at each boot automatically.
>
> Hi Hatayama,
>
> So what information should I look for to prepare disable_cpu_apic=X in
> kdump script?
>
> Is BSP processor info exported to user space somewhere? Or assuming that
> processor 0 is BSP and corresponding apicid should be disabled in kdump
> kernel is good enough?
>

Yes, this patch set assumes that the processor 0 is BSP and there's no
other BSP. Because this patch cares about only one BSP processor,
the disabled_cpu_apicid variable has unsigned int, not mask.

I think this assumption is reasonable since doing it rigorously requires
additional processing between 1st and 2nd kernels just as I explained in
previous mail.

> I am looking at /proc/cpuinfo and following 3 fields seem interesting.
>
> processor: 0
> apicid : 0
> initial apicid : 0
>
> What's the difference between apicid and "initial apicid". I guess
> initial apicid reflects the apicid number as set by firmware and then
> kernel can overwrite it and new number would be reflected in "apicid"?
>
> If that's the case, then I guess we should be looking at "apicid" of
> processor "0" and set that in disable_cpu_apic? Because that's the
> number kdump kernel boot should see in apic upon boot.
>

Yes, that's fully correct, and please see 10.4.6 Local APIC ID in Intel SPG
for details.

BTW, we can use cpuid instruction in user-space, too. It might be more
flexible to use cpuid than looking up /proc/cpuinfo.

Also, there's one corner case that if we hot-remove cpu0, we cannot
look up /proc/cpuinfo to get cpu0 information since /proc/cpunifo displays
*online* cpus only. We cannot use even cpuid instruction for offline cpu.
So, to address this corner case, we need to prepare new interface to see
cpu0 initial apicid which is always available.

My idea is for example to introduce the following file in sysfs:

/sys/devices/system/cpu/cpu0/initial_apicid

Under the current implementation, cpu0 hot-remove is software one and kernel
must start with cpu0 in boot time. It's enough to assign the value of initial
APIC ID in the boot time. The one in boot_cpu_data?

--
Thanks.
HATAYAMA, Daisuke

Eric W. Biederman

unread,
Oct 24, 2013, 2:00:01 AM10/24/13
to
Vivek Goyal <vgo...@redhat.com> writes:

> Hi Hatayama,
>
> So what information should I look for to prepare disable_cpu_apic=X in
> kdump script?
>
> Is BSP processor info exported to user space somewhere? Or assuming that
> processor 0 is BSP and corresponding apicid should be disabled in kdump
> kernel is good enough?

On x86 assuming that processor 0 is BSP should be good enough. Unless
we starting getting SMP BIOSen playing games with us.

> I am looking at /proc/cpuinfo and following 3 fields seem interesting.
>
> processor: 0
> apicid : 0
> initial apicid : 0
>
> What's the difference between apicid and "initial apicid". I guess
> initial apicid reflects the apicid number as set by firmware and then
> kernel can overwrite it and new number would be reflected in "apicid"?

Last I was current initial apicid is the apic id the hardware chooses
and it tells you something about the topology of the processors in the
system.

The apicid is programmed by the BIOS so that the BSP can have apicid 0,
and apicid's can be contiguous etc. In principle any piece of software
can program apicids but there is no advantage.

> If that's the case, then I guess we should be looking at "apicid" of
> processor "0" and set that in disable_cpu_apic? Because that's the
> number kdump kernel boot should see in apic upon boot.

Restricting this to x86 and not Voyager that is correct. Linux cpu 0
is the processor we booted up on as is apparent lots of things special
case the bootstrap processor so using a cpu hotplug remove to make it go
away is silly. Still to handle cazy cpu hotplug I would recomend to
simply force a single cpu if you can't find cpu 0.

Eric

Baoquan He

unread,
Oct 29, 2013, 10:30:03 AM10/29/13
to
Hi,

I am reviewing this patchset, and found there's a cpu0 hotplug feature
posted by intel which we can borrow an idea from. In that implementation,
CPU0 is waken up by nmi not INIT to avoid the realmode bootstrap code
execution. I tried it by below patch which includes one line of change.

By console printing, I got the boot cpu is always 0(namely cpu=0),
however the apicid related to each processor keeps the same as in 1st
kernel. In my HP Z420 machine, the apicid for BSP is 0, so I just make a
test patch which depends on the fact that apicid for BSP is 0. Maybe
generally the apicid for BSP can't be guaranteed, then passing it from
1st kernel to 2nd kernel in cmdline is very helpful, just as you have
done for disable_cpu_apic.

On my HP z420, I add nr_cpus=4 in /etc/sysconfig/kdump, and then execute
below command, then 3 APs (1 boot cpu and 2 AP) can be waken up
correctly, but BSP failed because NMI received for unknown reason 21 on
CPU0. I think I need further check why BSP failed to wake up by nmi. But
3 processors are brought up successfully and kdump is successful too.

sudo taskset -c 1 sh -c "echo c >/proc/sysrq-trigger"

[ 0.296831] smpboot: Booting Node 0, Processors # 1
[ 0.302095]
*****************************************************cpu=1, apicid=0, wakeup_cpu_via_init_nmi
[ 0.311942] cpu=1, apicid=0, register_nmi_handlercpu=1, apicid=0, wakeup_secondary_cpu_via_nmi
[ 0.320826] Uhhuh. NMI received for unknown reason 21 on CPU 0.
[ 0.327129] Do you have a strange power saving mode enabled?
[ 0.333858] Dazed and confused, but trying to continue
[ 0.339290] cpu=1, apicid=0, wakeup_cpu_via_init_nmi
[ 2.409099] Uhhuh. NMI received for unknown reason 21 on CPU 0.
[ 2.415393] Do you have a strange power saving mode enabled?
[ 2.421142] Dazed and confused, but trying to continue
[ 5.379519] smpboot: CPU1: Not responding
[ 5.383692] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 6cacab6..e45fe5b 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -702,7 +702,7 @@ wakeup_cpu_via_init_nmi(int cpu, unsigned long
start_ip, int apicid,
/*
* Wake up AP by INIT, INIT, STARTUP sequence.
*/
- if (cpu)
+ if (cpu && apicid)
return wakeup_secondary_cpu_via_init(apicid, start_ip);

/*
> _______________________________________________
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Baoquan He

unread,
Oct 29, 2013, 11:30:02 AM10/29/13
to
On 10/29/13 at 10:21pm, Baoquan He wrote:
> Hi,
>
> I am reviewing this patchset, and found there's a cpu0 hotplug feature

Forget to attach the link of patch for cpu0 hotplug.
http://lwn.net/Articles/475018/

In this patchset, BSP is also called CPU0.

HATAYAMA Daisuke

unread,
Oct 29, 2013, 8:50:02 PM10/29/13
to
We've already discussed this approach and concluded this is not applicable
to our issue.
Follow http://lists.infradead.org/pipermail/kexec/2012-October/006905.html.

The reason is:

- The cpu0-hotplugging approach assumes BSP to be halting before initiating
NMI to it while in our case, BSP is halting in the 1st kernel or is
running in arbitrary position of the 1st kernel in catastrophic state.

- In general, NMI modifies stack, which means if throwing NMI to the BSP
in the 1st kernel, stack on the 1st kernel is modified. It's unpermissible
from kdump's perspective.

- On x86_64, there are two cases where stack is changed to another one
when receiving interrupts. One is when receiving interrupt in user mode.
The other is when using Interrupt Stack Table (IST), which is already
used in the current x86_64 implementation.

By using either, it would be possible to wake up BSP in the 1st kernel
by modifying the contexts on the 2nd kernel's NMI stack pushed on when NMI
to the 1st kernel is initiated.

However, this approach depends on the logic in the 1st kernel, there's
no guarantee that it works well. Consider severely buggy situation again.

- To do this approach rigorously, we need to check if states of BSP and APs
are kept in just what we assume in the place where logic is guaranteed to be
sane, i.e., at least after purgatory. However, adding new logic in the
purgatory means we are forced to introduce additional dependency between
kernel and kexec. The process performed in purgatory itself is not so
simple.I don't like this complication.

To sum up, I think the current idea is simple enough approach.

--
Thanks.
HATAYAMA, Daisuke

Baoquan He

unread,
Oct 30, 2013, 2:10:01 AM10/30/13
to
Hi HATAYAMA,

All right. I didn't think of the stack issues NMI will bring. In fact
without this NMI stack problem, using CPU0 Hotplug as a base and sending
nmi to bsp will be good, because BSP failure can be acceptable, then
(N-1)cpus can be used. Later on if possible the contexts modifying can
be changed to let BSP wake up correctly. After all, from the user's
point of view, multiple cpus can be waken up, why not waking up all cpus
including BSP.

Anyway, this issue has been discussed so long time, and it will be great
to run multiple cpus in 2nd kernel. This evolution may be like CPU0 Hotplug,
they let cpus except of BSP hot plug available, then hanle the last cpu -
the BSP finally. From this perspective, I like your patch and hope it
can be merged asap.

Baoquan
Thanks a lot

>
> - On x86_64, there are two cases where stack is changed to another one
> when receiving interrupts. One is when receiving interrupt in user mode.
> The other is when using Interrupt Stack Table (IST), which is already
> used in the current x86_64 implementation.
>
> By using either, it would be possible to wake up BSP in the 1st kernel
> by modifying the contexts on the 2nd kernel's NMI stack pushed on when NMI
> to the 1st kernel is initiated.
>
> However, this approach depends on the logic in the 1st kernel, there's
> no guarantee that it works well. Consider severely buggy situation again.
>
> - To do this approach rigorously, we need to check if states of BSP and APs
> are kept in just what we assume in the place where logic is guaranteed to be
> sane, i.e., at least after purgatory. However, adding new logic in the
> purgatory means we are forced to introduce additional dependency between
> kernel and kexec. The process performed in purgatory itself is not so
> simple.I don't like this complication.
>
> To sum up, I think the current idea is simple enough approach.
>
> --
> Thanks.
> HATAYAMA, Daisuke
>
>
> _______________________________________________
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

HATAYAMA Daisuke

unread,
Oct 30, 2013, 6:00:03 AM10/30/13
to
Considering again, I'm now beginning with thinking that making CPU halting
in the 1st kernel to execute the 2nd kernel's NMI handler is impossible.

The address of IDT is saved in IDTR and this is a per-cpu register, and
value of IDTR in the 2nd kernel and the one in the 1st kernel are different.
In other words, to wake up BSP from 2nd kernel using NMI, it's necessary to
tell the address of IDTR in the 2nd kernel to the BSP halting in the 1st
kernel.

--
Thanks.
HATAYAMA, Daisuke

jerry....@hp.com

unread,
Oct 30, 2013, 9:00:02 PM10/30/13
to
On Wed, Oct 23, 2013 at 09:05:06AM +0900, HATAYAMA Daisuke wrote:
> >>
> >>- Rebased on top of v3.12-rc6
> >>
> >>- Basic design has been changed. Now users need to figure out initial
> >> APIC ID of BSP in the 1st kernel and configures kernel parameter for
> >> the 2nd kernel manually using disable_cpu_apic kernel parameter to
> >> be newly introduced in this patch set. This design is more flexible
> >> than the previous version in that we no longer have to rely on
> >> ACPI/MP table to get initial APIC ID of BSP.
> >
> >
> >Do you literally mean a human at each boot will have to configure
> >the kdump configuration files for passing disable_cpu_apic?
> >Or do you envision the setting of disable_cpu_apic being put into
> >the kdump initialization scripts?
> >
> >thanks
> >
> >Jerry
>
> Nearer to the former case, but this is not what a human should do. It's
> a cumbersome task. I think, on fedora/RHEL system for example, kdump
> service should check at each boot automatically.
>
> HATAYAMA, Daisuke


Daisuke,

Are you planning on making changes to the kexec tools to automate
the setting of disable_cpu_apic to the capture kernel? Or do you
know someone who is planning this?

I back ported the kernel side changes to a 4.2.32 based kernel.
I tested the patch on a prototype system which exhibits a stable
initial_apic_id for CPU 0. While not something that would be suitable
for customers, it does allow me to test the kernel portion of the patch.

I will report the results of the testing later this week.

Thanks

Jerry


--

----------------------------------------------------------------------------
Jerry Hoemann Software Engineer Hewlett-Packard/MODL

3404 E Harmony Rd. MS 57 phone: (970) 898-1022
Ft. Collins, CO 80528 FAX: (970) 898-XXXX
email: jerry....@hp.com
----------------------------------------------------------------------------

HATAYAMA Daisuke

unread,
Oct 31, 2013, 12:50:01 AM10/31/13
to
I'll do that after this patch is merged in kernel. But it is still under
reviewing.

--
Thanks.
HATAYAMA, Daisuke

Vivek Goyal

unread,
Oct 31, 2013, 9:30:01 AM10/31/13
to
On Wed, Oct 30, 2013 at 06:58:13PM -0600, jerry....@hp.com wrote:

[..]
> Daisuke,
>
> Are you planning on making changes to the kexec tools to automate
> the setting of disable_cpu_apic to the capture kernel? Or do you
> know someone who is planning this?

I think we should not make this change in kexec-tools and should leave
it to distro scripts to append disable_cpu_apic.

Who knows in future this restriction is not there at all and kexec-tools
will be stuck with always passing disable_cpu_apic. Getting rid of
this parameter in distro scripts will be much easier.

Thanks
Vivek

Simon Horman

unread,
Oct 31, 2013, 8:40:02 PM10/31/13
to
On Thu, Oct 31, 2013 at 09:27:45AM -0400, Vivek Goyal wrote:
> On Wed, Oct 30, 2013 at 06:58:13PM -0600, jerry....@hp.com wrote:
>
> [..]
> > Daisuke,
> >
> > Are you planning on making changes to the kexec tools to automate
> > the setting of disable_cpu_apic to the capture kernel? Or do you
> > know someone who is planning this?
>
> I think we should not make this change in kexec-tools and should leave
> it to distro scripts to append disable_cpu_apic.
>
> Who knows in future this restriction is not there at all and kexec-tools
> will be stuck with always passing disable_cpu_apic. Getting rid of
> this parameter in distro scripts will be much easier.

Hi Vivek, Hi Daisuke,

That approach sounds reasonable to me.

jerry....@hp.com

unread,
Nov 1, 2013, 4:00:01 AM11/1/13
to
On Thu, Oct 31, 2013 at 09:27:45AM -0400, Vivek Goyal wrote:
> On Wed, Oct 30, 2013 at 06:58:13PM -0600, jerry....@hp.com wrote:
>
> [..]
> > Daisuke,
> >
> > Are you planning on making changes to the kexec tools to automate
> > the setting of disable_cpu_apic to the capture kernel? Or do you
> > know someone who is planning this?
>
> I think we should not make this change in kexec-tools and should leave
> it to distro scripts to append disable_cpu_apic.
>
> Who knows in future this restriction is not there at all and kexec-tools
> will be stuck with always passing disable_cpu_apic. Getting rid of
> this parameter in distro scripts will be much easier.
>
> Thanks
> Vivek


I'm fine either way as long as there is a reasonable automated way to
pass this information to the capture kernel.

It is possible for the bsp to change from boot to boot. So checking
the initial apic id of the bsp each boot will be necessary. If it
changes the kdump initrd would need to change, correct?
If yes, it would be good if the scripts can cache prior boot value
of bsp and avoid rebuilding the kdump initrd if the bsp doesn't change.


thanks

Jerry
--

----------------------------------------------------------------------------
Jerry Hoemann Software Engineer Hewlett-Packard/MODL

3404 E Harmony Rd. MS 57 phone: (970) 898-1022
Ft. Collins, CO 80528 FAX: (970) 898-XXXX
email: jerry....@hp.com
----------------------------------------------------------------------------

HATAYAMA Daisuke

unread,
Nov 4, 2013, 2:10:01 AM11/4/13
to
(2013/10/31 22:27), Vivek Goyal wrote:
> On Wed, Oct 30, 2013 at 06:58:13PM -0600, jerry....@hp.com wrote:
>
> [..]
>> Daisuke,
>>
>> Are you planning on making changes to the kexec tools to automate
>> the setting of disable_cpu_apic to the capture kernel? Or do you
>> know someone who is planning this?
>
> I think we should not make this change in kexec-tools and should leave
> it to distro scripts to append disable_cpu_apic.
>
> Who knows in future this restriction is not there at all and kexec-tools
> will be stuck with always passing disable_cpu_apic. Getting rid of
> this parameter in distro scripts will be much easier.
>
> Thanks
> Vivek
>

Yes, I'll post the patch to distribution's kexec-tools. ``kexec-tools''
seems a little ambiguous; the users scripts on fedora are contained
in the same package.

I don't know things on other distributions mostly at all; I've read
Ubuntu's scripts really a little bit only. I'd like anyone to post patches
if necessary...

BTW, what's the status of this patch set? I've redesigned this so as to
be done with kernel parameter in order not to depend on platform specific
BIOS table. Due to the redesign, I think now ACK is needed again.

--
Thanks.
HATAYAMA, Daisuke

jerry....@hp.com

unread,
Nov 6, 2013, 2:10:02 PM11/6/13
to
On Wed, Oct 23, 2013 at 12:01:18AM +0900, HATAYAMA Daisuke wrote:
> This patch set is to allow kdump 2nd kernel to wake up multiple CPUs
> even if 1st kernel crashs on some AP, a continueing work from:
>
> [PATCH v3 0/2] x86, apic, kdump: Disable BSP if boot cpu is AP
> https://lkml.org/lkml/2013/10/16/300.
>
> In this version, basic design has changed. Now users need to figure
> out initial APIC ID of BSP in the 1st kernel and configures kernel
> parameter for the 2nd kernel manually using disable_cpu_apic kernel
> parameter to be newly introduced in this patch set. This design is
> more flexible than the previous version in that we no longer have to
> rely on ACPI/MP table to get initial APIC ID of BSP.
>
> Sorry, this patch set have not include in-source documentation
> requested by Borislav Petkov yet, but I'll post it later separately,
> which would be better to focus on documentation reviewing.
>
> ChangeLog
>
> v3 => v4)
>
> - Rebased on top of v3.12-rc6
>
> - Basic design has been changed. Now users need to figure out initial
> APIC ID of BSP in the 1st kernel and configures kernel parameter for
> the 2nd kernel manually using disable_cpu_apic kernel parameter to
> be newly introduced in this patch set. This design is more flexible
> than the previous version in that we no longer have to rely on
> ACPI/MP table to get initial APIC ID of BSP.
>


Daisuke,

I have back ported version 4 of this patch to both a 2.6.32 and 3.0.80
based kernels and distros and tested on a prototype system. I have
previously test version 1 & 3 as well.)

The systems are configured to boot the capture kernel 8-way parallel.
However, I am running makedumpfile single threaded.

Panic is induced via "echo c > /proc/sysrq-trigger". This is done
under various system loads and on random cpus. I have done over a
thousand dumps total during this testing.

I have seen no issues w/ the 3.0.80 dump testing on our proto.

On the 2.6.32 testing on our proto, i have hit a low probability (< 5%)
chance of the capture suffering a soft lockup hang during
"Switching to clocksource hpet." I have not RCA'd this yet.
Note, I have seen this issue on earlier version of the patch, so
it is not specific to this version.

I then tested the 2.6.32 port on a dl380. This worked without issue.

Note, I have seen no issues related to this patch on our proto when
booting the capture with a single processor.

While I am still pursuing the issue of the 2.6.32 kernel on our proto,
I believe this patch is good and should be accepted.




thanks

Jerry

--

----------------------------------------------------------------------------
Jerry Hoemann Software Engineer Hewlett-Packard/MODL

3404 E Harmony Rd. MS 57 phone: (970) 898-1022
Ft. Collins, CO 80528 FAX: (970) 898-XXXX
email: jerry....@hp.com
----------------------------------------------------------------------------

Baoquan He

unread,
Nov 7, 2013, 10:40:01 PM11/7/13
to
Hi,

Reccently people reported kexec didn't work correctly. After check, it's
a regression. Since a code block which migrate current thread to cpu0
when executing "kexec -e", this can be reproduced by setting affinity to
CPUn(n!=0). You can find this patch in this link:
https://lkml.org/lkml/2013/11/5/88

Then I thought why we don't do this in kdump. I tried migrating current
thread to cpu0 when crash happened, it works very well. Set affinity to
make crash happened on CPUn(n!=0), then all cpus can be brought up and
dump is successful. I pasted the patch as below.

Only one thing worried me, whether the context related to crash cpu will
be different, and do we care which cpu crashed. If it need be cared, or
it doesn't involve difference, That will be great. Multiple CPUs can be
supported easily in this simpler way. Meanwhile, this patch just try to
migrate, if it's failed, we can avoid to bring up bsp.

Watch do you think about it?

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index e0e0841..9e6cf4b 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -102,6 +102,22 @@ static void kdump_nmi_shootdown_cpus(void)

void native_machine_crash_shutdown(struct pt_regs *regs)
{
+#ifdef CONFIG_SMP
+ /* The boot cpu is always logical cpu 0 */
+ int reboot_cpu_id = 0;
+
+ /* See if there has been given a command line override */
+ if ((reboot_cpu != -1) && (reboot_cpu < nr_cpu_ids) &&
+ cpu_online(reboot_cpu))
+ reboot_cpu_id = reboot_cpu;
+
+ /* Make certain the cpu I'm about to reboot on is online */
+ if (!cpu_online(reboot_cpu_id))
+ reboot_cpu_id = smp_processor_id();
+
+ /* Make certain I only run on the appropriate processor */
+ set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id));
+
/* This function is only called after the system
* has panicked or is otherwise in a critical state.
* The minimum amount of code to allow a kexec'd kernel
@@ -114,6 +130,7 @@ void native_machine_crash_shutdown(struct pt_regs
*regs)
local_irq_disable();

kdump_nmi_shootdown_cpus();
+#endif




On 10/23/13 at 12:01am, HATAYAMA Daisuke wrote:
> _______________________________________________
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

HATAYAMA Daisuke

unread,
Nov 7, 2013, 11:20:02 PM11/7/13
to
(2013/11/08 12:30), Baoquan He wrote:
> Hi,
>
> Reccently people reported kexec didn't work correctly. After check, it's
> a regression. Since a code block which migrate current thread to cpu0
> when executing "kexec -e", this can be reproduced by setting affinity to
> CPUn(n!=0). You can find this patch in this link:
> https://lkml.org/lkml/2013/11/5/88
>
> Then I thought why we don't do this in kdump. I tried migrating current
> thread to cpu0 when crash happened, it works very well. Set affinity to
> make crash happened on CPUn(n!=0), then all cpus can be brought up and
> dump is successful. I pasted the patch as below.
>
> Only one thing worried me, whether the context related to crash cpu will
> be different, and do we care which cpu crashed. If it need be cared, or
> it doesn't involve difference, That will be great. Multiple CPUs can be
> supported easily in this simpler way. Meanwhile, this patch just try to
> migrate, if it's failed, we can avoid to bring up bsp.
>
> Watch do you think about it?
>

We have already discussed this idea. It's the idea of my first patch and
it was nacked. See the following url. (Sorry, I removed explanation of
development history from patch description at v4 patch, but I've planned
to write what ideas doesn't work well in documentation of this work.)

https://lkml.org/lkml/2012/4/15/181

The key reason why we cannot do that is the environment we are running
must be considered broken. Either interrupts or scheduler could no longer
work. Tables for interrupts can be broken. The other cpus except for the
crashing cpu are no longer guaranteed to be running sanely. Migrating cpu
from the crashing cpu to cpu0 reduces reliability of kdump.

--
Thanks.
HATAYAMA, Daisuke

Vivek Goyal

unread,
Nov 8, 2013, 11:10:01 AM11/8/13
to
On Wed, Oct 23, 2013 at 12:01:24AM +0900, HATAYAMA Daisuke wrote:
> If crash occurs on some AP, then kdump 2nd kernel is booted up on the
> AP. Therefore, it is not always correct that the CPU that is currently
> booting up the kernel is BSP. It's wrong to reflect BSP information in
> MP table as for the current booting up CPU.
>
> Also, boot_cpu_physical_apicid has already been initialized before
> reaching here, for example, in register_lapic_address().
>
> This is a preparation for next patch that will introduce a new kernel
> parameter to disabls specified CPU where boot_cpu_physical_apicid
> needs to have apicid for the currently booting up CPU to identify it
> to avoid falsely disabling it.
>
> Signed-off-by: HATAYAMA Daisuke <d.hat...@jp.fujitsu.com>
> ---
> arch/x86/kernel/mpparse.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
> index d2b5648..969bb9f 100644
> --- a/arch/x86/kernel/mpparse.c
> +++ b/arch/x86/kernel/mpparse.c
> @@ -64,7 +64,6 @@ static void __init MP_processor_info(struct mpc_cpu *m)
>
> if (m->cpuflag & CPU_BOOTPROCESSOR) {
> bootup_cpu = " (Bootup-CPU)";
> - boot_cpu_physical_apicid = m->apicid;
> }
>
> printk(KERN_INFO "Processor #%d%s\n", m->apicid, bootup_cpu);

Hi Hatayama,

Looks like different pieces of code are assuming different meaning of
boot_cpu_physical_apicid.

MP table parsing code seems to assume that this is boot cpu as reported
by MP tables.

if (m->cpuflag & CPU_BOOTPROCESSOR) {
bootup_cpu = " (Bootup-CPU)";
boot_cpu_physical_apicid = m->apicid;
}

And based on that it also tries to determine whether boot cpu has been
detected yet or not. If it was always the cpu we are booting on, then
MP table parsing code did not have to worry about whether boot cpu
has been detected yet or not.

void generic_processor_info(int apicid, int version)
{
int cpu, max = nr_cpu_ids;
bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
phys_cpu_present_map);

/*
* If boot cpu has not been detected yet, then only allow upto
* nr_cpu_ids - 1 processors and keep one slot free for boot cpu
*/
if (!boot_cpu_detected && num_processors >= nr_cpu_ids - 1 &&
apicid != boot_cpu_physical_apicid) {
int thiscpu = max + disabled_cpus - 1;

pr_warning(
"ACPI: NR_CPUS/possible_cpus limit of %i almost"
" reached. Keeping one slot for boot cpu."
" Processor %d/0x%x ignored.\n", max, thiscpu,
apicid);

disabled_cpus++;
return;
}

I am not the code expert here but looks like there is some confusion
here w.r.t what's the meaning of boot_cpu_physical_apicid and we might
have to fix it.

Thanks
Vivek

HATAYAMA Daisuke

unread,
Nov 10, 2013, 10:00:02 PM11/10/13
to
Looking at my past investigation, kernel/mpparse.c, mm/amdtopology.c and
platform/visws/visws_quirks.c assumes that boot_cpu_physical_apicid
has initial apicid of the BSP, not the current actual booting-up cpu.

These three are called in get_smp_config() below. If either of them is
called actually, boot_cpu_physical_apicid has the apicid different from
the current actual booting-up cpu temporarily. But init_apic_mappings()
soon modifies back the value to the one obtained by read_apic_id().

/*
* Read APIC and some other early information from ACPI tables.
*/
acpi_boot_init();
sfi_init();
x86_dtb_init();

/*
* get boot-time SMP configuration:
*/
if (smp_found_config)
get_smp_config();

prefill_possible_map();

init_cpu_to_node();

init_apic_mappings();

So, thanks to init_apic_mappings(), the patch set would work without the
first patch... This is a careless point in this patch set.

Also, in case of UP kernel, there is the following code in
APIC_init_uniprocessor():

/*
* Hack: In case of kdump, after a crash, kernel might be booting
* on a cpu with non-zero lapic id. But boot_cpu_physical_apicid
* might be zero if read from MP tables. Get it from LAPIC.
*/
# ifdef CONFIG_CRASH_DUMP
boot_cpu_physical_apicid = read_apic_id();
# endif

So, it seems reasonable for boot_cpu_physical_apicid to have the apicid for
the actually booting-up cpu.

Next, let's consider whether or not to fix here. To be honest, the above
lastly called init_apic_mappings() part looks to me a kind of workaround
and should be cleaned up, by introducing bsp_apicid variable separately
to boot_cpu_physical_apicid.

However, I don't know mm/amdtopology.c and platform/visws/visws_quirks.c very
well, in particular for the former. I would think it really needs the real BSP's
apicid in the next patch, but more reviewing by each maintainers might be needed
here.

BTW, there are other confusions except for boot_cpu_physical_apicid. For example,
there's currently the assumption that cpu0 is always the one with BSP flag, for
example, in hibernation, suspend, reboot and cpu0 hot-plugging code. The current
version of this patch set doesn't deal with any of them because the first two
are never used in the kdump 2nd kernel, reboot has so far worked well even if
cpu0 is AP. Lastly, cpu0 hot-plugging code is never used in the 2nd kernel; even
if it is used, NMI logic would be applicable to AP without special handling.

So, I'll post a patch like this. Do you agree?

- introduce bsp_apicid variable in apic.c and use it to have the initial apicid
of the real BSP.
- replace boot_cpu_physical_apicid in mm/amdtopology.c, mpparse.c and
platform/visws/visws_quirks.c by newly introduced bsp_apicid. The change needs
to be reviewed by each maintainers.

Also, by the way, currently read_apic_id() is used to get the apicid of the
current actually booting-up cpu. However, this is compared with the initial apicid
exported from MP table or MADT. So, rigorously, read_apic_id() is wrong, this
returns the apicid possibly different from initial apicid. Instead, cpuid value
should be used. However, there's no bug report about this and if fixing this,
patch set would become bigger, which I want to avoid. So, I don't do this.

--
Thanks.
HATAYAMA, Daisuke

HATAYAMA Daisuke

unread,
Nov 11, 2013, 12:00:02 AM11/11/13
to
Thanks for your testing.

> I have seen no issues w/ the 3.0.80 dump testing on our proto.
>
> On the 2.6.32 testing on our proto, i have hit a low probability (< 5%)
> chance of the capture suffering a soft lockup hang during
> "Switching to clocksource hpet." I have not RCA'd this yet.
> Note, I have seen this issue on earlier version of the patch, so
> it is not specific to this version.
>
> I then tested the 2.6.32 port on a dl380. This worked without issue.
>
> Note, I have seen no issues related to this patch on our proto when
> booting the capture with a single processor.
>
> While I am still pursuing the issue of the 2.6.32 kernel on our proto,
> I believe this patch is good and should be accepted.
>

This seems there's something that depends on the system you used. But I
have never verified my patch set on 2.6.32-based kernel. I'll try to
do a similar test on some FJ systems.

The 2.6.32-based kernel you mean is one of the Longterm release kernels,
right? So, you used on the test the 2.6.32-based Longterm release kernel
with my v4 patch, right?

The root cause seems to have already been fixed on recent kernel since
you didn't see the bug on 3.0.80-based kernel, so I think binary search
would be useful.

--
Thanks.
HATAYAMA, Daisuke

Vivek Goyal

unread,
Nov 11, 2013, 12:00:02 PM11/11/13
to
On Mon, Nov 11, 2013 at 11:52:30AM +0900, HATAYAMA Daisuke wrote:

[..]
> Looking at my past investigation, kernel/mpparse.c, mm/amdtopology.c and
> platform/visws/visws_quirks.c assumes that boot_cpu_physical_apicid
> has initial apicid of the BSP, not the current actual booting-up cpu.
>
> These three are called in get_smp_config() below. If either of them is
> called actually, boot_cpu_physical_apicid has the apicid different from
> the current actual booting-up cpu temporarily. But init_apic_mappings()
> soon modifies back the value to the one obtained by read_apic_id().
>
> /*
> * Read APIC and some other early information from ACPI tables.
> */
> acpi_boot_init();
> sfi_init();
> x86_dtb_init();
>
> /*
> * get boot-time SMP configuration:
> */
> if (smp_found_config)
> get_smp_config();
>
> prefill_possible_map();
>
> init_cpu_to_node();
>
> init_apic_mappings();
>
> So, thanks to init_apic_mappings(), the patch set would work without the
> first patch... This is a careless point in this patch set.
>

If init_apic_mappings(), is making sure that boot_cpu_physical_apicid is
apic id of booting processor, and you don't need first patch of your
series, then I think atleast re-post your patch series without first
patch.

And then there can be another series which looks into whether we need
two different variables or not and if we do, then a separate variable
bsp_physical_apicid will track the bsp id as reported by BIOS and
boot_cpu_physical_apicid will track apic id of booting cpu. This might
a very big and slow cleanup. So I think blocking the first patch series
behind it might not make much sense.

Thanks
Vivek

HATAYAMA Daisuke

unread,
Nov 11, 2013, 7:50:02 PM11/11/13
to
Yes, I'll repost soon.

> And then there can be another series which looks into whether we need
> two different variables or not and if we do, then a separate variable
> bsp_physical_apicid will track the bsp id as reported by BIOS and
> boot_cpu_physical_apicid will track apic id of booting cpu. This might
> a very big and slow cleanup. So I think blocking the first patch series
> behind it might not make much sense.
>

Yes, the current handling of boot_cpu_physical_apicid looks strange and
should be cleaned up, but the cleaning up needs reviewing together with
the maintainers for the corresponding part; in particular, it can be
lengthy for the reviewing on amdtopology.c. I leave this as another
work for the time being.

--
Thanks.
HATAYAMA, Daisuke

HATAYAMA Daisuke

unread,
Nov 12, 2013, 5:00:02 AM11/12/13
to
This patch set is to allow kdump 2nd kernel to wake up multiple CPUs
even if 1st kernel crashs on some AP, a continueing work from:

[PATCH v3 0/2] x86, apic, kdump: Disable BSP if boot cpu is AP
https://lkml.org/lkml/2013/10/16/300.

At v4, basic design has changed. Now users need to figure out initial
APIC ID of BSP in the 1st kernel and configures kernel parameter for
the 2nd kernel manually using disable_cpu_apic kernel parameter to be
newly introduced in this patch set. This design is more flexible than
the previous version in that we no longer have to rely on ACPI/MP
table to get initial APIC ID of BSP.

Sorry, this patch set have not include in-source documentation
requested by Borislav Petkov yet, but I'll post it later separately,
which would be better to focus on documentation reviewing.

ChangeLog

v4 => v5)

- Rebased on top of v3.12

- Introduce bsp_physical_apicid that has the initial APIC ID for the
processor with BSP flag on IA32_APIC_BASE MSR. Without this,
boot_cpu_physical_apicid has temporarilly the value around MP table
related codes, although it's designed to have the initial APIC ID
for the processor that is doing the boot up. Use the
bsp_physical_apicid in MP table related codes; no impact on
semantics at runtime there.

v3 => v4)

- Rebased on top of v3.12-rc6

- Basic design has been changed. Now users need to figure out initial
APIC ID of BSP in the 1st kernel and configures kernel parameter for
the 2nd kernel manually using disable_cpu_apic kernel parameter to
be newly introduced in this patch set. This design is more flexible
than the previous version in that we no longer have to rely on
ACPI/MP table to get initial APIC ID of BSP.

v2 => v3)

- Change default value of boot_cpu_is_bsp to true.

- Before executing rdmsr(MSR_IA32_APICBASE), check if the number of
processor family is larger than or equal to 6 in order to avoid
invalid opcode exception on processors where MSR_IA32_APICBASE is
not supported.

v1 => v2)

- Rebased on top of v3.12-rc5.

- Fix linking time error of boot_cpu_is_bsp_init() in case of
CONFIG_LOCAL_APIC disabled by adding empty static inline function
instead.

- Fix missing feature check by means of cpu_has_apic macro in
boot_cpu_is_bsp_init() before calling rdmsr_safe(MSR_IA32_APICBASE).

NOTE: I've checked local apic-present case only; I don't have any
x86 processor without local apic.

- Add __init annotation to boot_cpu_is_bsp_init().

Test

- built with and without CONFIG_LOCAL_APIC, CONFIG_SMP and CONFIG_X86_UP_APIC.
- tested x86_64 in case of acpi and MP table
- tested disable_cpu_apicid=<n> to disable both AP and BSP

---

HATAYAMA Daisuke (3):
x86, apic: add bsp_physical_apicid
x86, apic: Add disable_cpu_apicid kernel parameter
Documentation, x86, apic, kexec: Add disable_cpu_apicid kernel parameter


Documentation/kernel-parameters.txt | 9 +++++++
arch/x86/include/asm/mpspec.h | 7 +++++
arch/x86/kernel/apic/apic.c | 44 ++++++++++++++++++++++++++++++++
arch/x86/kernel/cpu/common.c | 5 +++-
arch/x86/kernel/mpparse.c | 3 +-
arch/x86/kernel/setup.c | 2 +
arch/x86/mm/amdtopology.c | 6 ++--
arch/x86/platform/visws/visws_quirks.c | 6 +++-
8 files changed, 75 insertions(+), 7 deletions(-)

HATAYAMA Daisuke

unread,
Nov 12, 2013, 5:00:02 AM11/12/13
to
Add bsp_physical_apicid variable that has the initial APIC ID for the
processor with BSP flag on IA32_APIC_BASE MSR. Without this,
boot_cpu_physical_apicid temporarily has the value around MP table
related codes such as kernel/mpparse.c, mm/amdtopology.c and
platform/visws/visws_quirks.c until it is rewritten back in
init_apic_mappings() to the initial APIC ID for the processor that is
doing boot up.

This change is also required by the next commit introducing new
disable_cpu_apicid kernel parameter, where the value specified via
disable_cpu_apicid is compared with boot_cpu_physical_apicid in
generic_processor_info(), but in case of using MP table, the
boot_cpu_physical_apicid is rewiriten by MP_processor_info() to the
initial APIC ID of the BSP reported by BIOS, which is problematic if
some AP is specified in disable_cpu_apic.

Then, replace boot_cpu_physical_apicid in the MP table related codes
such as kernel/mpparse.c, mm/amdtopology.c and
platform/visws/visws_quirks.c by newly introduced bsp_physical_apicid,
where no functional change is intended.

If the cpu that is doing boot-up process is BSP, use cpuid to get the
initial APIC ID. If AP, assign the initial APIC ID obtained from MP
table or MADT. This covers the above MP table related codes with no
functional change.

Initialize boot_cpu_data.initial_apicid in early_identify_cpu() since
currently it is initialized after logical cpu number assignment is
performed.

Signed-off-by: HATAYAMA Daisuke <d.hat...@jp.fujitsu.com>
---
arch/x86/include/asm/mpspec.h | 7 +++++++
arch/x86/kernel/apic/apic.c | 15 +++++++++++++++
arch/x86/kernel/cpu/common.c | 5 ++++-
arch/x86/kernel/mpparse.c | 3 ++-
arch/x86/kernel/setup.c | 2 ++
arch/x86/mm/amdtopology.c | 6 +++---
arch/x86/platform/visws/visws_quirks.c | 6 ++++--
7 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index 626cf70..894d6df 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -47,11 +47,18 @@ extern int mp_bus_id_to_type[MAX_MP_BUSSES];
extern DECLARE_BITMAP(mp_bus_not_pci, MAX_MP_BUSSES);

extern unsigned int boot_cpu_physical_apicid;
+extern unsigned int bsp_physical_apicid;
extern unsigned int max_physical_apicid;
extern int mpc_default_type;
extern unsigned long mp_lapic_addr;

#ifdef CONFIG_X86_LOCAL_APIC
+extern void bsp_physical_apicid_init(void);
+#else
+static inline void bsp_physical_apicid_init(void) { };
+#endif
+
+#ifdef CONFIG_X86_LOCAL_APIC
extern int smp_found_config;
#else
# define smp_found_config 0
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index a7eb82d..b60ad92 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -63,6 +63,10 @@ unsigned disabled_cpus;
/* Processor that is doing the boot up */
unsigned int boot_cpu_physical_apicid = -1U;

+/* Processor with BP flag on IA32_APIC_BASE MSR. In case of kexec,
+ * this can differ from the processor that is doing the boot up. */
+unsigned int bsp_physical_apicid = BAD_APICID;
+
/*
* The highest APIC ID seen during enumeration.
*/
@@ -2589,3 +2593,14 @@ static int __init lapic_insert_resource(void)
* that is using request_resource
*/
late_initcall(lapic_insert_resource);
+
+void __init bsp_physical_apicid_init(void)
+{
+ u32 l, h;
+
+ if (boot_cpu_data.x86 >= 6 && cpu_has_apic) {
+ rdmsr(MSR_IA32_APICBASE, l, h);
+ if (!(l & MSR_IA32_APICBASE_BSP))
+ bsp_physical_apicid = boot_cpu_data.initial_apicid;
+ }
+}
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 2793d1f..28bea2c 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -681,7 +681,7 @@ static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
/*
* Do minimum CPU detection early.
* Fields really needed: vendor, cpuid_level, family, model, mask,
- * cache alignment.
+ * cache alignment, initial_apicid.
* The others are not touched to avoid unwanted side effects.
*
* WARNING: this function is only called on the BP. Don't add code here
@@ -725,6 +725,9 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
this_cpu->c_bsp_init(c);

setup_force_cpu_cap(X86_FEATURE_ALWAYS);
+
+ if (c->cpuid_level >= 0x00000001)
+ c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF;
}

void __init early_cpu_init(void)
diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index d2b5648..209fbf6 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -64,7 +64,8 @@ static void __init MP_processor_info(struct mpc_cpu *m)

if (m->cpuflag & CPU_BOOTPROCESSOR) {
bootup_cpu = " (Bootup-CPU)";
- boot_cpu_physical_apicid = m->apicid;
+ if (bsp_physical_apicid == BAD_APICID)
+ bsp_physical_apicid = m->apicid;
}

printk(KERN_INFO "Processor #%d%s\n", m->apicid, bootup_cpu);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f0de629..8745b24 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1166,6 +1166,8 @@ void __init setup_arch(char **cmdline_p)

early_quirks();

+ bsp_physical_apicid_init();
+
/*
* Read APIC and some other early information from ACPI tables.
*/
diff --git a/arch/x86/mm/amdtopology.c b/arch/x86/mm/amdtopology.c
index 2ca15b5..bec16ad 100644
--- a/arch/x86/mm/amdtopology.c
+++ b/arch/x86/mm/amdtopology.c
@@ -183,9 +183,9 @@ int __init amd_numa_init(void)

/* get the APIC ID of the BSP early for systems with apicid lifting */
early_get_boot_cpu_id();
- if (boot_cpu_physical_apicid > 0) {
- pr_info("BSP APIC ID: %02x\n", boot_cpu_physical_apicid);
- apicid_base = boot_cpu_physical_apicid;
+ if (bsp_physical_apicid > 0) {
+ pr_info("BSP APIC ID: %02x\n", bsp_physical_apicid);
+ apicid_base = bsp_physical_apicid;
}

for_each_node_mask(i, numa_nodes_parsed)
diff --git a/arch/x86/platform/visws/visws_quirks.c b/arch/x86/platform/visws/visws_quirks.c
index 94d8a39..aebe821 100644
--- a/arch/x86/platform/visws/visws_quirks.c
+++ b/arch/x86/platform/visws/visws_quirks.c
@@ -165,8 +165,10 @@ static void __init MP_processor_info(struct mpc_cpu *m)
m->apicid, (m->cpufeature & CPU_FAMILY_MASK) >> 8,
(m->cpufeature & CPU_MODEL_MASK) >> 4, m->apicver);

- if (m->cpuflag & CPU_BOOTPROCESSOR)
- boot_cpu_physical_apicid = m->apicid;
+ if (m->cpuflag & CPU_BOOTPROCESSOR) {
+ if (bsp_physical_apicid == BAD_APICID)
+ bsp_physical_apicid = m->apicid;
+ }

ver = m->apicver;
if ((ver >= 0x14 && m->apicid >= 0xff) || m->apicid >= 0xf) {

HATAYAMA Daisuke

unread,
Nov 12, 2013, 5:00:02 AM11/12/13
to
Sorry for my confusion. It's necessary to introduce a new variable to keep
the initial APIC ID for the processor with BSP flag on IA32_APIC_BASE MSR,
which is needed in case of AP is specified by disable_cpu_apicid and using
MP table.

I've posted new v5 patch set a little ago. Could you please review it?

HATAYAMA Daisuke

unread,
Nov 12, 2013, 5:00:03 AM11/12/13
to
Add disable_cpu_apicid kernel parameter. To use this kernel parameter,
specify an initial APIC ID of the corresponding CPU you want to
disable.

This is mostly used for the kdump 2nd kernel to disable BSP to wake up
multiple CPUs without causing system reset or hang due to sending INIT
from AP to BSP.

Kdump users first figure out initial APIC ID of the BSP, CPU0 in the
1st kernel, for example from /proc/cpuinfo and then set up this kernel
parameter for the 2nd kernel using the obtained APIC ID.

However, doing this procedure at each boot time manually is awkward,
which should be automatically done by user-land service scripts, for
example, kexec-tools on fedora/RHEL distributions.

This design is more flexible than disabling BSP in kernel boot time
automatically in that in kernel boot time we have no choice but
referring to ACPI/MP table to obtain initial APIC ID for BSP, meaning
that the method is not applicable to the systems without such BIOS
tables.

One assumption behind this design is that users get initial APIC ID of
the BSP in still healthy state and so BSP is uniquely kept in
CPU0. Thus, through the kernel parameter, only one initial APIC ID can
be specified.

Signed-off-by: HATAYAMA Daisuke <d.hat...@jp.fujitsu.com>
---
arch/x86/kernel/apic/apic.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index b60ad92..075bf23 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -78,6 +78,13 @@ unsigned int max_physical_apicid;
physid_mask_t phys_cpu_present_map;

/*
+ * Processor to be disabled specified by kernel parameter
+ * disable_cpu_apicid=<int>, mostly used for the kdump 2nd kernel to
+ * avoid undefined behaviour caused by sending INIT from AP to BSP.
+ */
+unsigned int disabled_cpu_apicid = BAD_APICID;
+
+/*
* Map cpu index to physical APIC ID
*/
DEFINE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_cpu_to_apicid, BAD_APICID);
@@ -2117,6 +2124,19 @@ void generic_processor_info(int apicid, int version)
bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
phys_cpu_present_map);

+ if (disabled_cpu_apicid != BAD_APICID &&
+ disabled_cpu_apicid != boot_cpu_physical_apicid &&
+ disabled_cpu_apicid == apicid) {
+ int thiscpu = num_processors + disabled_cpus;
+
+ pr_warning("ACPI: Disable specified CPU."
+ " Processor %d/0x%x ignored.\n",
+ thiscpu, apicid);
+
+ disabled_cpus++;
+ return;
+ }
+
/*
* If boot cpu has not been detected yet, then only allow upto
* nr_cpu_ids - 1 processors and keep one slot free for boot cpu
@@ -2604,3 +2624,12 @@ void __init bsp_physical_apicid_init(void)
bsp_physical_apicid = boot_cpu_data.initial_apicid;
}
}
+
+static int __init apic_set_disabled_cpu_apicid(char *arg)
+{
+ if (!arg || !get_option(&arg, &disabled_cpu_apicid))
+ return -EINVAL;
+
+ return 0;
+}
+early_param("disable_cpu_apicid", apic_set_disabled_cpu_apicid);

Borislav Petkov

unread,
Nov 12, 2013, 5:50:01 AM11/12/13
to
How am I to parse this message - that 'thiscpu' is being disabled
currently? What does "Processor ... ignored" mean?

Why not just write:

ACPI: Disabling requested CPU %d (APIC ID: 0x%x)

and everyone knows what's happening?

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

jerry....@hp.com

unread,
Nov 13, 2013, 1:30:02 PM11/13/13
to
On Mon, Nov 11, 2013 at 01:49:41PM +0900, HATAYAMA Daisuke wrote:
>
> Thanks for your testing.
>
> >I have seen no issues w/ the 3.0.80 dump testing on our proto.
> >
> >On the 2.6.32 testing on our proto, i have hit a low probability (< 5%)
> >chance of the capture suffering a soft lockup hang during
> >"Switching to clocksource hpet." I have not RCA'd this yet.
> >Note, I have seen this issue on earlier version of the patch, so
> >it is not specific to this version.
> >
> >I then tested the 2.6.32 port on a dl380. This worked without issue.
> >
> >Note, I have seen no issues related to this patch on our proto when
> >booting the capture with a single processor.
> >
> >While I am still pursuing the issue of the 2.6.32 kernel on our proto,
> >I believe this patch is good and should be accepted.
> >
>
> This seems there's something that depends on the system you used. But I
> have never verified my patch set on 2.6.32-based kernel. I'll try to
> do a similar test on some FJ systems.
>
> The 2.6.32-based kernel you mean is one of the Longterm release kernels,
> right? So, you used on the test the 2.6.32-based Longterm release kernel
> with my v4 patch, right?


I've been using 2.6.32 and 3.0.80 based kernels from different distros.
These are not the same as long term kernels on kernel.org.


I see you've posted an updated version of the patch. i've picked it
up and will test.


--

----------------------------------------------------------------------------
Jerry Hoemann Software Engineer Hewlett-Packard/MODL

3404 E Harmony Rd. MS 57 phone: (970) 898-1022
Ft. Collins, CO 80528 FAX: (970) 898-XXXX
email: jerry....@hp.com
----------------------------------------------------------------------------

HATAYAMA Daisuke

unread,
Nov 14, 2013, 5:50:01 AM11/14/13
to
It seems "Disabling requested CPU %d" part is by far better than mine.
I'll apply this in the next patch.

For the latter part "(APIC ID: 0x%x)", there are other two messages about
disabling processors and I tried to use a message consistent with these.
So, I think "Processor %d/0x%x ignored.\n" should be used.

/*
* If boot cpu has not been detected yet, then only allow upto
* nr_cpu_ids - 1 processors and keep one slot free for boot cpu
*/
if (!boot_cpu_detected && num_processors >= nr_cpu_ids - 1 &&
apicid != boot_cpu_physical_apicid) {
int thiscpu = max + disabled_cpus - 1;

pr_warning(
"ACPI: NR_CPUS/possible_cpus limit of %i almost"
" reached. Keeping one slot for boot cpu."
" Processor %d/0x%x ignored.\n", max, thiscpu, apicid);

disabled_cpus++;
return;
}

if (num_processors >= nr_cpu_ids) {
int thiscpu = max + disabled_cpus;

pr_warning(
"ACPI: NR_CPUS/possible_cpus limit of %i reached."
" Processor %d/0x%x ignored.\n", max, thiscpu, apicid);

disabled_cpus++;
return;
}

--
Thanks.
HATAYAMA, Daisuke

HATAYAMA Daisuke

unread,
Nov 14, 2013, 6:20:01 AM11/14/13
to
Hello HPA,

I have another question relevant to http://lkml.org/lkml/2013/11/12/311.

(2013/11/12 18:51), HATAYAMA Daisuke wrote:
<cut>
> @@ -2589,3 +2593,14 @@ static int __init lapic_insert_resource(void)
> * that is using request_resource
> */
> late_initcall(lapic_insert_resource);
> +
> +void __init bsp_physical_apicid_init(void)
> +{
> + u32 l, h;
> +
> + if (boot_cpu_data.x86 >= 6 && cpu_has_apic) {
> + rdmsr(MSR_IA32_APICBASE, l, h);
> + if (!(l & MSR_IA32_APICBASE_BSP))
> + bsp_physical_apicid = boot_cpu_data.initial_apicid;
> + }
> +}

So, rigrously, we should not use rdmsr for MSR_IA32_APICBASE here since this can
return wrong value on some clustered system?

At least all the mpparse.c, amdtopology.c and visws_quirks.c has referred to
MP table only, so there's no issue by simply dropping this code.

> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 2793d1f..28bea2c 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -681,7 +681,7 @@ static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
> /*
> * Do minimum CPU detection early.
> * Fields really needed: vendor, cpuid_level, family, model, mask,
> - * cache alignment.
> + * cache alignment, initial_apicid.
> * The others are not touched to avoid unwanted side effects.
> *
> * WARNING: this function is only called on the BP. Don't add code here
> @@ -725,6 +725,9 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
> this_cpu->c_bsp_init(c);
>
> setup_force_cpu_cap(X86_FEATURE_ALWAYS);
> +
> + if (c->cpuid_level >= 0x00000001)
> + c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF;
> }

Can the initial APID IDs obtained in this way using cpuid duplicate on some
clustered system?

Can we consider each apic->get_apic_id() method return *initial* APIC ID?
According to 8.4 Multiple-Processor (MP) Initialization, Intel System Programming
Guide, the APID IDs written into MADT and MP table are the initial ones.
So, APID IDs to be compared with these must be the initial ones.

--
Thanks.
HATAYAMA, Daisuke

HATAYAMA Daisuke

unread,
Nov 20, 2013, 9:10:01 PM11/20/13
to
This patch set is to allow kdump 2nd kernel to wake up multiple CPUs,
a continueing work from:

[PATCH v3 0/2] x86, apic, kdump: Disable BSP if boot cpu is AP
https://lkml.org/lkml/2013/10/16/300.

At v4, basic design has changed. Now users need to figure out initial
APIC ID of BSP in the 1st kernel and configures kernel parameter for
the 2nd kernel manually using disable_cpu_apic kernel parameter, which
is newly introduced by this patch set. This design is more flexible
than the previous version in that we no longer have to rely on ACPI/MP
table to get initial APIC ID of BSP.

Sorry, this patch set have not include in-source documentation
requested by Borislav Petkov yet, but I'll post it later separately,
which would be better to focus on documentation reviewing.

ChangeLog

v5 => v6)

- Remove use of rdmsr(IA32_APIC_BASE) to initialize
bsp_physical_apicid since the MSR doesn't work well on some cluster
systems, suggested by HPA. Also, current users of the variable
expects the initial apicid reported through MP table only; removing
the use of the MSR is not problematic.

- Rename bsp_physical_apicid as bios_bsp_physical_apicid to make it
clear that the apidid contained there is the one reported from some
BIOS table. Also, initialize it not only by MP table but also by
ACPI MADT.

- Change message displayed when specified cpu is disabled from:

ACPI: Disable specified CPU. Processor 0/0x0 ignored.

to:

ACPI: Disabling requested cpu. Processor 0/0x0 ignored.

v4 => v5)

- Rebased on top of v3.12

- Introduce bsp_physical_apicid that has the initial APIC ID for the
processor with BSP flag on IA32_APIC_BASE MSR. Without this,
boot_cpu_physical_apicid has temporarilly the value around MP table
related codes, although it's designed to have the initial APIC ID
for the processor that is doing the boot up. Use the
bsp_physical_apicid in MP table related codes; no impact on
semantics at runtime there.

v3 => v4)

- Rebased on top of v3.12-rc6

- Basic design has been changed. Now users need to figure out initial
APIC ID of BSP in the 1st kernel and configures kernel parameter for
the 2nd kernel manually using disable_cpu_apic kernel parameter to
be newly introduced in this patch set. This design is more flexible
than the previous version in that we no longer have to rely on
ACPI/MP table to get initial APIC ID of BSP.

v2 => v3)

- Change default value of boot_cpu_is_bsp to true.

- Before executing rdmsr(MSR_IA32_APICBASE), check if the number of
processor family is larger than or equal to 6 in order to avoid
invalid opcode exception on processors where MSR_IA32_APICBASE is
not supported.

v1 => v2)

- Rebased on top of v3.12-rc5.

- Fix linking time error of boot_cpu_is_bsp_init() in case of
CONFIG_LOCAL_APIC disabled by adding empty static inline function
instead.

- Fix missing feature check by means of cpu_has_apic macro in
boot_cpu_is_bsp_init() before calling rdmsr_safe(MSR_IA32_APICBASE).

NOTE: I've checked local apic-present case only; I don't have any
x86 processor without local apic.

- Add __init annotation to boot_cpu_is_bsp_init().

Test

- built with and without CONFIG_LOCAL_APIC
- tested x86_64 in case of acpi and MP table
- tested disable_cpu_apicid=<n> to disable both AP and BSP

---

HATAYAMA Daisuke (3):
x86, apic: add bios_bsp_physical_apicid
x86, apic: Add disable_cpu_apicid kernel parameter
Documentation, x86, apic, kexec: Add disable_cpu_apicid kernel parameter


Documentation/kernel-parameters.txt | 9 ++++++++
arch/x86/include/asm/mpspec.h | 1 +
arch/x86/kernel/acpi/boot.c | 8 +++++++
arch/x86/kernel/apic/apic.c | 38 ++++++++++++++++++++++++++++++++
arch/x86/kernel/mpparse.c | 2 +-
arch/x86/mm/amdtopology.c | 6 +++--
arch/x86/platform/visws/visws_quirks.c | 2 +-
7 files changed, 61 insertions(+), 5 deletions(-)

HATAYAMA Daisuke

unread,
Nov 20, 2013, 9:10:01 PM11/20/13
to
Add bios_bsp_physical_apicid variable. This variable is initialized
with the value reported by BIOS tables such as ACPI MADT or MP table.

Without this variable, boot_cpu_physical_apicid temporarilly has the
value around MP table related codes such as kernel/mpparse.c,
mm/amdtopology.c and platform/visws/visws_quirks.c until it is
rewritten back in init_apic_mappings() to the initial APIC ID for the
processor that is doing boot up.

This change is also required by the next commit introducing new
disable_cpu_apicid kernel parameter, where the value specified via
disable_cpu_apicid is compared with boot_cpu_physical_apicid in
generic_processor_info(), but in case of using MP table, the
boot_cpu_physical_apicid is rewiriten by MP_processor_info() to the
initial APIC ID of the BSP reported by BIOS, which is problematic if
some AP is specified in disable_cpu_apic.

There's no functional change intended on kernel/mpparse.c,
mm/amdtopology.c and platform/visws/visws_quirks.c by the introduction
of bios_bsp_physical_apicid.

The reason why we don't use cpuid is that we cannot use IA32_APIC_BASE
MSR to determine if the booting up cpu is BSP since rdmsr() to the MSR
could not work well even if it exists on some system, e.g. some
cluster system.

Signed-off-by: HATAYAMA Daisuke <d.hat...@jp.fujitsu.com>
---
arch/x86/include/asm/mpspec.h | 1 +
arch/x86/kernel/acpi/boot.c | 8 ++++++++
arch/x86/kernel/apic/apic.c | 9 +++++++++
arch/x86/kernel/mpparse.c | 2 +-
arch/x86/mm/amdtopology.c | 6 +++---
arch/x86/platform/visws/visws_quirks.c | 2 +-
6 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index 626cf70..3f9f94b 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -47,6 +47,7 @@ extern int mp_bus_id_to_type[MAX_MP_BUSSES];
extern DECLARE_BITMAP(mp_bus_not_pci, MAX_MP_BUSSES);

extern unsigned int boot_cpu_physical_apicid;
+extern unsigned int bios_bsp_physical_apicid;
extern unsigned int max_physical_apicid;
extern int mpc_default_type;
extern unsigned long mp_lapic_addr;
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 40c7660..f62226a 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -206,6 +206,14 @@ static void acpi_register_lapic(int id, u8 enabled)
if (boot_cpu_physical_apicid != -1U)
ver = apic_version[boot_cpu_physical_apicid];

+ /*
+ * ACPI specification describes that platform firmware should
+ * list the BSP processor as the first LAPIC entry in the
+ * MADT.
+ */
+ if (!num_processors && !disabled_cpus)
+ bios_bsp_physical_apicid = id;
+
generic_processor_info(id, ver);
}

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index a7eb82d..cff8d71 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -63,6 +63,15 @@ unsigned disabled_cpus;
/* Processor that is doing the boot up */
unsigned int boot_cpu_physical_apicid = -1U;

+/* Processor with BP flag on IA32_APIC_BASE MSR to be initialized with
+ * the value reported by BIOS tables such as ACPI MADT or MP table,
+ * not with the value obtained by rdmsr(IA32_APIC_MSR) since it
+ * possibly doesn't work well on some clustered system.
+ *
+ * In case of kexec, this can differ from the processor that is doing
+ * the boot up. */
+unsigned int bios_bsp_physical_apicid = BAD_APICID;
+
/*
* The highest APIC ID seen during enumeration.
*/
diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index d2b5648..5bb5220 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -64,7 +64,7 @@ static void __init MP_processor_info(struct mpc_cpu *m)

if (m->cpuflag & CPU_BOOTPROCESSOR) {
bootup_cpu = " (Bootup-CPU)";
- boot_cpu_physical_apicid = m->apicid;
+ bios_bsp_physical_apicid = m->apicid;
}

printk(KERN_INFO "Processor #%d%s\n", m->apicid, bootup_cpu);
diff --git a/arch/x86/mm/amdtopology.c b/arch/x86/mm/amdtopology.c
index 2ca15b5..788012e 100644
--- a/arch/x86/mm/amdtopology.c
+++ b/arch/x86/mm/amdtopology.c
@@ -183,9 +183,9 @@ int __init amd_numa_init(void)

/* get the APIC ID of the BSP early for systems with apicid lifting */
early_get_boot_cpu_id();
- if (boot_cpu_physical_apicid > 0) {
- pr_info("BSP APIC ID: %02x\n", boot_cpu_physical_apicid);
- apicid_base = boot_cpu_physical_apicid;
+ if (bios_bsp_physical_apicid > 0) {
+ pr_info("BSP APIC ID: %02x\n", bios_bsp_physical_apicid);
+ apicid_base = bios_bsp_physical_apicid;
}

for_each_node_mask(i, numa_nodes_parsed)
diff --git a/arch/x86/platform/visws/visws_quirks.c b/arch/x86/platform/visws/visws_quirks.c
index 94d8a39..5a91e50 100644
--- a/arch/x86/platform/visws/visws_quirks.c
+++ b/arch/x86/platform/visws/visws_quirks.c
@@ -166,7 +166,7 @@ static void __init MP_processor_info(struct mpc_cpu *m)
(m->cpufeature & CPU_MODEL_MASK) >> 4, m->apicver);

if (m->cpuflag & CPU_BOOTPROCESSOR)
- boot_cpu_physical_apicid = m->apicid;
+ bios_bsp_physical_apicid = m->apicid;

ver = m->apicver;
if ((ver >= 0x14 && m->apicid >= 0xff) || m->apicid >= 0xf) {

HATAYAMA Daisuke

unread,
Nov 20, 2013, 9:10:02 PM11/20/13
to
Add disable_cpu_apicid kernel parameter. To use this kernel parameter,
specify an initial APIC ID of the corresponding CPU you want to
disable.

This is mostly used for the kdump 2nd kernel to disable BSP to wake up
multiple CPUs without causing system reset or hang due to sending INIT
from AP to BSP.

Kdump users first figure out initial APIC ID of the BSP, CPU0 in the
1st kernel, for example from /proc/cpuinfo and then set up this kernel
parameter for the 2nd kernel using the obtained APIC ID.

However, doing this procedure at each boot time manually is awkward,
which should be automatically done by user-land service scripts, for
example, kexec-tools on fedora/RHEL distributions.

This design is more flexible than disabling BSP in kernel boot time
automatically in that in kernel boot time we have no choice but
referring to ACPI/MP table to obtain initial APIC ID for BSP, meaning
that the method is not applicable to the systems without such BIOS
tables.

One assumption behind this design is that users get initial APIC ID of
the BSP in still healthy state and so BSP is uniquely kept in
CPU0. Thus, through the kernel parameter, only one initial APIC ID can
be specified.

Signed-off-by: HATAYAMA Daisuke <d.hat...@jp.fujitsu.com>
---
arch/x86/kernel/apic/apic.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index cff8d71..77fd68d 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -83,6 +83,13 @@ unsigned int max_physical_apicid;
physid_mask_t phys_cpu_present_map;

/*
+ * Processor to be disabled specified by kernel parameter
+ * disable_cpu_apicid=<int>, mostly used for the kdump 2nd kernel to
+ * avoid undefined behaviour caused by sending INIT from AP to BSP.
+ */
+unsigned int disabled_cpu_apicid = BAD_APICID;
+
+/*
* Map cpu index to physical APIC ID
*/
DEFINE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_cpu_to_apicid, BAD_APICID);
@@ -2122,6 +2129,19 @@ void generic_processor_info(int apicid, int version)
bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
phys_cpu_present_map);

+ if (disabled_cpu_apicid != BAD_APICID &&
+ disabled_cpu_apicid != boot_cpu_physical_apicid &&
+ disabled_cpu_apicid == apicid) {
+ int thiscpu = num_processors + disabled_cpus;
+
+ pr_warning("ACPI: Disabling requested cpu."
+ " Processor %d/0x%x ignored.\n",
+ thiscpu, apicid);
+
+ disabled_cpus++;
+ return;
+ }
+
/*
* If boot cpu has not been detected yet, then only allow upto
* nr_cpu_ids - 1 processors and keep one slot free for boot cpu
@@ -2598,3 +2618,12 @@ static int __init lapic_insert_resource(void)
* that is using request_resource
*/
late_initcall(lapic_insert_resource);
+
+static int __init apic_set_disabled_cpu_apicid(char *arg)
+{
+ if (!arg || !get_option(&arg, &disabled_cpu_apicid))
+ return -EINVAL;
+
+ return 0;
+}
+early_param("disable_cpu_apicid", apic_set_disabled_cpu_apicid);

HATAYAMA Daisuke

unread,
Nov 20, 2013, 9:10:02 PM11/20/13
to
Add disable_cpu_apicid kernel parameter to disable the CPU with the
specified number of initial APIC ID, mostly used for the kdump 2nd
kernel to disable BSP to wake up multiple CPUs without causing system
reset or hang due to sending INIT from AP to BSP.

Signed-off-by: HATAYAMA Daisuke <d.hat...@jp.fujitsu.com>
---
Documentation/kernel-parameters.txt | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index fcbb736..0ca0902 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -774,6 +774,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
disable= [IPV6]
See Documentation/networking/ipv6.txt.

+ disable_cpu_apicid= [X86,APIC,KEXEC,SMP]
+ Format: <int>
+ The number of initial APIC ID for the
+ corresponding CPU to be disabled at boot,
+ mostly used for the kdump 2nd kernel to
+ disable BSP to wake up multiple CPUs without
+ causing system reset or hang due to sending
+ INIT from AP to BSP.
+
disable_ddw [PPC/PSERIES]
Disable Dynamic DMA Window support. Use this if
to workaround buggy firmware.

Vivek Goyal

unread,
Nov 21, 2013, 4:40:02 PM11/21/13
to
On Thu, Nov 21, 2013 at 11:00:44AM +0900, HATAYAMA Daisuke wrote:

[..]
> @@ -2122,6 +2129,19 @@ void generic_processor_info(int apicid, int version)
> bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
> phys_cpu_present_map);
>
> + if (disabled_cpu_apicid != BAD_APICID &&
> + disabled_cpu_apicid != boot_cpu_physical_apicid &&

Hi Hatayama,

So we are comparing disabled_cpu_apicid with boot_cpu_physical_apicid
to make sure that one can not disable the cpu we are booting on. Can
we just read the apic id of booting cpu in local variable and compare
against that?

Something like as follows.

if (disabled_cpu_apicid != BAD_APICID &&
disabled_cpu_apicid == apicid &&
disabled_cpu_apicid != read_apic_id()) {
/* Disable cpu */
}

If above works, you will not need first patch in the series?

Thanks
Vivek

HATAYAMA Daisuke

unread,
Nov 21, 2013, 7:50:02 PM11/21/13
to
(2013/11/22 6:33), Vivek Goyal wrote:
> On Thu, Nov 21, 2013 at 11:00:44AM +0900, HATAYAMA Daisuke wrote:
>
> [..]
>> @@ -2122,6 +2129,19 @@ void generic_processor_info(int apicid, int version)
>> bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
>> phys_cpu_present_map);
>>
>> + if (disabled_cpu_apicid != BAD_APICID &&
>> + disabled_cpu_apicid != boot_cpu_physical_apicid &&
>
> Hi Hatayama,
>
> So we are comparing disabled_cpu_apicid with boot_cpu_physical_apicid
> to make sure that one can not disable the cpu we are booting on. Can
> we just read the apic id of booting cpu in local variable and compare
> against that?
>
> Something like as follows.
>
> if (disabled_cpu_apicid != BAD_APICID &&
> disabled_cpu_apicid == apicid &&
> disabled_cpu_apicid != read_apic_id()) {
> /* Disable cpu */
> }
>
> If above works, you will not need first patch in the series?
>

Yes, I came up with the idea, too. But doing this means we leave two different
ways boot_cpu_physical_apicid is used at boot, which seemed incomplete as a
patch. Also, then we could even lost the reason why boot_cpu_physical_apicid
exists.

But, it's true that the 1st patch causes one more reviewing point. I'll
remove it and fix the 2nd patch just as you suggested here.

--
Thanks.
HATAYAMA, Daisuke

Vivek Goyal

unread,
Nov 21, 2013, 9:10:02 PM11/21/13
to
I would say that do the boot_cpu_physical_id related cleanup in a separate
series. If we keep this patch really simple without fear of breaking
anything else, acking and committing the patch becomes easier.

Feel free to post boot cpu related cleanup in a separate series.

Thanks
Vivek
0 new messages