Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH] x86/apic: check FADT settings after enable x2apic

86 views
Skip to first unread message

Stoney Wang

unread,
Jan 14, 2013, 8:51:45 PM1/14/13
to suresh....@intel.com, linbao...@hp.com, greg.p...@hp.com, linux-...@vger.kernel.org, Stoney Wang
OS will enable x2apic mode even BIOS default in xapic mode.

FADT settings check (commit ea0dcf903e7d76aa5d483d876215fedcfdfe140f)
should be applied after detect default mode and change the mode (enable_IR_x2apic called)

Signed-off-by: Stoney Wang <song-b...@hp.com>
---
arch/x86/kernel/apic/x2apic_phys.c | 25 ++++++++++++++++---------
1 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c
index e03a1e1..76ea60d 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -20,18 +20,22 @@ static int set_x2apic_phys_mode(char *arg)
}
early_param("x2apic_phys", set_x2apic_phys_mode);

-static int x2apic_acpi_madt_oem_check(char *oem_id, char *oem_table_id)
+static int x2apic_fadt_phys(void)
{
- if (x2apic_phys)
- return x2apic_enabled();
- else if ((acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID) &&
- (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL) &&
- x2apic_enabled()) {
+ if ((acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID) &&
+ (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL)) {
printk(KERN_DEBUG "System requires x2apic physical mode\n");
return 1;
}
- else
- return 0;
+ return 0;
+}
+
+static int x2apic_acpi_madt_oem_check(char *oem_id, char *oem_table_id)
+{
+ if (x2apic_enabled())
+ return x2apic_phys || x2apic_fadt_phys();
+
+ return 0;
}

static void
@@ -85,7 +89,10 @@ static int x2apic_phys_probe(void)
if (x2apic_mode && x2apic_phys)
return 1;

- return apic == &apic_x2apic_phys;
+ if (apic == &apic_x2apic_phys)
+ return 1;
+
+ return x2apic_enabled() && x2apic_fadt_phys();
}

static struct apic apic_x2apic_phys = {
--
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Wang, Song-Bo (Stoney)

unread,
Jan 28, 2013, 12:05:54 AM1/28/13
to yin...@kernel.org, h...@zytor.com, Zhang, Lin-Bao (Linux Kernel R&D), Pearson, Greg, linux-...@vger.kernel.org, suresh....@intel.com
Hi Yinghai, hpa and others,

Would you please review the patch on detecting x2apic FADT settings?

We meet a BIOS system which works on x2apic physical mode by setting the bit ACPI_FADT_APIC_PHYSICAL in FADT table.
And for those systems with all cpuid < 255, the spec requires BIOS's default mode in xapic.
The kernel detects the default mode and do some initializations and will call enable_IR_x2apic and change the mode to x2apic successfully.

So it is necessary to check ACPI_FADT_APIC_PHYSICAL bit after the kernel change the mode from xapic to x2apic.

(*drv)->acpi_madt_oem_check is called on detect default BIOS mode,
(*drv)->probe is called after enable_IR_x2apic,

The previous FADT check (commit ea0dcf90) should be applied to x2apic_phys_probe too.

Thanks,
Stoney

Ingo Molnar

unread,
Jan 28, 2013, 5:11:54 AM1/28/13
to Yinghai Lu, Wang, Song-Bo (Stoney), H. Peter Anvin, Ingo Molnar, Thomas Gleixner, Zhang, Lin-Bao (Linux Kernel R&D), Pearson, Greg, linux-...@vger.kernel.org, suresh....@intel.com

* Yinghai Lu <yin...@kernel.org> wrote:

> On Sun, Jan 27, 2013 at 9:05 PM, Wang, Song-Bo (Stoney)
> <song-b...@hp.com> wrote:
> > Hi Yinghai, hpa and others,
> >
> > Would you please review the patch on detecting x2apic FADT settings?
> >
> > We meet a BIOS system which works on x2apic physical mode by
> > setting the bit ACPI_FADT_APIC_PHYSICAL in FADT table. And
> > for those systems with all cpuid < 255, the spec requires
> > BIOS's default mode in xapic. The kernel detects the default
> > mode and do some initializations and will call
> > enable_IR_x2apic and change the mode to x2apic successfully.
>
> Hi, Peter and Ingo,
>
> I checked the patch, and looks right.
>
> I updated the changelog and simplify the code a little bit.
>
> Please check if you can put it into tip:x86/apic

The code looks good to me, but the changelog is still lacking:

> HP has systems that work with x2apic phys mode and BIOS set
> ACPI_FADT_APIC_PHYSICAL in FADT table, and all cpuid < 255,
> the spec requires BIOS only put system on xapic mode. Kernel

Which spec?

> will set to x2apic logical mode instead of x2apic phys mode.

Which has exactly what bad effect on users of these systems?

You left out the most important information from the changelog:
why do users care, what good does the patch do?

Thanks,

Ingo

Yinghai Lu

unread,
Jan 28, 2013, 1:10:22 PM1/28/13
to Ingo Molnar, Wang, Song-Bo (Stoney), H. Peter Anvin, Ingo Molnar, Thomas Gleixner, Zhang, Lin-Bao (Linux Kernel R&D), Pearson, Greg, linux-...@vger.kernel.org, suresh....@intel.com
On Mon, Jan 28, 2013 at 2:11 AM, Ingo Molnar <mi...@kernel.org> wrote:
>
>> HP has systems that work with x2apic phys mode and BIOS set
>> ACPI_FADT_APIC_PHYSICAL in FADT table, and all cpuid < 255,
>> the spec requires BIOS only put system on xapic mode. Kernel
>
> Which spec?
>
>> will set to x2apic logical mode instead of x2apic phys mode.
>
> Which has exactly what bad effect on users of these systems?
>
> You left out the most important information from the changelog:
> why do users care, what good does the patch do?

please check you are happy with this:

---
From: Stoney Wang <song-b...@hp.com>
Subject: [PATCH] x86, apic: Check fadt x2apic phys in x2apic_phys_probe()

HP has systems that only work with x2apic phys mode and BIOS set
ACPI_FADT_APIC_PHYSICAL in FADT table. But all apicid < 255,
according to x2apic-spec, chapter 2.9, BIOS need to pass the control
to the OS with xapic mode.
Kernel will set apic driver wrong to x2apic cluster instead of x2apic phys.

The user will have to append nox2apic in boot command line to stay xapic mode,
or append x2apic_phys to switch to x2apic phys mode.

That is kernel bug about handling fadt phys bit.

Current code handle x2apic as:
1. When BIOS set x2apic mode:
When user specify x2apic_phys, or FADT indicates PHYSICAL.
During madt oem check, apic driver is set correctly to x2apic phys driver.

2. When BIOS does NOT set x2apic mode:
When user specify x2apic_phys, or FADT indicates PHYSICAL.
During madt oem check, apic driver is set with xapic logical or
xapic phys driver at first.
Later enable_IR_x2apic() will enable x2apic_mode.
After that, x2apic_phys_probe() will install right x2apic phys driver
when user specify x2apic_phys,
but will skip and let x2apic_cluster_probe to take over to install
wrong apic driver (x2apic cluster) when FADT indicates
PHYSICAL, because x2apic_phys_probe does not check FADT PHYSICAL.

Fix that by adding check x2apic_fadt_phys in x2apic_phys_probe().

[ changelog, and simplify code - Yinghai Lu ]
Signed-off-by: Stoney Wang <song-b...@hp.com>
Signed-off-by: Yinghai Lu <yin...@kernel.org>
---

Ingo Molnar

unread,
Jan 29, 2013, 2:48:03 AM1/29/13
to Yinghai Lu, Wang, Song-Bo (Stoney), H. Peter Anvin, Ingo Molnar, Thomas Gleixner, Zhang, Lin-Bao (Linux Kernel R&D), Pearson, Greg, linux-...@vger.kernel.org, suresh....@intel.com

* Yinghai Lu <yin...@kernel.org> wrote:

> On Mon, Jan 28, 2013 at 2:11 AM, Ingo Molnar <mi...@kernel.org> wrote:
> >
> >> HP has systems that work with x2apic phys mode and BIOS set
> >> ACPI_FADT_APIC_PHYSICAL in FADT table, and all cpuid < 255,
> >> the spec requires BIOS only put system on xapic mode. Kernel
> >
> > Which spec?
> >
> >> will set to x2apic logical mode instead of x2apic phys mode.
> >
> > Which has exactly what bad effect on users of these systems?
> >
> > You left out the most important information from the changelog:
> > why do users care, what good does the patch do?
>
> please check you are happy with this:
>
> ---
> From: Stoney Wang <song-b...@hp.com>
> Subject: [PATCH] x86, apic: Check fadt x2apic phys in x2apic_phys_probe()
>
> HP has systems that only work with x2apic phys mode and BIOS set
> ACPI_FADT_APIC_PHYSICAL in FADT table. But all apicid < 255,
> according to x2apic-spec, chapter 2.9, BIOS need to pass the control
> to the OS with xapic mode.
> Kernel will set apic driver wrong to x2apic cluster instead of x2apic phys.
>
> The user will have to append nox2apic in boot command line to stay xapic mode,
> or append x2apic_phys to switch to x2apic phys mode.

This still does not explain what happens if none of this user
action is taken. I.e. what exact _user visible problem_ does the
patch fix?

Is this really so unimportant to you? Almost everyone will start
a changelog with explaining what badness happens. Not you - you
explain everything from how the fix works to how to work around
the bug - except describing the most important thing: theuser
visible problem itself ... Weird.

Thanks,

Ingo

Wang, Song-Bo (Stoney)

unread,
Jan 29, 2013, 3:43:54 AM1/29/13
to mingo.ke...@gmail.com, Yinghai Lu, H. Peter Anvin, Thomas Gleixner, Zhang, Lin-Bao (Linux Kernel R&D), Pearson, Greg, linux-...@vger.kernel.org, suresh....@intel.com, Ingo Molnar
> From: Ingo Molnar [mailto:mingo.ke...@gmail.com] On Behalf Of Ingo
> Molnar
> Sent: Tuesday, January 29, 2013 3:48 PM
> To: Yinghai Lu
> Cc: Wang, Song-Bo (Stoney); H. Peter Anvin; Ingo Molnar; Thomas
> Gleixner; Zhang, Lin-Bao (Linux Kernel R&D); Pearson, Greg; linux-
> ker...@vger.kernel.org; suresh....@intel.com
> Subject: Re: [PATCH] x86/apic: check FADT settings after enable x2apic
Hi Ingo,

The HW requires x2apic physical mode, and if we do not apply this patch, the OS will follow into x2apic cluster mode by default.
Due to this mode mismatch, with specific HW configurations, there will be intermittent lost interrupts, which could result in a hang or data loss.

Logically, there is a mismatch because the kernel missed detection on HW configurations, and we should fix it, right?
Do we need more detailed information?

Thanks,

Stoney

Ingo Molnar

unread,
Jan 29, 2013, 3:49:24 AM1/29/13
to Wang, Song-Bo (Stoney), mingo.ke...@gmail.com, Yinghai Lu, H. Peter Anvin, Thomas Gleixner, Zhang, Lin-Bao (Linux Kernel R&D), Pearson, Greg, linux-...@vger.kernel.org, suresh....@intel.com, Ingo Molnar

* Wang, Song-Bo (Stoney) <song-b...@hp.com> wrote:

> [...] Due to this mode mismatch, with specific HW
> configurations, there will be intermittent lost interrupts,
> which could result in a hang or data loss.

That's the key piece of information that was missing - which
should be put into the changelog into a very prominent place.

> Logically, there is a mismatch because the kernel missed
> detection on HW configurations, and we should fix it, right?
> Do we need more detailed information?

No, the above sentence is more than enough and the fix looks
nice. Note that this sentence is more useful to users than the
rest of the changelog combined!

Thanks,

Ingo

Yinghai Lu

unread,
Jan 29, 2013, 4:30:47 PM1/29/13
to Ingo Molnar, Wang, Song-Bo (Stoney), mingo.ke...@gmail.com, H. Peter Anvin, Thomas Gleixner, Zhang, Lin-Bao (Linux Kernel R&D), Pearson, Greg, linux-...@vger.kernel.org, suresh....@intel.com, Ingo Molnar
On Tue, Jan 29, 2013 at 12:49 AM, Ingo Molnar <mi...@kernel.org> wrote:
>
> * Wang, Song-Bo (Stoney) <song-b...@hp.com> wrote:
>
>> [...] Due to this mode mismatch, with specific HW
>> configurations, there will be intermittent lost interrupts,
>> which could result in a hang or data loss.

Wang,

Maybe off topic, You should never ship those kind of system before you
fix that in BIOS, as user will have problem with current distribution.

You need to ask your BIOS guys (or OEM's BIOS guys) to check with Intel
to set one bit in IIO configure space to enable x2apic cluster.

Thanks

Yinghai

Ingo Molnar

unread,
Jan 31, 2013, 6:32:42 AM1/31/13
to Yinghai Lu, Wang, Song-Bo (Stoney), mingo.ke...@gmail.com, H. Peter Anvin, Thomas Gleixner, Zhang, Lin-Bao (Linux Kernel R&D), Pearson, Greg, linux-...@vger.kernel.org, suresh....@intel.com, Ingo Molnar

* Yinghai Lu <yin...@kernel.org> wrote:

> Please check attached.

Almost good.

This:

> When some HP sytems boot without x2apic_phys, there will be

Should mention the approximate models of the systems affected -
is it just a specific line of systems, or a wider range of
systems affected?

This will inform users and will help maintainers like me to
prioritize the merging and backporting of patches.

Yinghai Lu

unread,
Jan 31, 2013, 10:56:43 AM1/31/13
to Ingo Molnar, Wang, Song-Bo (Stoney), mingo.ke...@gmail.com, H. Peter Anvin, Thomas Gleixner, Zhang, Lin-Bao (Linux Kernel R&D), Pearson, Greg, linux-...@vger.kernel.org, suresh....@intel.com, Ingo Molnar
On Thu, Jan 31, 2013 at 3:32 AM, Ingo Molnar <mi...@kernel.org> wrote:
>
> This:
>
> > When some HP sytems boot without x2apic_phys, there will be
>
> Should mention the approximate models of the systems affected -
> is it just a specific line of systems, or a wider range of
> systems affected?
>
> This will inform users and will help maintainers like me to
> prioritize the merging and backporting of patches.

Yes.

Wang,

Can you give the model No. if it is already shipped?

Thanks

Yinghai

Wang, Song-Bo (Stoney)

unread,
Feb 4, 2013, 1:52:22 AM2/4/13
to Ingo Molnar, Yinghai Lu, mingo.ke...@gmail.com, H. Peter Anvin, Thomas Gleixner, Zhang, Lin-Bao (Linux Kernel R&D), Pearson, Greg, linux-...@vger.kernel.org, suresh....@intel.com, Ingo Molnar
* Ingo Molnar <mingo.ke...@gmail.com> wrote:
>
> * Yinghai Lu <yin...@kernel.org> wrote:
>
> > Please check attached.
>
> Almost good.
>
> This:
>
> > When some HP sytems boot without x2apic_phys, there will be
>
> Should mention the approximate models of the systems affected - is it
> just a specific line of systems, or a wider range of systems affected?
>
> This will inform users and will help maintainers like me to prioritize
> the merging and backporting of patches.
>
> Thanks,
>
> Ingo

Hi Ingo,

Due to some HW limitation, HP ProLiant DL980 G7 Server has the BIT ACPI_FADT_APIC_PHYSICAL set in BIOS.

This model of systems already shipped. It is great if some backporting could be done.

Thanks,

Stoney

Ingo Molnar

unread,
Feb 4, 2013, 6:03:18 AM2/4/13
to Wang, Song-Bo (Stoney), Yinghai Lu, mingo.ke...@gmail.com, H. Peter Anvin, Thomas Gleixner, Zhang, Lin-Bao (Linux Kernel R&D), Pearson, Greg, linux-...@vger.kernel.org, suresh....@intel.com, Ingo Molnar

* Wang, Song-Bo (Stoney) <song-b...@hp.com> wrote:

> * Ingo Molnar <mingo.ke...@gmail.com> wrote:
> >
> > * Yinghai Lu <yin...@kernel.org> wrote:
> >
> > > Please check attached.
> >
> > Almost good.
> >
> > This:
> >
> > > When some HP sytems boot without x2apic_phys, there will be
> >
> > Should mention the approximate models of the systems affected - is it
> > just a specific line of systems, or a wider range of systems affected?
> >
> > This will inform users and will help maintainers like me to prioritize
> > the merging and backporting of patches.
> >
> > Thanks,
> >
> > Ingo
>
> Hi Ingo,
>
> Due to some HW limitation, HP ProLiant DL980 G7 Server has the
> BIT ACPI_FADT_APIC_PHYSICAL set in BIOS.
>
> This model of systems already shipped. It is great if some
> backporting could be done.

Ok, please re-send the last version of the patch with the model
information included in the changelog and also include a Cc:
<sta...@kernel.org> tag before your signoff line.

Thanks,

Ingo

Yinghai Lu

unread,
Feb 4, 2013, 3:30:07 PM2/4/13
to Ingo Molnar, Thomas Gleixner, H. Peter Anvin, linux-...@vger.kernel.org, Stoney Wang, Yinghai Lu, sta...@kernel.org
From: Stoney Wang <song-b...@hp.com>

When HP ProLiant DL980 G7 Server boot without x2apic_phys, there will be
intermittent lost interrupts and could result in a hang or data loss.

Those systems only work with x2apic phys mode so BIOS set
ACPI_FADT_APIC_PHYSICAL in FADT table.
Also because all apicids are less then 255, BIOS need to pass the control
to the OS with xapic mode, according to x2apic-spec, chapter 2.9.

Current code handle x2apic when BIOS pass with xapic mode:

When user specify x2apic_phys, or FADT indicates PHYSICAL.
During madt oem check, apic driver is set with xapic logical or
xapic phys driver at first.
Later enable_IR_x2apic() will enable x2apic_mode.
After that, x2apic_phys_probe() will install right x2apic phys driver
if user specify x2apic_phys,
but will skip and let x2apic_cluster_probe to take over to install
wrong apic driver (x2apic cluster) when FADT indicates
PHYSICAL, because x2apic_phys_probe does not check FADT PHYSICAL.

Add checking x2apic_fadt_phys in x2apic_phys_probe() to fix the problem.

-v3: update the change according to Ingo.

[ changelog, and simplify code - Yinghai Lu ]
Signed-off-by: Stoney Wang <song-b...@hp.com>
Signed-off-by: Yinghai Lu <yin...@kernel.org>
Cc: sta...@kernel.org

---
arch/x86/kernel/apic/x2apic_phys.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)

Index: linux-2.6/arch/x86/kernel/apic/x2apic_phys.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apic/x2apic_phys.c
+++ linux-2.6/arch/x86/kernel/apic/x2apic_phys.c
@@ -20,18 +20,19 @@ static int set_x2apic_phys_mode(char *ar
}
early_param("x2apic_phys", set_x2apic_phys_mode);

-static int x2apic_acpi_madt_oem_check(char *oem_id, char *oem_table_id)
+static bool x2apic_fadt_phys(void)
{
- if (x2apic_phys)
- return x2apic_enabled();
- else if ((acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID) &&
- (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL) &&
- x2apic_enabled()) {
+ if ((acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID) &&
+ (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL)) {
printk(KERN_DEBUG "System requires x2apic physical mode\n");
- return 1;
+ return true;
}
- else
- return 0;
+ return false;
+}
+
+static int x2apic_acpi_madt_oem_check(char *oem_id, char *oem_table_id)
+{
+ return x2apic_enabled() && (x2apic_phys || x2apic_fadt_phys());
}

static void
@@ -82,7 +83,7 @@ static void init_x2apic_ldr(void)

static int x2apic_phys_probe(void)
{
- if (x2apic_mode && x2apic_phys)
+ if (x2apic_mode && (x2apic_phys || x2apic_fadt_phys()))
return 1;

return apic == &apic_x2apic_phys;

Ingo Molnar

unread,
Feb 5, 2013, 7:24:43 AM2/5/13
to Yinghai Lu, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, linux-...@vger.kernel.org, Stoney Wang, sta...@kernel.org

* Yinghai Lu <yin...@kernel.org> wrote:

> From: Stoney Wang <song-b...@hp.com>
>
> When HP ProLiant DL980 G7 Server boot without x2apic_phys, there will be
> intermittent lost interrupts and could result in a hang or data loss.

What does 'boot without x2apic_phys' mean?

Does it mean x2apic_phys=0 boot command line? Or, because
x2apic_phys is off by default, does it simply mean that if it's
booted with a default kernel, without any workaround specified
on the boot command line?

Thanks,

Ingo

Yinghai Lu

unread,
Feb 5, 2013, 11:53:16 AM2/5/13
to Ingo Molnar, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, linux-...@vger.kernel.org, Stoney Wang, sta...@kernel.org
On Tue, Feb 5, 2013 at 4:24 AM, Ingo Molnar <mi...@kernel.org> wrote:
>
> * Yinghai Lu <yin...@kernel.org> wrote:
>
>> From: Stoney Wang <song-b...@hp.com>
>>
>> When HP ProLiant DL980 G7 Server boot without x2apic_phys, there will be
>> intermittent lost interrupts and could result in a hang or data loss.
>
> What does 'boot without x2apic_phys' mean?
>
> Does it mean x2apic_phys=0 boot command line? Or, because
> x2apic_phys is off by default, does it simply mean that if it's
> booted with a default kernel, without any workaround specified
> on the boot command line?

means that user does not append "x2apic_phys" in boot command line.

current we have:

x2apic_phys [X86-64,APIC] Use x2apic physical mode instead of
default x2apic cluster mode on platforms
supporting x2apic.

int x2apic_phys;

static int set_x2apic_phys_mode(char *arg)
{
x2apic_phys = 1;
return 0;
}
early_param("x2apic_phys", set_x2apic_phys_mode);

if the user specify "x2apic_phys=0", kernel still enable x2apic_phys.

Now because system could enable x2apic phys mode via fadt, do we
need to make "x2apic_phys=0" works as disabling x2apic_phys and
overriding FADT ?

Thanks

Yinghai

Ingo Molnar

unread,
Feb 6, 2013, 8:17:09 AM2/6/13
to Yinghai Lu, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, linux-...@vger.kernel.org, Stoney Wang, sta...@kernel.org

* Yinghai Lu <yin...@kernel.org> wrote:

> On Tue, Feb 5, 2013 at 4:24 AM, Ingo Molnar <mi...@kernel.org> wrote:
> >
> > * Yinghai Lu <yin...@kernel.org> wrote:
> >
> >> From: Stoney Wang <song-b...@hp.com>
> >>
> >> When HP ProLiant DL980 G7 Server boot without x2apic_phys, there will be
> >> intermittent lost interrupts and could result in a hang or data loss.
> >
> > What does 'boot without x2apic_phys' mean?
> >
> > Does it mean x2apic_phys=0 boot command line? Or, because
> > x2apic_phys is off by default, does it simply mean that if it's
> > booted with a default kernel, without any workaround specified
> > on the boot command line?
>
> means that user does not append "x2apic_phys" in boot command line.

The user does not append a whole lot of other
behavior-modification command line options either! There's no
apic=0 line either. Nor smp=0.

Adding this essentially irrelevant piece of information to the
*FIRST*, most important sentence of the changelog is thus not
just confusing but utterly misleading. Communications 101.

Instead it should say something like:

When a HP ProLiant DL980 G7 Server boots a regular kernel,
there will be intermittent lost interrupts which could
result in a hang or (in extreme cases) data loss.

The reason is that this system only supports x2apic physical
mode, while the kernel boots with a logical-cluster default
setting.

This bug can be worked around by specifying the x2apic_phys=1
boot option, but we want to handle this sytem without
requiring manual workarounds.

Right? Writing a clean changelog is like writing clean code -
you have to learn it if you want to contribute to the kernel
smoothly. Think of it as an engineering task: the other required
half of modifying kernel code.

Thanks,

Ingo

Yinghai Lu

unread,
Feb 6, 2013, 12:10:49 PM2/6/13
to Ingo Molnar, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, linux-...@vger.kernel.org, Stoney Wang, sta...@kernel.org
On Wed, Feb 6, 2013 at 5:16 AM, Ingo Molnar <mi...@kernel.org> wrote:
>
> * Yinghai Lu <yin...@kernel.org> wrote:
>
>> On Tue, Feb 5, 2013 at 4:24 AM, Ingo Molnar <mi...@kernel.org> wrote:
>> >
>> > * Yinghai Lu <yin...@kernel.org> wrote:
>> >
>> >> From: Stoney Wang <song-b...@hp.com>
>> >>
>> >> When HP ProLiant DL980 G7 Server boot without x2apic_phys, there will be
>> >> intermittent lost interrupts and could result in a hang or data loss.
>> >
>> > What does 'boot without x2apic_phys' mean?
>> >
>> > Does it mean x2apic_phys=0 boot command line? Or, because
>> > x2apic_phys is off by default, does it simply mean that if it's
>> > booted with a default kernel, without any workaround specified
>> > on the boot command line?
>>
>> means that user does not append "x2apic_phys" in boot command line.
>
> The user does not append a whole lot of other
> behavior-modification command line options either! There's no
> apic=0 line either. Nor smp=0.
>
> Adding this essentially irrelevant piece of information to the
> *FIRST*, most important sentence of the changelog is thus not
> just confusing but utterly misleading. Communications 101.

Those system should not be out of the door with buggy bios.
Assume that HP would have release notes that require user to append
"x2apic_phys"

>
> Instead it should say something like:
>
> When a HP ProLiant DL980 G7 Server boots a regular kernel,
> there will be intermittent lost interrupts which could
> result in a hang or (in extreme cases) data loss.
>
> The reason is that this system only supports x2apic physical
> mode, while the kernel boots with a logical-cluster default
> setting.
>
> This bug can be worked around by specifying the x2apic_phys=1
> boot option, but we want to handle this sytem without
> requiring manual workarounds.
>
> Right? Writing a clean changelog is like writing clean code -
> you have to learn it if you want to contribute to the kernel
> smoothly. Think of it as an engineering task: the other required
> half of modifying kernel code.

Yes. that is clean. just want to update the workaround words to
"x2apic_phys" or "nox2apic"

This bug can be worked around by specifying "x2apic_phys" or
"nox2apic" boot option, but we want to handle this system without
requiring manual workarounds.

Will continue to work on writing clean changelog.

Thanks

Yinghai

Yinghai Lu

unread,
Feb 7, 2013, 1:53:36 PM2/7/13
to Ingo Molnar, Thomas Gleixner, H. Peter Anvin, linux-...@vger.kernel.org, Stoney Wang, Yinghai Lu, sta...@kernel.org
From: Stoney Wang <song-b...@hp.com>

When a HP ProLiant DL980 G7 Server boots a regular kernel,
there will be intermittent lost interrupts which could
result in a hang or (in extreme cases) data loss.

The reason is that this system only supports x2apic physical
mode, while the kernel boots with a logical-cluster default
setting.

This bug can be worked around by specifying "x2apic_phys" or
"nox2apic" boot option, but we want to handle this system without
requiring manual workarounds.

BIOS set ACPI_FADT_APIC_PHYSICAL in FADT table.
As all apicids are less than 255, BIOS need to pass the control
to the OS with xapic mode, according to x2apic-spec, chapter 2.9.

Current code handle x2apic when BIOS pass with xapic mode:
When user specify x2apic_phys, or FADT indicates PHYSICAL.
1. During madt oem check, apic driver is set with xapic logical or
xapic phys driver at first.
2. enable_IR_x2apic() will enable x2apic_mode.
3. if user specify x2apic_phys, x2apic_phys_probe() will install
right x2apic phys driver and use x2apic phys mode.
otherwise will skip and let x2apic_cluster_probe to take over
to install x2apic cluster driver (wrong one) even FADT indicates
PHYSICAL, because x2apic_phys_probe does not check FADT PHYSICAL.

Add checking x2apic_fadt_phys in x2apic_phys_probe() to fix the problem.

-v3: update the change according to Ingo.
-v4: merge changelog from Ingo.

tip-bot for Stoney Wang

unread,
Feb 11, 2013, 7:37:35 AM2/11/13
to linux-ti...@vger.kernel.org, linux-...@vger.kernel.org, h...@zytor.com, mi...@kernel.org, yin...@kernel.org, song-b...@hp.com, tg...@linutronix.de
Commit-ID: cb214ede7657db458fd0b2a25ea0b28dbf900ebc
Gitweb: http://git.kernel.org/tip/cb214ede7657db458fd0b2a25ea0b28dbf900ebc
Author: Stoney Wang <song-b...@hp.com>
AuthorDate: Thu, 7 Feb 2013 10:53:02 -0800
Committer: Ingo Molnar <mi...@kernel.org>
CommitDate: Mon, 11 Feb 2013 11:13:00 +0100

x86/apic: Work around boot failure on HP ProLiant DL980 G7 Server systems

When a HP ProLiant DL980 G7 Server boots a regular kernel,
there will be intermittent lost interrupts which could
result in a hang or (in extreme cases) data loss.

The reason is that this system only supports x2apic physical
mode, while the kernel boots with a logical-cluster default
setting.

This bug can be worked around by specifying the "x2apic_phys" or
"nox2apic" boot option, but we want to handle this system
without requiring manual workarounds.

The BIOS sets ACPI_FADT_APIC_PHYSICAL in FADT table.
As all apicids are smaller than 255, BIOS need to pass the
control to the OS with xapic mode, according to x2apic-spec,
chapter 2.9.

Current code handle x2apic when BIOS pass with xapic mode
enabled:

When user specifies x2apic_phys, or FADT indicates PHYSICAL:

1. During madt oem check, apic driver is set with xapic logical
or xapic phys driver at first.

2. enable_IR_x2apic() will enable x2apic_mode.

3. if user specifies x2apic_phys on the boot line, x2apic_phys_probe()
will install the correct x2apic phys driver and use x2apic phys mode.
Otherwise it will skip the driver will let x2apic_cluster_probe to
take over to install x2apic cluster driver (wrong one) even though FADT
indicates PHYSICAL, because x2apic_phys_probe does not check
FADT PHYSICAL.

Add checking x2apic_fadt_phys in x2apic_phys_probe() to fix the
problem.

Signed-off-by: Stoney Wang <song-b...@hp.com>
[ updated the changelog and simplified the code ]
Signed-off-by: Yinghai Lu <yin...@kernel.org>
Cc: sta...@kernel.org
Link: http://lkml.kernel.org/r/1360263182-16226-1-g...@kernel.org
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
arch/x86/kernel/apic/x2apic_phys.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c
index e03a1e1..562a76d 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -20,18 +20,19 @@ static int set_x2apic_phys_mode(char *arg)

Greg Kroah-Hartman

unread,
Feb 15, 2013, 5:56:31 PM2/15/13
to linux-...@vger.kernel.org, Greg Kroah-Hartman, sta...@vger.kernel.org, Stoney Wang, Yinghai Lu, Ingo Molnar
3.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Stoney Wang <song-b...@hp.com>

commit cb214ede7657db458fd0b2a25ea0b28dbf900ebc upstream.
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
arch/x86/kernel/apic/x2apic_phys.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)

--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -20,18 +20,19 @@ static int set_x2apic_phys_mode(char *ar
}
early_param("x2apic_phys", set_x2apic_phys_mode);

-static int x2apic_acpi_madt_oem_check(char *oem_id, char *oem_table_id)
+static bool x2apic_fadt_phys(void)
{
- if (x2apic_phys)
- return x2apic_enabled();
- else if ((acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID) &&
- (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL) &&
- x2apic_enabled()) {
+ if ((acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID) &&
+ (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL)) {
printk(KERN_DEBUG "System requires x2apic physical mode\n");
- return 1;
+ return true;
}
- else
- return 0;
+ return false;
+}
+
+static int x2apic_acpi_madt_oem_check(char *oem_id, char *oem_table_id)
+{
+ return x2apic_enabled() && (x2apic_phys || x2apic_fadt_phys());
}

static void
@@ -114,7 +115,7 @@ static void init_x2apic_ldr(void)

Greg Kroah-Hartman

unread,
Feb 15, 2013, 5:56:43 PM2/15/13
to linux-...@vger.kernel.org, Greg Kroah-Hartman, sta...@vger.kernel.org, Satoru Takeuchi, Matt Fleming, H. Peter Anvin
3.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Satoru Takeuchi <takeuch...@jp.fujitsu.com>

commit 1de63d60cd5b0d33a812efa455d5933bf1564a51 upstream.

There was a serious problem in samsung-laptop that its platform driver is
designed to run under BIOS and running under EFI can cause the machine to
become bricked or can cause Machine Check Exceptions.

Discussion about this problem:
https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557
https://bugzilla.kernel.org/show_bug.cgi?id=47121

The patches to fix this problem:
efi: Make 'efi_enabled' a function to query EFI facilities
83e68189745ad931c2afd45d8ee3303929233e7f

samsung-laptop: Disable on EFI hardware
e0094244e41c4d0c7ad69920681972fc45d8ce34

Unfortunately this problem comes back again if users specify "noefi" option.
This parameter clears EFI_BOOT and that driver continues to run even if running
under EFI. Refer to the document, this parameter should clear
EFI_RUNTIME_SERVICES instead.

Documentation/kernel-parameters.txt:
===============================================================================
..
noefi [X86] Disable EFI runtime services support.
..
===============================================================================

Documentation/x86/x86_64/uefi.txt:
===============================================================================
..
- If some or all EFI runtime services don't work, you can try following
kernel command line parameters to turn off some or all EFI runtime
services.
noefi turn off all EFI runtime services
..
===============================================================================

Signed-off-by: Satoru Takeuchi <takeuch...@jp.fujitsu.com>
Link: http://lkml.kernel.org/r/511C2C04...@jp.fujitsu.com
Cc: Matt Fleming <matt.f...@intel.com>
Signed-off-by: H. Peter Anvin <h...@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
arch/x86/platform/efi/efi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -86,7 +86,7 @@ EXPORT_SYMBOL(efi_enabled);

static int __init setup_noefi(char *arg)
{
- clear_bit(EFI_BOOT, &x86_efi_facility);
+ clear_bit(EFI_RUNTIME_SERVICES, &x86_efi_facility);
return 0;
}
early_param("noefi", setup_noefi);

Greg Kroah-Hartman

unread,
Feb 15, 2013, 6:00:59 PM2/15/13
to linux-...@vger.kernel.org, Greg Kroah-Hartman, sta...@vger.kernel.org, Alexander Duyck, Jeff Pieper, Jeff Kirsher, Vinson Lee
3.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Alexander Duyck <alexande...@intel.com>

commit ae1c07a6b7ced6c0c94c99e3b53f4e7856fa8bff upstream.

For some reason the reading of the RQDPC register was being artificially
limited to 4K. Instead of limiting the value we should read the value and
add the full amount. Otherwise this can lead to a misleading number of
dropped packets when the actual value is in fact much higher.

Signed-off-by: Alexander Duyck <alexande...@intel.com>
Tested-by: Jeff Pieper <jeffrey....@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey....@intel.com>
Cc: Vinson Lee <vl...@twitter.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
drivers/net/ethernet/intel/igb/igb_main.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -4649,11 +4649,13 @@ void igb_update_stats(struct igb_adapter
bytes = 0;
packets = 0;
for (i = 0; i < adapter->num_rx_queues; i++) {
- u32 rqdpc_tmp = rd32(E1000_RQDPC(i)) & 0x0FFF;
+ u32 rqdpc = rd32(E1000_RQDPC(i));
struct igb_ring *ring = adapter->rx_ring[i];

- ring->rx_stats.drops += rqdpc_tmp;
- net_stats->rx_fifo_errors += rqdpc_tmp;
+ if (rqdpc) {
+ ring->rx_stats.drops += rqdpc;
+ net_stats->rx_fifo_errors += rqdpc;
+ }

do {
start = u64_stats_fetch_begin_bh(&ring->rx_syncp);

Greg Kroah-Hartman

unread,
Feb 15, 2013, 6:01:23 PM2/15/13
to linux-...@vger.kernel.org, Greg Kroah-Hartman, sta...@vger.kernel.org, Dan Rosenberg, Brad Spengler, Kees Cook, H. Peter Anvin, Paul E. McKenney, Frederic Weisbecker, Eric W. Biederman, Linus Torvalds, Andrew Morton, Peter Zijlstra, Ingo Molnar
3.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Kees Cook <kees...@chromium.org>

commit e575a86fdc50d013bf3ad3aa81d9100e8e6cc60d upstream.

Without this patch, it is trivial to determine kernel page
mappings by examining the error code reported to dmesg[1].
Instead, declare the entire kernel memory space as a violation
of a present page.

Additionally, since show_unhandled_signals is enabled by
default, switch branch hinting to the more realistic
expectation, and unobfuscate the setting of the PF_PROT bit to
improve readability.

[1] http://vulnfactory.org/blog/2013/02/06/a-linux-memory-trick/

Reported-by: Dan Rosenberg <dan.j.r...@gmail.com>
Suggested-by: Brad Spengler <spe...@grsecurity.net>
Signed-off-by: Kees Cook <kees...@chromium.org>
Acked-by: H. Peter Anvin <h...@zytor.com>
Cc: Paul E. McKenney <pau...@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fwei...@gmail.com>
Cc: Eric W. Biederman <ebie...@xmission.com>
Cc: Linus Torvalds <torv...@linux-foundation.org>
Cc: Andrew Morton <ak...@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Link: http://lkml.kernel.org/r/20130207174...@www.outflux.net
Signed-off-by: Ingo Molnar <mi...@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
arch/x86/mm/fault.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -747,13 +747,15 @@ __bad_area_nosemaphore(struct pt_regs *r
return;
}
#endif
+ /* Kernel addresses are always protection faults: */
+ if (address >= TASK_SIZE)
+ error_code |= PF_PROT;

- if (unlikely(show_unhandled_signals))
+ if (likely(show_unhandled_signals))
show_signal_msg(regs, error_code, address, tsk);

- /* Kernel addresses are always protection faults: */
tsk->thread.cr2 = address;
- tsk->thread.error_code = error_code | (address >= TASK_SIZE);
+ tsk->thread.error_code = error_code;
tsk->thread.trap_nr = X86_TRAP_PF;

force_sig_info_fault(SIGSEGV, si_code, address, tsk, 0);

Greg Kroah-Hartman

unread,
Feb 15, 2013, 6:01:35 PM2/15/13
to linux-...@vger.kernel.org, Greg Kroah-Hartman, torv...@linux-foundation.org, ak...@linux-foundation.org, sta...@vger.kernel.org
This is the start of the stable review cycle for the 3.4.32 release.
There are 8 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Sun Feb 17 22:53:47 UTC 2013.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.4.32-rc1.gz
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <gre...@linuxfoundation.org>
Linux 3.4.32-rc1

Alexander Duyck <alexande...@intel.com>
igb: Remove artificial restriction on RQDPC stat reading

Satoru Takeuchi <takeuch...@jp.fujitsu.com>
efi: Clear EFI_RUNTIME_SERVICES rather than EFI_BOOT by "noefi" boot parameter

Rafael J. Wysocki <r...@sisk.pl>
PCI/PM: Clean up PME state when removing a device

Jan Beulich <JBeu...@suse.com>
x86/xen: don't assume %ds is usable in xen_iret for 32-bit PVOPS.

Mel Gorman <mgo...@suse.de>
x86/mm: Check if PUD is large when validating a kernel address

Stoney Wang <song-b...@hp.com>
x86/apic: Work around boot failure on HP ProLiant DL980 G7 Server systems

Kees Cook <kees...@chromium.org>
x86: Do not leak kernel page mapping locations

Heiko Carstens <heiko.c...@de.ibm.com>
s390/timer: avoid overflow when programming clock comparator


-------------

Diffstat:

Makefile | 4 ++--
arch/s390/kernel/time.c | 3 +++
arch/x86/include/asm/pgtable.h | 5 +++++
arch/x86/kernel/apic/x2apic_phys.c | 21 +++++++++++----------
arch/x86/mm/fault.c | 8 +++++---
arch/x86/mm/init_64.c | 3 +++
arch/x86/platform/efi/efi.c | 2 +-
arch/x86/xen/xen-asm_32.S | 14 +++++++-------
drivers/net/ethernet/intel/igb/igb_main.c | 8 +++++---
drivers/pci/remove.c | 2 ++
10 files changed, 44 insertions(+), 26 deletions(-)

Greg Kroah-Hartman

unread,
Feb 15, 2013, 6:01:47 PM2/15/13
to linux-...@vger.kernel.org, Greg Kroah-Hartman, sta...@vger.kernel.org, Petr Matousek, Andrew Cooper, Jan Beulich, Konrad Rzeszutek Wilk
3.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jan Beulich <JBeu...@suse.com>

commit 13d2b4d11d69a92574a55bfd985cfb0ca77aebdc upstream.

This fixes CVE-2013-0228 / XSA-42

Drew Jones while working on CVE-2013-0190 found that that unprivileged guest user
in 32bit PV guest can use to crash the > guest with the panic like this:

-------------
general protection fault: 0000 [#1] SMP
last sysfs file: /sys/devices/vbd-51712/block/xvda/dev
Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xen_netfront ext4
mbcache jbd2 xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last
unloaded: scsi_wait_scan]

Pid: 1250, comm: r Not tainted 2.6.32-356.el6.i686 #1
EIP: 0061:[<c0407462>] EFLAGS: 00010086 CPU: 0
EIP is at xen_iret+0x12/0x2b
EAX: eb8d0000 EBX: 00000001 ECX: 08049860 EDX: 00000010
ESI: 00000000 EDI: 003d0f00 EBP: b77f8388 ESP: eb8d1fe0
DS: 0000 ES: 007b FS: 0000 GS: 00e0 SS: 0069
Process r (pid: 1250, ti=eb8d0000 task=c2953550 task.ti=eb8d0000)
Stack:
00000000 0027f416 00000073 00000206 b77f8364 0000007b 00000000 00000000
Call Trace:
Code: c3 8b 44 24 18 81 4c 24 38 00 02 00 00 8d 64 24 30 e9 03 00 00 00
8d 76 00 f7 44 24 08 00 00 02 80 75 33 50 b8 00 e0 ff ff 21 e0 <8b> 40
10 8b 04 85 a0 f6 ab c0 8b 80 0c b0 b3 c0 f6 44 24 0d 02
EIP: [<c0407462>] xen_iret+0x12/0x2b SS:ESP 0069:eb8d1fe0
general protection fault: 0000 [#2]
---[ end trace ab0d29a492dcd330 ]---
Kernel panic - not syncing: Fatal exception
Pid: 1250, comm: r Tainted: G D ---------------
2.6.32-356.el6.i686 #1
Call Trace:
[<c08476df>] ? panic+0x6e/0x122
[<c084b63c>] ? oops_end+0xbc/0xd0
[<c084b260>] ? do_general_protection+0x0/0x210
[<c084a9b7>] ? error_code+0x73/
-------------

Petr says: "
I've analysed the bug and I think that xen_iret() cannot cope with
mangled DS, in this case zeroed out (null selector/descriptor) by either
xen_failsafe_callback() or RESTORE_REGS because the corresponding LDT
entry was invalidated by the reproducer. "

Jan took a look at the preliminary patch and came up a fix that solves
this problem:

"This code gets called after all registers other than those handled by
IRET got already restored, hence a null selector in %ds or a non-null
one that got loaded from a code or read-only data descriptor would
cause a kernel mode fault (with the potential of crashing the kernel
as a whole, if panic_on_oops is set)."

The way to fix this is to realize that the we can only relay on the
registers that IRET restores. The two that are guaranteed are the
%cs and %ss as they are always fixed GDT selectors. Also they are
inaccessible from user mode - so they cannot be altered. This is
the approach taken in this patch.

Another alternative option suggested by Jan would be to relay on
the subtle realization that using the %ebp or %esp relative references uses
the %ss segment. In which case we could switch from using %eax to %ebp and
would not need the %ss over-rides. That would also require one extra
instruction to compensate for the one place where the register is used
as scaled index. However Andrew pointed out that is too subtle and if
further work was to be done in this code-path it could escape folks attention
and lead to accidents.

Reviewed-by: Petr Matousek <pmat...@redhat.com>
Reported-by: Petr Matousek <pmat...@redhat.com>
Reviewed-by: Andrew Cooper <andrew....@citrix.com>
Signed-off-by: Jan Beulich <jbeu...@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konra...@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
arch/x86/xen/xen-asm_32.S | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

--- a/arch/x86/xen/xen-asm_32.S
+++ b/arch/x86/xen/xen-asm_32.S
@@ -88,11 +88,11 @@ ENTRY(xen_iret)
*/
#ifdef CONFIG_SMP
GET_THREAD_INFO(%eax)
- movl TI_cpu(%eax), %eax
- movl __per_cpu_offset(,%eax,4), %eax
- mov xen_vcpu(%eax), %eax
+ movl %ss:TI_cpu(%eax), %eax
+ movl %ss:__per_cpu_offset(,%eax,4), %eax
+ mov %ss:xen_vcpu(%eax), %eax
#else
- movl xen_vcpu, %eax
+ movl %ss:xen_vcpu, %eax
#endif

/* check IF state we're restoring */
@@ -105,11 +105,11 @@ ENTRY(xen_iret)
* resuming the code, so we don't have to be worried about
* being preempted to another CPU.
*/
- setz XEN_vcpu_info_mask(%eax)
+ setz %ss:XEN_vcpu_info_mask(%eax)
xen_iret_start_crit:

/* check for unmasked and pending */
- cmpw $0x0001, XEN_vcpu_info_pending(%eax)
+ cmpw $0x0001, %ss:XEN_vcpu_info_pending(%eax)

/*
* If there's something pending, mask events again so we can
@@ -117,7 +117,7 @@ xen_iret_start_crit:
* touch XEN_vcpu_info_mask.
*/
jne 1f
- movb $1, XEN_vcpu_info_mask(%eax)
+ movb $1, %ss:XEN_vcpu_info_mask(%eax)

1: popl %eax

Greg Kroah-Hartman

unread,
Feb 15, 2013, 6:01:58 PM2/15/13
to linux-...@vger.kernel.org, Greg Kroah-Hartman, sta...@vger.kernel.org, Christian Borntraeger, Heiko Carstens
3.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Heiko Carstens <heiko.c...@de.ibm.com>

commit d911e03d097bdc01363df5d81c43f69432eb785c upstream.

Since ed4f209 "s390/time: fix sched_clock() overflow" a new helper function
is used to avoid overflows when converting TOD format values to nanosecond
values.
The kvm interrupt code formerly however only worked by accident because of
an overflow. It tried to program a timer that would expire in more than ~29
years. Because of the old TOD-to-nanoseconds overflow bug the real expiry
value however was much smaller, but now it isn't anymore.
This however triggers yet another bug in the function that programs the clock
comparator s390_next_ktime(): if the absolute "expires" value is after 2042
this will result in an overflow and the programmed value is lower than the
current TOD value which immediatly triggers a clock comparator (= timer)
interrupt.
Since the timer isn't expired it will be programmed immediately again and so
on... the result is a dead system.
To fix this simply program the maximum possible value if an overflow is
detected.

Reported-by: Christian Borntraeger <bornt...@de.ibm.com>
Tested-by: Christian Borntraeger <bornt...@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.c...@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
arch/s390/kernel/time.c | 3 +++
1 file changed, 3 insertions(+)

--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -121,6 +121,9 @@ static int s390_next_ktime(ktime_t expir
nsecs = ktime_to_ns(ktime_add(timespec_to_ktime(ts), expires));
do_div(nsecs, 125);
S390_lowcore.clock_comparator = sched_clock_base_cc + (nsecs << 9);
+ /* Program the maximum value if we have an overflow (== year 2042) */
+ if (unlikely(S390_lowcore.clock_comparator < sched_clock_base_cc))
+ S390_lowcore.clock_comparator = -1ULL;
set_clock_comparator(S390_lowcore.clock_comparator);
return 0;

Greg Kroah-Hartman

unread,
Feb 15, 2013, 6:02:17 PM2/15/13
to linux-...@vger.kernel.org, Greg Kroah-Hartman, sta...@vger.kernel.org, Rafael J. Wysocki, Bjorn Helgaas
3.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: "Rafael J. Wysocki" <r...@sisk.pl>

commit 249bfb83cf8ba658955f0245ac3981d941f746ee upstream.

Devices are added to pci_pme_list when drivers use pci_enable_wake()
or pci_wake_from_d3(), but they aren't removed from the list unless
the driver explicitly disables wakeup. Many drivers never disable
wakeup, so their devices remain on the list even after they are
removed, e.g., via hotplug. A subsequent PME poll will oops when
it tries to touch the device.

This patch disables PME# on a device before removing it, which removes
the device from pci_pme_list. This is safe even if the device never
had PME# enabled.

This oops can be triggered by unplugging a Thunderbolt ethernet adapter
on a Macbook Pro, as reported by Daniel below.

[bhelgaas: changelog]
Reference: http://lkml.kernel.org/r/CAMVG2svG21yiM1wkH4_2pen2...@mail.gmail.com
Reported-and-tested-by: Daniel J Blueman <dan...@quora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j...@intel.com>
Signed-off-by: Bjorn Helgaas <bhel...@google.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
drivers/pci/remove.c | 2 ++
1 file changed, 2 insertions(+)

--- a/drivers/pci/remove.c
+++ b/drivers/pci/remove.c
@@ -19,6 +19,8 @@ static void pci_free_resources(struct pc

static void pci_stop_dev(struct pci_dev *dev)
{
+ pci_pme_active(dev, false);
+
if (dev->is_added) {
pci_proc_detach_device(dev);
pci_remove_sysfs_dev_files(dev);

Greg Kroah-Hartman

unread,
Feb 15, 2013, 6:02:34 PM2/15/13
to linux-...@vger.kernel.org, Greg Kroah-Hartman, sta...@vger.kernel.org, Mel Gorman, Rik van Riel, Michal Hocko, Johannes Weiner, Ingo Molnar, linu...@kvack.org
3.4-stable review patch. If anyone has any objections, please let me know.

------------------

From: Mel Gorman <mgo...@suse.de>

commit 0ee364eb316348ddf3e0dfcd986f5f13f528f821 upstream.

A user reported the following oops when a backup process reads
/proc/kcore:

BUG: unable to handle kernel paging request at ffffbb00ff33b000
IP: [<ffffffff8103157e>] kern_addr_valid+0xbe/0x110
[...]

Call Trace:
[<ffffffff811b8aaa>] read_kcore+0x17a/0x370
[<ffffffff811ad847>] proc_reg_read+0x77/0xc0
[<ffffffff81151687>] vfs_read+0xc7/0x130
[<ffffffff811517f3>] sys_read+0x53/0xa0
[<ffffffff81449692>] system_call_fastpath+0x16/0x1b

Investigation determined that the bug triggered when reading
system RAM at the 4G mark. On this system, that was the first
address using 1G pages for the virt->phys direct mapping so the
PUD is pointing to a physical address, not a PMD page.

The problem is that the page table walker in kern_addr_valid() is
not checking pud_large() and treats the physical address as if
it was a PMD. If it happens to look like pmd_none then it'll
silently fail, probably returning zeros instead of real data. If
the data happens to look like a present PMD though, it will be
walked resulting in the oops above.

This patch adds the necessary pud_large() check.

Unfortunately the problem was not readily reproducible and now
they are running the backup program without accessing
/proc/kcore so the patch has not been validated but I think it
makes sense.

Signed-off-by: Mel Gorman <mgo...@suse.de>
Reviewed-by: Rik van Riel <ri...@redhat.coM>
Reviewed-by: Michal Hocko <mho...@suse.cz>
Acked-by: Johannes Weiner <han...@cmpxchg.org>
Cc: linu...@kvack.org
Link: http://lkml.kernel.org/r/20130211145...@suse.de
Signed-off-by: Ingo Molnar <mi...@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
arch/x86/include/asm/pgtable.h | 5 +++++
arch/x86/mm/init_64.c | 3 +++
2 files changed, 8 insertions(+)

--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -142,6 +142,11 @@ static inline unsigned long pmd_pfn(pmd_
return (pmd_val(pmd) & PTE_PFN_MASK) >> PAGE_SHIFT;
}

+static inline unsigned long pud_pfn(pud_t pud)
+{
+ return (pud_val(pud) & PTE_PFN_MASK) >> PAGE_SHIFT;
+}
+
#define pte_page(pte) pfn_to_page(pte_pfn(pte))

static inline int pmd_large(pmd_t pte)
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -821,6 +821,9 @@ int kern_addr_valid(unsigned long addr)
if (pud_none(*pud))
return 0;

+ if (pud_large(*pud))
+ return pfn_valid(pud_pfn(*pud));
+
pmd = pmd_offset(pud, addr);
if (pmd_none(*pmd))
return 0;

Shuah Khan

unread,
Feb 16, 2013, 5:09:48 PM2/16/13
to Greg Kroah-Hartman, linux-...@vger.kernel.org, torv...@linux-foundation.org, ak...@linux-foundation.org, sta...@vger.kernel.org
On Fri, Feb 15, 2013 at 3:56 PM, Greg Kroah-Hartman
<gre...@linuxfoundation.org> wrote:
> This is the start of the stable review cycle for the 3.4.32 release.
> There are 8 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun Feb 17 22:53:47 UTC 2013.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.4.32-rc1.gz
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>
> -------------

Patches applied cleanly to 3.0.64, 3.4.31, and 3.7.8.
Compiled and booted on the following systems:
HP EliteBook 6930p Intel(R) Core(TM)2 Duo CPU T9400 @ 2.53GHz
HP ProBook 6475b AMD A10-4600M APU with Radeon(tm) HD Graphics

Reviewed patches

Cross-compile tests results:
alpha: defconfig passed on all
arm: defconfig passed on all
arm64: not applicable to 3.0.y, 3.4.y. defconfig passed on 3.7.y
c6x: not applicable to 3.0.y, defconfig passed on 3.4.y, and 3.7.y.
mips: defconfig passed on all
mipsel: defconfig passed on all
powerpc: wii_defconfig passed on all
sh: defconfig passed on all
sparc: defconfig passed on all
tile: tilegx_defconfig passed on all

-- Shuah

Lin-Bao Zhang

unread,
Feb 18, 2013, 3:52:24 AM2/18/13
to Yinghai Lu, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, linux-...@vger.kernel.org, Stoney Wang, sta...@kernel.org, linbao...@hp.com
Hi Yinghai ,
this patch has been committed into some git repository ? if yes ,
which repository ?
I just " git log arch/x86/kernel/apic/x2apic_phys.c " on
"linux-stable" , but I could not find it.
i guess maybe you will put it into another repository ,thanks very much!

Yinghai Lu

unread,
Feb 18, 2013, 12:13:57 PM2/18/13
to Lin-Bao Zhang, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, linux-...@vger.kernel.org, Stoney Wang, sta...@kernel.org, linbao...@hp.com
On Mon, Feb 18, 2013 at 12:52 AM, Lin-Bao Zhang <2004....@gmail.com> wrote:
> Hi Yinghai ,
> this patch has been committed into some git repository ? if yes ,
> which repository ?
> I just " git log arch/x86/kernel/apic/x2apic_phys.c " on
> "linux-stable" , but I could not find it.
> i guess maybe you will put it into another repository ,thanks very much!

It is already in linus's tree.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=cb214ede7657db458fd0b2a25ea0b28dbf900ebc

Stable tree should have that soon.

Thanks

Yinghai
0 new messages