Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[patch] Ignore MCFG if the mmconfig area isn't reserved in the e820 table

341 views
Skip to first unread message

Arjan van de Ven

unread,
Mar 23, 2006, 1:30:15 PM3/23/06
to
Hi,

There have been several machines that don't have a working MMCONFIG,
often because of a buggy MCFG table in the ACPI bios. This patch adds a
simple sanity check that detects a whole bunch of these cases, and when
it detects it, linux now boots rather than crash-and-burns.

Signed-off-by: Arjan van de Ven <ar...@linux.intel.com>

diff -purN linux-2.6.16/arch/i386/kernel/setup.c linux-2.6.16-mmconfig/arch/i386/kernel/setup.c
--- linux-2.6.16/arch/i386/kernel/setup.c 2006-03-20 06:53:29.000000000 +0100
+++ linux-2.6.16-mmconfig/arch/i386/kernel/setup.c 2006-03-23 17:26:13.000000000 +0100
@@ -1377,6 +1377,31 @@ static void __init register_memory(void)
pci_mem_start, gapstart, gapsize);
}

+
+
+/*
+ * Check if an address is reserved in the e820 map
+ */
+int is_e820_reserved(u64 address)
+{
+ int i;
+ i = e820.nr_map;
+ while (--i >= 0) {
+ unsigned long long start = e820.map[i].addr;
+ unsigned long long end = start + e820.map[i].size;
+
+ if (address <=end && address >= start) {
+ if (e820.map[i].type == E820_RESERVED)
+ return 1;
+ else
+ return 0;
+ }
+ }
+ return 0;
+}
+
+
+
/* Use inline assembly to define this because the nops are defined
as inline assembly strings in the include files and we cannot
get them easily into strings. */
diff -purN linux-2.6.16/arch/i386/pci/mmconfig.c linux-2.6.16-mmconfig/arch/i386/pci/mmconfig.c
--- linux-2.6.16/arch/i386/pci/mmconfig.c 2006-03-20 06:53:29.000000000 +0100
+++ linux-2.6.16-mmconfig/arch/i386/pci/mmconfig.c 2006-03-23 17:26:13.000000000 +0100
@@ -12,6 +12,7 @@
#include <linux/pci.h>
#include <linux/init.h>
#include <linux/acpi.h>
+#include <asm/e820.h>
#include "pci.h"

#define mmcfg_virt_addr ((void __iomem *) fix_to_virt(FIX_PCIE_MCFG))
@@ -183,6 +184,17 @@ static int __init pci_mmcfg_init(void)
(pci_mmcfg_config[0].base_address == 0))
goto out;

+ /*
+ * several bioses have a buggy MCFG table. While this is hard
+ * to test for conclusively, we know the value is defective
+ * if the memory isn't marked reserved in the e820 table
+ */
+ if (!is_e820_reserved(pci_mmcfg_config[0].base_address)) {
+ printk(KERN_INFO "PCI: BIOS Bug: MCFG area is not reserved\n");
+ printk(KERN_INFO "PCI: Not using MMCONFIG.\n");
+ goto out;
+ }
+
printk(KERN_INFO "PCI: Using MMCONFIG\n");
raw_pci_ops = &pci_mmcfg;
pci_probe = (pci_probe & ~PCI_PROBE_MASK) | PCI_PROBE_MMCONF;
diff -purN linux-2.6.16/arch/x86_64/kernel/e820.c linux-2.6.16-mmconfig/arch/x86_64/kernel/e820.c
--- linux-2.6.16/arch/x86_64/kernel/e820.c 2006-03-20 06:53:29.000000000 +0100
+++ linux-2.6.16-mmconfig/arch/x86_64/kernel/e820.c 2006-03-23 17:27:04.000000000 +0100
@@ -639,3 +639,25 @@ __init void e820_setup_gap(void)
printk(KERN_INFO "Allocating PCI resources starting at %lx (gap: %lx:%lx)\n",
pci_mem_start, gapstart, gapsize);
}
+
+/*
+ * Check if an address is reserved in the e820 map
+ */
+int is_e820_reserved(u64 address)
+{
+ int i;
+ i = e820.nr_map;
+ while (--i >= 0) {
+ unsigned long long start = e820.map[i].addr;
+ unsigned long long end = start + e820.map[i].size;
+
+ if (address <=end && address >= start) {
+ if (e820.map[i].type == E820_RESERVED)
+ return 1;
+ else
+ return 0;
+ }
+ }
+ return 0;
+}
+
diff -purN linux-2.6.16/arch/x86_64/pci/mmconfig.c linux-2.6.16-mmconfig/arch/x86_64/pci/mmconfig.c
--- linux-2.6.16/arch/x86_64/pci/mmconfig.c 2006-03-20 06:53:29.000000000 +0100
+++ linux-2.6.16-mmconfig/arch/x86_64/pci/mmconfig.c 2006-03-23 17:34:06.000000000 +0100
@@ -9,6 +9,7 @@
#include <linux/init.h>
#include <linux/acpi.h>
#include <linux/bitmap.h>
+#include <asm/e820.h>
#include "pci.h"

#define MMCONFIG_APER_SIZE (256*1024*1024)
@@ -161,6 +162,19 @@ static int __init pci_mmcfg_init(void)
(pci_mmcfg_config[0].base_address == 0))
return 0;

+
+ /*
+ * several bioses have a buggy MCFG table. While this is hard
+ * to test for conclusively, we know the value is defective
+ * if the memory isn't marked reserved in the e820 table
+ */
+ if (!is_e820_reserved(pci_mmcfg_config[0].base_address)) {
+ printk(KERN_INFO "PCI: BIOS Bug: MCFG area is not reserved\n");
+ printk(KERN_INFO "PCI: Not using MMCONFIG.\n");
+ return 0;
+ }
+
+
/* RED-PEN i386 doesn't do _nocache right now */
pci_mmcfg_virt = kmalloc(sizeof(*pci_mmcfg_virt) * pci_mmcfg_config_num, GFP_KERNEL);
if (pci_mmcfg_virt == NULL) {
diff -purN linux-2.6.16/include/asm-i386/e820.h linux-2.6.16-mmconfig/include/asm-i386/e820.h
--- linux-2.6.16/include/asm-i386/e820.h 2006-03-20 06:53:29.000000000 +0100
+++ linux-2.6.16-mmconfig/include/asm-i386/e820.h 2006-03-23 17:26:13.000000000 +0100
@@ -35,6 +35,9 @@ struct e820map {
};

extern struct e820map e820;
+
+extern int is_e820_reserved(u64 address);
+
#endif/*!__ASSEMBLY__*/

#endif/*__E820_HEADER*/
diff -purN linux-2.6.16/include/asm-x86_64/e820.h linux-2.6.16-mmconfig/include/asm-x86_64/e820.h
--- linux-2.6.16/include/asm-x86_64/e820.h 2006-03-20 06:53:29.000000000 +0100
+++ linux-2.6.16-mmconfig/include/asm-x86_64/e820.h 2006-03-23 17:27:41.000000000 +0100
@@ -47,6 +47,7 @@ extern void contig_e820_setup(void);
extern unsigned long e820_end_of_ram(void);
extern void e820_reserve_resources(void);
extern void e820_print_map(char *who);
+extern int is_e820_reserved(u64 address);
extern int e820_mapped(unsigned long start, unsigned long end, unsigned type);

extern void e820_bootmem_free(pg_data_t *pgdat, unsigned long start,unsigned long end);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Andi Kleen

unread,
Mar 23, 2006, 1:40:19 PM3/23/06
to
On Thursday 23 March 2006 19:22, Arjan van de Ven wrote:
> Hi,
>
> There have been several machines that don't have a working MMCONFIG,
> often because of a buggy MCFG table in the ACPI bios. This patch adds a
> simple sanity check that detects a whole bunch of these cases, and when
> it detects it, linux now boots rather than crash-and-burns.

Yes, MCFG is a pain recently. Looks like we did the grave mistake
of using something in the BIOS before Windows again.

> +
> +/*
> + * Check if an address is reserved in the e820 map
> + */
> +int is_e820_reserved(u64 address)
> +{
> + int i;
> + i = e820.nr_map;
> + while (--i >= 0) {
> + unsigned long long start = e820.map[i].addr;
> + unsigned long long end = start + e820.map[i].size;
> +
> + if (address <=end && address >= start) {
> + if (e820.map[i].type == E820_RESERVED)
> + return 1;
> + else
> + return 0;
> + }
> + }
> + return 0;
> +}

That is e820_mapped(address, address+size, E820_RESERVED)

And not having a size is definitely wrong on i386 too.

-Andi

Arjan van de Ven

unread,
Mar 23, 2006, 2:10:12 PM3/23/06
to

> That is e820_mapped(address, address+size, E820_RESERVED)
>
> And not having a size is definitely wrong on i386 too.

s/wrong/not selective enough/

and e820_mapped doesn't check this either anyway, at least not the way
you imply it does.

I'll do a new patch using this for x86_64 though, no need to make a
second function like this.

Arjan van de Ven

unread,
Mar 23, 2006, 2:20:16 PM3/23/06
to
On Thu, 2006-03-23 at 20:02 +0100, Arjan van de Ven wrote:
> > That is e820_mapped(address, address+size, E820_RESERVED)
> >
> > And not having a size is definitely wrong on i386 too.
>
> s/wrong/not selective enough/
>
> and e820_mapped doesn't check this either anyway, at least not the way
> you imply it does.
>
> I'll do a new patch using this for x86_64 though, no need to make a
> second function like this.

There have been several machines that don't have a working MMCONFIG,
often because of a buggy MCFG table in the ACPI bios. This patch adds a
simple sanity check that detects a whole bunch of these cases, and when

it detects it, linux now boots rather than crash-and-burns. The accuracy
of this detection can in principle be improved if there was a "is this
entire range in e820 with THIS attribute", but no such function exist
and the complexity needed for this is not really worth it; this simple
check already catches most cases anyway.

Signed-off-by: Arjan van de Ven <ar...@linux.intel.com>

diff -purN linux-2.6.16/arch/i386/kernel/setup.c linux-2.6.16-mmconfig/arch/i386/kernel/setup.c
--- linux-2.6.16/arch/i386/kernel/setup.c 2006-03-20 06:53:29.000000000 +0100

+++ linux-2.6.16-mmconfig/arch/i386/kernel/setup.c 2006-03-23 20:06:22.000000000 +0100
@@ -1377,6 +1377,27 @@ static void __init register_memory(void)
pci_mem_start, gapstart, gapsize);
}


+/*
+ * Check if an address is reserved in the e820 map
+ */
+int is_e820_reserved(u64 address)
+{
+ int i;
+ i = e820.nr_map;
+ while (--i >= 0) {
+ unsigned long long start = e820.map[i].addr;
+ unsigned long long end = start + e820.map[i].size;
+
+ if (address <=end && address >= start) {
+ if (e820.map[i].type == E820_RESERVED)
+ return 1;
+ else
+ return 0;
+ }
+ }
+ return 0;
+}

+
/* Use inline assembly to define this because the nops are defined
as inline assembly strings in the include files and we cannot
get them easily into strings. */
diff -purN linux-2.6.16/arch/i386/pci/mmconfig.c linux-2.6.16-mmconfig/arch/i386/pci/mmconfig.c
--- linux-2.6.16/arch/i386/pci/mmconfig.c 2006-03-20 06:53:29.000000000 +0100

+++ linux-2.6.16-mmconfig/arch/i386/pci/mmconfig.c 2006-03-23 20:06:22.000000000 +0100

+++ linux-2.6.16-mmconfig/arch/x86_64/kernel/e820.c 2006-03-23 20:05:33.000000000 +0100
@@ -80,6 +80,11 @@ static inline int bad_addr(unsigned long
return 0;
}

+/*
+ * This function returns 1 if any part of the <start, end> range is in the
+ * E820 map having "type". There may be parts in this range that are not in
+ * E820 at all and/or parts with different types in addition.
+ */
int __init e820_mapped(unsigned long start, unsigned long end, unsigned type)
{
int i;


diff -purN linux-2.6.16/arch/x86_64/pci/mmconfig.c linux-2.6.16-mmconfig/arch/x86_64/pci/mmconfig.c
--- linux-2.6.16/arch/x86_64/pci/mmconfig.c 2006-03-20 06:53:29.000000000 +0100

+++ linux-2.6.16-mmconfig/arch/x86_64/pci/mmconfig.c 2006-03-23 20:08:21.000000000 +0100


@@ -9,6 +9,7 @@
#include <linux/init.h>
#include <linux/acpi.h>
#include <linux/bitmap.h>
+#include <asm/e820.h>
#include "pci.h"

#define MMCONFIG_APER_SIZE (256*1024*1024)
@@ -161,6 +162,19 @@ static int __init pci_mmcfg_init(void)
(pci_mmcfg_config[0].base_address == 0))
return 0;

+ /*
+ * several bioses have a buggy MCFG table. While this is hard
+ * to test for conclusively, we know the value is defective
+ * if the memory isn't marked reserved in the e820 table
+ */

+ if (!e820_mapped(pci_mmcfg_config[0].base_address,
+ pci_mmcfg_config[0].base_address + MMCONFIG_APER_SIZE,
+ E820_RESERVED)) {


+ printk(KERN_INFO "PCI: BIOS Bug: MCFG area is not reserved\n");
+ printk(KERN_INFO "PCI: Not using MMCONFIG.\n");

+ return 0;
+ }
+

/* RED-PEN i386 doesn't do _nocache right now */
pci_mmcfg_virt = kmalloc(sizeof(*pci_mmcfg_virt) * pci_mmcfg_config_num, GFP_KERNEL);
if (pci_mmcfg_virt == NULL) {
diff -purN linux-2.6.16/include/asm-i386/e820.h linux-2.6.16-mmconfig/include/asm-i386/e820.h
--- linux-2.6.16/include/asm-i386/e820.h 2006-03-20 06:53:29.000000000 +0100

+++ linux-2.6.16-mmconfig/include/asm-i386/e820.h 2006-03-23 20:06:22.000000000 +0100


@@ -35,6 +35,9 @@ struct e820map {
};

extern struct e820map e820;
+
+extern int is_e820_reserved(u64 address);
+
#endif/*!__ASSEMBLY__*/

#endif/*__E820_HEADER*/

-

Andi Kleen

unread,
Mar 24, 2006, 7:20:12 AM3/24/06
to
Arjan van de Ven <ar...@linux.intel.com> writes:

> On Thu, 2006-03-23 at 20:02 +0100, Arjan van de Ven wrote:
> > > That is e820_mapped(address, address+size, E820_RESERVED)
> > >
> > > And not having a size is definitely wrong on i386 too.
> >
> > s/wrong/not selective enough/
> >
> > and e820_mapped doesn't check this either anyway, at least not the way
> > you imply it does.
> >
> > I'll do a new patch using this for x86_64 though, no need to make a
> > second function like this.
>
>
> There have been several machines that don't have a working MMCONFIG,
> often because of a buggy MCFG table in the ACPI bios. This patch adds a
> simple sanity check that detects a whole bunch of these cases, and when
> it detects it, linux now boots rather than crash-and-burns. The accuracy
> of this detection can in principle be improved if there was a "is this
> entire range in e820 with THIS attribute", but no such function exist
> and the complexity needed for this is not really worth it; this simple
> check already catches most cases anyway.

I added the patch to my patchkit now. I also have an older patch (needs a bit
more cleanup) that checks for all busses if they are reachable using MCFG
Still needs some more work and interaction check with PCI hotplug though.


-Andi

Arjan van de Ven

unread,
Mar 24, 2006, 8:40:20 AM3/24/06
to
Andi Kleen wrote:
> I added the patch to my patchkit now.

thanks

> I also have an older patch (needs a bit
> more cleanup) that checks for all busses if they are reachable using MCFG
> Still needs some more work and interaction check with PCI hotplug though.
>

yes more advanced tests are going to be useful; at least this one was simple and catches
a large portion of the problem cases.

Ashok Raj

unread,
Mar 24, 2006, 10:30:20 AM3/24/06
to
On Thu, Mar 23, 2006 at 11:15:19AM -0800, Arjan van de Ven wrote:
>
> >
> > I'll do a new patch using this for x86_64 though, no need to make a
> > second function like this.
>
> int __init e820_mapped(unsigned long start, unsigned long end,
> unsigned type)


Why not use the same type of function like x86_64 as well instead of the newly
added is_820_mapped()? If the purpose of both functions is the same, i386 could benefit
with same style code instead of a slight variant.

--
Cheers,
Ashok Raj
- Open Source Technology Center

Arjan van de Ven

unread,
Mar 24, 2006, 10:30:21 AM3/24/06
to
Ashok Raj wrote:
> On Thu, Mar 23, 2006 at 11:15:19AM -0800, Arjan van de Ven wrote:
>> >
>> > I'll do a new patch using this for x86_64 though, no need to make a
>> > second function like this.
>>
>> int __init e820_mapped(unsigned long start, unsigned long end,
>> unsigned type)
>
>
> Why not use the same type of function like x86_64 as well instead of the newly
> added is_820_mapped()? If the purpose of both functions is the same, i386 could benefit
> with same style code instead of a slight variant.

the purpose is not the same. the e820_mapped function is far less strict in its check
(I'm still afraid it is too weak for this purpose actually)

and it's not is_e820_mapped but is_e820_reserved()

Arjan van de Ven

unread,
Mar 24, 2006, 10:50:09 AM3/24/06
to
Andi Kleen wrote:
> In theory they should be the same. What do you think is different?

in practice the x86-64 version returns "success" if there is one byte in the entire
memory range that complies with the requested type, even if the rest of the range is
of another type. What the ideal is for the purpose here is "is the entire range reserved",
but for now I'll settle for "is the start address reserved".

(and yes you can express the "is the start address reserved" as a question to the current function for
a 1 byte range, I probably should do that I suppose)

Ashok Raj

unread,
Mar 24, 2006, 10:50:09 AM3/24/06
to
On Fri, Mar 24, 2006 at 04:42:17PM +0100, Arjan van de Ven wrote:
> Andi Kleen wrote:
> > In theory they should be the same. What do you think is different?
>
> in practice the x86-64 version returns "success" if there is one byte in the entire
> memory range that complies with the requested type, even if the rest of the range is
> of another type. What the ideal is for the purpose here is "is the entire range reserved",
> but for now I'll settle for "is the start address reserved".
>
> (and yes you can express the "is the start address reserved" as a question to the current function for
> a 1 byte range, I probably should do that I suppose)

or why not check

if (type == ei->type && start >= ei->addr && end <= (ei->addr + ei->size))
return 1;

will this make the range check check stricter?


--
Cheers,
Ashok Raj
- Open Source Technology Center

Andi Kleen

unread,
Mar 24, 2006, 10:50:10 AM3/24/06
to
On Friday 24 March 2006 16:42, Arjan van de Ven wrote:
> Andi Kleen wrote:
> > In theory they should be the same. What do you think is different?
>
> in practice the x86-64 version returns "success" if there is one byte in the entire
> memory range that complies with the requested type, even if the rest of the range is
> of another type.

I would consider that a bug. Please send fix.

-Andi

Andi Kleen

unread,
Mar 24, 2006, 10:50:10 AM3/24/06
to
On Friday 24 March 2006 16:24, Arjan van de Ven wrote:
> Ashok Raj wrote:
> > On Thu, Mar 23, 2006 at 11:15:19AM -0800, Arjan van de Ven wrote:
> >> >
> >> > I'll do a new patch using this for x86_64 though, no need to make a
> >> > second function like this.
> >>
> >> int __init e820_mapped(unsigned long start, unsigned long end,
> >> unsigned type)
> >
> >
> > Why not use the same type of function like x86_64 as well instead of the newly
> > added is_820_mapped()? If the purpose of both functions is the same, i386 could benefit
> > with same style code instead of a slight variant.
>
> the purpose is not the same. the e820_mapped function is far less strict in its check
> (I'm still afraid it is too weak for this purpose actually)

In theory they should be the same. What do you think is different?

>

> and it's not is_e820_mapped but is_e820_reserved()

That's just a special case.

-Andi

Arjan van de Ven

unread,
Mar 24, 2006, 11:00:17 AM3/24/06
to
Ashok Raj wrote:
> On Fri, Mar 24, 2006 at 04:42:17PM +0100, Arjan van de Ven wrote:
>> Andi Kleen wrote:
>>> In theory they should be the same. What do you think is different?
>> in practice the x86-64 version returns "success" if there is one byte in the entire
>> memory range that complies with the requested type, even if the rest of the range is
>> of another type. What the ideal is for the purpose here is "is the entire range reserved",
>> but for now I'll settle for "is the start address reserved".
>>
>> (and yes you can express the "is the start address reserved" as a question to the current function for
>> a 1 byte range, I probably should do that I suppose)
>
> or why not check
>
> if (type == ei->type && start >= ei->addr && end <= (ei->addr + ei->size))
> return 1;
>
> will this make the range check check stricter?

that's not going to cut it; you can have 2 e820 entries that together span the range

Arjan van de Ven

unread,
Mar 24, 2006, 11:00:20 AM3/24/06
to
Andi Kleen wrote:
> On Friday 24 March 2006 16:42, Arjan van de Ven wrote:
>> Andi Kleen wrote:
>>> In theory they should be the same. What do you think is different?
>> in practice the x86-64 version returns "success" if there is one byte in the entire
>> memory range that complies with the requested type, even if the rest of the range is
>> of another type.
>
> I would consider that a bug. Please send fix.

I'm less sure. It's what the function does, and I can see very valid usage models for it;
to detect that a certain type is NOT present. And the code is clearly written with that goal
in mind at least. I'm tempted to write a real range function but I also was hoping to avoid
doing that, since for the MCFG test it really is a bit overkill.

Greg KH

unread,
Mar 24, 2006, 4:30:27 PM3/24/06
to
On Fri, Mar 24, 2006 at 01:19:02PM +0100, Andi Kleen wrote:
> Arjan van de Ven <ar...@linux.intel.com> writes:
>
> > On Thu, 2006-03-23 at 20:02 +0100, Arjan van de Ven wrote:
> > > > That is e820_mapped(address, address+size, E820_RESERVED)
> > > >
> > > > And not having a size is definitely wrong on i386 too.
> > >
> > > s/wrong/not selective enough/
> > >
> > > and e820_mapped doesn't check this either anyway, at least not the way
> > > you imply it does.
> > >
> > > I'll do a new patch using this for x86_64 though, no need to make a
> > > second function like this.
> >
> >
> > There have been several machines that don't have a working MMCONFIG,
> > often because of a buggy MCFG table in the ACPI bios. This patch adds a
> > simple sanity check that detects a whole bunch of these cases, and when
> > it detects it, linux now boots rather than crash-and-burns. The accuracy
> > of this detection can in principle be improved if there was a "is this
> > entire range in e820 with THIS attribute", but no such function exist
> > and the complexity needed for this is not really worth it; this simple
> > check already catches most cases anyway.
>
> I added the patch to my patchkit now. I also have an older patch (needs a bit
> more cleanup) that checks for all busses if they are reachable using MCFG
> Still needs some more work and interaction check with PCI hotplug though.

If you need help with that, please let me know.

Otherwise I'll let you push this to Linus when you feel it's ready :)

thanks,

greg k-h

0 new messages