Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH] Fix boot-time hang on G31/G33 PC

10 views
Skip to first unread message

Matthew Wilcox

unread,
Aug 25, 2007, 10:00:14 PM8/25/07
to

This patch, loosely based on a patch from Robert Hancock, which was in
turn based on a patch from Jesse Barnes, fixes a boot-time hang on my
shiny new PC. The 'conflict' mentioned in the patch in my case happens
to be between mmconfig and the graphics card, but it could easily be
between any pair of devices if they are left enabled by the BIOS and
mappen in the wrong place.

Signed-off-by: Matthew Wilcox <mat...@wil.cx>

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 34b8dae..51ef450 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -180,11 +180,26 @@ static inline int is_64bit_memory(u32 mask)
return 0;
}

+/*
+ * Sizing PCI BARs requires us to disable decoding, otherwise we may run
+ * into conflicts with other devices while trying to size the BAR. Normally
+ * this isn't a problem, but it happens on some machines normally, and can
+ * happen on others during PCI device hotplug. Don't disable BARs for host
+ * bridges, though. Some of them do silly things like disable accesses to
+ * RAM from the CPU
+ */
static void pci_read_bases(struct pci_dev *dev, unsigned int howmany, int rom)
{
unsigned int pos, reg, next;
u32 l, sz;
struct resource *res;
+ u16 orig_cmd;
+
+ if ((dev->class >> 8) != PCI_CLASS_BRIDGE_HOST) {
+ pci_read_config_word(dev, PCI_COMMAND, &orig_cmd);
+ pci_write_config_word(dev, PCI_COMMAND,
+ orig_cmd & ~(PCI_COMMAND_MEMORY | PCI_COMMAND_IO));
+ }

for(pos=0; pos<howmany; pos = next) {
u64 l64;
@@ -283,6 +298,9 @@ static void pci_read_bases(struct pci_dev *dev, unsigned int howmany, int rom)
}
}
}
+
+ if ((dev->class >> 8) != PCI_CLASS_BRIDGE_HOST)
+ pci_write_config_word(dev, PCI_COMMAND, orig_cmd);
}

void __devinit pci_read_bridge_bases(struct pci_bus *child)

--
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Robert Hancock

unread,
Aug 26, 2007, 12:30:13 AM8/26/07
to
Matthew Wilcox wrote:
> This patch, loosely based on a patch from Robert Hancock, which was in
> turn based on a patch from Jesse Barnes, fixes a boot-time hang on my
> shiny new PC. The 'conflict' mentioned in the patch in my case happens
> to be between mmconfig and the graphics card, but it could easily be
> between any pair of devices if they are left enabled by the BIOS and
> mappen in the wrong place.
>
> Signed-off-by: Matthew Wilcox <mat...@wil.cx>
>

We've already got a patch for this in Greg's PCI tree, hopefully it
should go in for 2.6.24.

Are you getting MMCONFIG enabled on your system with 2.6.23? If not this
problem shouldn't matter. In the cases I've seen that have caused
problems in the past (Intel boards mainly), where the MMCONFIG area
overlaps with where the graphics card BAR ends up during BAR sizing, the
BIOS happened to not reserve the MMCONFIG table in the E820 memory map,
so current mainline will turn off MMCONFIG. However, it's quite possible
that some systems will pass the old E820 validation check and turn on
MMCONFIG where the overlap happens..

There's a patch in Andi's tree (also hopefully for 2.6.24) to loosen the
MMCONFIG validation to check against ACPI reservations instead of the
E820 map (which isn't required to have a reservation for MMCONFIG). This
makes the disable-decode change more critical.

Matthew Wilcox

unread,
Aug 26, 2007, 9:00:14 AM8/26/07
to
On Sat, Aug 25, 2007 at 10:24:57PM -0600, Robert Hancock wrote:
> We've already got a patch for this in Greg's PCI tree, hopefully it
> should go in for 2.6.24.

I haven't seen it. I guess it wasn't sent to the PCI mailing list.

Your patch had two or three problems with it; assuming we're talking
about the same patch.

> Are you getting MMCONFIG enabled on your system with 2.6.23?

Yes, I have to pass pci=nommconf to make it boot.

--
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

Matthew Wilcox

unread,
Aug 26, 2007, 10:10:06 AM8/26/07
to
On Sun, Aug 26, 2007 at 06:55:42AM -0600, Matthew Wilcox wrote:
> On Sat, Aug 25, 2007 at 10:24:57PM -0600, Robert Hancock wrote:
> > We've already got a patch for this in Greg's PCI tree, hopefully it
> > should go in for 2.6.24.
>
> I haven't seen it. I guess it wasn't sent to the PCI mailing list.
>
> Your patch had two or three problems with it; assuming we're talking
> about the same patch.

I found
http://git.kernel.org/?p=linux/kernel/git/gregkh/patches.git;a=blob;f=pci/pci-disable-decode-of-io-memory-during-bar-sizing.patch;h=958ef4e837733206a80f058aee236847eec5fbd8;hb=HEAD

which has two problems:

- With a 64-bit BAR, it checks to see if the upper 32 bits represent IO
or memory. You can't do that to the upper 32 bits.
- It does a lot of additional writes to the cmd register; my patch does two
writes per device; yours does two per BAR

It's also a lot more complex than my patch, IMO.

Greg, please drop Robert's patch and put mine in instead.

Robert Hancock

unread,
Aug 26, 2007, 2:10:06 PM8/26/07
to
Matthew Wilcox wrote:
> On Sun, Aug 26, 2007 at 06:55:42AM -0600, Matthew Wilcox wrote:
>> On Sat, Aug 25, 2007 at 10:24:57PM -0600, Robert Hancock wrote:
>>> We've already got a patch for this in Greg's PCI tree, hopefully it
>>> should go in for 2.6.24.
>> I haven't seen it. I guess it wasn't sent to the PCI mailing list.
>>
>> Your patch had two or three problems with it; assuming we're talking
>> about the same patch.
>
> I found
> http://git.kernel.org/?p=linux/kernel/git/gregkh/patches.git;a=blob;f=pci/pci-disable-decode-of-io-memory-during-bar-sizing.patch;h=958ef4e837733206a80f058aee236847eec5fbd8;hb=HEAD
>
> which has two problems:
>
> - With a 64-bit BAR, it checks to see if the upper 32 bits represent IO
> or memory. You can't do that to the upper 32 bits.
> - It does a lot of additional writes to the cmd register; my patch does two
> writes per device; yours does two per BAR
>
> It's also a lot more complex than my patch, IMO.
>
> Greg, please drop Robert's patch and put mine in instead.

Based on looking at your patch it seems OK. The first problem you
mentioned with what Jesse and I had there is definitely a valid one.

You can add my:

Acked-by: Robert Hancock <hanc...@shaw.ca>

Jesse Barnes

unread,
Aug 28, 2007, 1:30:14 PM8/28/07
to
> > Greg, please drop Robert's patch and put mine in instead.
>
> Based on looking at your patch it seems OK. The first problem you
> mentioned with what Jesse and I had there is definitely a valid one.
>
> You can add my:
>
> Acked-by: Robert Hancock <hanc...@shaw.ca>

Yeah, thanks Matthew.

Acked-by: Jesse Barnes <jesse....@intel.com>

Message has been deleted

Grant Grundler

unread,
Aug 28, 2007, 2:30:10 PM8/28/07
to
On Tue, Aug 28, 2007 at 11:59:08AM -0600, Grant Grundler wrote:

> On Sat, Aug 25, 2007 at 07:55:56PM -0600, Matthew Wilcox wrote:
> >
> > This patch, loosely based on a patch from Robert Hancock, which was in
> > turn based on a patch from Jesse Barnes, fixes a boot-time hang on my
> > shiny new PC. The 'conflict' mentioned in the patch in my case happens
> > to be between mmconfig and the graphics card, but it could easily be
> > between any pair of devices if they are left enabled by the BIOS and
> > mappen in the wrong place.
>
> This issue has come up before:
> http://www.uwsg.indiana.edu/hypermail/linux/kernel/0212.2/0232.html

Ok...finally found the thread I was looking for:
http://www.ussg.iu.edu/hypermail/linux/kernel/0212.2/0978.html

or look at the "by Thread" page and search for "disable BAR":
http://www.ussg.iu.edu/hypermail/linux/kernel/0212.2/index.html

Main difference now is not disabling anything on any sort of Bridge.

Summary: Sizing BARs has never been a very safe operation. We have to
mitigate as best we can and then live with the remaining risks.

cheers,
grant

Greg KH

unread,
Sep 26, 2007, 5:30:10 PM9/26/07
to
Due to the issues surrounding this patch, I'm dropping it from my repo.

thanks,

greg k-h


On Sat, Aug 25, 2007 at 07:55:56PM -0600, Matthew Wilcox wrote:
>
>

> This patch, loosely based on a patch from Robert Hancock, which was in
> turn based on a patch from Jesse Barnes, fixes a boot-time hang on my
> shiny new PC. The 'conflict' mentioned in the patch in my case happens
> to be between mmconfig and the graphics card, but it could easily be
> between any pair of devices if they are left enabled by the BIOS and
> mappen in the wrong place.
>
> Signed-off-by: Matthew Wilcox <mat...@wil.cx>

> Acked-by: Robert Hancock <hanc...@shaw.ca>
> Acked-by: Jesse Barnes <jesse....@intel.com>
> Signed-off-by: Greg Kroah-Hartman <gre...@suse.de>
>
> ---
> drivers/pci/probe.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)


>
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -180,11 +180,26 @@ static inline int is_64bit_memory(u32 ma

> return 0;
> }
>
> +/*
> + * Sizing PCI BARs requires us to disable decoding, otherwise we may run
> + * into conflicts with other devices while trying to size the BAR. Normally
> + * this isn't a problem, but it happens on some machines normally, and can
> + * happen on others during PCI device hotplug. Don't disable BARs for host
> + * bridges, though. Some of them do silly things like disable accesses to
> + * RAM from the CPU
> + */
> static void pci_read_bases(struct pci_dev *dev, unsigned int howmany, int rom)
> {
> unsigned int pos, reg, next;
> u32 l, sz;
> struct resource *res;
> + u16 orig_cmd;
> +
> + if ((dev->class >> 8) != PCI_CLASS_BRIDGE_HOST) {
> + pci_read_config_word(dev, PCI_COMMAND, &orig_cmd);
> + pci_write_config_word(dev, PCI_COMMAND,
> + orig_cmd & ~(PCI_COMMAND_MEMORY | PCI_COMMAND_IO));
> + }
>
> for(pos=0; pos<howmany; pos = next) {
> u64 l64;
> @@ -283,6 +298,9 @@ static void pci_read_bases(struct pci_de
> }
> }
> }

> +
> + if ((dev->class >> 8) != PCI_CLASS_BRIDGE_HOST)
> + pci_write_config_word(dev, PCI_COMMAND, orig_cmd);
> }
>

> void pci_read_bridge_bases(struct pci_bus *child)

Jesse Barnes

unread,
Sep 26, 2007, 6:00:23 PM9/26/07
to
On Wednesday, September 26, 2007 2:18 pm Greg KH wrote:
> Due to the issues surrounding this patch, I'm dropping it from my
> repo.

What issues? Is it causing problems for people?

Jesse

Greg KH

unread,
Sep 26, 2007, 6:10:14 PM9/26/07
to
On Wed, Sep 26, 2007 at 02:55:55PM -0700, Jesse Barnes wrote:
> On Wednesday, September 26, 2007 2:18 pm Greg KH wrote:
> > Due to the issues surrounding this patch, I'm dropping it from my
> > repo.
>
> What issues? Is it causing problems for people?

I thought this was the patch that Ivan objected to.

thanks,

greg k-h

Jesse Barnes

unread,
Sep 26, 2007, 6:30:18 PM9/26/07
to
On Wednesday, September 26, 2007 2:56 pm Greg KH wrote:
> On Wed, Sep 26, 2007 at 02:55:55PM -0700, Jesse Barnes wrote:
> > On Wednesday, September 26, 2007 2:18 pm Greg KH wrote:
> > > Due to the issues surrounding this patch, I'm dropping it from my
> > > repo.
> >
> > What issues? Is it causing problems for people?
>
> I thought this was the patch that Ivan objected to.

Yeah, Ivan objected to this, but incorrectly I think.

Ivan, your concern is about disabling things like interrupt controllers
and power management chips during probe right? You're right that doing
that could cause problems if we get and interrupt or PMU event at just
the wrong time, but that could just as easily happen if decode was
still enabled but the BAR had a bogus address programmed (as it would
during probing).

Ultimately, I don't care much one way or another as long as we can get
the desktop platforms fixed somehow. I think disabling decode is the
most correct way of doing this, but I'm open to other solutions (this
is the only patch I've seen though that's been tested to solve the
problem).

Jesse

Robert Hancock

unread,
Sep 26, 2007, 7:10:09 PM9/26/07
to
Jesse Barnes wrote:
> On Wednesday, September 26, 2007 2:56 pm Greg KH wrote:
>> On Wed, Sep 26, 2007 at 02:55:55PM -0700, Jesse Barnes wrote:
>>> On Wednesday, September 26, 2007 2:18 pm Greg KH wrote:
>>>> Due to the issues surrounding this patch, I'm dropping it from my
>>>> repo.
>>> What issues? Is it causing problems for people?
>> I thought this was the patch that Ivan objected to.
>
> Yeah, Ivan objected to this, but incorrectly I think.
>
> Ivan, your concern is about disabling things like interrupt controllers
> and power management chips during probe right? You're right that doing
> that could cause problems if we get and interrupt or PMU event at just
> the wrong time, but that could just as easily happen if decode was
> still enabled but the BAR had a bogus address programmed (as it would
> during probing).
>
> Ultimately, I don't care much one way or another as long as we can get
> the desktop platforms fixed somehow. I think disabling decode is the
> most correct way of doing this, but I'm open to other solutions (this
> is the only patch I've seen though that's been tested to solve the
> problem).

I agree, I've yet to see a single report of an actual problem that
disabling the decode causes, nor even a theoretical problem which
couldn't already happen due to the possibility of the device being
accessed during BAR sizing, which just shouldn't be allowed since it
can't work with or without this patch.

Until and unless we have something better that fixes the real and
serious boot-time problems that we need this patch to fix, I would say
it should stay in.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hanc...@nospamshaw.ca
Home Page: http://www.roberthancock.com/

Ivan Kokshaysky

unread,
Sep 27, 2007, 10:40:14 AM9/27/07
to
On Wed, Sep 26, 2007 at 03:20:40PM -0700, Jesse Barnes wrote:
> Ivan, your concern is about disabling things like interrupt controllers
> and power management chips during probe right? You're right that doing
> that could cause problems if we get and interrupt or PMU event at just
> the wrong time, but that could just as easily happen if decode was
> still enabled but the BAR had a bogus address programmed (as it would
> during probing).

Yes, nobody is arguing that moving the BAR around is unsafe, but generally
it's the less of two evils.

The major problem here is that with IO and MEM bits cleared in the command
register you disable *all* address decoders on the device, not just ranges
that have respective BARs. At least this behaviour is required by PCI spec.
Examples:
- legacy VGA IO and memory (no corresponding BARs);
- base/limit registers of P2P bridge;
- PMU and SMBus registers (sort of normal BARs, but hidden elsewhere
in the config space);
- IDE legacy mode registers;
- IO-APIC registers (typically sort of read-only BAR).

For all of these address ranges our current BAR probe is effectively
a no-op, but disable/re-enable clearly isn't.

> Ultimately, I don't care much one way or another as long as we can get
> the desktop platforms fixed somehow. I think disabling decode is the
> most correct way of doing this, but I'm open to other solutions (this
> is the only patch I've seen though that's been tested to solve the
> problem).

There are two other solutions: one is to disable decode selectively,
only on devices or systems where it's necessary and known to be safe.
I've posted a patch which introduces "disable_while_probe" pci_dev field
for that purpose.
Another one is to delay mmconfig probe until after the PCI probe is done,
as Matthew suggested, and Robert confirmed that it's feasible.

Ivan.

Kok, Auke

unread,
Sep 27, 2007, 2:40:14 PM9/27/07
to


for everyone who's using this quirk or has the same boot issue: I just confirmed
that the new dg33tl bios update v0287 (released 9/20) fixes the boot issue for my
systems. I encourage everyone to update their BIOS image and see if this works.

Cheers,

Auke

Greg KH

unread,
Sep 28, 2007, 1:20:21 PM9/28/07
to

Thanks for letting us know. So, another reason to drop this patch :)

thanks,

greg k-h

Vitaliy Gusev

unread,
Oct 12, 2007, 10:30:12 AM10/12/07
to

No, it still doesn't work even with a latest BIOS verstion. I have a computer
with Intel DQ35MP motherboard. I upgraded BIOS to rev. 0696, released
Oct 1. But kernel (2.6.22.5) still hangs up. Kernel with this fix patch boots
and works fine.

> Thanks for letting us know. So, another reason to drop this patch :)
>
> thanks,
>
> greg k-h
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majo...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Thanks,
Vitaliy Gusev

Kok, Auke

unread,
Oct 12, 2007, 1:20:19 PM10/12/07
to

please note that your BIOS is completely different than the one posted above. This
may be a clue.

interestingly enough I can attest (somewhat) to this. After the BIOS upgrade
2.6.22 booted just fine without pci=nommconf, but I've had several boot lockups
with 2.6.23-rc8.

So, the issue is still open and Matthew/Jesse's suggested fix becomes much more
attractive to me now.

Greg, I suggest putting this patch back in the "grey" zone

Auke

0 new messages