Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH] proper bios handoff in ehci-hcd

4 views
Skip to first unread message

Gary_L...@dell.com

unread,
Jun 14, 2004, 4:40:16 PM6/14/04
to
Stuart Hayes here at Dell has identified this or/and mix-up in the ehci-hcd driver. Because of this, ehci-hcd is not properly released by BIOSes supporting full 2.0 and port behavior can then become erratic.

This is broken in latest 2.4 and 2.6.

Gary Lerhaupt
Dell Linux Development
http://linux.dell.com

--- linux/drivers/usb/host/ehci-hcd.c.orig 2004-06-05 03:12:18.000000000 -0500
+++ linux/drivers/usb/host/ehci-hcd.c 2004-06-05 01:18:51.000000000 -0500
@@ -290,7 +290,7 @@ static int bios_handoff (struct ehci_hcd
int msec = 500;

/* request handoff to OS */
- cap &= 1 << 24;
+ cap |= 1 << 24;
pci_write_config_dword (ehci->hcd.pdev, where, cap);

/* and wait a while for it to happen */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

David Brownell

unread,
Jun 15, 2004, 10:10:12 AM6/15/04
to
Gary_L...@Dell.com wrote:
> Stuart Hayes here at Dell has identified this or/and mix-up in the
> ehci-hcd driver. Because of this, ehci-hcd is not properly released by
> BIOSes supporting full 2.0 and port behavior can then become erratic.

Good patch, it should be merged. That handoff code actually
predates general availability of BIOSes handling _any_ EHCI
controllers, and your patch resolves a problem I'd seen on a
newish board but hadn't yet had time to track down (beyond
knowing that broken BIOS handoff was the issue).

Thanks to you and Stuart!

- Dave

Olaf Hering

unread,
Jul 13, 2004, 2:10:10 PM7/13/04
to
On Tue, Jun 15, David Brownell wrote:

> Gary_L...@Dell.com wrote:
> >Stuart Hayes here at Dell has identified this or/and mix-up in the
> >ehci-hcd driver. Because of this, ehci-hcd is not properly released by
> >BIOSes supporting full 2.0 and port behavior can then become erratic.
>
> Good patch, it should be merged. That handoff code actually
> predates general availability of BIOSes handling _any_ EHCI
> controllers, and your patch resolves a problem I'd seen on a
> newish board but hadn't yet had time to track down (beyond
> knowing that broken BIOS handoff was the issue).

David,

there are 2 reports about breakage by this patch. One is on lkml, and
another one is in my bugzilla inbox. How can we fix that one? I assume
that handoff patch is correct.

<6>NET: Registered protocol family 17
<3>ehci_hcd 0000:00:1d.7: BIOS handoff failed (104, 1010001)
<3>ehci_hcd 0000:00:1d.7: can't reset
<3>ehci_hcd 0000:00:1d.7: init 0000:00:1d.7 fail, -95
<4>ehci_hcd: probe of 0000:00:1d.7 failed with error -95

this is a FSC Amilo D7830 notebook, the guy on lkml has a Asus P4P800 board.

--
USB is for mice, FireWire is for men!

sUse lINUX ag, nÜRNBERG

David Brownell

unread,
Jul 13, 2004, 4:40:04 PM7/13/04
to
Hi Olaf,

Olaf Hering wrote:
>
> there are 2 reports about breakage by this patch. One is on lkml, and
> another one is in my bugzilla inbox. How can we fix that one? I assume
> that handoff patch is correct.

The only question I have about it right now is whether
it might not be more correct to use a _byte_ access to
set the "Host OS wants controller" flag. It looks to me
like it does the right thing, per EHCI 1.0 section 5.1,
though maybe 500 msec is too short a period to wait.
See if 5000 msec helps.

The 0x01010001 flag is pretty clearly trouble, and
says that the BIOS hasn't reacted to the request.
Maybe it's polling at some rate slower than 2x/second
on those machines ... or maybe this is just a bios bug.

In this case, one could just look at byte 106 (104 + 2)
of pci config space later to see if it changed after
the 500 msec passed.

- Dave


> <6>NET: Registered protocol family 17
> <3>ehci_hcd 0000:00:1d.7: BIOS handoff failed (104, 1010001)
> <3>ehci_hcd 0000:00:1d.7: can't reset
> <3>ehci_hcd 0000:00:1d.7: init 0000:00:1d.7 fail, -95
> <4>ehci_hcd: probe of 0000:00:1d.7 failed with error -95
>
> this is a FSC Amilo D7830 notebook, the guy on lkml has a Asus P4P800 board.
>

Stuart...@dell.com

unread,
Jul 13, 2004, 5:00:13 PM7/13/04
to
Will Beers wrote:
> > though maybe 500 msec is too short a period to wait.
> > See if 5000 msec helps.
>
> I went all the way up to 20000 msec and it still didn't help. I'm
> sure it's a bad idea, but removing that whole if-block below it makes
> it work (which is effectively what switching the and/or did). I
> don't know enough about it to judge whether it's correct, but what
> exactly is it checking for there?
>
> -Will

Without the patch, Linux would just ignore the BIOS handoff--Linux was
writing "0" to the bit that it was supposed to wait for the BIOS to
clear, so it never waited for the BIOS to let go of the controller.

I bet you have a bad BIOS that won't hand off, but I would try the other
thing David suggested--change the write to a byte write. It seems
unlikely, but, since Linux is writing a "1" to the "BIOS owns the
controller" bit right now, you might be hitting something like this, if
the system is breaking up the write into multiple smaller writes:

the "OS wants the controller" bit is getting written to 1 (first part of
the Linux write, which the system broke into pieces)
the system BIOS (SMI handler) sees that bit set to 1, and clears the
"BIOS owns" bit
the "BIOS owns" bit is getting written back to a 1 (the second part of
the Linux write)
Linux waits in vain for BIOS to clear the "BIOS owns" bit\

Again, seems unlikely, but worth a try if you're recompiling and
testing.

David Brownell

unread,
Jul 13, 2004, 5:20:12 PM7/13/04
to
Will Beers wrote:
> > though maybe 500 msec is too short a period to wait.
> > See if 5000 msec helps.
>
> I went all the way up to 20000 msec and it still didn't help. I'm sure
> it's a bad idea, but removing that whole if-block below it makes it work
> (which is effectively what switching the and/or did). I don't know
> enough about it to judge whether it's correct, but what exactly is it
> checking for there?

There are two flags in adjacent bytes of pci config space.
State transitions are shown in the spec [1] (simple state
diagrams), but basically your hardware started out in a
"BIOS owned" mode, and we want Linux to run it instead.
So we change (0,1) to (1,1), then BIOS should get an IRQ
before changing it to (1,0) and ignoring EHCI ... it's not.

Sounds to me like your BIOS may be broken. But if you're
up for it, you could try using byte access to write that one
flag byte; I could also believe some hardware won't issue the
SMI interrupt without that. There are also a lot of bits in
the next word, which might let you stomp on on the BIOS in
constructive useful ways.

- Dave

[1] http://www.usb.org/developers/docs/ at the very
bottom of the page, for that part you won't need
to know anything else about USB.

David Brownell

unread,
Jul 13, 2004, 5:40:10 PM7/13/04
to
Will Beers wrote:
> > Sounds to me like your BIOS may be broken. But if you're
> > up for it, you could try using byte access to write that one
>
> Changing the pci_read_config to a byte access fixes it, thanks!

You're reading byte 0 not byte 2 of that field ... I meant
more like the attached patch to _write_ the flag (untested).


> - pci_read_config_dword(pdev, where, &cap);
> + pci_read_config_byte(pdev, where, &cap);

Diff

Pete Zaitcev

unread,
Jul 13, 2004, 6:10:09 PM7/13/04
to
On Tue, 13 Jul 2004 15:52:43 -0500
<Stuart...@Dell.com> wrote:

> the "OS wants the controller" bit is getting written to 1 (first part of
> the Linux write, which the system broke into pieces)

If something breaks word writes into pieces, all hell breaks lose.
I don't believe it can happen.

I hit regressions when we implemented the proper handoff as requested
by Stuart @Dell, so I think for the moment the right thing would be this:

--- linux-2.4.21-15.18.EL/drivers/usb/host/ehci-hcd.c 2004-07-01
08:07:56.000000000 -0700
+++ linux-2.4.21-15.18-usb/drivers/usb/host/ehci-hcd.c 2004-07-08
15:15:05.944863675 -0700
@@ -302,7 +302,8 @@
if (cap & (1 << 16)) {
ehci_err (ehci, "BIOS handoff failed (%d, %04x)\n",
where, cap);
- return 1;
+ pci_write_config_dword (ehci->hcd.pdev, where, 0);
+ return 0;
}
ehci_dbg (ehci, "BIOS handoff succeeded\n");
}

Essentially, here I insist on doing the right thing with cap|=(1<<24),
which fixes Dell boxes which implement proper handoff, but then if we
time out as on Thinkpads, write zero as the old code did (probably
pointless, but just to be safe) and continue.

David, any comment?

-- Pete

David Brownell

unread,
Jul 13, 2004, 7:10:12 PM7/13/04
to
Pete Zaitcev wrote:

> I hit regressions when we implemented the proper handoff as requested
> by Stuart @Dell, so I think for the moment the right thing would be this:
>
> --- linux-2.4.21-15.18.EL/drivers/usb/host/ehci-hcd.c 2004-07-01
> 08:07:56.000000000 -0700
> +++ linux-2.4.21-15.18-usb/drivers/usb/host/ehci-hcd.c 2004-07-08
> 15:15:05.944863675 -0700
> @@ -302,7 +302,8 @@
> if (cap & (1 << 16)) {
> ehci_err (ehci, "BIOS handoff failed (%d, %04x)\n",
> where, cap);
> - return 1;
> + pci_write_config_dword (ehci->hcd.pdev, where, 0);
> + return 0;
> }
> ehci_dbg (ehci, "BIOS handoff succeeded\n");
> }
>
> Essentially, here I insist on doing the right thing with cap|=(1<<24),
> which fixes Dell boxes which implement proper handoff, but then if we
> time out as on Thinkpads, write zero as the old code did (probably
> pointless, but just to be safe) and continue.

I'd rather not change the config space again ... that's clearly wrong.
Or is there some policy about what sorts of BIOS bugs we should assume?

Instead, how about: (a) longer timeout, 5 seconds to match OHCI's
absurdly long default there; (b) change that "handoff failed" message
to add "continuing anyway"; and (c) return 0 as you do, which I'm
expecting is the key part of that patch.

That'll evidently work for Will, as well as correctly functioning hardware
with EHCI-aware BIOS (the Dell boxes and the AMI BIOS box I tested) also
the classic EHCI-unaware BIOS setups.

- Dave

Olaf Hering

unread,
Jul 15, 2004, 5:50:04 AM7/15/04
to
On Tue, Jul 13, David Brownell wrote:

> Instead, how about: (a) longer timeout, 5 seconds to match OHCI's
> absurdly long default there; (b) change that "handoff failed" message
> to add "continuing anyway"; and (c) return 0 as you do, which I'm
> expecting is the key part of that patch.

This patch works for me:

diff -purN linux-2.6.8-rc1-bk3.bios-handoff/drivers/usb/host/ehci-hcd.c linux-2.6.8-rc1-bk3/drivers/usb/host/ehci-hcd.c
--- linux-2.6.8-rc1-bk3.bios-handoff/drivers/usb/host/ehci-hcd.c 2004-07-15 11:24:14.000000000 +0200
+++ linux-2.6.8-rc1-bk3/drivers/usb/host/ehci-hcd.c 2004-07-15 11:32:28.463930957 +0200
@@ -303,9 +303,11 @@ static int bios_handoff (struct ehci_hcd
pci_read_config_dword(pdev, where, &cap);
} while ((cap & (1 << 16)) && msec);


if (cap & (1 << 16)) {

- ehci_err (ehci, "BIOS handoff failed (%d, %04x)\n",
+ ehci_err (ehci, "BIOS handoff failed (%d, %04x)\n"
+ " Devices connected to this controller will not work correctly.\n"
+ " Complain to your BIOS vendor.\n", /* Really! */


where, cap);
- return 1;

+ return 0;
}
ehci_dbg (ehci, "BIOS handoff succeeded\n");
}

--

USB is for mice, FireWire is for men!

sUse lINUX ag, nÜRNBERG

Olaf Hering

unread,
Aug 5, 2004, 9:50:09 AM8/5/04
to
On Thu, Jul 15, Olaf Hering wrote:

> On Tue, Jul 13, David Brownell wrote:
>
> > Instead, how about: (a) longer timeout, 5 seconds to match OHCI's
> > absurdly long default there; (b) change that "handoff failed" message
> > to add "continuing anyway"; and (c) return 0 as you do, which I'm
> > expecting is the key part of that patch.

David, what is the status with this bios problem?
Can a patch like this patch go in?
What could we lose if the error is ignored?

David Brownell

unread,
Aug 5, 2004, 12:10:06 PM8/5/04
to
On Thursday 05 August 2004 06:39, Olaf Hering wrote:
> On Thu, Jul 15, Olaf Hering wrote:
>
> > On Tue, Jul 13, David Brownell wrote:
> >
> > > Instead, how about: (a) longer timeout, 5 seconds to match OHCI's
> > > absurdly long default there; (b) change that "handoff failed" message
> > > to add "continuing anyway"; and (c) return 0 as you do, which I'm
> > > expecting is the key part of that patch.
>
> David, what is the status with this bios problem?
> Can a patch like this patch go in?
> What could we lose if the error is ignored?

I submitted a very similar patch yesterday, not yet merged but
closer to what I described:

http://www.mail-archive.com/linux-usb-devel%40lists.sourceforge.net/msg26725.html

0 new messages