Patch to fix/shorten "wi" freezes

ied...@maths.tcd.ie

unread,

Oct 27, 2002, 1:00:55 PM10/27/02

to

The wi driver causes quite long system freezes both when the pccard
is removed, and also if the hardware becomes confused. I've found
on -current that sometimes the whole machine can become unresponsive
for a period of minutes with messages such as:

wi0: timeout in wi_cmd 0x0002; event status 0x8080
wi0: timeout in wi_cmd 0x0000; event status 0x8080
wi0: wi_cmd: busy bit won't clear.
wi0: wi_cmd: busy bit won't clear.
wi0: init failed
wi0: wi_cmd: busy bit won't clear.
... <repeated 20 times>
wi0: wi_cmd: busy bit won't clear.
wi0: failed to allocate 1594 bytes on NIC
wi0: tx buffer allocation failed
wi0: wi_cmd: busy bit won't clear.
wi0: failed to allocate 1594 bytes on NIC

The "wi_cmd 0x0002" is WI_CMD_DISABLE from wi_stop(). Each of the
"busy bit won't clear" messages comes after a 5-second busy-loop
delay in wi_cmd(), so the above takes 2-3 minutes to complete.
This, BTW, is a Lucent silver card, probed as:

wi0: <WaveLAN/IEEE> at port 0x100-0x13f irq 9 function 0
config 1 on pccard0
wi0: 802.11 address: 00:02:2d:21:57:d3
wi0: using Lucent Technologies, WaveLAN/IEEE
wi0: Lucent Firmware: Station 6.16.01

The patch below does a few things:
- It adds a 20ms delay at the end of wi_init(), which seems to fix
the above timeouts in wi_stop(), as it seems that calling wi_stop()
too soon after wi_init() can cause these.
- The busy-bit loop timeout is reduced from 5 seconds to 500ms.
- When a status of 0xffff is returned or the "busy bit won't clear"
error occurs, sc->wi_gone is set to 1, so that other operations
will fail immediately instead of going back into the long busy
loops. Since sc->wi_gone had been used as a sanity test in
wi_generic_detach() to make sure devices are not detached twice,
this has been changed to use the previously unused WI_FLAGS_ATTACHED
flag. We also need to remove the wi_gone test in wi_stop(), since
otherwise the untimeout() calls will be missed if wi_gone is set
by something other than wi_generic_detach().
- The functions wi_cmd() and wi_seek() now test wi_gone, and return
immediately if it is set.

For me this makes the card work much more reliably and it reduces
the length of any hangs to less than 1 second. I guess it will take
testing on other cards and configurations to see if this improves
things in general or causes problems with some combinations.

Ian

Index: if_wi.c
===================================================================
RCS file: /dump/FreeBSD-CVS/src/sys/dev/wi/if_wi.c,v
retrieving revision 1.117
diff -u -r1.117 if_wi.c
--- if_wi.c 14 Oct 2002 01:59:57 -0000 1.117
+++ if_wi.c 27 Oct 2002 16:57:04 -0000
@@ -200,7 +200,7 @@
WI_LOCK(sc, s);
ifp = &sc->arpcom.ac_if;

- if (sc->wi_gone) {
+ if ((sc->wi_flags & WI_FLAGS_ATTACHED) == 0) {
device_printf(dev, "already unloaded\n");
WI_UNLOCK(sc, s);
return(ENODEV);
@@ -214,6 +214,7 @@
ether_ifdetach(ifp, ETHER_BPF_SUPPORTED);
bus_teardown_intr(dev, sc->irq, sc->wi_intrhand);
wi_free(dev);
+ sc->wi_flags &= ~WI_FLAGS_ATTACHED;
sc->wi_gone = 1;

WI_UNLOCK(sc, s);
@@ -471,6 +472,7 @@
*/
ether_ifattach(ifp, ETHER_BPF_SUPPORTED);
callout_handle_init(&sc->wi_stat_ch);
+ sc->wi_flags |= WI_FLAGS_ATTACHED;
WI_UNLOCK(sc, s);

return(0);
@@ -1002,20 +1004,24 @@
{
int i, s = 0;
static volatile int count = 0;
+
+ if (sc->wi_gone)
+ return (ENODEV);

if (count > 1)
panic("Hey partner, hold on there!");
count++;

/* wait for the busy bit to clear */
- for (i = 500; i > 0; i--) { /* 5s */
+ for (i = 500; i > 0; i--) { /* 500ms */
if (!(CSR_READ_2(sc, WI_COMMAND) & WI_CMD_BUSY)) {
break;
}
- DELAY(10*1000); /* 10 m sec */
+ DELAY(1000); /* 1ms */
}
if (i == 0) {
device_printf(sc->dev, "wi_cmd: busy bit won't clear.\n" );
+ sc->wi_gone = 1;
count--;
return(ETIMEDOUT);
}
@@ -1052,6 +1058,8 @@
if (i == WI_TIMEOUT) {
device_printf(sc->dev,
"timeout in wi_cmd 0x%04x; event status 0x%04x\n", cmd, s);
+ if (s == 0xffff)
+ sc->wi_gone = 1;
return(ETIMEDOUT);
}
return(0);
@@ -1364,6 +1372,9 @@
int selreg, offreg;
int status;

+ if (sc->wi_gone)
+ return (ENODEV);
+
switch (chan) {
case WI_BAP0:
selreg = WI_SEL0;
@@ -1391,6 +1402,8 @@
if (i == WI_TIMEOUT) {
device_printf(sc->dev, "timeout in wi_seek to %x/%x; last status %x\n",
id, off, status);
+ if (status == 0xffff)
+ sc->wi_gone = 1;
return(ETIMEDOUT);
}

@@ -2196,6 +2209,13 @@
sc->wi_stat_ch = timeout(wi_inquire, sc, hz * 60);
WI_UNLOCK(sc, s);

+ /*
+ * A 10ms or greater delay here seems to avoid a problem that
+ * causes some Lucent orinoco cards to time out in wi_stop()
+ * if called immediately after wi_init(). Use 20ms to be safe.
+ */
+ DELAY(20000);
+
return;
}

@@ -2471,11 +2491,11 @@
int s;

WI_LOCK(sc, s);
-
- if (sc->wi_gone) {
- WI_UNLOCK(sc, s);
- return;
- }
+ /*
+ * Ignore wi_gone here, as we still need to do the untimeout calls.
+ * Currently everything here should be safe to do even if the
+ * hardware is gone.
+ */

wihap_shutdown(sc);

To Unsubscribe: send mail to majo...@FreeBSD.org
with "unsubscribe freebsd-mobile" in the body of the message

i...@bsdimp.com

unread,

Oct 27, 2002, 7:07:52 PM10/27/02

to

In message: <2002102718...@salmon.maths.tcd.ie>
Ian Dowse <ied...@maths.tcd.ie> writes:
: The wi driver causes quite long system freezes both when the pccard

: is removed, and also if the hardware becomes confused. I've found
: on -current that sometimes the whole machine can become unresponsive
: for a period of minutes with messages such as:

For the removal, it would be better to use the bus_child_present() api
for the eject case. For the hardware becomes confused case, setting
gone means that nothing further will happen with the card. However,
the other parts of this patch seem relatively reasonable to me.

Warner

ied...@maths.tcd.ie

unread,

Oct 27, 2002, 8:11:24 PM10/27/02

to

In message <20021027.17064...@bsdimp.com>, "M. Warner Losh" writes:
>
>For the removal, it would be better to use the bus_child_present() api
>for the eject case.

Does the pccard system implement bus_child_present()? I just did a
quick grep for *_child_present there without finding anything, but
maybe I'm looking for the wrong thing. Is the idea to use something
like

if (sc->wi_gone)
return;
<hardware operation>
if (timeout) {
if (!bus_child_present(sc->dev)) {
sc->wi_gone = 1;
return;
}
device_printf(sc->dev, "device timeout\n");
}

so that further slow timeouts can be avoided if the device has
really been removed? I presume it is too much overhead to just call
bus_child_present() everywhere instead of testing `gone'.

>For the hardware becomes confused case, setting
>gone means that nothing further will happen with the card. However,
>the other parts of this patch seem relatively reasonable to me.

Yes, this is a problem if some devices report errors and then come
back to life. The problems I have seen don't recover, but I guess
that will not always be the case. In the patch the `gone' flag is
set when status is 0xffff after a timeout and when the "busy bit
won't clear" error occurs. Hopefully these are both usually cases
where the hardware does not recover itself.

Ian

iwa...@jp.freebsd.org

unread,

Oct 28, 2002, 10:30:57 PM10/28/02

to

> wi0: timeout in wi_cmd 0x0002; event status 0x8080

Here are 2 patches against similar problems related wi.
Please review them. TIA.

1. The suspend methods problem with NEWARD cardbus and pccard.
cardbus_suspend() forced cbb into power off via
cbb_power(CARD_VCC_0V) before pccard_suspend() detach wi properly.

----
Index: cardbus.c
===================================================================
RCS file: /home/ncvs/src/sys/dev/cardbus/cardbus.c,v
retrieving revision 1.24
diff -u -r1.24 cardbus.c
--- cardbus.c 7 Oct 2002 23:00:51 -0000 1.24
+++ cardbus.c 11 Oct 2002 02:22:28 -0000
@@ -148,6 +148,8 @@
static int cardbus_write_ivar(device_t cbdev, device_t child, int which,
uintptr_t value);

+#define DETACH_NOWARN 0x800
+
/************************************************************************/
/* Probe/Attach */
/************************************************************************/
@@ -175,7 +177,7 @@
static int
cardbus_suspend(device_t self)
{
- cardbus_detach_card(self, DETACH_FORCE);
+ cardbus_detach_card(self, DETACH_NOWARN); /* detach existing cards */
return (0);
}

@@ -208,8 +210,6 @@
PCIB_WRITE_CONFIG(brdev, b, s, f, PCIR_MAXLAT, 0x14, 1);
cfg->maxlat = PCIB_READ_CONFIG(brdev, b, s, f, PCIR_MAXLAT, 1);
}
-
-#define DETACH_NOWARN 0x800

static int
cardbus_attach_card(device_t cbdev)

----

2. Even after device had removed and wi_stop() called untimeout(wi_inquire),
sometimes wi_inquire() is still called and we got kernel panic.
I think there are some bugs in timeout/untimeout system under heavy
stresses. Here's a workaround for sanity checking.

----
Index: if_wi.c
===================================================================
RCS file: /home/ncvs/src/sys/dev/wi/if_wi.c,v

retrieving revision 1.117
diff -u -r1.117 if_wi.c
--- if_wi.c 14 Oct 2002 01:59:57 -0000 1.117

+++ if_wi.c 15 Oct 2002 19:24:13 -0000
@@ -853,6 +853,17 @@
int s;

sc = xsc;
+
+ if (sc == NULL) {
+ printf("%s: wrong argument\n", __func__);
+ return;
+ }

+
+ if (sc->wi_gone) {

+ printf("%s: already detached\n", __func__);
+ return;
+ }
+
ifp = &sc->arpcom.ac_if;

sc->wi_stat_ch = timeout(wi_inquire, sc, hz * 60);

----

i...@bsdimp.com

unread,

Oct 29, 2002, 2:44:19 AM10/29/02

to

In message: <20021029.122934....@jp.FreeBSD.org>
Mitsuru IWASAKI <iwa...@jp.FreeBSD.org> writes:
: Here are 2 patches against similar problems related wi.

: Please review them. TIA.
:
: 1. The suspend methods problem with NEWARD cardbus and pccard.
: cardbus_suspend() forced cbb into power off via
: cbb_power(CARD_VCC_0V) before pccard_suspend() detach wi properly.

Good call. I agree that this is the bug, but I'm not sure I like the
solution.

: ----

This is both right and wrong a the same time. It is right in that it
forces cardbus_detach_card to not power down the card. It is wrong in
that it fixes the problem one layer too high. cardbus_detach_card has
no business turning the power off if there are no children. Also, the
warning is bogus, imho. So, here's my patch that corrects these
problems. It eliminates NOWARN, eliminates the bogus printf and power
cycle, and adds a set of nice parens :-)

Comments?

Warner

P.S. I'm still trying to understand the wi patch and the need for it.
Timeout leaks seem like they would show up all over the place...

--- //depot/user/imp/newcard/dev/cardbus/cardbus.c 2002/10/10 08:32:14
+++ //depot/user/imp/newcard/dev/cardbus/cardbus.c 2002/10/28 23:36:38
@@ -209,8 +209,6 @@

cfg->maxlat = PCIB_READ_CONFIG(brdev, b, s, f, PCIR_MAXLAT, 1);
}

-#define DETACH_NOWARN 0x800
-
static int
cardbus_attach_card(device_t cbdev)
{
@@ -219,7 +217,7 @@
static int curr_bus_number = 2; /* XXX EVILE BAD (see below) */
int bus, slot, func;

- cardbus_detach_card(cbdev, DETACH_NOWARN); /* detach existing cards */
+ cardbus_detach_card(cbdev, 0); /* detach existing cards */

POWER_ENABLE_SOCKET(brdev, cbdev);
bus = pcib_get_bus(cbdev);
@@ -283,10 +281,6 @@
device_get_children(cbdev, &devlist, &numdevs);

if (numdevs == 0) {
- if (!(flags & DETACH_NOWARN)) {
- DEVPRINTF((cbdev, "detach_card: no card to detach!\n"));
- POWER_DISABLE_SOCKET(device_get_parent(cbdev), cbdev);
- }
free(devlist, M_TEMP);
return ENOENT;
}
@@ -297,7 +291,7 @@

if (status == DS_ATTACHED || status == DS_BUSY) {
if (device_detach(dinfo->pci.cfg.dev) == 0 ||
- flags & DETACH_FORCE) {
+ (flags & DETACH_FORCE)) {
cardbus_release_all_resources(cbdev, dinfo);
device_delete_child(cbdev, devlist[tmp]);
} else {