Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

tNetTask suspends

54 views
Skip to first unread message

tom.mcc...@boeing.com

unread,
Mar 18, 2008, 4:06:48 PM3/18/08
to
Hi,

I'm getting the following message on a DY4 card:

"sn: Fatal error. Receive structure invalid."

Once it appears, the netTask is suspended and the only way out is a
reboot.

Can anyone help remedy this?

Thanks,

-TomM

nois...@gmail.com

unread,
Mar 18, 2008, 5:13:45 PM3/18/08
to

Well, uhm, no, because you provided almost no details. For example,
what version of
VxWorks are you using? What is a DY4 card? What kind of processor does
it have on it?
(ARM? PPC? Coldfire? MIPS? x86?) What driver are you using? ("sn" is
not much to go
on.) Is it one that's shipped by Wind River? Is it one that you wrote?
Is it from a 3rd party?
Is it ethernet? Some kind of serial line interface? Shared memory?

One assumes the problem is that the driver (whatever it is) is
encountering some sort of
error condition while receiving data, and the error handling is
inadequate. That is, rather than
just discarding the bad data and continuing, it just crashes or hangs.
And since the driver's
receive handler runs in the context of tNetTask, that means it crashes
or hangs too.

As to how to remedy it, there's no way to know without more
information.

-Bill

> Thanks,
>
> -TomM

VKG Ritsoft Technologies

unread,
Mar 19, 2008, 5:56:34 AM3/19/08
to
Hi Tom,

I believe this error is from sonic driver and problem could be
malformed packet header.

Best Regards
VKG | Ritsoft Technologies

tom.mcc...@boeing.com

unread,
Mar 19, 2008, 10:33:24 AM3/19/08
to
Hi Bill,

Sorry for the abbreviated information; here is information that will
hopefully help:

The card is a SVME/DMV-177 single board computer manufactured by DY 4
Systems in Ontario, Canada (acquired by Curtiss-Wright Controls in
2004). The processor is a PPC 603e, running at 80 MHz. The error
appears with increased Ethernet activity (TCP/IP). The Ethernet
interface is controlled by a National Semiconductor DP83932B SONIC
chip.

The error text was found in libPPC603gnuvx.a. sysLib.c was developed
by DY 4.

VxWorks (for DY 4 VME-176/177) version 5.3.1; Kernel: WIND version
2.5.

Thanks,

-TomM


nois...@gmail.com

unread,
Mar 19, 2008, 4:23:02 PM3/19/08
to


Ah, ok.

Something told me I should have recognized the "sn" driver name, but
now I realize why I didn't: VxWorks 5.3.1 is pretty old, and the if_sn
driver is one of the BSD-style netif drivers, which have been
deprecated since 5.5. There doesn't appear to be an END driver to
replace it.

Anyway, I dug around a bit, and it looks like I was right: the problem
is in fact that the driver's error handling is very poor, and it calls
taskSuspend(0) when it encounters a problem receiving a packet. (Oddly
enough, it also tries to do a 'return' immediately following the
taskSuspend().) It looks like the error can be triggered by a number
of things, including the frame being too small or too large, or if the
PRX bit is not set in the RX DMA descriptor (which indicates there was
an error on receive, such as a CRC error or frame alignment error).
What the driver really should be doing is discarding the bad frame and
moving on to the next one (incrementing the RX error count along the
way), not suspending tNetTask.

The short answer here is that this driver looks buggy, probably in a
number of ways. It was probably not tested under heavy load (which is
a chronic problem). During periods of heavy network activity, a couple
of things can happen: you run out of RX descriptors (or overrun the RX
FIFO), or you end up with bad packets (runts, CRC errors, etc...). The
driver should be written to handle these conditions and recover from
them, but it isn't.

There's another problem though, which is that judging by the
documentation for the SONIC controller (which is still available on
National Semiconductor's site), you have to be very careful how you
handle the RX descriptor buffer management. The SONIC looks like it
uses what I call the 'single synchronization point' model, where the
RX DMA engine expects a linked list of descriptors, where the last one
has an 'end of list' bit set to mark where the list terminates. The
end of list bit is used by the chip to figure out when it's reached
the end of the list: when it reaches the end of list, the RX DMA
engine pauses. The driver is supposed to process pending RX
descriptors and when -- and only when -- it has also reached the end
of list, it can resume the DMA channel. The problem with this of
course is that you don't normally want the DMA channel to pause if you
can avoid. In this case though, you really can't avoid it, but some
driver developers think they can, so they try to cheat their way
around the problem: each time they receive a packet, they move the EOL
bit to the next descriptor. (This is a little like tying a stick with
a carrot on the end of it to the head of a donkey.) You can't do this
though, because it creates a race condition: there's no way to be
certain that the ethernet chip won't attempt to consume the next
descriptor while you're tying to update it (unless of course you stall
the RX DMA channel yourself first).

Anyway, I don't think the if_sn driver implements the right logic, so
it's very possible it's susceptible to the same race condition.

As to how to fix this... well, that's hard to say, mainly because of
how old VxWorks 5.3.1 is. The problem is definitely a bug in the if_sn
driver, which is Wind River code. If have a valid support contract,
then you can open up a support request for this issue. Given how old
VxWorkls 5.3.1 is though, I somehow doubt you can still get support
for it. At the very least, if you might be able to search for possible
patches for the if_sn driver.

Another alternative is to just write your own replacement for the
SONIC driver. The datasheet for the device is still available (http://
www.national.com/ds/DP/DP83932C.pdf) so this is not entirely out of
the question, but it's probably more work than you care to do.

-Bill

tom.mcc...@boeing.com

unread,
Mar 20, 2008, 10:07:50 AM3/20/08
to
> SONIC driver. The datasheet for the device is still available (http://www.national.com/ds/DP/DP83932C.pdf) so this is not entirely out of

> the question, but it's probably more work than you care to do.
>
> -Bill- Hide quoted text -
>
> - Show quoted text -

Many thanks, Bill

-TomM

0 new messages