Breaking news on: USB disconnects problem

35 views
Skip to first unread message

nuccio raciti

unread,
Jan 31, 2010, 11:46:44 AM1/31/10
to sim1
The EP9307 errata file:
http://groups.google.it/group/sim1/web/ER667E1REV2.pdf?hl=it

on page 15 says:

"
USB
Description 1
Outgoing USB DMA transfers may be corrupted if the data buffer is
not quad word aligned or if the data
buffer length is not an integer number of quad words. USB quad
word and single word transfers are not
affected.
Workaround
Make the transmit buffers used by the USB device aligned on a quad
word boundary, i.e. the start address
ends in a 0x0 nibble, and make the buffer length an integer number
of quad words, i.e. the length in bytes
ends in a 0x0 nibble. This can be achieved by padding the transmit
buffer structure by the appropriate
number of bytes.
Incoming USB data is not affected.
A fix for this bug has been implemented for silicon revision E2.
"

So, Sim1 uses the E1 revision of the EP9307 chip,
and the EDB9307A uses the E2 .... :-/

Nuccio

nuccio raciti

unread,
Jan 31, 2010, 12:00:06 PM1/31/10
to sim1
Also, E1 revision contains audio problems:

AC’97
Description
Disabling audio transmit by clearing the TEN bit in one of the
AC97TXCRx registers will not clear out any
remaining bytes in the TX FIFO. If the number of bytes left in
the FIFO is not equal to a whole sample or
samples, this will throw off subsequent audio playback causing
distortion or channel swapping.
Workaround
To stop audio playback, do the following:
1) Pause DMA
2) Poll the AC97SRx register until either TXUE or TXFE is set.
3) Clear the TEN bit.
This ensures that the TX FIFO is empty before the transmit
channel is disabled.

N

Martin Guy

unread,
Feb 1, 2010, 7:27:11 AM2/1/10
to si...@googlegroups.com
On 1/31/10, nuccio raciti <raciti...@gmail.com> wrote:
> The EP9307 errata file
> "
> USB
> Description 1
> Outgoing USB DMA transfers may be corrupted if the data buffer is
> not quad word aligned or if the data
> buffer length is not an integer number of quad words. USB quad
> word and single word transfers are not
> affected.
> Workaround
> Make the transmit buffers used by the USB device aligned on a quad
> word boundary, i.e. the start address
> ends in a 0x0 nibble, and make the buffer length an integer number
> of quad words, i.e. the length in bytes
> ends in a 0x0 nibble. This can be achieved by padding the transmit
> buffer structure by the appropriate
> number of bytes.
> Incoming USB data is not affected.
> A fix for this bug has been implemented for silicon revision E2.
> "
>
> So, Sim1 uses the E1 revision of the EP9307 chip,
> and the EDB9307A uses the E2 .... :-/

Good find! Unfortunately we get read errors and device disconnects
when only reading (mounted readonly, for example), while other boards
using E0 and E1 (the Armadillo and TS7250 here) never suffer this
problem.

However, it's true that an extra undocumented error in the FPU was
discovered by testing on the sim1 that did not happen on the TS7250
because the sim1 is faster (has 32-bit RAM, not 16-bit) so I will see
if I can find the USB-DMA setup code and put a test in.

M

Martin Guy

unread,
Feb 5, 2010, 3:40:02 AM2/5/10
to si...@googlegroups.com
On 2/1/10, Martin Guy <marti...@gmail.com> wrote:
> On 1/31/10, nuccio raciti <raciti...@gmail.com> wrote:
> > Outgoing USB DMA transfers may be corrupted if the data buffer is
> > not quad word aligned or if the data
> > buffer length is not an integer number of quad words. USB quad
> > word and single word transfers are not
> > affected.
> > Workaround
> > Make the transmit buffers used by the USB device aligned on a quad
> > word boundary, i.e. the start address
> > ends in a 0x0 nibble, and make the buffer length an integer number
> > of quad words, i.e. the length in bytes
> > ends in a 0x0 nibble. This can be achieved by padding the transmit
> > buffer structure by the appropriate
> > number of bytes.

Ok, I've run some tests with 2.6.32. The key to getting verbose output
is to edit drivers/usb/host/ohci-hcd.c and define OHCI_VERBOSE_DEBUG
at line 53
and I added some printk()s wherever DMA buffers were allocated.

It turns out that all DMA buffers I could see are aligned to a
multiple of 16 bytes when they area allocated. However, the length is
not always a multiple of 16. A dump of /var/log/kern.log is attached,
just before a disconnect happens. Interesting lines are:

drivers/usb/host/ohci-dbg.c: SUB c5940340 dev=5 ep=2out-bulk flags=c
len=0/31 stat=-115
drivers/usb/host/ohci-dbg.c: RET c5940340 dev=5 ep=2out-bulk flags=c
len=31/31 stat=0
...
ep93xx-ohci ep93xx-ohci: rhsc
hub 1-0:1.0: state 7 ports 3 chg 0000 evt 0004
ep93xx-ohci ep93xx-ohci: GetStatus roothub.portstatus [1] = 0x00020101 PESC PPS
CCS
hub 1-0:1.0: port 2 enable change, status 00000101
hub 1-0:1.0: port 2 disabled by hub (EMI?), re-enabling...

The first sign that something has gone wrong is "rhsc", Root Hub
Status Change, which then says that a device has got disabled.

This is after a lot of other "out-bulk" transfers of which many are
not a multiple of 16.

I've tried hacking the code as they suggest to pad the short transfers
with extra bytes, but that makes the device reset continually,
disconnecting as soon as the kernel starts talking to it.
I went from drivers/usb/host/ohci-ep93xx.c:
static struct hc_driver ohci_ep93xx_hc_driver = {
.urb_enqueue = ohci_urb_enqueue,

to drivers/usb/host/ohci-hcd.c where I added:

static int ohci_urb_enqueue (
struct usb_hcd *hcd,
struct urb *urb,
gfp_t mem_flags
) {
struct ohci_hcd *ohci = hcd_to_ohci (hcd);
struct ed *ed;
urb_priv_t *urb_priv;
unsigned int pipe = urb->pipe;
int i, size = 0;
unsigned long flags;
int retval = 0;

#ifdef OHCI_VERBOSE_DEBUG
urb_print(urb, "SUB", usb_pipein(pipe), -EINPROGRESS);
#endif

/* every endpoint has a ed, locate and maybe (re)initialize it */
if (! (ed = ed_get (ohci, urb->ep, urb->dev, pipe, urb->interval)))
return -ENOMEM;

#ifdef CONFIG_ARCH_EP93XX
/*
* A hardware bug in EP93xx silicon revision E1 corrupts outgoing data
* if the data buffer length is not a multiple of "four quadwords".
* The workaround is to pad all outgoing transfers up to a multiple of
* 16 bytes.
*/
if (usb_urb_dir_out(urb) && ed->type == PIPE_BULK) {
&& urb->transfer_buffer_length == 31) {
urb->transfer_buffer_length = 32;
((char *)(urb->transfer_dma))[31] = '\0';
}
#endif

usbdebug-kern.log.txt

Martin Guy

unread,
Feb 5, 2010, 3:46:44 AM2/5/10
to si...@googlegroups.com
Sory, the thing sent that before it was ready

if (usb_urb_dir_out(urb) && ed->type == PIPE_BULK) {

if (urb->transfer_buffer_length & 15 != 0)
urb->transfer_buffer_length =
(urb->transfer_buffer_length + 15) & ~15;
}
#endif

but that just stops it ever getting connected, resetting continually.
I also tried only checking for the "31" length and modifying it to 32,
as well as setting the pad byte to '\0' - same result.

So I'm not sure how we are supposed to increase the size of the data
packets without upsetting the host controller or device.

My unexplored paths are:
- this code says it deals with all transfers except the ones to the
root hub itself.
- How come the USB storage on the armadillo (revision E0) and TS7250
here (rev E1) work reliably? The TS is using mainline kernel code with
no modification to the USB part (though the TS has 16-bit RAM, which
is slower to access than the 32-bit stuff on the sim.one)

?

M

Martin Guy

unread,
Feb 5, 2010, 3:50:01 AM2/5/10
to si...@googlegroups.com
Oh, some extra stuff in decoding the kernel logs:

SUB is when the data is submitted to be sent or received
RET is when the data has been received or transmitted
and the status codes are negative versions of the values in
/usr/include/asm-generic/errno.h
In various contexts I've seen:
-62 Timer expired
-104 Connection reset by peer
-115 Operation now in progress (not an error)
-121 Remote I/O error

Cheers

M

Martin Guy

unread,
Feb 8, 2010, 5:09:43 AM2/8/10
to si...@googlegroups.com
Hi again. I was looking at the sim.one at FOSDEM (and met Marcus, hi!).

It seems the sim1 is sensitive to EMI. In the "Hackers' Room",
surrounded by laptops, it worked fine and I even managed to write an
entire filesystem to USB drive once, while at the OE stand with other
running naked boards around it wouldn't even boot: either no lights at
all, not even network, or complete garbage out of the serial port or
coming on with just the red light on all the time.

I've noticed the same effects here at home. In the machine room, I
sometimes get the red light at boot (and, incidentally, the AN258
workaround (reboot on constant red light) doesn't work, even though
jumper JP3 is in place). That's also where I got the effect when
u-boot couldn't see the ethernet that time - since I moved it to
another, quieter room, that symptom went away.

Some ideas that didn't solve it:
- make ohci-hcd a module and give it the "distrust_firmware"
parameter, which works round some USB hosts reporting "power overload"
when they shouldn't, keeping power on all the time.
- connect the USB casing to DGND. This is easy on the back of the
board, but doesn't help.

At this point I'm having to go back to the clock instability theory,
due to electromagnetic interference, and suggest either a metal box or
reworking the crystal circuitry.

Can anyone check that any of the clock outputs are definitely
producing a continuous regular signal of constant frequency, even
while serial garbage, USB disconnect, SD card errors or audio reset
are observed?

M

Reply all
Reply to author
Forward
0 new messages