reading errors on JMicron JM20337 USB-SATA

Lev A. Melnikovsky

unread,

Aug 1, 2009, 6:40:07 PM8/1/09

to

Hello,

I have read through a year old thread on "JMicron JM20337 USB-SATA data
corruption bugfix" and it seems here's another aspect of the same problem.
The SATA disk has genuine errors (bad sectors, just in case: I am not
going to use it but to recover some data from it). Unfortunately when a
bad block is read no error is returned, instead a caller is blocked
indefinitely (until the USB cable is removed). The system log is filled
with repetitive

sd 3:0:0:0: [sdf] Sense Key : 0x0 [current]
sd 3:0:0:0: [sdf] ASC=0x0 ASCQ=0x0

Is this difficult/possible to fix?

The kernel is 2.6.27.28, here's what I get with CONFIG_USB_STORAGE_DEBUG
set:

usb-storage: Status code -121; transferred 0/4096
usb-storage: -- short read transfer
usb-storage: Bulk data transfer result 0x1
usb-storage: Attempting to get CSW...
usb-storage: usb_stor_bulk_transfer_buf: xfer 13 bytes
usb-storage: Status code 0; transferred 13/13
usb-storage: -- transfer complete
usb-storage: Bulk status result = 0
usb-storage: Bulk Status S 0x53425355 T 0x39 R 4096 Stat 0x1
usb-storage: -- transport indicates command failure
usb-storage: -- unexpectedly short transfer
usb-storage: Issuing auto-REQUEST_SENSE
usb-storage: Bulk Command S 0x43425355 T 0x3a L 18 F 128 Trg 0 LUN 0 CL 6
usb-storage: usb_stor_bulk_transfer_buf: xfer 31 bytes
usb-storage: Status code 0; transferred 31/31
usb-storage: -- transfer complete
usb-storage: Bulk command transfer result=0
usb-storage: usb_stor_bulk_transfer_sglist: xfer 18 bytes, 1 entries
usb-storage: Status code 0; transferred 18/18
usb-storage: -- transfer complete
usb-storage: Bulk data transfer result 0x0
usb-storage: Attempting to get CSW...
usb-storage: usb_stor_bulk_transfer_buf: xfer 13 bytes
usb-storage: Status code 0; transferred 13/13
usb-storage: -- transfer complete
usb-storage: Bulk status result = 0
usb-storage: Bulk Status S 0x53425355 T 0x3a R 0 Stat 0x0
usb-storage: -- Result from auto-sense is 0
usb-storage: -- code: 0x70, key: 0x0, ASC: 0x0, ASCQ: 0x0
usb-storage: (Unknown Key): (unknown ASC/ASCQ)
usb-storage: scsi cmd done, result=0x2
sd 3:0:0:0: [sdf] Sense Key : 0x0 [current]
sd 3:0:0:0: [sdf] ASC=0x0 ASCQ=0x0
usb-storage: queuecommand called
usb-storage: *** thread sleeping.
usb-storage: *** thread awakened.
usb-storage: Command READ_10 (10 bytes)
usb-storage: 28 00 03 a8 d0 b0 00 00 08 00
usb-storage: Bulk Command S 0x43425355 T 0x3b L 4096 F 128 Trg 0 LUN 0 CL 10
usb-storage: usb_stor_bulk_transfer_buf: xfer 31 bytes
usb-storage: Status code 0; transferred 31/31
usb-storage: -- transfer complete
usb-storage: Bulk command transfer result=0
usb-storage: usb_stor_bulk_transfer_sglist: xfer 4096 bytes, 1 entries

I'd be glad to supply additional information if needed.
Thanks
-L
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Artur Skawina

unread,

Aug 2, 2009, 10:10:06 AM8/2/09

to

Lev A. Melnikovsky wrote:
> I have read through a year old thread on "JMicron JM20337 USB-SATA data
> corruption bugfix" and it seems here's another aspect of the same
> problem. The SATA disk has genuine errors (bad sectors, just in case: I
> am not going to use it but to recover some data from it). Unfortunately
> when a bad block is read no error is returned, instead a caller is
> blocked indefinitely (until the USB cable is removed). The system log is
> filled with repetitive
>
> sd 3:0:0:0: [sdf] Sense Key : 0x0 [current]
> sd 3:0:0:0: [sdf] ASC=0x0 ASCQ=0x0

yes, jmicron bridges do not report errors properly and just stall pretty
much indefinitely; found out the hard way, when a disk started to develop
bad blocks. took a bit of time to figure out as there were no i/o errors
reported at all. At least all the patches from back then have been merged
and the kernel can better cope w/ the situation (it used to be a lot worse);
plus modern smartctl will let you see the smart attributes (-d usbjmicron),
making it easier to check if the disk really is failing.

What did work for my case was to copy the data from the disk and every
time the process stalled turn off power to the sata drive for a few seconds
(leaving the bridge connected). The bridge in most cases recovered and a bit
more data got off the drive.

This was what saved that controller, because by the time i realized the
disk went bad, it was not possible to even mount the fs using another sata
controller due to all the i/o errors. With the above process i was able to
recover ~95% of the data.
Summary: Wouldn't want to use the bridge for any kind of unattended data
transfer, it's more of a data recovery device...

artur

Lev A. Melnikovsky

unread,

Aug 3, 2009, 3:20:05 AM8/3/09

to

On Sun, 2 Aug 2009 at 6:03pm, Artur Skawina wrote:

AS> Lev A. Melnikovsky wrote:
AS> > I have read through a year old thread on "JMicron JM20337 USB-SATA data
AS> > corruption bugfix" and it seems here's another aspect of the same
AS> > problem. The SATA disk has genuine errors (bad sectors, just in case: I
AS> > am not going to use it but to recover some data from it). Unfortunately
AS> > when a bad block is read no error is returned, instead a caller is
AS> > blocked indefinitely (until the USB cable is removed). The system log is
AS> > filled with repetitive
AS> >
AS> > sd 3:0:0:0: [sdf] Sense Key : 0x0 [current]
AS> > sd 3:0:0:0: [sdf] ASC=0x0 ASCQ=0x0
AS>
AS> yes, jmicron bridges do not report errors properly and just stall pretty
AS> much indefinitely; found out the hard way, when a disk started to develop
My interpretation was different - the bridge firmware does not crash but
remains alive (it does not report the error properly but "zis iz probably
perfectly normal behaviour for a Vogon"). This is the Linux kernel that
indefinitely tries to re-read. Am I wrong?

AS> What did work for my case was to copy the data from the disk and every
AS> time the process stalled turn off power to the sata drive for a few
AS> seconds (leaving the bridge connected). The bridge in most cases
AS> recovered and a bit more data got off the drive.
My nerve is too weak to touch ground/power until the data line is
disconnected. Running -rc1 seems not so dangerous...

-L

Alan Stern

unread,

Aug 3, 2009, 10:30:22 AM8/3/09

to

On Mon, 3 Aug 2009, Lev A. Melnikovsky wrote:

> AS> yes, jmicron bridges do not report errors properly and just stall pretty
> AS> much indefinitely; found out the hard way, when a disk started to develop
> My interpretation was different - the bridge firmware does not crash but
> remains alive (it does not report the error properly but "zis iz probably
> perfectly normal behaviour for a Vogon"). This is the Linux kernel that
> indefinitely tries to re-read. Am I wrong?

You are correct except for the term "indefinitely". The retries _will_
stop if you wait long enough. Unfortunately, because of all the nested
retry loops in the SCSI drivers and at the application level, you may
have to wait as long as half an hour.

I agree that this should be fixed. But it is a SCSI issue, not a USB
issue. You could try bringing it up on the linux-scsi mailing list.

Alan Stern

Artur Skawina

unread,

Aug 3, 2009, 11:40:11 AM8/3/09

to

Alan Stern wrote:
> On Mon, 3 Aug 2009, Lev A. Melnikovsky wrote:
>> AS> yes, jmicron bridges do not report errors properly and just stall pretty
>> AS> much indefinitely; found out the hard way, when a disk started to develop
>> My interpretation was different - the bridge firmware does not crash but
>> remains alive (it does not report the error properly but "zis iz probably
>> perfectly normal behaviour for a Vogon"). This is the Linux kernel that
>> indefinitely tries to re-read. Am I wrong?

No, but that's arguably the right thing to do -- the device didn't
report an error, so why should the kernel fail?..

> You are correct except for the term "indefinitely". The retries _will_
> stop if you wait long enough. Unfortunately, because of all the nested
> retry loops in the SCSI drivers and at the application level, you may
> have to wait as long as half an hour.

iirc, i had stalls _way_ longer than that, probably because the reads
eventually succeeded, only to stall on the next ones.

> I agree that this should be fixed. But it is a SCSI issue, not a USB
> issue. You could try bringing it up on the linux-scsi mailing list.

actually, the number of retries should probably be configurable, but i
wouldn't lower them by default; losing data because of recoverable errors
is bad. In this case the bridge may be at fault (by not passing along the
error), but to make a significant difference you'd have to reduce the number
of retries to something like zero, maybe one at most, and that's just too
low for a default.

artur

Alan Stern

unread,

Aug 3, 2009, 11:50:06 AM8/3/09

to

On Mon, 3 Aug 2009, Artur Skawina wrote:

> > You are correct except for the term "indefinitely". The retries _will_
> > stop if you wait long enough. Unfortunately, because of all the nested
> > retry loops in the SCSI drivers and at the application level, you may
> > have to wait as long as half an hour.
>
> iirc, i had stalls _way_ longer than that, probably because the reads
> eventually succeeded, only to stall on the next ones.
>
> > I agree that this should be fixed. But it is a SCSI issue, not a USB
> > issue. You could try bringing it up on the linux-scsi mailing list.
>
> actually, the number of retries should probably be configurable, but i
> wouldn't lower them by default; losing data because of recoverable errors
> is bad. In this case the bridge may be at fault (by not passing along the
> error), but to make a significant difference you'd have to reduce the number
> of retries to something like zero, maybe one at most, and that's just too
> low for a default.

As I understand it, the SCSI and block layers conspire to keep retrying
each command until a timeout expires, not until the number of retries
reaches a limit.

But the situation is complicated, because some kinds of retries reset
the timer. And if the application repeats the I/O request then of
course everything starts over again.

Alan Stern

Lev A. Melnikovsky

unread,

Aug 3, 2009, 3:30:13 PM8/3/09

to

On Mon, 3 Aug 2009 at 6:25pm, Alan Stern wrote:

AS> You are correct except for the term "indefinitely". The retries _will_
AS> stop if you wait long enough. Unfortunately, because of all the nested
AS> retry loops in the SCSI drivers and at the application level, you may
AS> have to wait as long as half an hour.
It was a simple test, I've plugged the USB cable off after two hours, this
is apparently not long enough:

[root ~]# time dd if=/dev/sdf of=/dev/null skip=61395120 count=1 bs=512
dd: reading `/dev/sdf': Input/output error
0+0 records in
0+0 records out
0 bytes (0 B) copied, 7550.12 s, 0.0 kB/s
dd: closing input file `/dev/sdf': Bad file descriptor

real 125m50.119s
user 0m0.000s
sys 0m0.000s

-L

Alan Stern

unread,

Aug 3, 2009, 4:00:25 PM8/3/09

to

On Mon, 3 Aug 2009, Lev A. Melnikovsky wrote:

> On Mon, 3 Aug 2009 at 6:25pm, Alan Stern wrote:
>
> AS> You are correct except for the term "indefinitely". The retries _will_
> AS> stop if you wait long enough. Unfortunately, because of all the nested
> AS> retry loops in the SCSI drivers and at the application level, you may
> AS> have to wait as long as half an hour.
> It was a simple test, I've plugged the USB cable off after two hours, this
> is apparently not long enough:
>
> [root ~]# time dd if=/dev/sdf of=/dev/null skip=61395120 count=1 bs=512
> dd: reading `/dev/sdf': Input/output error
> 0+0 records in
> 0+0 records out
> 0 bytes (0 B) copied, 7550.12 s, 0.0 kB/s
> dd: closing input file `/dev/sdf': Bad file descriptor
>
> real 125m50.119s
> user 0m0.000s
> sys 0m0.000s

Okay, it looks like I was wrong and this particular kind of error will
indeed cause unending retries.

Either way, like I said before, you should complain about this to the
SCSI people. They are the ones who can fix it. (You can CC: linux-usb
too, just to keep us in the loop.)

Tell them that scsi_end_request() mustn't call scsi_requeue_command()
if bytes == 0.

Alan Stern