I have a buggy USB card reader which responds with "Unrecoverable read
error" for particular reads (and probably writes). The error is
triggered immediately when I insert the device and udev runs "vol_id" on
it. The usb-storage people have been quick to figure out a workaround,
but there is also a more general problem.
The kernel's response to this unrecoverable read error is to hang the
vol_id process. "strace" shows that vol_id is hung on sys_read() for at
least 10 minutes. It continues to hang even when I unplug the card
reader. This ties up the device node (/dev/sdb) so that if I re-insert
the card reader, it uses a different device node. It also stops me
hibernating the computer because the vol_id process is "refusing to
freeze". Nor can the hung vol_id be killed; it's stuck in D state. I
have reproduced all of this on linux 2.6.25.3.
I sent Alan Stern a copy of the kernel messages with
CONFIG_USB_STORAGE_DEBUG. His conclusion was that the hang was not in
usb-storage but elsewhere. I had also sent stack traces automatically
output after a failed hibernation which implicated a filesystem in the
hang (!).
There's a lot going on here. Can anyone help pin down specific
problems? I think the error handling is primarily the responsibility of
the SCSI generic layer, but I don't have any insight into what it is
supposed do. My instinct says that SCSI error handling should be mature
and the issue is in some way specific to usb-storage, but that's just a
feeling.
As a starting point, I've attached dmesg output showing two sets of
stack traces (from Alt-SysRq-T) taken before and after the device was
removed.
Thanks!
Alan