Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

No retries after periph invalidation?

1 view
Skip to first unread message

Alexander Motin

unread,
Jul 23, 2011, 6:08:52 PM7/23/11
to
Hi.

I've simulated one real world device failure condition, when SATA disk
still reports its presence, but doesn't respond to any command. I've
found that due to multiple command retries, each of which cause 30s
timeout, bus reset and another retry/requeue, it may take ages to
eventually drop the failed device. Odd thing that those retries continue
even after XPT considered device lost and invalidated it.

I've made a patch (http://people.freebsd.org/~mav/periph_noretry.patch)
for cam_periph_error() to block any retries after periph was marked as
invalid. With that patch all activity completes in 1-2 minutess, just
after several timeouts, required to consider device loss.

Can this way considered to be correct?

--
Alexander Motin
_______________________________________________
freebs...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "freebsd-scsi...@freebsd.org"

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-...@muc.de

Matthew Jacob

unread,
Jul 23, 2011, 8:58:22 PM7/23/11
to
On 7/23/2011 3:08 PM, Alexander Motin wrote:
> Hi.
>
> I've simulated one real world device failure condition, when SATA disk
> still reports its presence, but doesn't respond to any command. I've
> found that due to multiple command retries, each of which cause 30s
> timeout, bus reset and another retry/requeue, it may take ages to
> eventually drop the failed device. Odd thing that those retries continue
> even after XPT considered device lost and invalidated it.
>
> I've made a patch (http://people.freebsd.org/~mav/periph_noretry.patch)
> for cam_periph_error() to block any retries after periph was marked as
> invalid. With that patch all activity completes in 1-2 minutess, just
> after several timeouts, required to consider device loss.
>
> Can this way considered to be correct?
>

Yes, I like this.

0 new messages