Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: amr driver issues in 7.1-RELEASE

0 views
Skip to first unread message

Scott Long

unread,
Jan 22, 2009, 3:29:46 PM1/22/09
to Steve Polyack, sco...@freebsd.org, Mike Tancsa, freebsd-...@freebsd.org
I might have a fix for this, let me check and get back to you.

Scott

Sent from my iPhone

On Jan 22, 2009, at 12:49 PM, Steve Polyack <kor...@comcast.net> wrote:

> We have multiple systems using LSILogic PERC4 cards (PERC4e/Si,
> PERC4/DC, amr driver). After recently upgrading two of them from
> 6.3 to 7.1, we have begun to see the following errors in our logs
> during heavy use in both systems:
> amr0: Too many retries on command 0xffffffff80a4da58. Controller is
> likely dead
> amr0: Too many retries on command 0xffffffff80a4eaa8. Controller is
> likely dead
> amr0: Too many retries on command 0xffffffff80a497a0. Controller is
> likely dead
> amr0: Too many retries on command 0xffffffff80a4eaa8. Controller is
> likely dead
>
> However, the system continues working and the volumes remain
> accessible. This happens on the PERC4/DC controllers in both
> systems. Firmware version is 352D. MegaCLI reports no problems.
>
> Has anyone else seen these messages? A google search turns up
> nothing but results in the driver code. Is this something we should
> be worried about? We won't be moving any other systems to 7.1 until
> we can clear this up.
>
> Thanks!
>
> -Steve Polyack
_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"

Scott Long

unread,
Jan 22, 2009, 4:16:43 PM1/22/09
to Steve Polyack, Mike Tancsa, freebsd-...@freebsd.org
Steve Polyack wrote:
> We have multiple systems using LSILogic PERC4 cards (PERC4e/Si,
> PERC4/DC, amr driver). After recently upgrading two of them from 6.3 to
> 7.1, we have begun to see the following errors in our logs during heavy
> use in both systems:
> amr0: Too many retries on command 0xffffffff80a4da58. Controller is
> likely dead
> amr0: Too many retries on command 0xffffffff80a4eaa8. Controller is
> likely dead
> amr0: Too many retries on command 0xffffffff80a497a0. Controller is
> likely dead
> amr0: Too many retries on command 0xffffffff80a4eaa8. Controller is
> likely dead
>
> However, the system continues working and the volumes remain
> accessible. This happens on the PERC4/DC controllers in both systems.
> Firmware version is 352D. MegaCLI reports no problems.
>
> Has anyone else seen these messages? A google search turns up nothing
> but results in the driver code. Is this something we should be worried
> about? We won't be moving any other systems to 7.1 until we can clear
> this up.
>

The fix for this that I was thinking of is already in 7.1. There might
still be a driver bug, but I'm leaning more towards the controller
simply being busy. Do you have a reproducible test case that I could
try?

Scott

Steve Polyack

unread,
Jan 22, 2009, 4:29:30 PM1/22/09
to Scott Long, Mike Tancsa, freebsd-...@freebsd.org
Scott Long wrote:
> The fix for this that I was thinking of is already in 7.1. There
> might still be a driver bug, but I'm leaning more towards the
> controller simply being busy. Do you have a reproducible test case
> that I could
> try?
>
> Scott
>
We saw this one while backups wrote from an array on the PERC4/DC to a
tape drive (on a separate controller).
amr1: Too many retries on command 0xffffffff80a6d060. Controller is
likely dead

The other four which I noted came during writes to the array attached to
the PERC4/DC (external Dell PowerVault). I want to say they showed up
while writing a 30G junkfile (/dev/random) to the array which we were
using to test the tape access; either that, or while we wrote that file
out to the tape drive.

If it matters, we also use ports/sysutils/linux-megacli2 to periodically
check the status of our arrays. It's possible that this happened during
one of these long writes/reads. I'm not having any luck reproducing at
the moment, but if I come across a reproducible test, I will let you know.

Thanks!
Steve Polyack

Scott Long

unread,
Jan 22, 2009, 4:50:35 PM1/22/09
to Steve Polyack, Mike Tancsa, freebsd-...@freebsd.org
Steve Polyack wrote:
> Scott Long wrote:
>> The fix for this that I was thinking of is already in 7.1. There
>> might still be a driver bug, but I'm leaning more towards the
>> controller simply being busy. Do you have a reproducible test case
>> that I could
>> try?
>>
>> Scott
>>
> We saw this one while backups wrote from an array on the PERC4/DC to a
> tape drive (on a separate controller).
> amr1: Too many retries on command 0xffffffff80a6d060. Controller is
> likely dead
>
> The other four which I noted came during writes to the array attached to
> the PERC4/DC (external Dell PowerVault). I want to say they showed up
> while writing a 30G junkfile (/dev/random) to the array which we were
> using to test the tape access; either that, or while we wrote that file
> out to the tape drive.
>
> If it matters, we also use ports/sysutils/linux-megacli2 to periodically
> check the status of our arrays. It's possible that this happened during
> one of these long writes/reads. I'm not having any luck reproducing at
> the moment, but if I come across a reproducible test, I will let you know.
>

I don't know too much about the internals of the AMR firmware, but I
imagine that it could be possible that a management command from megacli
could stall the firmware and make this warning pop up. I'll see if I
can reproduce it. The warning is harmless, though, even if it is
strongly worded.

Scott

Steve Polyack

unread,
Jan 26, 2009, 4:05:15 PM1/26/09
to Scott Long, Mike Tancsa, freebsd-...@freebsd.org
Steve Polyack wrote:
> Scott Long wrote:
>> The fix for this that I was thinking of is already in 7.1. There
>> might still be a driver bug, but I'm leaning more towards the
>> controller simply being busy. Do you have a reproducible test case
>> that I could
>> try?
>>
>> Scott
>>
So far, I have not been able to reliably reproduce this. It pops up
every now and then during our backups, which at the moment aren't that
disk intensive. I'll let you know if I come across anything else.
0 new messages