Scott
Sent from my iPhone
On Jan 22, 2009, at 12:49 PM, Steve Polyack <kor...@comcast.net> wrote:
> We have multiple systems using LSILogic PERC4 cards (PERC4e/Si,
> PERC4/DC, amr driver). After recently upgrading two of them from
> 6.3 to 7.1, we have begun to see the following errors in our logs
> during heavy use in both systems:
> amr0: Too many retries on command 0xffffffff80a4da58. Controller is
> likely dead
> amr0: Too many retries on command 0xffffffff80a4eaa8. Controller is
> likely dead
> amr0: Too many retries on command 0xffffffff80a497a0. Controller is
> likely dead
> amr0: Too many retries on command 0xffffffff80a4eaa8. Controller is
> likely dead
>
> However, the system continues working and the volumes remain
> accessible. This happens on the PERC4/DC controllers in both
> systems. Firmware version is 352D. MegaCLI reports no problems.
>
> Has anyone else seen these messages? A google search turns up
> nothing but results in the driver code. Is this something we should
> be worried about? We won't be moving any other systems to 7.1 until
> we can clear this up.
>
> Thanks!
>
> -Steve Polyack
_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"
The fix for this that I was thinking of is already in 7.1. There might
still be a driver bug, but I'm leaning more towards the controller
simply being busy. Do you have a reproducible test case that I could
try?
Scott
The other four which I noted came during writes to the array attached to
the PERC4/DC (external Dell PowerVault). I want to say they showed up
while writing a 30G junkfile (/dev/random) to the array which we were
using to test the tape access; either that, or while we wrote that file
out to the tape drive.
If it matters, we also use ports/sysutils/linux-megacli2 to periodically
check the status of our arrays. It's possible that this happened during
one of these long writes/reads. I'm not having any luck reproducing at
the moment, but if I come across a reproducible test, I will let you know.
Thanks!
Steve Polyack
I don't know too much about the internals of the AMR firmware, but I
imagine that it could be possible that a management command from megacli
could stall the firmware and make this warning pop up. I'll see if I
can reproduce it. The warning is harmless, though, even if it is
strongly worded.
Scott