Hi!
I've been running 3.0.12 for a while now and was doing som maintenance and saw that version 4.0.7 was out. So I installed it on the server with the in place upgrade. But the script mentioned something about some signature that didn't match so it had to do a clean install instead.
First, everything seemed to work correctly, but after somewhere around 3-4 minutes I got an error through the iLO (It is an HP server, HP ProLiant DL380e gen 8 to be exact) where the storage controller reported a critical failure. At first I thought that it was just an unlucky coincidence that it broke down now. But then I had already started the update of another server and got the same error when I rebooted that server. So I thought that the chances for both raid cards to break in almost the exact same instance was so minimal that it almost can't happen.
Which made me curious, so I started to investigate a little and came to the conclusion that something has changed between version 3.0.14 and 4.0.1, that introduces this error.
I should probably also mention that the error disappears when you reboot the server. Not directly when the server gets shutdown but during post. It does also mention the error during post. I'm attaching a picture so you can see for your self! :) But just in case somebody is searching for this error in the future, here is the error:
1719-Slot 1 Drive Array - A controller failure event occurred prior to this power-up. (Previous lock up code = 0x14)
I've searched for this error on the web and found that there was some firmware update, that should fix this error. But the firmware mentioned was version 3 something, if I remember correctly. But as you can see on the screenshot, the cards are already running version 8.32. Which is the latest version available on HP:s support site.
Another thing to mention might be that non of the drives connected through the RAID card gets listed with the lsblk command after the error shows up. (Haven't checked if they show up before the error though)
I assume that there is something that gets run on startup, or shortly after, that generates some command to the raid card that it can't handle. Which in turn makes it crash or lock up as the error indicates.
Is there anything that I can try to find the problem?
Listing some info about my setup (So that it is easier to find than to scroll through the text):
Servers (2x): HP ProLiant DL380e Gen 8
RAID card (Same card in both servers): HP Smart Array P822
Best regards
Oskar