An update, long due. After the compliete inaction by IBM’ so called “support” who blamed us for using non official operating systems, we complained
quite loudly (and harshly) and they accepted to “replace a backplane for mere reasons of customer satisfaction”. Despite me insisting to bring also
a HBA because we really didn´t know what was wrong.
So they sent a technician with one of the three almost passive boards of the backplane, even though I told them that the issue was spread among the 24 disks, not
just a group of 8. He changed one of them at random (I was on vacation when he came) and, as I imagined, the issue wasn’t solved at all.
Tired of dealing with them I pulled the SAS3 HBA and installed a classic LSI2008 card. A nightmare in itself, because the stupid firmware of the IBM hangs during
boot (“connecting RAID adapters and boot devices” or something like that, I left it like that for 24 hours just to see if it eventually exited the loop). I had to erase the
boot services flash from the HBA even though I had already disabled BIOS and UEFI services for the riser PCI card. Anyway I digress.
Repeating all of our tests, with the LSI2008 card everything works like a charm, although I’ve seen some surprising behavior. I spent a lot of time running
benchmarks. I could repeat the error condition in less than an hour fairly reliably with the LSI3008 card, and I was unable to reproduce the error with the LSI2008.
Of course, these days this is the most sure you can be, unless someone presents you with a proper oscilloscope and SAS pod. I even suggested that to IBM,
offering to do a serious diagnosis of the problem for them ;)
The odd behavior, for which LSI’s spiritual advice would be welcome, is this: 6 minutes after booting the system, while doing a scrub in order to generate
I/O load, and before beginning to run the error triggering benchmarks, I saw some surprising messages on /var/log/messages:
———
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da15,pass16: Element descriptor: 'SLOT 000'
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da15,pass16: SAS Device Slot Element: 1 Phys at Slot 0, Not All Phys
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: SATA device
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: parent 500507603ea6fd90 addr 500507603ea6fd99
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da16,pass17: Element descriptor: 'SLOT 001'
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da16,pass17: SAS Device Slot Element: 1 Phys at Slot 1, Not All Phys
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: SATA device
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: parent 500507603ea6fd90 addr 500507603ea6fd9a
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da17,pass18: Element descriptor: 'SLOT 002'
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da17,pass18: SAS Device Slot Element: 1 Phys at Slot 2, Not All Phys
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: SATA device
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: parent 500507603ea6fd90 addr 500507603ea6fd9b
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da18,pass19: Element descriptor: 'SLOT 003'
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da18,pass19: SAS Device Slot Element: 1 Phys at Slot 3, Not All Phys
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: SATA device
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: parent 500507603ea6fd90 addr 500507603ea6fd9c
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da19,pass20: Element descriptor: 'SLOT 004'
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da19,pass20: SAS Device Slot Element: 1 Phys at Slot 4, Not All Phys
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: SATA device
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: parent 500507603ea6fd90 addr 500507603ea6fd9d
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da20,pass21: Element descriptor: 'SLOT 005'
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da20,pass21: SAS Device Slot Element: 1 Phys at Slot 5, Not All Phys
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: SATA device
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: parent 500507603ea6fd90 addr 500507603ea6fd9e
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da21,pass22: Element descriptor: 'SLOT 006'
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da21,pass22: SAS Device Slot Element: 1 Phys at Slot 6, Not All Phys
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: SATA device
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: parent 500507603ea6fd90 addr 500507603ea6fd9f
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da22,pass23: Element descriptor: 'SLOT 007'
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da22,pass23: SAS Device Slot Element: 1 Phys at Slot 7, Not All Phys
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: SATA device
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: parent 500507603ea6fd90 addr 500507603ea6fda0
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da7,pass8: Element descriptor: 'SLOT 008'
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da7,pass8: SAS Device Slot Element: 1 Phys at Slot 8, Not All Phys
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: SATA device
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: parent 500507603ea6fd90 addr 500507603ea6fd91
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da8,pass9: Element descriptor: 'SLOT 009'
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da8,pass9: SAS Device Slot Element: 1 Phys at Slot 9, Not All Phys
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: SATA device
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: parent 500507603ea6fd90 addr 500507603ea6fd92
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da9,pass10: Element descriptor: 'SLOT 010'
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da9,pass10: SAS Device Slot Element: 1 Phys at Slot 10, Not All Phys
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: SATA device
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: parent 500507603ea6fd90 addr 500507603ea6fd93
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da10,pass11: Element descriptor: 'SLOT 011'
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da10,pass11: SAS Device Slot Element: 1 Phys at Slot 11, Not All Phys
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: SATA device
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: parent 500507603ea6fd90 addr 500507603ea6fd94
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da11,pass12: Element descriptor: 'SLOT 012'
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da11,pass12: SAS Device Slot Element: 1 Phys at Slot 12, Not All Phys
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: SATA device
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: parent 500507603ea6fd90 addr 500507603ea6fd95
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da12,pass13: Element descriptor: 'SLOT 013'
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da12,pass13: SAS Device Slot Element: 1 Phys at Slot 13, Not All Phys
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: SATA device
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: parent 500507603ea6fd90 addr 500507603ea6fd96
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da13,pass14: Element descriptor: 'SLOT 014'
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da13,pass14: SAS Device Slot Element: 1 Phys at Slot 14, Not All Phys
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: SATA device
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: parent 500507603ea6fd90 addr 500507603ea6fd97
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da14,pass15: Element descriptor: 'SLOT 015'
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: da14,pass15: SAS Device Slot Element: 1 Phys at Slot 15, Not All Phys
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: SATA device
Apr 20 11:06:38 clientes-ssd8 kernel: ses1: phy 0: parent 500507603ea6fd90 addr 500507603ea6fd98
——————
And at 17:41, something similar:
——————
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: da0,pass0: Element descriptor: 'SLOT 016'
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: da0,pass0: SAS Device Slot Element: 1 Phys at Slot 16, Not All Phys
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: phy 0: SATA device
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: phy 0: parent 500507603ea6d720 addr 500507603ea6d721
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: da1,pass1: Element descriptor: 'SLOT 017'
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: da1,pass1: SAS Device Slot Element: 1 Phys at Slot 17, Not All Phys
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: phy 0: SATA device
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: phy 0: parent 500507603ea6d720 addr 500507603ea6d722
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: da2,pass2: Element descriptor: 'SLOT 018'
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: da2,pass2: SAS Device Slot Element: 1 Phys at Slot 18, Not All Phys
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: phy 0: SATA device
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: phy 0: parent 500507603ea6d720 addr 500507603ea6d723
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: da3,pass3: Element descriptor: 'SLOT 019'
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: da3,pass3: SAS Device Slot Element: 1 Phys at Slot 19, Not All Phys
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: phy 0: SATA device
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: phy 0: parent 500507603ea6d720 addr 500507603ea6d724
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: da4,pass4: Element descriptor: 'SLOT 020'
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: da4,pass4: SAS Device Slot Element: 1 Phys at Slot 20, Not All Phys
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: phy 0: SATA device
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: phy 0: parent 500507603ea6d720 addr 500507603ea6d725
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: da5,pass5: Element descriptor: 'SLOT 021'
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: da5,pass5: SAS Device Slot Element: 1 Phys at Slot 21, Not All Phys
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: phy 0: SATA device
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: phy 0: parent 500507603ea6d720 addr 500507603ea6d726
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: da6,pass6: Element descriptor: 'SLOT 022'
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: da6,pass6: SAS Device Slot Element: 1 Phys at Slot 22, Not All Phys
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: phy 0: SATA device
Apr 20 17:41:41 clientes-ssd8 kernel: ses0: phy 0: parent 500507603ea6d720 addr 500507603ea6d727
———————
After those events I did a scrub just in case, and no errors were found. Can it be some expander oddity that somewhat
confused the LSI3008 and not the LSI2008?
The system is working as a charm anyway, but I wonder if there’s some non obvious problem waiting to become a time bomb.
Regarding IBM, well, unless we can fix this the expensive piece of hardware it will be scrapped. And I really doubt
any piece of kit from IBM/Lenovo (seems that Lenovo is in charge of support for these servers now) will be purchased here on
my watch, ever.
Thanks,
Borja.
_______________________________________________
freebs...@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "freebsd-scsi...@freebsd.org"
2008 is 6Gbps component, 3008 is a 12Gbps one so if you have 12Gbps
capable devices its quite possible that where a 2008 works fine,
negotiates at 6Gbps, the 3008 could fail @ 12Gbps due to the tighter
tolerances required from all components.
We had similar issues when chassis first started moving from 3Gbps to
6Gbps, in fact we found that Dell shipped drives with amended firmware
that limited their negotiation speed down to 3Gbps specifically to
workaround signalling issues in their chassis, even though they
advertised them as 6Gbps compatible.
Regards
Steve