Now that all disks have been taken out and put back in one by one, I do:
# mfiutil show volumes
mfi0 Volumes:
Id Size Level Stripe State Cache Name
Whoops, no volumes? That can't be good. I check up on the disks, they're all there:
mfiutil show drives
mfi0 Physical Drives:
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ721953> SATA enclosure 1, slot 2
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ722143> SATA enclosure 1, slot 3
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ722152> SATA enclosure 1, slot 4
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ722133> SATA enclosure 1, slot 14
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ722139> SATA enclosure 1, slot 15
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ721963> SATA enclosure 1, slot 17
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ722160> SATA enclosure 1, slot 23
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ722127> SATA enclosure 2, slot 1
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ722155> SATA enclosure 2, slot 4
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ721968> SATA enclosure 2, slot 5
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ721956> SATA enclosure 2, slot 6
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ722137> SATA enclosure 2, slot 7
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ721969> SATA enclosure 2, slot 8
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ722122> SATA enclosure 2, slot 10
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ722149> SATA enclosure 2, slot 11
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ721951> SATA enclosure 1, slot 0
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ722128> SATA enclosure 1, slot 5
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ721966> SATA enclosure 1, slot 6
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ721952> SATA enclosure 1, slot 7
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ722145> SATA enclosure 1, slot 8
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ722131> SATA enclosure 1, slot 9
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ721961> SATA enclosure 1, slot 10
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ721954> SATA enclosure 1, slot 11
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ722158> SATA enclosure 1, slot 12
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ721967> SATA enclosure 1, slot 13
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ721971> SATA enclosure 1, slot 16
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ722157> SATA enclosure 1, slot 19
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ721970> SATA enclosure 1, slot 21
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ722140> SATA enclosure 1, slot 22
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ722125> SATA enclosure 2, slot 2
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ722146> SATA enclosure 2, slot 3
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ722134> SATA enclosure 2, slot 9
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ721965> SATA enclosure 1, slot 1
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ722151> SATA enclosure 1, slot 20
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ721964> SATA enclosure 2, slot 0
( 1863G) UNCONFIGURED GOOD <SAMSUNG HD203WI 0003 serial=S1UYJDWZ721955> SATA enclosure 1, slot 18
Well, that's an excelent way of adding the volumes in the sequence they appear physically, so I start with the first one from top left:
# mfiutil create raid0 -v E01:S05
Adding drive 26 to array 0
Adding array 0 to volume 0
mfiutil: Command failed: Wrong firmware or drive state
mfiutil: Failed to add volume: Input/output error
Firmware error? Invalid drive state?
# mfiutil good E01:S05
mfiutil: Drive 26 is already in the desired state
Seems to be good... do I have a firmware issue? Check dmesg:
mfi0: <LSI MegaSAS Gen2> port 0xc000-0xc0ff mem 0xfad7c000-0xfad7ffff,0xfadc0000-0xfadfffff irq 16 at device 0.0 on pci5
mfi0: Megaraid SAS driver Ver 3.00
mfi0: 1966 (338565595s/0x0020/info) - Shutdown command received from host
mfi0: 1967 (boot + 3s/0x0020/info) - Firmware initialization started (PCI ID 0079/1000/9261/1000)
mfi0: 1968 (boot + 3s/0x0020/info) - Firmware version 2.0.03-0673
mfi0: 1969 (boot + 4s/0x0020/info) - Board Revision
mfi0: 1970 (boot + 24s/0x0004/info) - Enclosure (SES) discovered on PD 08(c Port 0 - 3/p1)
mfi0: 1971 (boot + 24s/0x0004/info) - Enclosure (SES) discovered on PD 09(c Port 0 - 3/p2)
mfi0: 1972 (boot + 24s/0x0004/info) - Enclosure PD 08(c Port 0 - 3/p1) communication restored
mfi0: 1973 (boot + 24s/0x0004/info) - Enclosure PD 08(c Port 0 - 3/p1) fan 1 speed changed
mfi0: 1974 (boot + 24s/0x0004/info) - Enclosure PD 08(c Port 0 - 3/p1) fan 2 speed changed
mfi0: 1975 (boot + 24s/0x0004/info) - Enclosure PD 08(c Port 0 - 3/p1) fan 3 speed changed
mfi0: 1976 (boot + 24s/0x0004/info) - Enclosure PD 09(c Port 0 - 3/p2) communication restored
mfi0: 1977 (boot + 24s/0x0004/info) - Enclosure PD 09(c Port 0 - 3/p2) fan 1 speed changed
mfi0: 1978 (boot + 24s/0x0004/info) - Enclosure PD 09(c Port 0 - 3/p2) fan 2 speed changed
mfi0: 1979 (boot + 24s/0x0004/info) - Enclosure PD 09(c Port 0 - 3/p2) fan 3 speed changed
mfi0: 1980 (boot + 24s/0x0002/info) - Inserted: Encl PD 08
mfi0: 1981 (boot + 24s/0x0002/info) - Inserted: PD 08(c Port 0 - 3/p1) Info: enclPd=08, scsiType=d, portMap=00, sasAddr=50030480008fb0fd,0000000000000000
mfi0: 1982 (boot + 24s/0x0002/info) - Inserted: Encl PD 09
mfi0: 1983 (boot + 24s/0x0002/info) - Inserted: PD 09(c Port 0 - 3/p2) Info: enclPd=09, scsiType=d, portMap=00, sasAddr=50030480008e7b7d,0000000000000000
mfi0: 1984 (boot + 24s/0x0002/info) - Inserted: PD 0a(e0x08/s2)
and then lots of disks.... looks fine, right?
What am I missing? I badly want mfid0-mfid35 back so that I can recreate my ZFS and get to work :-)
On a side note, the ZFS I'll make is this, any comments on the configuration?
zpool create tank \
raidz2 /dev/mfid0 /dev/mfid1 /dev/mfid2 /dev/mfid3 /dev/mfid4 /dev/mfid5 /dev/mfid30 \
raidz2 /dev/mfid6 /dev/mfid7 /dev/mfid8 /dev/mfid9 /dev/mfid10 /dev/mfid11 /dev/mfid31 \
raidz2 /dev/mfid12 /dev/mfid13 /dev/mfid14 /dev/mfid15 /dev/mfid16 /dev/mfid17 /dev/mfid32 \
raidz2 /dev/mfid18 /dev/mfid19 /dev/mfid20 /dev/mfid21 /dev/mfid22 /dev/mfid23 /dev/mfid33 \
raidz2 /dev/mfid24 /dev/mfid25 /dev/mfid26 /dev/mfid27 /dev/mfid28 /dev/mfid29 /dev/mfid34 \
spare /dev/mfid35
Cheers
Nik_______________________________________________
freebs...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "freebsd-scsi...@freebsd.org"
I already tinkered with that kind of controller too (it seems we faced
the same problems in the same order too...). For this kind of purposes,
you really want a controller that does real JBOD and tells you about
what is going on with the disk. :/
I bumped in exactly the same problem as you, meaning that removing the
drive will destroy the RAID-0 volume, and frankly, even if the tools
were working for recreating the volumes, having to do that to see the
disk again is a "dirty hack"(tm) that is bound to blow up in your face
at some point, or to be really hard to maintain for any upcoming upgrade.
Sorry for not providing any additional help with your problem. :/
--
Stephane LAPIE, EPITA SRS, Promo 2005
"Even when they have digital readouts, I can't understand them."
--MegaTokyo
You can wire down SCSI buses and disks in /boot/device.hints so each
disk always gets the same device number regardless of the order in which
the disks spin up. The syntax is documented in /sys/conf/NOTES (search
for "SCSI DEVICE CONFIGURATION"). It's a CAM feature, and mfi uses CAM,
so I *think* it should work for mfi as well, but what you'll actually be
wiring down are mfi volumes, not individual disks, so it's up to you to
assign the right disk to the right volume.
I am very close to suggesting that you just let the controller handle
the RAID part of things and just build your zfs pool on top of that,
i.e. use mfiutil to divide your disks into 5 x 6+1, 5 x 5+2 or 7 x 4+1
volumes plus one spare, and use each volume as a separate vdev in your
zfs pool. You may lose a small amount of performance, and rebuilds will
be slower, but the main argument in favor of zfs, the write hole, is
moot if your controller has battery-backed cache.
DES
--
Dag-Erling Smørgrav - d...@des.no
MFI only uses CAM for passthrough access to component drives, not for normal I/O. Setting CAM wiring hints will not solve the problem at hand. And the problem at hand isn't really even numbering, it's that the MFI firmware freaked out and marked the disks inaccessible. I think that this happened to us at Yahoo once, and we eventually gave up and replaced the disks. Putting the disks on a non-LSI, non-RAID controller and writing 0's to the last 10MB worth of sectors (or just writing 0's to the entire drive) will likely solve the problem, but YMMV.
Scott
On Sep 26, 2010, at 1:25 AM, Scott Long wrote:
> MFI only uses CAM for passthrough access to component drives, not for normal I/O. Setting CAM wiring hints will not solve the problem at hand. And the problem at hand isn't really even numbering, it's that the MFI firmware freaked out and marked the disks inaccessible. I think that this happened to us at Yahoo once, and we eventually gave up and replaced the disks. Putting the disks on a non-LSI, non-RAID controller and writing 0's to the last 10MB worth of sectors (or just writing 0's to the entire drive) will likely solve the problem, but YMMV.
Indeed, "mfiutil locate" gave us all we needed for identifying the disk, so numbering is no longer an issue. Thanks for the tip on how to get the disk back up, I'll make sure we try that this week.
I tried the let-the-RAID-controller-make-RAIDs-and-join-them-via-ZFS model on the mpt-based controller, and that made performance drop from ~250 mb/sec to ~4 mb/sec. With mfi (the system is otherwise unchanged), the average speed is ~200 mb/sec, down ~50 mb/sec.
Cheers
Nik_______________________________________________