Sami Ketola wrote:
> Doug Freyburger <
dfre...@yahoo.com> wrote:
>> config:
>>
>> devstuff UNAVAIL insufficient replicas
>> emcpower0c UNAVAIL cannot open
>> emcpower1c UNAVAIL cannot open
>> emcpower2c UNAVAIL cannot open
>> emcpower3c UNAVAIL cannot open
>
> what does zdb -lv /dev/dsk/emcpower0c say? Can it find all 4 zfs labels?
> and on the other device files?
Thanks for this!
When I try any of the 4 PowerPath devices they all see the first 2
labels and they are correctly copies of each other with all the same
data. For the 3rd and 4th label they say:
--------------------------------------------
LABEL 2
--------------------------------------------
failed to read label 2
--------------------------------------------
LABEL 3
--------------------------------------------
failed to read label 3
Now I'm off to FTFM (Find) and RTFM (Read) to figure out what it means
to have 2 of 4 labels available. It sounds bad - Whatever overwrote
some of the labels likely also overwrote some of the content data.
> can you find any valid lables on any devices in /dev/dsk?
Trying the devices with the WWN in them that are listed by "powermt
display" half give a slightly differernt result:
--------------------------------------------
LABEL 2
--------------------------------------------
failed to unpack label 2
--------------------------------------------
LABEL 3
--------------------------------------------
failed to unpack label 3
The redundant paths are supposed to be to the exact same devices and
"powermt config" confirms that's true. Creepy that a different path to
the same LUN produces different ouput.
Since they are on a Clariion one device goes through the primary path
the other through the trespass path so maybe it's not completely
creepy. I suspect that just told me that all 4 LUNs to this host are
owned by SP A1. This being a development node I'm not sure how bad that
is. On my list of stuff to check is ownership of the LUNs on all of the
production hosts. Good round robin assignment of SP ownership by LUN is
something I'll pay close attention to on production hosts not on
test/dev hosts.
Going through /dev/*s0 I see the two internal drives that are mirrored
with the metastat class of commands, the DVD-ROM that is unplugged,
six devices with WWNs in their names that are not currently mapped to
the host (that would be 3 former LUNs dual pathed) and the 8 paths to
the 4 PowerPath LUNs. That confirms there are no ZFS configured LUNs
visible to this host other than the ones in the lost zpool.
>> Any suggestions? I've seen a zpool go bad at boot time before, maybe a
>> timing problem with Powerpath or DMP. I could "zpool export devstuff"
>> and "zpool import devstuff" and it worked. But that was on a different
>> host in a different data center. This time it did not work.
>
> you can always try to rename /etc/zfs/zpool.cache reboot and import.
#last reboot | head -6
reboot system boot Tue Jan 3 17:37
reboot system down Tue Jan 3 17:33
reboot system boot Mon Jan 2 10:14
reboot system down Mon Jan 2 10:10
reboot system boot Sun Jan 1 16:29
reboot system down Sun Jan 1 16:25
#ls -laF /etc/zfs
total 14
drwxr-xr-x 2 root sys 512 Jan 2 10:04 ./
drwxr-xr-x 84 root sys 5632 Jan 3 17:40 ../
#
The modify time on the directory is 6 minutes before one of the reboots.
That makes sense - The reboot on the first is when the zpool was lost
but since this system is not production it tickets but does not page. I
saw the tickets on the second and tried "zpool export devstuff" and
"zpool import devstuff" because this host has lost its zpool before and
that recovered the zpool at that time. When I did the "zpool export
devstuff" it removed zpool.cache. Sigh. So I've done that and it did
not help. Thanks for the suggestion!
Had I ever lost a zpool on any other host I'd conclude that ZFS is not
ready for prime time. But since this host is the only one with the
problem I conclude this host is no longer even appropriate for a
test/dev environment. There's a replacement host on the way to replace
it but it won't arrive for a month. Guess it's time to expedite.
The reboot yesterday was for another theory. This host has lost its
zpool before while up and another time failed to bring them up at
reboot. I wondered if it was a race condition bringing up the PowerPath
devices too late in the boot process so they are not ready when ZFS is
started. If so then booting with an exported zpool would handle that.
It did not work.