Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Replaced FC drive in zpool, now can't get rid of old drive

17 views
Skip to first unread message

Scott

unread,
Jul 28, 2016, 9:46:32 AM7/28/16
to
On a Solaris 10x64 server attached to an HP SmartArray (with 32 LUNs) I have two zpools, each using 16 LUNs. The LUNs are presented via FC (they map 1:1 to HDDs in the HP box). This was my attempt to let ZFS deal with raidz2 of two pools, rather than using the hardware RAID5 the HP box likes.

Multipathing is also done to the LUNs, so the device I deal with starts with /scsi_vchi/... (though /dev/rdsk and /dev/dsk exists at a lower layer for it).

I've had this running for a few years, then one of the HDDs died.
Googling (or, https://docs.oracle.com/cd/E23823_01/html/819-5461/gbbvf.html) says to take the zpool offline in order to do the HDD replacement, which seems absurd, so I didn't do that. I deleted the old LUN (from the HP) and created a new one (from the HP), and when I did the new LUN has a new WWPN.

Next I had Solaris scan for and create new links to the LUN.
Then I did a zpool replace <pool> <old LUN> <new LUN>.
The pool is healthy, after about 24 hours of resilvering.

How do I get rid of the old LUN though?
format and luxadm probe shows it there, but it's not in mpathadm list LU.
If I luxadm probe , get the WWN of the B/O FC HDD, then
luxadm display <WWN> 2>&1 | less
I see ERROR: I/O failure communicating with /dev/rdsk/c5t<longnum>/d0s2

cfgadm -al doesn't show controllers higher than c2.

Regards, Scott

DoN. Nichols

unread,
Jul 29, 2016, 12:26:23 AM7/29/16
to
On 2016-07-28, Scott <spac...@gmail.com> wrote:
> On a Solaris 10x64 server attached to an HP SmartArray (with 32 LUNs) I have two zpools, each using 16 LUNs. The LUNs are presented via FC (they map 1:1 to HDDs in the HP box). This was my attempt to let ZFS deal with raidz2 of two pools, rather than using the hardware RAID5 the HP box likes.

> Multipathing is also done to the LUNs, so the device I deal with
> starts with /scsi_vchi/... (though /dev/rdsk and /dev/dsk exists at a
> lower layer for it).

O.K. No experience with the multipathing, so that might make a
difference.

> I've had this running for a few years, then one of the HDDs died.

Started with new drives, or used ones? I've done both, FWIW.
But I'm using Eurologic drive trays with fiber optics connecting to the
server.

> Googling (or,
> https://docs.oracle.com/cd/E23823_01/html/819-5461/gbbvf.html) says to
> take the zpool offline in order to do the HDD replacement, which seems
> absurd, so I didn't do that. I deleted the old LUN (from the HP) and
> created a new one (from the HP), and when I did the new LUN has a new
> WWPN.

I've replaced drives without having to take the zpool offline.

Of course it has a new WWN -- unique ones are built into each
drive.

> Next I had Solaris scan for and create new links to the LUN.
> Then I did a zpool replace <pool> <old LUN> <new LUN>.
> The pool is healthy, after about 24 hours of resilvering.

O.K. Quite a long resilvering -- but maybe you are using larger
drives. I'm using 500 GB FC drives, in some Eurologic trays -- JBODs,
connected to the server via fiber optics. And I alson ever put that
many drives in an array (in part because of the number of drives which
the Eurologic tray holds.) I typically put five drives in a raidz2,
have one hot spare per array (cross-linked so they can serve in any
array that needs one of that size), and one drive which I can
experiment with.

I've even moved from 146 GB drives to 500 GB drives, by
replacing them one at a time -- and when the last drive was replaced,
the size of the pool jumped to appropriate for the size of the new
drives.

> How do I get rid of the old LUN though?
> format and luxadm probe shows it there, but it's not in mpathadm list LU.
> If I luxadm probe , get the WWN of the B/O FC HDD, then
> luxadm display <WWN> 2>&1 | less
> I see ERROR: I/O failure communicating with /dev/rdsk/c5t<longnum>/d0s2

Did you use "devfsadm" to get the system to see the new drive
and to say goodbye to the old one? Typically, I use:

devfsadm -c disk -C

-c so it doesn't bother doing anything with the other things
(tape drives and whatever),

and the -C to clean out dev entries for things which are no
longer there.

But -- your hardware RAID controller may be telling the system
that it still is present somewhere -- it just can't talk to it now. :-)

The only thing that I have with a hardware RAID controller is a
Sun Fire X4150 which is powered down at the moment because of a
thunderstorm -- and because it is an experimental machine for me, not a
server which I am depending on so far.

Note, that some things which were automatic in older zfs
versions are no longer so in the current ones. Try "zpool get all" to
get a list like this:

======================================================================
usage:
get <"all" | property[,...]> <pool> ...

the following properties are supported:

PROPERTY EDIT VALUES

allocated NO <size>
capacity NO <size>
free NO <size>
guid NO <guid>
health NO <state>
size NO <size>
altroot YES <path>
autoexpand YES on | off
autoreplace YES on | off
bootfs YES <filesystem>
cachefile YES <file> | none
delegation YES on | off
failmode YES wait | continue | panic
listsnapshots YES on | off
readonly YES on | off
version YES <version>
======================================================================

autoreplace is one of those.

======================================================================
autoreplace=on | off

Controls automatic device replacement. If set to "off",
device replacement must be initiated by the administra-
tor by using the "zpool replace" command. If set to
"on", any new device, found in the same physical
location as a device that previously belonged to the
pool, is automatically formatted and replaced. The
default behavior is "off". This property can also be
referred to by its shortened column name, "replace".
======================================================================

And autoexpand is another. That is what allows the pool size to
expand when all drives in the pool have been replaced with larger ones.

======================================================================
autoexpand=on | off

Controls automatic pool expansion when a larger device
is added or attached to the pool or when a larger device
replaces a smaller device in the pool. If set to on, the
pool will be resized according to the size of the
expanded device. If the device is part of a mirror or
raidz then all devices within that mirror/raidz group
must be expanded before the new space is made available
to the pool. The default behavior is off. This property
can also be referred to by its shortened column name,
expand.
======================================================================

> cfgadm -al doesn't show controllers higher than c2.

Hmm ... what server? I am running a T5220 as my primary server,
and that one goes up to c5 -- with some fiber optic cards in it to talk
to two of the Eurologic trays, and spare fiber optic ports (which are c2
and c3, I think. The trays were previously running connected to a Sun
Fire 280R (rack-mount server version of the Sun Blade 1000/2000), and
this was via copper connection with both trays chained together.

> Regards, Scott

Hopefully, if what I have mentioned does not apply, something
else may do so.

Good Luck,
DoN.

--
Remove oil spill source from e-mail
Email: <BPdnic...@d-and-d.com> | (KV4PH) Voice (all times): (703) 938-4564
(too) near Washington D.C. | http://www.d-and-d.com/dnichols/DoN.html
--- Black Holes are where God is dividing by zero ---
0 new messages