ZFS Disk Replacement Notes

75 views
Skip to first unread message

Nick Andre

unread,
Feb 28, 2021, 3:24:15 PM2/28/21
to wmfo-ops
Thanks so much to Bailey who swapped the failed drive today in Ringo! He informs me the new drive is 8TB and we have two more spares on hand after this one.

I'm going to attempt to fix the zpool and tell it to use the new disk, leaving the research here for posterity. I can see the status by running:

wmfo-admin@ringo:~$ sudo zpool status
  pool: zvol0
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 0 in 31h12m with 0 errors on Mon Oct 12 07:36:57 2020
config:

        NAME                        STATE     READ WRITE CKSUM
        zvol0                       DEGRADED     0     0     0
          raidz2-0                  DEGRADED     0     0     0
            wwn-0x50014ee26120df1b  ONLINE       0     0     0
            wwn-0x50014ee2b673aa8f  ONLINE       0     0     0
            2556652543945511217     UNAVAIL      0     0     0  was /dev/disk/by-id/wwn-0x50014ee20bc8a3f7-part1
            wwn-0x50014ee2b675e679  ONLINE       0     0     0
            wwn-0x50014ee2b673b323  ONLINE       0     0     0
            wwn-0x50014ee20bc8a028  ONLINE       0     0     0
            wwn-0x50014ee20bc8a9cc  ONLINE       0     0     0
            wwn-0x50014ee20bca37bd  ONLINE       0     0     0


Handily it tells me to use zpool replace. So naturally I googled the answer. This doc page comes up. I ran:
wmfo-admin@ringo:~$ sudo zpool offline zvol0 2556652543945511217

After which the status on that disk changed from UNAVAIL to OFFLINE.

The dock page wasn't super useful as it didn't seem to be written for this config style (with IDs instead of device names). ZFS really did not like my using sudo zpool replace zvol0 2556652543945511217 for some reason, so I went to the disk by ID area and found the list of IDs for the raw disks:

wmfo-admin@ringo:~$ ls -l /dev/disk/by-id/ | grep -v part | grep sd
lrwxrwxrwx 1 root root  9 Feb 28 14:44 wwn-0x5000cca0bbd0ff7a -> ../../sdc
lrwxrwxrwx 1 root root  9 Oct 23 14:31 wwn-0x50014ee20bc8a028 -> ../../sdf
lrwxrwxrwx 1 root root  9 Oct 23 14:31 wwn-0x50014ee20bc8a9cc -> ../../sdg
lrwxrwxrwx 1 root root  9 Oct 23 14:31 wwn-0x50014ee20bca37bd -> ../../sdh
lrwxrwxrwx 1 root root  9 Oct 23 14:31 wwn-0x50014ee26120df1b -> ../../sda
lrwxrwxrwx 1 root root  9 Oct 23 14:31 wwn-0x50014ee2b673aa8f -> ../../sdb
lrwxrwxrwx 1 root root  9 Oct 23 14:31 wwn-0x50014ee2b673b323 -> ../../sde
lrwxrwxrwx 1 root root  9 Oct 23 14:31 wwn-0x50014ee2b675e679 -> ../../sdd


I see that sdc is the one that's new and doesn't match any IDs in the status list. However running:

wmfo-admin@ringo:~$ sudo zpool replace zvol0 wwn-0x5000cca0bbd0ff7a
invalid vdev specification
use '-f' to override the following errors:
/dev/disk/by-id/wwn-0x5000cca0bbd0ff7a does not contain an EFI label but it may contain partition
information in the MBR.


fails, leading me to this stackoverflow. So I used good ol' parted:

wmfo-admin@ringo:~$ sudo parted /dev/sdc
GNU Parted 3.2
Using /dev/sdc
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mklabel GPT
(parted) q
Information: You may need to update /etc/fstab.


This magic seemed to replace the old ID with the new one (got the old ID from the status, new ID from the new disk above):

sudo zpool replace zvol0 wwn-0x50014ee20bc8a3f7 wwn-0x5000cca0bbd0ff7a

And now I guess we wait, says resilvering:

wmfo-admin@ringo:~$ sudo zpool status
  pool: zvol0
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Feb 28 15:17:41 2021
    730M scanned out of 39.3T at 14.0M/s, (scan is slow, no estimated time)
    91.1M resilvered, 0.00% done
config:

        NAME                          STATE     READ WRITE CKSUM
        zvol0                         DEGRADED     0     0     0
          raidz2-0                    DEGRADED     0     0     0
            wwn-0x50014ee26120df1b    ONLINE       0     0     0
            wwn-0x50014ee2b673aa8f    ONLINE       0     0     0
            replacing-2               OFFLINE      0     0     0
              2556652543945511217     OFFLINE      0     0     0  was /dev/disk/by-id/wwn-0x50014ee20bc8a3f7-part1
              wwn-0x5000cca0bbd0ff7a  ONLINE       0     0     0  (resilvering)
            wwn-0x50014ee2b675e679    ONLINE       0     0     0
            wwn-0x50014ee2b673b323    ONLINE       0     0     0
            wwn-0x50014ee20bc8a028    ONLINE       0     0     0
            wwn-0x50014ee20bc8a9cc    ONLINE       0     0     0
            wwn-0x50014ee20bca37bd    ONLINE       0     0     0

errors: No known data errors


Nick Andre

unread,
Feb 28, 2021, 3:53:13 PM2/28/21
to wmfo-ops
I should add that with the array doing all this work we may notice hiccups over the next two days or so due to the overhead.

Current estimate from ZFS is 483G scanned out of 39.3T at 239M/s, 47h18m to go

Frank Rossano

unread,
Mar 2, 2021, 8:00:56 AM3/2/21
to Nick Andre, wmfo-ops
Thanks for doing that Bailey.  Nick, how's it going with the re-normalizing job?  Did it finish or is it still in progress?

Frank

Nick Andre

unread,
Mar 2, 2021, 11:04:12 AM3/2/21
to Frank Rossano, wmfo-ops
Seems like it's all done, the array is healthy again :)
Reply all
Reply to author
Forward
0 new messages