ZFS Disk Replacement Notes

Nick Andre

unread,

Feb 28, 2021, 3:24:15 PM2/28/21

to wmfo-ops

Thanks so much to Bailey who swapped the failed drive today in Ringo! He informs me the new drive is 8TB and we have two more spares on hand after this one.

I'm going to attempt to fix the zpool and tell it to use the new disk, leaving the research here for posterity. I can see the status by running:

wmfo-admin@ringo:~$ sudo zpool status
pool: zvol0
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: scrub repaired 0 in 31h12m with 0 errors on Mon Oct 12 07:36:57 2020
config:

NAME STATE READ WRITE CKSUM
zvol0 DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
wwn-0x50014ee26120df1b ONLINE 0 0 0
wwn-0x50014ee2b673aa8f ONLINE 0 0 0
2556652543945511217 UNAVAIL 0 0 0 was /dev/disk/by-id/wwn-0x50014ee20bc8a3f7-part1
wwn-0x50014ee2b675e679 ONLINE 0 0 0
wwn-0x50014ee2b673b323 ONLINE 0 0 0
wwn-0x50014ee20bc8a028 ONLINE 0 0 0
wwn-0x50014ee20bc8a9cc ONLINE 0 0 0
wwn-0x50014ee20bca37bd ONLINE 0 0 0

Handily it tells me to use zpool replace. So naturally I googled the answer. This doc page comes up. I ran:

wmfo-admin@ringo:~$ sudo zpool offline zvol0 2556652543945511217

After which the status on that disk changed from UNAVAIL to OFFLINE.

The dock page wasn't super useful as it didn't seem to be written for this config style (with IDs instead of device names). ZFS really did not like my using sudo zpool replace zvol0 2556652543945511217 for some reason, so I went to the disk by ID area and found the list of IDs for the raw disks:

wmfo-admin@ringo:~$ ls -l /dev/disk/by-id/ | grep -v part | grep sd

lrwxrwxrwx 1 root root 9 Feb 28 14:44 wwn-0x5000cca0bbd0ff7a -> ../../sdc
lrwxrwxrwx 1 root root 9 Oct 23 14:31 wwn-0x50014ee20bc8a028 -> ../../sdf
lrwxrwxrwx 1 root root 9 Oct 23 14:31 wwn-0x50014ee20bc8a9cc -> ../../sdg
lrwxrwxrwx 1 root root 9 Oct 23 14:31 wwn-0x50014ee20bca37bd -> ../../sdh
lrwxrwxrwx 1 root root 9 Oct 23 14:31 wwn-0x50014ee26120df1b -> ../../sda
lrwxrwxrwx 1 root root 9 Oct 23 14:31 wwn-0x50014ee2b673aa8f -> ../../sdb
lrwxrwxrwx 1 root root 9 Oct 23 14:31 wwn-0x50014ee2b673b323 -> ../../sde
lrwxrwxrwx 1 root root 9 Oct 23 14:31 wwn-0x50014ee2b675e679 -> ../../sdd

I see that sdc is the one that's new and doesn't match any IDs in the status list. However running:

wmfo-admin@ringo:~$ sudo zpool replace zvol0 wwn-0x5000cca0bbd0ff7a
invalid vdev specification
use '-f' to override the following errors:
/dev/disk/by-id/wwn-0x5000cca0bbd0ff7a does not contain an EFI label but it may contain partition
information in the MBR.

fails, leading me to this stackoverflow. So I used good ol' parted:

wmfo-admin@ringo:~$ sudo parted /dev/sdc
GNU Parted 3.2
Using /dev/sdc
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mklabel GPT
(parted) q
Information: You may need to update /etc/fstab.

This magic seemed to replace the old ID with the new one (got the old ID from the status, new ID from the new disk above):

sudo zpool replace zvol0 wwn-0x50014ee20bc8a3f7 wwn-0x5000cca0bbd0ff7a

And now I guess we wait, says resilvering:

wmfo-admin@ringo:~$ sudo zpool status
pool: zvol0
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sun Feb 28 15:17:41 2021
730M scanned out of 39.3T at 14.0M/s, (scan is slow, no estimated time)
91.1M resilvered, 0.00% done
config:

NAME STATE READ WRITE CKSUM
zvol0 DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
wwn-0x50014ee26120df1b ONLINE 0 0 0
wwn-0x50014ee2b673aa8f ONLINE 0 0 0
replacing-2 OFFLINE 0 0 0
2556652543945511217 OFFLINE 0 0 0 was /dev/disk/by-id/wwn-0x50014ee20bc8a3f7-part1
wwn-0x5000cca0bbd0ff7a ONLINE 0 0 0 (resilvering)
wwn-0x50014ee2b675e679 ONLINE 0 0 0
wwn-0x50014ee2b673b323 ONLINE 0 0 0
wwn-0x50014ee20bc8a028 ONLINE 0 0 0
wwn-0x50014ee20bc8a9cc ONLINE 0 0 0
wwn-0x50014ee20bca37bd ONLINE 0 0 0

errors: No known data errors

Nick Andre

unread,

Feb 28, 2021, 3:53:13 PM2/28/21

to wmfo-ops

I should add that with the array doing all this work we may notice hiccups over the next two days or so due to the overhead.

Current estimate from ZFS is 483G scanned out of 39.3T at 239M/s, 47h18m to go

Frank Rossano

unread,

Mar 2, 2021, 8:00:56 AM3/2/21

to Nick Andre, wmfo-ops

Thanks for doing that Bailey. Nick, how's it going with the re-normalizing job? Did it finish or is it still in progress?

Frank

Nick Andre

unread,

Mar 2, 2021, 11:04:12 AM3/2/21

to Frank Rossano, wmfo-ops

Seems like it's all done, the array is healthy again :)

Reply all

Reply to author

Forward