Replacing a REMOVED drive in DEGRADED zpool

Robert

unread,

Aug 20, 2025, 8:56:40 PMAug 20

to ques...@freebsd.org

I have my first zpool degraded on a FreeBSD 13.5 server and looking for
advice on the steps I'll be taking to successfully replace the REMOVED
drive in a 4 disk 2 mirror zpool. It is scrubbed monthly with last scrub
August 3rd...

root@db1:~ # zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP
HEALTH ALTROOT
zdb1 262G 102G 160G - - 53% 38% 1.00x
DEGRADED -
root@db1:~ # zpool status
pool: zdb1
state: DEGRADED
status: One or more devices has been removed by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using zpool online' or replace the device with
'zpool replace'.
scan: scrub repaired 0B in 00:27:57 with 0 errors on Sun Aug 3
04:43:48 2025
config:

NAME STATE READ WRITE CKSUM
zdb1 DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
ada0p3 REMOVED 0 0 0
ada1p3 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ada2p3 ONLINE 0 0 0
ada3p3 ONLINE 0 0 0

I have data backup of any important data. I also use zfs-autobackup form
an old remote FreeNAS server to take hourly snapshots ...

zfs-autobackup -v --keep-source 72 --keep-target 168 --ssh-source db1
offsite1 DATA/backups/db1

So, I have the last 72 hours of snapshots on the local server with 7
days worth on the remote NAS. The disk entered the REMOVED state at 6am
this morning, a little over 14 hours ago and I plan to replace on Friday
night to give myself some time in case a restore needs to happen.
Perhaps I should bump the local snapshot storage up to 168 hours (1
week) as well at this point or hold what is there, can I hold all
snapshots with one command? Here is the disk info for the 3 drives
remaining in the zpool ...

root@db1:~ # camcontrol devlist
<WDC WD1500ADFD-00NLR5 21.07QR5> at scbus1 target 0 lun 0 (ada1,pass1)
<WDC WD1500HLFS-01G6U3 04.04V05> at scbus2 target 0 lun 0 (ada2,pass2)
<WDC WD1500ADFD-00NLR5 21.07QR5> at scbus3 target 0 lun 0 (ada3,pass3)
root@db1:~ # gpart show ada1
=> 40 293046688 ada1 GPT (140G)
40 1024 1 freebsd-boot (512K)
1064 984 - free - (492K)
2048 16777216 2 freebsd-swap (8.0G)
16779264 276267008 3 freebsd-zfs (132G)
293046272 456 - free - (228K)

All the drive report identical layouts as ada1. I've used camcontrol
with identify to get all the serial numbers of these drives, so I plan
to shut the server down, pull the bad drive and insert the replacement,
boot up and replace. Would these be the steps I need to take assuming
the replacement drive shows up as the same ada0 device?

1. Run `zpool offline zdb1 ada0p3`
2. Shut down and pull/insert replacement
3. Boot up and run `gpart backup ada1 > gpart.ada1` then `gpart restore
ada0 < gpart.ada1`
4. Run `zpool replace zdb1 ada0p3 ada0p3`

I'm just not sure if this is all that is needed with a ROOT zpool or if
all correct. I appreciate any guidance. Here is the full zfs list...

root@db1:~ # zfs list
NAME USED AVAIL REFER MOUNTPOINT
zdb1 103G 152G 96K /zdb1
zdb1/ROOT 101G 151G 96K none
zdb1/ROOT/13.1-RELEASE-p7_2023-05-04_200035 8K 151G 15.8G /
zdb1/ROOT/13.2-RELEASE-p1_2023-08-07_124053 8K 151G 25.4G /
zdb1/ROOT/13.2-RELEASE-p2_2023-09-09_111305 8K 151G 29.0G /
zdb1/ROOT/13.2-RELEASE-p3_2023-12-31_111612 8K 151G 33.1G /
zdb1/ROOT/13.2-RELEASE-p9_2024-04-14_121449 8K 151G 34.9G /
zdb1/ROOT/13.2-RELEASE_2023-05-04_200614 8K 151G 15.9G /
zdb1/ROOT/13.2-RELEASE_2023-08-01_151806 8K 151G 25.1G /
zdb1/ROOT/13.3-RELEASE-p1_2024-04-14_121907 8K 151G 34.9G /
zdb1/ROOT/13.3-RELEASE-p1_2024-08-04_122937 8K 151G 36.2G /
zdb1/ROOT/13.3-RELEASE-p4_2025-01-04_162341 8K 151G 36.3G /
zdb1/ROOT/13.3-RELEASE-p8_2025-01-04_164203 8K 151G 36.9G /
zdb1/ROOT/13.4-RELEASE-p1_2025-01-04_164619 8K 151G 37.0G /
zdb1/ROOT/13.4-RELEASE-p2_2025-05-10_133828 8K 151G 39.4G /
zdb1/ROOT/13.5-RELEASE-p1_2025-07-04_113332 8K 151G 39.6G /
zdb1/ROOT/13.5-RELEASE_2025-05-10_134206 8K 151G 39.4G /
zdb1/ROOT/default 101G 151G 39.5G /
zdb1/tmp 1.12M 151G 200K /tmp
zdb1/usr 1.17G 151G 96K /usr
zdb1/usr/home 1.31M 151G 1.30M /usr/home
zdb1/usr/ports 1.17G 151G 1.17G /usr/ports
zdb1/usr/src 96K 151G 96K /usr/src
zdb1/var 5.35M 151G 96K /var
zdb1/var/audit 96K 151G 96K /var/audit
zdb1/var/crash 96K 151G 96K /var/crash
zdb1/var/log 4.78M 151G 660K /var/log
zdb1/var/mail 200K 151G 144K /var/mail
zdb1/var/tmp 96K 151G 96K /var/tmp

Thank you.

--
Robert

Matthias Fechner

unread,

Aug 20, 2025, 11:23:45 PMAug 20

to Robert, ques...@freebsd.org

Hi Robert,

Am 21.08.2025 um 03:55 schrieb Robert:
> 1. Run `zpool offline zdb1 ada0p3`
> 2. Shut down and pull/insert replacement
> 3. Boot up and run `gpart backup ada1 > gpart.ada1` then `gpart
> restore ada0 < gpart.ada1`
> 4. Run `zpool replace zdb1 ada0p3 ada0p3`

I have written for this some notes for myself:
https://wiki.idefix.fechner.net/freebsd/zfs/#replace-a-failed-disk

I do not think you must offline it, you should be able to use replace
command or attach (I would check the man page to verify that for the
version you use).

I hope that helps.

Matthias

David Christensen

unread,

Aug 21, 2025, 2:15:13 AMAug 21

to ques...@freebsd.org

I am used to seeing non-zero numbers in the READ, WRITE, and/or CKSUM
columns for a bad disk. Did you do something to reset the numbers?

>
> I have data backup of any important data. I also use zfs-autobackup form
> an old remote FreeNAS server to take hourly snapshots ...
>
> zfs-autobackup -v --keep-source 72 --keep-target 168 --ssh-source db1
> offsite1 DATA/backups/db1
>
> So, I have the last 72 hours of snapshots on the local server with 7
> days worth on the remote NAS.

Good.

> The disk entered the REMOVED state at 6am
> this morning, a little over 14 hours ago

`zpool status`, above, said "One or more devices has been removed by the
administrator". Did a human remove the disk or did ZFS? If a human,
what command did they use?

> and I plan to replace on Friday
> night to give myself some time in case a restore needs to happen.
> Perhaps I should bump the local snapshot storage up to 168 hours (1
> week) as well at this point or hold what is there, can I hold all
> snapshots with one command? Here is the disk info for the 3 drives
> remaining in the zpool ...
>
> root@db1:~ # camcontrol devlist
> <WDC WD1500ADFD-00NLR5 21.07QR5> at scbus1 target 0 lun 0 (ada1,pass1)
> <WDC WD1500HLFS-01G6U3 04.04V05> at scbus2 target 0 lun 0 (ada2,pass2)
> <WDC WD1500ADFD-00NLR5 21.07QR5> at scbus3 target 0 lun 0 (ada3,pass3)

ZFS RAID10 with Raptors and VelociRaptor -- a blast from the past! :-)

The problem with vintage systems is "do not throw good money after bad".
If you already have a spare Raptor or VelociRaptor and all four disks
test and report good with smartctl(8), then perhaps replacing the failed
disk with another disk is a good idea. Otherwise, I would consider
other options (a pair of SSD's).

> root@db1:~ # gpart show ada1
> => 40 293046688 ada1 GPT (140G)
> 40 1024 1 freebsd-boot (512K)
> 1064 984 - free - (492K)
> 2048 16777216 2 freebsd-swap (8.0G)
> 16779264 276267008 3 freebsd-zfs (132G)
> 293046272 456 - free - (228K)

What partition scheme is on the disks?

I do not see an EFI system partition. Is the motherboard firmware
BIOS/Legacy or UEFI?

How is ada0p1 freebsd-boot configured into the system? ZFS
stripe-of-mirrors? UFS gmirror/gstripe RAID10?

How is ada0p2 freebsd-swap configured into the system? One of four swap
devices?

>
> All the drive report identical layouts as ada1. I've used camcontrol
> with identify to get all the serial numbers of these drives, so I plan
> to shut the server down, pull the bad drive and insert the replacement,
> boot up and replace. Would these be the steps I need to take assuming
> the replacement drive shows up as the same ada0 device?
>
> 1. Run `zpool offline zdb1 ada0p3`

I would use zpool-detach(8) to remove the failed disk from the pool.

You will need to disconnect ada0p1 freebsd-boot and ada0p2 freebsd-swap
according to how they are configured into your system.

> 2. Shut down and pull/insert replacement
> 3. Boot up and run `gpart backup ada1 > gpart.ada1` then `gpart restore
> ada0 < gpart.ada1`
> 4. Run `zpool replace zdb1 ada0p3 ada0p3`

Cloning ada1's GPT's (primary and secondary) to ada0 will result in
duplicate identifiers on two disks -- UUID's, labels, etc.. Two disks
with matching identifiers in the same computer is asking for trouble. I
would not do that. If anything, clone the failed disk GPT's to the
replacement disk GPT's -- but only if the failed disk GPT's are good.

If the failed disk is still mostly operational with bad blocks all
within the middle data portion of ada0p3 (NOT in metadata), cloning the
failed disk to the replacement disk could save effort. ddrescue(1) may
be required to get past bad blocks.

Otherwise, I would zero the replacement disk and build it manually.

I would use zpool-attach(8) to add the replacement ada0p3 as a mirror of
ada1p3.

You will need to build and connect the replacement ada0p1 freebsd-boot
and replacement ada0p2 freebsd-swap according to how they are to be
configured into your system.

Finally, ZFS, ZFS stripe-of-mirrors, root-on-ZFS, and gmirror/gstripe
RAID10 are all non-trivial. Replacing such a disk correctly is going to
require a lot of knowledge. If you like learning adventures, go for it.

But if you want 24x7 operations, I do better with
backup/wipe/install/restore. It is simpler, I can estimate how long it
will take, I can roll it back, and I have confidence in the results. If
you go this route, I would put FreeBSD on UFS on a single small SSD and
put the data on ZFS with redundant disks. Backup the OS disk with
rsync(1) and take images regularly. Restoring an OS disk from an image
is the fastest way to recover from a OS disk disaster.

David

Dag-Erling Smørgrav

unread,

Aug 21, 2025, 2:56:17 AMAug 21

to David Christensen, ques...@freebsd.org

David Christensen <dpch...@holgerdanske.com> writes:

> Robert <rob...@webtent.org> writes:
> > 3. Boot up and run `gpart backup ada1 > gpart.ada1` then `gpart
> > restore ada0 < gpart.ada1`

> Cloning ada1's GPT's (primary and secondary) to ada0 will result in
> duplicate identifiers on two disks -- UUID's, labels, etc..

`gpart backup | gpart restore` does not copy UUIDs or labels.

> [...] cloning the failed disk to the replacement disk could save

> effort. ddrescue(1) may be required to get past bad blocks.

You mean recoverdisk(1).

DES
--
Dag-Erling Smørgrav - d...@FreeBSD.org

Robert

unread,

Aug 21, 2025, 9:14:06 AMAug 21

to Matthias Fechner, ques...@freebsd.org

Thanks, there is some great information in your document. And detach
would not be necessary either? It can be replaced in its current status
of REMOVED?

Robert

unread,

Aug 21, 2025, 9:47:12 AMAug 21

to ques...@freebsd.org

On 8/21/2025 2:14 AM, David Christensen wrote:
> On 8/20/25 17:55, Robert wrote:
>
> I am used to seeing non-zero numbers in the READ, WRITE, and/or CKSUM
> columns for a bad disk. Did you do something to reset the numbers?

I know what you mean, I have seen REMOVED using SSD on an IBM x3550
servers using mdadm as well, does the SATA connector hiccup? I haven't
done a thing, no reset or clear. So, I know I could perhaps use the same
drive, but I'll need to smart check it after pulling to be sure.

>> The disk entered the REMOVED state at 6am this morning, a little over
>> 14 hours ago
>
>
> `zpool status`, above, said "One or more devices has been removed by
> the administrator". Did a human remove the disk or did ZFS? If a
> human, what command did they use?

Nope, nobody, this server is in a data center and I'm quite sure no one
touched it.

>> and I plan to replace on Friday night to give myself some time in
>> case a restore needs to happen. Perhaps I should bump the local
>> snapshot storage up to 168 hours (1 week) as well at this point or
>> hold what is there, can I hold all snapshots with one command? Here
>> is the disk info for the 3 drives remaining in the zpool ...
>>
>> root@db1:~ # camcontrol devlist
>> <WDC WD1500ADFD-00NLR5 21.07QR5> at scbus1 target 0 lun 0 (ada1,pass1)
>> <WDC WD1500HLFS-01G6U3 04.04V05> at scbus2 target 0 lun 0 (ada2,pass2)
>> <WDC WD1500ADFD-00NLR5 21.07QR5> at scbus3 target 0 lun 0 (ada3,pass3)
>
>
> ZFS RAID10 with Raptors and VelociRaptor -- a blast from the past! :-)
>
>
> The problem with vintage systems is "do not throw good money after
> bad". If you already have a spare Raptor or VelociRaptor and all four
> disks test and report good with smartctl(8), then perhaps replacing
> the failed disk with another disk is a good idea. Otherwise, I would
> consider other options (a pair of SSD's).

Lol, yep, I believe I have a matching drive sitting in waiting but will
confirm when I get to the data center later today. It's an old
Supermicro that has been like a rock. I don't throw away things unless
they fail or not support somehow. I use redundant hardware and/or
services. I had servers reach 15 years, but this is almost unachievable
these days with technology advancing faster and faster. The redundant db
server for this one does use SSD drives.

>> root@db1:~ # gpart show ada1
>> => 40 293046688 ada1 GPT (140G)
>>    40 1024 1 freebsd-boot (512K)
>>    1064 984 - free - (492K)
>>    2048 16777216 2 freebsd-swap (8.0G)
>>    16779264 276267008 3 freebsd-zfs (132G)
>>    293046272 456 - free - (228K)
>
>
> What partition scheme is on the disks?
>
>
> I do not see an EFI system partition. Is the motherboard firmware
> BIOS/Legacy or UEFI?

Partition scheme? As I showed with gpart, these drives have a GUID
Partition Table with partitions as shown. I believe this server is
legacy and does not supports UEFI.

>
>
> How is ada0p1 freebsd-boot configured into the system? ZFS
> stripe-of-mirrors? UFS gmirror/gstripe RAID10?
>
>
> How is ada0p2 freebsd-swap configured into the system? One of four
> swap devices?

Yes, I am concerned about the boot, I guess I need to `gpart bootcode -b
/boot/pmbr -p /boot/gptzfsboot -i 1 ada?` the new drive after I replace?

>> All the drive report identical layouts as ada1. I've used camcontrol
>> with identify to get all the serial numbers of these drives, so I
>> plan to shut the server down, pull the bad drive and insert the
>> replacement, boot up and replace. Would these be the steps I need to
>> take assuming the replacement drive shows up as the same ada0 device?
>>
>> 1. Run `zpool offline zdb1 ada0p3`
>
>
> I would use zpool-detach(8) to remove the failed disk from the pool.

Yes, I was perhaps expecting an error when I try to offline, I will
instead try to detach.

> You will need to disconnect ada0p1 freebsd-boot and ada0p2
> freebsd-swap according to how they are configured into your system.

This is where I'm not sure. I used the default FreeBSD installation with
ZFS and selected the drives for redundancy I believe. I never did this
before this server and always used UFS with gmirror before. But after
seeing the default in FreeBSD for a long time now, I decided to use. I
get more risky when I have a redundant services as in this case.

> Cloning ada1's GPT's (primary and secondary) to ada0 will result in
> duplicate identifiers on two disks -- UUID's, labels, etc.. Two disks
> with matching identifiers in the same computer is asking for trouble.
> I would not do that. If anything, clone the failed disk GPT's to the
> replacement disk GPT's -- but only if the failed disk GPT's are good.
>
>
> If the failed disk is still mostly operational with bad blocks all
> within the middle data portion of ada0p3 (NOT in metadata), cloning
> the failed disk to the replacement disk could save effort.
> ddrescue(1) may be required to get past bad blocks.
>
>
> Otherwise, I would zero the replacement disk and build it manually.

Ok, I have a matching drive, so I don't plan on reusing the REMOVED
drive. I usually take the REMOVED one and examine it to see if it can be
re-inserted safely as needed. I was hoping the gpart backup/restore
would be the equivalent of sgdisk in Linux that I've used many times to
duplicate a disk used for replacement in software RAID.

> I would use zpool-attach(8) to add the replacement ada0p3 as a mirror
> of ada1p3.

Attach in place of zpool-replace?

> You will need to build and connect the replacement ada0p1 freebsd-boot
> and replacement ada0p2 freebsd-swap according to how they are to be
> configured into your system.

Again, I used the default FreeBSD ZFS install. But this is where I
believe I need to tweak my recovery if gpart backup/restore does not
prepare the disk as expected.

> Finally, ZFS, ZFS stripe-of-mirrors, root-on-ZFS, and gmirror/gstripe
> RAID10 are all non-trivial. Replacing such a disk correctly is going
> to require a lot of knowledge. If you like learning adventures, go
> for it.
>
>
> But if you want 24x7 operations, I do better with
> backup/wipe/install/restore. It is simpler, I can estimate how long
> it will take, I can roll it back, and I have confidence in the
> results. If you go this route, I would put FreeBSD on UFS on a single
> small SSD and put the data on ZFS with redundant disks. Backup the OS
> disk with rsync(1) and take images regularly. Restoring an OS disk
> from an image is the fastest way to recover from a OS disk disaster.

This is my first time recovering from a ROOT zpool, I've used ZFS as you
said for many years with UFS and gmirror two drives for the OS. I would
perhaps like to experiment with restoring from snapshot if someone could
outline the differences handling ROOT zpool using zfs send/receive I
assume just like I've done with ZFS for data only. But for this case,
I'd like to replace as designed and get the server back in operation.

Thanks for all the pointers!

Robert

unread,

Aug 21, 2025, 10:19:52 AMAug 21

to ques...@freebsd.org

On 8/21/2025 2:56 AM, Dag-Erling Smørgrav wrote:
> David Christensen <dpch...@holgerdanske.com> writes:
>> Robert <rob...@webtent.org> writes:
>>> 3. Boot up and run `gpart backup ada1 > gpart.ada1` then `gpart
>>> restore ada0 < gpart.ada1`
>> Cloning ada1's GPT's (primary and secondary) to ada0 will result in
>> duplicate identifiers on two disks -- UUID's, labels, etc..
> `gpart backup | gpart restore` does not copy UUIDs or labels.

Good to know, this would be the best option then for preparing the drive
for zpool-replace?

>> [...] cloning the failed disk to the replacement disk could save
>> effort. ddrescue(1) may be required to get past bad blocks.
> You mean recoverdisk(1).

Would recovering the disk be beneficial versus replace? As far as faster
recovery, not needing to resilver or as much. These are not big drives
as you can see and RAID10 zpool.

Thanks for the pointers!

David Christensen

unread,

Aug 21, 2025, 1:04:30 PMAug 21

to ques...@freebsd.org

On 8/20/25 23:56, Dag-Erling Smørgrav wrote:
> David Christensen <dpch...@holgerdanske.com> writes:
>> Robert <rob...@webtent.org> writes:
>>> 3. Boot up and run `gpart backup ada1 > gpart.ada1` then `gpart
>>> restore ada0 < gpart.ada1`
>> Cloning ada1's GPT's (primary and secondary) to ada0 will result in
>> duplicate identifiers on two disks -- UUID's, labels, etc..
>
> `gpart backup | gpart restore` does not copy UUIDs or labels.

Thank you for the clarification regarding gpart(8). Testing backup:

2025-08-21 09:05:30 toor@f5 ~
# gpart backup ada0
GPT 128
1 freebsd-boot 40 1024
2 freebsd-ufs 1064 29359104
3 freebsd-swap 29360168 1564672

You are correct -- gpart(8) backups are incomplete (!). Disk GUID
(UUID) is missing and partition GUID's (UUID's) and names (labels) are
missing. Where is this documented?

RTFM glabel(8), I believe I see how to set labels after a gpart(8)
restore onto a wiped disk.

RTFM glabel(8) says it supports UUID's, but I do not see how. How do I
set the disk GUID and partition GUID's after a gpart(8) restore?

(`gpart list ada0` shows much more information, but not the disk GUID.)

>
>> [...] cloning the failed disk to the replacement disk could save
>> effort. ddrescue(1) may be required to get past bad blocks.
>
> You mean recoverdisk(1).
>
> DES

recoverdisk(1) looks useful -- thank you for the pointer. :-)

David

Dag-Erling Smørgrav

unread,

Aug 21, 2025, 3:03:48 PMAug 21

to Robert, ques...@freebsd.org

Robert <rob...@webtent.org> writes:

> Dag-Erling Smørgrav <d...@FreeBSD.org> writes:
> > `gpart backup | gpart restore` does not copy UUIDs or labels.
> Good to know, this would be the best option then for preparing the
> drive for zpool-replace?

Yes. You should take a look in /var/backups, you may find a backup of
the partition table from the failed drive. Assuming you remove the
failed drive first, you can safely `gpart restore -l` this backup onto
the replacement drive, which will recreate the labels (but not UUIDs).

> Would recovering the disk be beneficial versus replace? As far as
> faster recovery, not needing to resilver or as much. These are not big
> drives as you can see and RAID10 zpool.

You can try to use recoverdisk to copy undamaged portions of the failed
drive onto the replacement, but it's likely to take longer than
resilvering.

David Christensen

unread,

Aug 21, 2025, 3:05:47 PMAug 21

to ques...@freebsd.org

On 8/21/25 06:46, Robert wrote:
> On 8/21/2025 2:14 AM, David Christensen wrote:
>> On 8/20/25 17:55, Robert wrote:
>>
>> I am used to seeing non-zero numbers in the READ, WRITE, and/or CKSUM
>> columns for a bad disk. Did you do something to reset the numbers?
>
> I know what you mean, I have seen REMOVED using SSD on an IBM x3550
> servers using mdadm as well, does the SATA connector hiccup?

The wrong SATA HBA, cable, backplane, rack, etc., and/or a poor
connection anywhere can wreak havoc. A few years ago, I bought all new
SATA III cables and SATA III mobile racks, and replaced the previous
mixture of SATA I, II, III stuff in my various computers. Disk
reliability improved dramatically.

Re-seating all of the SATA cables, re-seating all of the drive power
cables, and re-seating the HBA, followed by `zpool online zdb1 ada0p3`
and `zpool scrub zdb1` could fix the problem.

> Partition scheme? As I showed with gpart, these drives have a GUID
> Partition Table with partitions as shown. I believe this server is
> legacy and does not supports UEFI.

Okay -- BIOS/Legacy and GPT. Those are key parameters (among others)
that the FreeBSD installer detects and uses to choose what actions to
take. The goal is to reproduce what the installer did.

RTFM gpart(8) has some good information about bootloader stages and what
pieces are needed for the various combinations of BIOS/UEFI, MBR/GPT,
UFS/ZFS/gmirror/gstripe, etc..

Much of the FreeBSD installer is a suite of shell scripts. They are
very well written and reasonably easy to read. Crawling the installer
code is another possible source of information.

> Yes, I am concerned about the boot,

I tried a manual install of FreeBSD not too long ago by following Lucas
([1] and [2]?). Trying to remember those, looking at your previous
`gpart show ada1` console session, knowing the computer is BIOS/Legacy,
and knowing the disks are GPT, I now believe ada[0-4]p1 are bootloader
stages that run after the MBR bootloader stage. They should all be the
same; gmirror/gstripe should not be involved.

Please run and post the following commands to check if ada[0-3]p1
bootloader stages are the same:

# cmp /dev/ada1p1 /dev/ada0p1

# cmp /dev/ada1p1 /dev/ada2p1

# cmp /dev/ada1p1 /dev/ada3p1

Please run and post the following commands to check if ada[0-3] MBR's
are the same:

# cmp <(dd if=/dev/ada1 count=1 status=none) <(dd if=/dev/ada0 count=1
status=none)

# cmp <(dd if=/dev/ada1 count=1 status=none) <(dd if=/dev/ada2 count=1
status=none)

# cmp <(dd if=/dev/ada1 count=1 status=none) <(dd if=/dev/ada3 count=1
status=none)

Please run and post the following commands to confirm that gmirror and
gstripe are not in use. Do not bother to load if prompted:

# gmirror status

# gstripe status

Please run and post the following command to confirm swap:

# swapinfo

Please post (redact as necessary) /boot/loader.conf,
/boot/loader.conf.d/*, /etc/rc.conf, and /etc/fstab in case there is
anything else we need to consider.

> I guess I need to `gpart bootcode -
> b /boot/pmbr -p /boot/gptzfsboot -i 1 ada?` the new drive after I replace?

>> You will need to disconnect ada0p1 freebsd-boot and ada0p2 freebsd-
>> swap according to how they are configured into your system.
>
> This is where I'm not sure. I used the default FreeBSD installation with
> ZFS and selected the drives for redundancy I believe. I never did this
> before this server and always used UFS with gmirror before. But after
> seeing the default in FreeBSD for a long time now, I decided to use. I
> get more risky when I have a redundant services as in this case.

Running and posting the commands above should help determine what to do
with ada0p1 and adap2.

> Ok, I have a matching drive, so I don't plan on reusing the REMOVED
> drive. I usually take the REMOVED one and examine it to see if it can be
> re-inserted safely as needed. I was hoping the gpart backup/restore
> would be the equivalent of sgdisk in Linux that I've used many times to
> duplicate a disk used for replacement in software RAID.

I also use Linux, and FreeBSD gpart(8) backup and restore surprised me
in a bad way. Please see the thread branch starting here:

https://lists.freebsd.org/archives/freebsd-questions/2025-August/006883.html

>> I would use zpool-attach(8) to add the replacement ada0p3 as a mirror
>> of ada1p3.
>
> Attach in place of zpool-replace?

AIUI RTFM zpool(8) if you detach one of two drives in a mirror, the
detached drive is forgotten, the mirror goes away, and the remaining
disk becomes striped at the top level. When you install a new drive and
want to create a mirror with a singular striped drive, use zpool(8) attach.

> Again, I used the default FreeBSD ZFS install. But this is where I
> believe I need to tweak my recovery if gpart backup/restore does not
> prepare the disk as expected.

Yes.

> Thanks for all the pointers!

YW. There is still more to figure out, but you are getting closer.

David

[1] https://mwl.io/nonfiction/os#af3e

[2] https://mwl.io/nonfiction/os#fmse

Dag-Erling Smørgrav

unread,

Aug 21, 2025, 3:06:57 PMAug 21

to David Christensen, ques...@freebsd.org

David Christensen <dpch...@holgerdanske.com> writes:
> You are correct -- gpart(8) backups are incomplete (!). Disk GUID
> (UUID) is missing and partition GUID's (UUID's) and names (labels) are
> missing. Where is this documented?

Labels are included, they're just not restored unless you specify -l.
If your backup doesn't include labels, it means your partition table
didn't have any. UUIDs are intentionally neither included nor restored.

> RTFM glabel(8), I believe I see how to set labels after a gpart(8)
> restore onto a wiped disk.

glabel labels are different and separate from gpart labels.

David Christensen

unread,

Aug 21, 2025, 3:28:59 PMAug 21

to ques...@freebsd.org

On 8/21/25 12:06, Dag-Erling Smørgrav wrote:
> David Christensen <dpch...@holgerdanske.com> writes:
>> You are correct -- gpart(8) backups are incomplete (!). Disk GUID
>> (UUID) is missing and partition GUID's (UUID's) and names (labels) are
>> missing. Where is this documented?
>
> Labels are included, they're just not restored unless you specify -l.
> If your backup doesn't include labels, it means your partition table
> didn't have any.

Thank you for the clarification -- the disk I tested does not have GPT
partition labels. Testing on a disk with GPT partition labels, gpart(8)
action "backup" does print the labels.

> UUIDs are intentionally neither included nor restored.
>
>> RTFM glabel(8), I believe I see how to set labels after a gpart(8)
>> restore onto a wiped disk.
>
> glabel labels are different and separate from gpart labels.
>
> DES

Thank you for the clarification regarding GPT partition labels, GEOM
provider labels, and glabel(8).

So if I want a GPT partition label, I will use gpart(8) action "add"
with option "-l label" or action "modify" with option "-l label".

David

Robert

unread,

Aug 22, 2025, 12:03:24 AMAug 22

to Dag-Erling Smørgrav, ques...@freebsd.org

On 8/21/2025 3:03 PM, Dag-Erling Smørgrav wrote:
> Robert <rob...@webtent.org> writes:
>> Dag-Erling Smørgrav <d...@FreeBSD.org> writes:
>>> `gpart backup | gpart restore` does not copy UUIDs or labels.
>> Good to know, this would be the best option then for preparing the
>> drive for zpool-replace?
> Yes. You should take a look in /var/backups, you may find a backup of
> the partition table from the failed drive. Assuming you remove the
> failed drive first, you can safely `gpart restore -l` this backup onto
> the replacement drive, which will recreate the labels (but not UUIDs).

Great, had no idea, yes, I see the gpartada0.backup in /var/backups...

root@db1:~ # cat /var/backups/gpart.ada0.bak <<-- REMOVED disk
GPT 128
1 freebsd-boot 40 1024 gptboot0
2 freebsd-swap 2048 16777216 swap0
3 freebsd-zfs 16779264 276267008 zfs0
root@db1:~ # cat /var/backups/gpart.ada1.bak
GPT 128
1 freebsd-boot 40 1024 gptboot1
2 freebsd-swap 2048 16777216 swap1
3 freebsd-zfs 16779264 276267008 zfs1
root@db1:~ # cat /var/backups/gpart.ada2.bak
GPT 128
1 freebsd-boot 40 1024 gptboot2
2 freebsd-swap 2048 16777216 swap2
3 freebsd-zfs 16779264 276267008 zfs2
root@db1:~ # cat /var/backups/gpart.ada3.bak
GPT 128
1 freebsd-boot 40 1024 gptboot3
2 freebsd-swap 2048 16777216 swap3
3 freebsd-zfs 16779264 276267008 zfs3
root@db1:~ # cat /var/backups/gpart.ada4.bak

>> Would recovering the disk be beneficial versus replace? As far as
>> faster recovery, not needing to resilver or as much. These are not big
>> drives as you can see and RAID10 zpool.
> You can try to use recoverdisk to copy undamaged portions of the failed
> drive onto the replacement, but it's likely to take longer than
> resilvering.

Then I'll stick to the original plan but with attach instead of replace
using `zpool attach ada0p3 ada0p3`.

Robert

unread,

Aug 22, 2025, 12:44:27 AMAug 22

to ques...@freebsd.org

On 8/21/2025 3:04 PM, David Christensen wrote:
> The wrong SATA HBA, cable, backplane, rack, etc., and/or a poor
> connection anywhere can wreak havoc. A few years ago, I bought all
> new SATA III cables and SATA III mobile racks, and replaced the
> previous mixture of SATA I, II, III stuff in my various computers.
> Disk reliability improved dramatically.
>
>
> Re-seating all of the SATA cables, re-seating all of the drive power
> cables, and re-seating the HBA, followed by `zpool online zdb1 ada0p3`
> and `zpool scrub zdb1` could fix the problem.

Will try first, thanks!

> Okay -- BIOS/Legacy and GPT. Those are key parameters (among others)
> that the FreeBSD installer detects and uses to choose what actions to
> take. The goal is to reproduce what the installer did.

Which is...

root@db1:~ # cat /var/backups/gpart.ada0.bak

GPT 128
1 freebsd-boot 40 1024 gptboot0
2 freebsd-swap 2048 16777216 swap0
3 freebsd-zfs 16779264 276267008 zfs0

> Please run and post the following commands to check if ada[0-3]p1
> bootloader stages are the same:
>
> # cmp /dev/ada1p1 /dev/ada0p1

Not until I resolve ada0.
> # cmp /dev/ada1p1 /dev/ada2p1
root@db1:~ # cmp /dev/ada1p1 /dev/ada2p1
/dev/ada1p1 /dev/ada2p1 differ: char 159233, line 417
> # cmp /dev/ada1p1 /dev/ada3p1

Interesting?

root@db1:~ # cmp /dev/ada1p1 /dev/ada3p1
root@db1:~ #

> Please run and post the following commands to check if ada[0-3] MBR's
> are the same:
>
> # cmp <(dd if=/dev/ada1 count=1 status=none) <(dd if=/dev/ada0 count=1
> status=none)
>
> # cmp <(dd if=/dev/ada1 count=1 status=none) <(dd if=/dev/ada2 count=1
> status=none)
>
> # cmp <(dd if=/dev/ada1 count=1 status=none) <(dd if=/dev/ada3 count=1
> status=none)

root@db1:~ # cmp <(dd if=/dev/ada1 count=1 status=none) <(dd
if=/dev/ada0 count=1 status=none)
Missing name for redirect.
root@db1:~ # cmp <(dd if=/dev/ada1 count=1 status=none) <(dd
if=/dev/ada2 count=1 status=none)
Missing name for redirect.
root@db1:~ # cmp <(dd if=/dev/ada1 count=1 status=none) <(dd
if=/dev/ada3 count=1 status=none)
Missing name for redirect.

> Please run and post the following commands to confirm that gmirror and
> gstripe are not in use. Do not bother to load if prompted:
>
> # gmirror status
>
> # gstripe status

Yes, gmirror is in use, I'm guessing a `gmirror insert ada0p2` is neeed?

root@db1:~ # gmirror status
Name Status Components
mirror/swap DEGRADED ada2p2 (ACTIVE)
ada3p2 (ACTIVE)
ada1p2 (ACTIVE)
root@db1:~ # gstripe status
gstripe: Command 'status' not available; try 'load' first.

> Please run and post the following command to confirm swap:
>
> # swapinfo

root@db1:~ # swapinfo
Device 1K-blocks Used Avail Capacity
/dev/mirror/swap 8388604 30216 8358388 0%

> Please post (redact as necessary) /boot/loader.conf,
> /boot/loader.conf.d/*, /etc/rc.conf, and /etc/fstab in case there is
> anything else we need to consider.

root@db1:~ # cat /boot/loader.conf
geom_mirror_load="YES"
kern.geom.label.disk_ident.enable="0"
kern.geom.label.gptid.enable="0"
cryptodev_load="YES"
zfs_load="YES"
root@db1:~ # ls -lah /boot/loader.conf.d/
total 9
drwxr-xr-x 2 root wheel 2B May 12 2022 .
drwxr-xr-x 15 root wheel 71B Jul 4 11:33 ..
root@db1:~ # cat /etc/rc.conf
hostname="db1.REDACTED"
ifconfig_em0="inet REDACTED netmask 255.255.255.192"
defaultrouter="REDACTED.1"

# Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
dumpdev="AUTO"
nrpe3_enable="YES"
sshd_enable="YES"
ntpdate_enable="YES"
ntpd_enable="YES"
named_enable="YES"
postgresql_enable="YES"
nfs_client_enable="YES"
nfs_server_enable="YES"
nfs_server_flags="-u -t -n 4"
rpcbind_enable="YES"
mountd_flags="-r"
mountd_enable="YES"
apache24_enable="YES"
postfix_enable="YES"
sendmail_enable="NONE"
sendmail_submit_enable="NO"
sendmail_outbound_enable="NO"
sendmail_msp_queue_enable="NO"
slapd_enable="YES"
slapd_flags='-h "ldapi://%2fvar%2frun%2fopenldap%2fldapi/
ldap://0.0.0.0:389/ ldaps://0.0.0.0:636/"'
slapd_sockets="/var/run/openldap/ldapi"
mrtg_daemon_enable="YES"
saslauthd_enable="YES"
saslauthd_flags="-a ldap"
root@db1:~ # cat /etc/fstab
# Device Mountpoint FStype Options Dump Pass#
/dev/mirror/swap none swap sw 0 0
REDACTED:/mnt/REDACTED /nfs/backup nfs rw 0 0

> Running and posting the commands above should help determine what to
> do with ada0p1 and adap2.

Thanks for the help.

>> Ok, I have a matching drive, so I don't plan on reusing the REMOVED
>> drive. I usually take the REMOVED one and examine it to see if it can
>> be re-inserted safely as needed. I was hoping the gpart
>> backup/restore would be the equivalent of sgdisk in Linux that I've
>> used many times to duplicate a disk used for replacement in software
>> RAID.
>
>
> I also use Linux, and FreeBSD gpart(8) backup and restore surprised me
> in a bad way. Please see the thread branch starting here:
>
> https://lists.freebsd.org/archives/freebsd-questions/2025-August/006883.html
>

Yeah, I read. So, gpart backup/restore can be used as long as I throw
the -l switch?

>>> I would use zpool-attach(8) to add the replacement ada0p3 as a
>>> mirror of ada1p3.
>>
>> Attach in place of zpool-replace?
>
>
> AIUI RTFM zpool(8) if you detach one of two drives in a mirror, the
> detached drive is forgotten, the mirror goes away, and the remaining
> disk becomes striped at the top level. When you install a new drive
> and want to create a mirror with a singular striped drive, use
> zpool(8) attach.

This is the plan now thanks to all ya'lls help, unless I get away with
re-seating all and zpool online the disk.

--
Robert

David Christensen

unread,

Aug 22, 2025, 1:12:34 AMAug 22

to ques...@freebsd.org

On 8/21/25 21:02, Robert wrote:
> On 8/21/2025 3:03 PM, Dag-Erling Smørgrav wrote:
>> You should take a look in /var/backups, you may find a backup of
>> the partition table from the failed drive. Assuming you remove the
>> failed drive first, you can safely `gpart restore -l` this backup onto
>> the replacement drive, which will recreate the labels (but not UUIDs).
>
> Great, had no idea, yes, I see the gpartada0.backup in /var/backups...
>
> root@db1:~ # cat /var/backups/gpart.ada0.bak <<-- REMOVED disk
> GPT 128
> 1 freebsd-boot 40 1024 gptboot0
> 2 freebsd-swap 2048 16777216 swap0
> 3 freebsd-zfs 16779264 276267008 zfs0
> root@db1:~ # cat /var/backups/gpart.ada1.bak
> GPT 128
> 1 freebsd-boot 40 1024 gptboot1
> 2 freebsd-swap 2048 16777216 swap1
> 3 freebsd-zfs 16779264 276267008 zfs1
> root@db1:~ # cat /var/backups/gpart.ada2.bak
> GPT 128
> 1 freebsd-boot 40 1024 gptboot2
> 2 freebsd-swap 2048 16777216 swap2

> 3 freebsd-zfs 1677926l /v4 276267008 zfs2

> root@db1:~ # cat /var/backups/gpart.ada3.bak
> GPT 128
> 1 freebsd-boot 40 1024 gptboot3
> 2 freebsd-swap 2048 16777216 swap3
> 3 freebsd-zfs 16779264 276267008 zfs3
> root@db1:~ # cat /var/backups/gpart.ada4.bak
>

Good. So long as nothing uses GUID/UUID, gpart(8) restore with labels
should work.

This is my server system disk (BIOS, MBR):

2025-08-21 21:13:19 toor@f5 ~
# gpart show ada0
=> 40 117231328 ada0 GPT (56G)

40 1024 1 freebsd-boot (512K)

1064 29359104 2 freebsd-ufs (14G)
29360168 1564672 3 freebsd-swap (764M)
30924840 86306528 - free - (41G)

I have a backup of the freebsd boot partition:

2025-08-21 21:55:05 toor@f5 ~
# ll /var/backups/boot.ada0p1.bak
-rw-r--r-- 1 root wheel 524288 2024/03/04 03:01:00
/var/backups/boot.ada0p1.bak

And the backup still matches adap1:

2025-08-21 21:13:44 toor@f5 ~
# cmp /dev/ada0p1 /var/backups/boot.ada0p1.bak

2025-08-21 21:14:00 toor@f5 ~
# echo $?
0

The last piece of the puzzle is the MBR. I see some possibilities in /boot:

2025-08-21 21:20:36 toor@f5 ~
# ll -S /boot | grep ' 512 ' | grep -v drwx
-r--r--r-- 1 root wheel 512 2025/05/24 14:51:34 boot0
-r--r--r-- 1 root wheel 512 2025/05/24 14:51:34 boot0sio
-r--r--r-- 1 root wheel 512 2023/04/06 21:24:38 boot1
-r--r--r-- 1 root wheel 512 2023/04/06 21:24:38 mbr
-r--r--r-- 1 root wheel 512 2023/04/06 21:24:38 pmbr

Referring to WikiPedia "Master boot record" table "Structure of a
classical generic MBR":

https://en.wikipedia.org/wiki/Master_boot_record

The bootstrap code area is the first 446 bytes. Look for a match:

2025-08-21 21:24:08 toor@f5 ~
# cmp -n 446 /dev/ada0 /boot/boot0
/dev/ada0 /boot/boot0 differ: char 12, line 1

2025-08-21 21:25:00 toor@f5 ~
# cmp -n 446 /dev/ada0 /boot/boot0sio
/dev/ada0 /boot/boot0sio differ: char 12, line 1

2025-08-21 21:25:05 toor@f5 ~
# cmp -n 446 /dev/ada0 /boot/boot1
/dev/ada0 /boot/boot1 differ: char 1, line 1

2025-08-21 21:25:08 toor@f5 ~
# cmp -n 446 /dev/ada0 /boot/mbr
/dev/ada0 /boot/mbr differ: char 12, line 1

2025-08-21 21:25:12 toor@f5 ~
# cmp -n 446 /dev/ada0 /boot/pmbr

So, the FreeBSD installer put /boot/pmbr into the MBR of my system disk.

Checking the partition table entries and boot signature:

2025-08-21 21:28:19 toor@f5 ~
# cmp -i 446 -n 16 /dev/ada0 /boot/pmbr
/dev/ada0 /boot/pmbr differ: char 3, line 1

2025-08-21 21:28:50 toor@f5 ~
# cmp -i 462 -n 16 /dev/ada0 /boot/pmbr

2025-08-21 21:28:58 toor@f5 ~
# cmp -i 478 -n 16 /dev/ada0 /boot/pmbr

2025-08-21 21:29:09 toor@f5 ~
# cmp -i 494 -n 16 /dev/ada0 /boot/pmbr

2025-08-21 21:29:17 toor@f5 ~
# cmp -i 510 -n 2 /dev/ada0 /boot/pmbr

So, everything matches except partition entry number 1:

2025-08-21 21:31:33 toor@f5 ~
# dd if=/dev/ada0 count=1 status=none | hexdump -s 446 -n 16
000001be 00 00 02 00 ee ff ff ff 01 00 00 00 2f cf fc 06
|............/...|
000001ce

2025-08-21 21:32:27 toor@f5 ~
# dd if=/boot/pmbr count=1 status=none | hexdump -s 446 -n 16
000001be 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|................|
000001ce

So, the installer must have populated the first partition entry.

Referring to the WikiPedia page table "Layout of one 16-byte partition
entry", decoding my MBR first partition entry:

Status or physical drive
inactive

CHS address of first absolute sector in partition
cylinder = 0
head = 0
sector = 2

Partition type
ee = GPT protective MBR

CHS adress of last absolute sector in partition
cylinder = 1023
head = 255
sector = 31

LBA of first absolute sector in the partition
0x00000001 = sector 1

Number of sectors in partition
0x06fccf2f = 117231407 sectors

Convert the number of sectors in partition field value to decimal:

2025-08-21 21:32:37 toor@f5 ~
# perl -e 'printf "%i\n", 0x06fccf2f'
117231407

This matches the disk size minus 1 (for the MBR):

2025-08-21 21:54:57 toor@f5 ~
# diskinfo -v ada0 | grep 'mediasize in sectors'
117231408 # mediasize in sectors

Again, I would check if the failed disk and the other disck all have the
same MBR. If so, you could clone one of them into the MBR of
replacement disk.

>>> Would recovering the disk be beneficial versus replace? As far as
>>> faster recovery, not needing to resilver or as much. These are not big
>>> drives as you can see and RAID10 zpool.
>> You can try to use recoverdisk to copy undamaged portions of the failed
>> drive onto the replacement, but it's likely to take longer than
>> resilvering.
> Then I'll stick to the original plan but with attach instead of replace
> using `zpool attach ada0p3 ada0p3`.
>

I think you have a typo -- the replacement ada0p3 should attach to ada1p3.

David

David Christensen

unread,

Aug 22, 2025, 2:05:12 AMAug 22

to ques...@freebsd.org

On 8/21/25 21:43, Robert wrote:
> On 8/21/2025 3:04 PM, David Christensen wrote:
>> Please run and post the following commands to check if ada[0-3]p1
>> bootloader stages are the same:
>>
>> # cmp /dev/ada1p1 /dev/ada0p1
> Not until I resolve ada0.
>> # cmp /dev/ada1p1 /dev/ada2p1
> root@db1:~ # cmp /dev/ada1p1 /dev/ada2p1
> /dev/ada1p1 /dev/ada2p1 differ: char 159233, line 417
>> # cmp /dev/ada1p1 /dev/ada3p1
>
> Interesting?
>
> root@db1:~ # cmp /dev/ada1p1 /dev/ada3p1
> root@db1:~ #
>

Hmmm... freebsd-boot matches on ada1 and ada3, but not ada2 (?). That
is strange. I wonder if ada2p1 is corrupt? I would try to find the
matching bootloader file in /boot and compare it against all of
ada[0-3]p1. See my post that crossed yours for commands.

>> Please run and post the following commands to check if ada[0-3] MBR's
>> are the same:
>>
>> # cmp <(dd if=/dev/ada1 count=1 status=none) <(dd if=/dev/ada0 count=1
>> status=none)
>>
>> # cmp <(dd if=/dev/ada1 count=1 status=none) <(dd if=/dev/ada2 count=1
>> status=none)
>>
>> # cmp <(dd if=/dev/ada1 count=1 status=none) <(dd if=/dev/ada3 count=1
>> status=none)
> root@db1:~ # cmp <(dd if=/dev/ada1 count=1 status=none) <(dd if=/dev/
> ada0 count=1 status=none)
> Missing name for redirect.
> root@db1:~ # cmp <(dd if=/dev/ada1 count=1 status=none) <(dd if=/dev/

> ada2 count=1 status=none)

> Missing name for redirect.
> root@db1:~ # cmp <(dd if=/dev/ada1 count=1 status=none) <(dd if=/dev/

> ada3 count=1 status=none)
> Missing name for redirect.

I think we are using different shells, and yours does not support
process substitution (?):

2025-08-21 22:58:13 dpchrist@laalaa ~
$ ssh root@f5
root@f5:~ # echo $SHELL
/bin/csh
root@f5:~ # exit
logout
Connection to f5 closed.

2025-08-21 22:58:37 dpchrist@laalaa ~
$ ssh toor@f5

2025-08-21 22:58:41 toor@f5 ~
# echo $SHELL
/usr/local/bin/bash

2025-08-21 22:58:43 toor@f5 ~
# exit
logout
Connection to f5 closed.

>> Please run and post the following commands to confirm that gmirror and
>> gstripe are not in use. Do not bother to load if prompted:
>>
>> # gmirror status
>>
>> # gstripe status
>
> Yes, gmirror is in use, I'm guessing a `gmirror insert ada0p2` is neeed?
>
> root@db1:~ # gmirror status
> Name Status Components
> mirror/swap DEGRADED ada2p2 (ACTIVE)
> ada3p2 (ACTIVE)
> ada1p2 (ACTIVE)
> root@db1:~ # gstripe status
> gstripe: Command 'status' not available; try 'load' first.
>

So, the freebsd-swap partitions are mirrored and the freebsd-boot
partitions are not are not. That is key information. You will need to
make sure that you match the installed configuration when you configure
the replacement disk.

>> Please run and post the following command to confirm swap:
>>
>> # swapinfo
> root@db1:~ # swapinfo
> Device 1K-blocks Used Avail Capacity
> /dev/mirror/swap 8388604 30216 8358388 0%
>
>> Please post (redact as necessary) /boot/loader.conf, /boot/
>> loader.conf.d/*, /etc/rc.conf, and /etc/fstab in case there is
>> anything else we need to consider.
> root@db1:~ # cat /boot/loader.conf
> geom_mirror_load="YES"
> kern.geom.label.disk_ident.enable="0"
> kern.geom.label.gptid.enable="0"
> cryptodev_load="YES"
> zfs_load="YES"

Looks reasonable.

> root@db1:~ # ls -lah /boot/loader.conf.d/
> total 9
> drwxr-xr-x 2 root wheel 2B May 12 2022 .
> drwxr-xr-x 15 root wheel 71B Jul 4 11:33 ..

No configuration add-on files.

That server has some complexity.

> root@db1:~ # cat /etc/fstab
> # Device Mountpoint FStype Options Dump Pass#
> /dev/mirror/swap none swap sw 0 0
> REDACTED:/mnt/REDACTED /nfs/backup nfs rw 0 0

Regarding read-write /nfs/backup, I will assume that you have taken
security precautions to protect backups and the backup server.

You are making good progress! :-)

David

Dag-Erling Smørgrav

unread,

Aug 22, 2025, 3:44:45 AMAug 22

to David Christensen, ques...@freebsd.org

David Christensen <dpch...@holgerdanske.com> writes:
> Again, I would check if the failed disk and the other disck all have
> the same MBR. If so, you could clone one of them into the MBR of
> replacement disk.

No. Use `gpart bootcode -p /boot/pmbr adaN`.

Robert

unread,

Aug 22, 2025, 10:03:40 AMAug 22

to ques...@freebsd.org

On 8/22/2025 3:44 AM, Dag-Erling Smørgrav wrote:
> No. Use `gpart bootcode -p /boot/pmbr adaN`.

Yes, this is what I've used in the past. So, I would not need to examine
the disks for the same MBR and just use the gpart bootcode above on the
new disk?

Thanks to all for the pointers!

Dag-Erling Smørgrav

unread,

Aug 22, 2025, 11:22:13 AMAug 22

to Robert, ques...@freebsd.org

Robert <rob...@webtent.org> writes:
> So, I would not need to examine the disks for the same MBR and just
> use the gpart bootcode above on the new disk?

Yes. And if you haven't done so in a while (i.e. not since your last
upgrade), I would run `gpart bootcode` on all disks. You can combine -b
and -p in a single command, e.g.

gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0

Robert

unread,

Aug 22, 2025, 12:54:00 PMAug 22

to ques...@freebsd.org

Perhaps. As the host name nomenclature would suggest, it is our db server that serves lightweight bind9 DNS and LDAP to serve mail routing to our MX servers. The PostgreSQL houses the spam filtering data for same MX servers that run Amavisd with SpamAssassin and some other small utility databases. That's it in a nutshell.

I have two of these for redundancy primary/secondary DNS, openldap and PostgreSQL replication. As we know, these servers are setup with root on ZFS and gmirror swap. I've done zfs send/recv with data snapshots, but never for this root on ZFS setup. I'd like to also put a plan together for cloning to a new server. Could I get some pointers to good posts or docs on doing this and possibly start another thread to discuss questions? Perhaps this involves dump and beadm.

--
Robert

David Christensen

unread,

Aug 22, 2025, 4:12:41 PMAug 22

to ques...@freebsd.org

On 8/22/25 08:21, Dag-Erling Smørgrav wrote:
> Robert <rob...@webtent.org> writes:
>> So, I would not need to examine the disks for the same MBR and just
>> use the gpart bootcode above on the new disk?
>
> Yes. And if you haven't done so in a while (i.e. not since your last
> upgrade), I would run `gpart bootcode` on all disks. You can combine -b
> and -p in a single command, e.g.
>
> gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0
>
> DES

Thank you for clarifying the straight-forward method for installing the
MBR bootstrap code and the FreeBSD boot partition bootloader code.

In the general case, the FreeBSD installer or sysadmin needs to figure
out which bootcode to use, which partcode to use, and which geom to use.
I have RTFM gpart(8) several times now, but it is taking me a while to
get a mental picture of the FreeBSD bootstrap decision tree formed by
firmware (BIOS vs. UEFI) and sysamin choices (partition/slice scheme,
volume management, filesystem, etc.). Is there a single document that
clearly explains all of this stuff?

David

David Christensen

unread,

Aug 22, 2025, 9:14:14 PMAug 22

to ques...@freebsd.org

Some thoughts:

1. Put your OS on a disk-on-module, single SSD, or gmirror/ ZFS mirror
of two such devices. Keep the OS instance small and self-contained. In
addition to backups, take images regularly.

2. Put your data on a ZFS pool with suitable redundancy and performance.

3. ZFS mirrors and stripes-of-mirrors perform better and resilver
faster than ZFS RAID-Z or stripes-of-RAID-Z's, but they are not as space
efficient. Mirrors should not be a problem for a 300 GB pool.

4. AIUI some databases can manage disks directly. I suggest
researching databases on raw disks vs. databases on ZFS volumes vs.
databases on ZFS file systems. If the first is compelling, include
dedicated SSD's for the database.

5. Root-on-ZFS offers many features (including replication), most
notably boot environments. I started with this, but now use UFS.

6. It is easier to manage multiple computers with similar hardware than
to manage multiple computers with differing hardware. Your organization
might benefit from standardizing on one or a few server platforms. You
would then pick a suitable platform, and add disks and options as needed
for the next pair of database servers.

7. Virtualization and containers are possibilities, especially if you
want isolation for security and/or damage control.

David

Matthias Fechner

unread,

Aug 23, 2025, 12:22:04 AMAug 23

to Robert, ques...@freebsd.org

if you never did such a restore I suggest you setup a virtual machine in
a similar way you have your setup (I think do a default install and use
two additional disc to build an additional zpool is enough).
You can use much smaller discs, it is just to simulate the commands.

If the setup is done, make a snapshot.

Then just remove/detach a disc and see what happens.
Add a new disc and try to replace it in the zpool.

This will give you some practice.

You can ask and read as many as you want, you will only get your answers
if you try it and a virtual machine is here perfect to test this.

Matthias

Frank Leonhardt

unread,

Aug 29, 2025, 8:23:57 AM (9 days ago) Aug 29

to ques...@freebsd.org

On 21/08/2025 01:55, Robert wrote:
> I have my first zpool degraded on a FreeBSD 13.5 server and looking
> for advice on the steps I'll be taking to successfully replace the
> REMOVED drive in a 4 disk 2 mirror zpool. It is scrubbed monthly with
> last scrub August 3rd...
>

Funnily enough I wrote about this exact thing in Feb:

https://blog.frankleonhardt.com/2025/freebsd-zfs-raidz-failed-disk-replacement/

Okay, you're configuration is mirrors but I think most of the same
applies. Note what I say about making the replacement disk bootable,
which doesn't seem to have been mentioned in this tread.

Also, posted a script on this list that installed ZFS manually and other
stuff a few months later. Guess what got me interested :-)

Since then I've had a spectacular failure of a ZFS mirror that had been
happy for years. The problem with mirrored drives is that when you
replace one it thrashes the other while it's resilvering, so if the old
one becoming dodgy it could push it over the edge. I hope you have more
luck than I did.

Reply all

Reply to author

Forward