Replacing a REMOVED drive in DEGRADED zpool

1 view
Skip to first unread message

Robert

unread,
Aug 20, 2025, 8:56:40 PMAug 20
to ques...@freebsd.org
I have my first zpool degraded on a FreeBSD 13.5 server and looking for
advice on the steps I'll be taking to successfully replace the REMOVED
drive in a 4 disk 2 mirror zpool. It is scrubbed monthly with last scrub
August 3rd...

root@db1:~ # zpool list
NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP DEDUP   
HEALTH  ALTROOT
zdb1   262G   102G   160G        -         -    53%    38% 1.00x 
DEGRADED  -
root@db1:~ # zpool status
  pool: zdb1
 state: DEGRADED
status: One or more devices has been removed by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using zpool online' or replace the device with
        'zpool replace'.
  scan: scrub repaired 0B in 00:27:57 with 0 errors on Sun Aug 3
04:43:48 2025
config:

        NAME        STATE     READ WRITE CKSUM
        zdb1        DEGRADED     0     0     0
          mirror-0  DEGRADED     0     0     0
            ada0p3  REMOVED      0     0     0
            ada1p3  ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            ada2p3  ONLINE       0     0     0
            ada3p3  ONLINE       0     0     0

I have data backup of any important data. I also use zfs-autobackup form
an old remote FreeNAS server to take hourly snapshots ...

zfs-autobackup -v --keep-source 72 --keep-target 168 --ssh-source db1
offsite1 DATA/backups/db1

So, I have the last 72 hours of snapshots on the local server with 7
days worth on the remote NAS. The disk entered the REMOVED state at 6am
this morning, a little over 14 hours ago and I plan to replace on Friday
night to give myself some time in case a restore needs to happen.
Perhaps I should bump the local snapshot storage up to 168 hours (1
week) as well at this point or hold what is there, can I hold all
snapshots with one command? Here is the disk info for the 3 drives
remaining in the zpool ...

root@db1:~ # camcontrol devlist
<WDC WD1500ADFD-00NLR5 21.07QR5>   at scbus1 target 0 lun 0 (ada1,pass1)
<WDC WD1500HLFS-01G6U3 04.04V05>   at scbus2 target 0 lun 0 (ada2,pass2)
<WDC WD1500ADFD-00NLR5 21.07QR5>   at scbus3 target 0 lun 0 (ada3,pass3)
root@db1:~ # gpart show ada1
=>       40  293046688  ada1  GPT  (140G)
         40       1024     1  freebsd-boot  (512K)
       1064        984        - free -  (492K)
       2048   16777216     2  freebsd-swap  (8.0G)
   16779264  276267008     3  freebsd-zfs  (132G)
  293046272        456        - free -  (228K)

All the drive report identical layouts as ada1. I've used camcontrol
with identify to get all the serial numbers of these drives, so I plan
to shut the server down, pull the bad drive and insert the replacement,
boot up and replace. Would these be the steps I need to take assuming
the replacement drive shows up as the same ada0 device?

1. Run `zpool offline zdb1 ada0p3`
2. Shut down and pull/insert replacement
3. Boot up and run `gpart backup ada1 > gpart.ada1` then `gpart restore
ada0 < gpart.ada1`
4. Run `zpool replace zdb1 ada0p3 ada0p3`

I'm just not sure if this is all that is needed with a ROOT zpool or if
all correct. I appreciate any guidance. Here is the full zfs list...

root@db1:~ # zfs list
NAME                                          USED  AVAIL  REFER  MOUNTPOINT
zdb1                                          103G   152G  96K  /zdb1
zdb1/ROOT                                     101G   151G  96K  none
zdb1/ROOT/13.1-RELEASE-p7_2023-05-04_200035     8K   151G  15.8G  /
zdb1/ROOT/13.2-RELEASE-p1_2023-08-07_124053     8K   151G  25.4G  /
zdb1/ROOT/13.2-RELEASE-p2_2023-09-09_111305     8K   151G  29.0G  /
zdb1/ROOT/13.2-RELEASE-p3_2023-12-31_111612     8K   151G  33.1G  /
zdb1/ROOT/13.2-RELEASE-p9_2024-04-14_121449     8K   151G  34.9G  /
zdb1/ROOT/13.2-RELEASE_2023-05-04_200614        8K   151G  15.9G  /
zdb1/ROOT/13.2-RELEASE_2023-08-01_151806        8K   151G  25.1G  /
zdb1/ROOT/13.3-RELEASE-p1_2024-04-14_121907     8K   151G  34.9G  /
zdb1/ROOT/13.3-RELEASE-p1_2024-08-04_122937     8K   151G  36.2G  /
zdb1/ROOT/13.3-RELEASE-p4_2025-01-04_162341     8K   151G  36.3G  /
zdb1/ROOT/13.3-RELEASE-p8_2025-01-04_164203     8K   151G  36.9G  /
zdb1/ROOT/13.4-RELEASE-p1_2025-01-04_164619     8K   151G  37.0G  /
zdb1/ROOT/13.4-RELEASE-p2_2025-05-10_133828     8K   151G  39.4G  /
zdb1/ROOT/13.5-RELEASE-p1_2025-07-04_113332     8K   151G  39.6G  /
zdb1/ROOT/13.5-RELEASE_2025-05-10_134206        8K   151G  39.4G  /
zdb1/ROOT/default                             101G   151G  39.5G  /
zdb1/tmp                                     1.12M   151G 200K  /tmp
zdb1/usr                                     1.17G   151G  96K  /usr
zdb1/usr/home                                1.31M   151G  1.30M  /usr/home
zdb1/usr/ports                               1.17G   151G  1.17G  /usr/ports
zdb1/usr/src                                   96K   151G  96K  /usr/src
zdb1/var                                     5.35M   151G  96K  /var
zdb1/var/audit                                 96K   151G  96K  /var/audit
zdb1/var/crash                                 96K   151G  96K  /var/crash
zdb1/var/log                                 4.78M   151G 660K  /var/log
zdb1/var/mail                                 200K   151G 144K  /var/mail
zdb1/var/tmp                                   96K   151G  96K  /var/tmp

Thank you.

--
Robert


Matthias Fechner

unread,
Aug 20, 2025, 11:23:45 PMAug 20
to Robert, ques...@freebsd.org
Hi Robert,

Am 21.08.2025 um 03:55 schrieb Robert:
> 1. Run `zpool offline zdb1 ada0p3`
> 2. Shut down and pull/insert replacement
> 3. Boot up and run `gpart backup ada1 > gpart.ada1` then `gpart
> restore ada0 < gpart.ada1`
> 4. Run `zpool replace zdb1 ada0p3 ada0p3`

I have written for this some notes for myself:
https://wiki.idefix.fechner.net/freebsd/zfs/#replace-a-failed-disk

I do not think you must offline it, you should be able to use replace
command or attach (I would check the man page to verify that for the
version you use).

I hope that helps.

Matthias


David Christensen

unread,
Aug 21, 2025, 2:15:13 AMAug 21
to ques...@freebsd.org
I am used to seeing non-zero numbers in the READ, WRITE, and/or CKSUM
columns for a bad disk. Did you do something to reset the numbers?


>
> I have data backup of any important data. I also use zfs-autobackup form
> an old remote FreeNAS server to take hourly snapshots ...
>
> zfs-autobackup -v --keep-source 72 --keep-target 168 --ssh-source db1
> offsite1 DATA/backups/db1
>
> So, I have the last 72 hours of snapshots on the local server with 7
> days worth on the remote NAS.


Good.


> The disk entered the REMOVED state at 6am
> this morning, a little over 14 hours ago


`zpool status`, above, said "One or more devices has been removed by the
administrator". Did a human remove the disk or did ZFS? If a human,
what command did they use?


> and I plan to replace on Friday
> night to give myself some time in case a restore needs to happen.
> Perhaps I should bump the local snapshot storage up to 168 hours (1
> week) as well at this point or hold what is there, can I hold all
> snapshots with one command? Here is the disk info for the 3 drives
> remaining in the zpool ...
>
> root@db1:~ # camcontrol devlist
> <WDC WD1500ADFD-00NLR5 21.07QR5>   at scbus1 target 0 lun 0 (ada1,pass1)
> <WDC WD1500HLFS-01G6U3 04.04V05>   at scbus2 target 0 lun 0 (ada2,pass2)
> <WDC WD1500ADFD-00NLR5 21.07QR5>   at scbus3 target 0 lun 0 (ada3,pass3)


ZFS RAID10 with Raptors and VelociRaptor -- a blast from the past! :-)


The problem with vintage systems is "do not throw good money after bad".
If you already have a spare Raptor or VelociRaptor and all four disks
test and report good with smartctl(8), then perhaps replacing the failed
disk with another disk is a good idea. Otherwise, I would consider
other options (a pair of SSD's).


> root@db1:~ # gpart show ada1
> =>       40  293046688  ada1  GPT  (140G)
>          40       1024     1  freebsd-boot  (512K)
>        1064        984        - free -  (492K)
>        2048   16777216     2  freebsd-swap  (8.0G)
>    16779264  276267008     3  freebsd-zfs  (132G)
>   293046272        456        - free -  (228K)


What partition scheme is on the disks?


I do not see an EFI system partition. Is the motherboard firmware
BIOS/Legacy or UEFI?


How is ada0p1 freebsd-boot configured into the system? ZFS
stripe-of-mirrors? UFS gmirror/gstripe RAID10?


How is ada0p2 freebsd-swap configured into the system? One of four swap
devices?


>
> All the drive report identical layouts as ada1. I've used camcontrol
> with identify to get all the serial numbers of these drives, so I plan
> to shut the server down, pull the bad drive and insert the replacement,
> boot up and replace. Would these be the steps I need to take assuming
> the replacement drive shows up as the same ada0 device?
>
> 1. Run `zpool offline zdb1 ada0p3`


I would use zpool-detach(8) to remove the failed disk from the pool.


You will need to disconnect ada0p1 freebsd-boot and ada0p2 freebsd-swap
according to how they are configured into your system.


> 2. Shut down and pull/insert replacement
> 3. Boot up and run `gpart backup ada1 > gpart.ada1` then `gpart restore
> ada0 < gpart.ada1`
> 4. Run `zpool replace zdb1 ada0p3 ada0p3`


Cloning ada1's GPT's (primary and secondary) to ada0 will result in
duplicate identifiers on two disks -- UUID's, labels, etc.. Two disks
with matching identifiers in the same computer is asking for trouble. I
would not do that. If anything, clone the failed disk GPT's to the
replacement disk GPT's -- but only if the failed disk GPT's are good.


If the failed disk is still mostly operational with bad blocks all
within the middle data portion of ada0p3 (NOT in metadata), cloning the
failed disk to the replacement disk could save effort. ddrescue(1) may
be required to get past bad blocks.


Otherwise, I would zero the replacement disk and build it manually.


I would use zpool-attach(8) to add the replacement ada0p3 as a mirror of
ada1p3.


You will need to build and connect the replacement ada0p1 freebsd-boot
and replacement ada0p2 freebsd-swap according to how they are to be
configured into your system.
Finally, ZFS, ZFS stripe-of-mirrors, root-on-ZFS, and gmirror/gstripe
RAID10 are all non-trivial. Replacing such a disk correctly is going to
require a lot of knowledge. If you like learning adventures, go for it.


But if you want 24x7 operations, I do better with
backup/wipe/install/restore. It is simpler, I can estimate how long it
will take, I can roll it back, and I have confidence in the results. If
you go this route, I would put FreeBSD on UFS on a single small SSD and
put the data on ZFS with redundant disks. Backup the OS disk with
rsync(1) and take images regularly. Restoring an OS disk from an image
is the fastest way to recover from a OS disk disaster.


David


Dag-Erling Smørgrav

unread,
Aug 21, 2025, 2:56:17 AMAug 21
to David Christensen, ques...@freebsd.org
David Christensen <dpch...@holgerdanske.com> writes:
> Robert <rob...@webtent.org> writes:
> > 3. Boot up and run `gpart backup ada1 > gpart.ada1` then `gpart
> > restore ada0 < gpart.ada1`
> Cloning ada1's GPT's (primary and secondary) to ada0 will result in
> duplicate identifiers on two disks -- UUID's, labels, etc..

`gpart backup | gpart restore` does not copy UUIDs or labels.

> [...] cloning the failed disk to the replacement disk could save
> effort. ddrescue(1) may be required to get past bad blocks.

You mean recoverdisk(1).

DES
--
Dag-Erling Smørgrav - d...@FreeBSD.org

Robert

unread,
Aug 21, 2025, 9:14:06 AMAug 21
to Matthias Fechner, ques...@freebsd.org
Thanks, there is some great information in  your document. And detach
would not be necessary either? It can be replaced in its current status
of REMOVED?


Robert

unread,
Aug 21, 2025, 9:47:12 AMAug 21
to ques...@freebsd.org
On 8/21/2025 2:14 AM, David Christensen wrote:
> On 8/20/25 17:55, Robert wrote:
>
> I am used to seeing non-zero numbers in the READ, WRITE, and/or CKSUM
> columns for a bad disk.  Did you do something to reset the numbers?

I know what you mean, I have seen REMOVED using SSD on an IBM x3550
servers using mdadm as well, does the SATA connector hiccup? I haven't
done a thing, no reset or clear. So, I know I could perhaps use the same
drive, but I'll need to smart check it after pulling to be sure.

>> The disk entered the REMOVED state at 6am this morning, a little over
>> 14 hours ago
>
>
> `zpool status`, above, said "One or more devices has been removed by
> the administrator".  Did a human remove the disk or did ZFS? If a
> human, what command did they use?

Nope, nobody, this server is in a data center and I'm quite sure no one
touched it.
>> and I plan to replace on Friday night to give myself some time in
>> case a restore needs to happen. Perhaps I should bump the local
>> snapshot storage up to 168 hours (1 week) as well at this point or
>> hold what is there, can I hold all snapshots with one command? Here
>> is the disk info for the 3 drives remaining in the zpool ...
>>
>> root@db1:~ # camcontrol devlist
>> <WDC WD1500ADFD-00NLR5 21.07QR5>   at scbus1 target 0 lun 0 (ada1,pass1)
>> <WDC WD1500HLFS-01G6U3 04.04V05>   at scbus2 target 0 lun 0 (ada2,pass2)
>> <WDC WD1500ADFD-00NLR5 21.07QR5>   at scbus3 target 0 lun 0 (ada3,pass3)
>
>
> ZFS RAID10 with Raptors and VelociRaptor -- a blast from the past!  :-)
>
>
> The problem with vintage systems is "do not throw good money after
> bad".  If you already have a spare Raptor or VelociRaptor and all four
> disks test and report good with smartctl(8), then perhaps replacing
> the failed disk with another disk is a good idea. Otherwise, I would
> consider other options (a pair of SSD's).

Lol, yep, I believe I have a matching drive sitting in waiting but will
confirm when I get to the data center later today. It's an old
Supermicro that has been like a rock. I don't throw away things unless
they fail or not support somehow. I use redundant hardware and/or
services. I had servers reach 15 years, but this is almost unachievable
these days with technology advancing faster and faster. The redundant db
server for this one does use SSD drives.

>> root@db1:~ # gpart show ada1
>> =>       40  293046688  ada1  GPT  (140G)
>>           40       1024     1  freebsd-boot  (512K)
>>         1064        984        - free -  (492K)
>>         2048   16777216     2  freebsd-swap  (8.0G)
>>     16779264  276267008     3  freebsd-zfs  (132G)
>>    293046272        456        - free -  (228K)
>
>
> What partition scheme is on the disks?
>
>
> I do not see an EFI system partition.  Is the motherboard firmware
> BIOS/Legacy or UEFI?
Partition scheme? As I showed with gpart, these drives have a GUID
Partition Table with partitions as shown. I believe this server is
legacy and does not supports UEFI.
>
>
> How is ada0p1 freebsd-boot configured into the system?  ZFS
> stripe-of-mirrors?  UFS gmirror/gstripe RAID10?
>
>
> How is ada0p2 freebsd-swap configured into the system?  One of four
> swap devices?

Yes, I am concerned about the boot, I guess I need to `gpart bootcode -b
/boot/pmbr -p /boot/gptzfsboot -i 1 ada?` the new drive after I replace?

>> All the drive report identical layouts as ada1. I've used camcontrol
>> with identify to get all the serial numbers of these drives, so I
>> plan to shut the server down, pull the bad drive and insert the
>> replacement, boot up and replace. Would these be the steps I need to
>> take assuming the replacement drive shows up as the same ada0 device?
>>
>> 1. Run `zpool offline zdb1 ada0p3`
>
>
> I would use zpool-detach(8) to remove the failed disk from the pool.

Yes, I was perhaps expecting an error when I try to offline, I will
instead try to detach.

> You will need to disconnect ada0p1 freebsd-boot and  ada0p2
> freebsd-swap according to how they are configured into your system.

This is where I'm not sure. I used the default FreeBSD installation with
ZFS and selected the drives for redundancy I believe. I never did this
before this server and always used UFS with gmirror before. But after
seeing the default in FreeBSD for a long time now, I decided to use. I
get more risky when I have a redundant services as in this case.

> Cloning ada1's GPT's (primary and secondary) to ada0 will result in
> duplicate identifiers on two disks -- UUID's, labels, etc..  Two disks
> with matching identifiers in the same computer is asking for trouble. 
> I would not do that.  If anything, clone the failed disk GPT's to the
> replacement disk GPT's -- but only if the failed disk GPT's are good.
>
>
> If the failed disk is still mostly operational with bad blocks all
> within the middle data portion of ada0p3 (NOT in metadata), cloning
> the failed disk to the replacement disk could save effort. 
> ddrescue(1) may be required to get past bad blocks.
>
>
> Otherwise, I would zero the replacement disk and build it manually.

Ok, I have a matching drive, so I don't plan on reusing the REMOVED
drive. I usually take the REMOVED one and examine it to see if it can be
re-inserted safely as needed. I was hoping the gpart backup/restore
would be the equivalent of sgdisk in Linux that I've used many times to
duplicate a disk used for replacement in software RAID.

> I would use zpool-attach(8) to add the replacement ada0p3 as a mirror
> of ada1p3.

Attach in place of zpool-replace?

> You will need to build and connect the replacement ada0p1 freebsd-boot
> and replacement ada0p2 freebsd-swap according to how they are to be
> configured into your system.

Again, I used the default FreeBSD ZFS install. But this is where I
believe I need to tweak my recovery if gpart backup/restore does not
prepare the disk as expected.

> Finally, ZFS, ZFS stripe-of-mirrors, root-on-ZFS, and gmirror/gstripe
> RAID10 are all non-trivial. Replacing such a disk correctly is going
> to require a lot of knowledge.  If you like learning adventures, go
> for it.
>
>
> But if you want 24x7 operations, I do better with
> backup/wipe/install/restore.  It is simpler, I can estimate how long
> it will take, I can roll it back, and I have confidence in the
> results.  If you go this route, I would put FreeBSD on UFS on a single
> small SSD and put the data on ZFS with redundant disks. Backup the OS
> disk with rsync(1) and take images regularly. Restoring an OS disk
> from an image is the fastest way to recover from a OS disk disaster.

This is my first time recovering from a ROOT zpool, I've used ZFS as you
said for many years with UFS and gmirror two drives for the OS. I would
perhaps like to experiment with restoring from snapshot if someone could
outline the differences handling ROOT zpool using zfs send/receive I
assume just like I've done with ZFS for data only. But for this case,
I'd like to replace as designed and get the server back in operation.

Thanks for all the pointers!




Robert

unread,
Aug 21, 2025, 10:19:52 AMAug 21
to ques...@freebsd.org
On 8/21/2025 2:56 AM, Dag-Erling Smørgrav wrote:
> David Christensen <dpch...@holgerdanske.com> writes:
>> Robert <rob...@webtent.org> writes:
>>> 3. Boot up and run `gpart backup ada1 > gpart.ada1` then `gpart
>>> restore ada0 < gpart.ada1`
>> Cloning ada1's GPT's (primary and secondary) to ada0 will result in
>> duplicate identifiers on two disks -- UUID's, labels, etc..
> `gpart backup | gpart restore` does not copy UUIDs or labels.

Good to know, this would be the best option then for preparing the drive
for zpool-replace?

>> [...] cloning the failed disk to the replacement disk could save
>> effort. ddrescue(1) may be required to get past bad blocks.
> You mean recoverdisk(1).
Would recovering the disk be beneficial versus replace? As far as faster
recovery, not needing to resilver or as much. These are not big drives
as you can see and RAID10 zpool.

Thanks for the pointers!


David Christensen

unread,
Aug 21, 2025, 1:04:30 PMAug 21
to ques...@freebsd.org
On 8/20/25 23:56, Dag-Erling Smørgrav wrote:
> David Christensen <dpch...@holgerdanske.com> writes:
>> Robert <rob...@webtent.org> writes:
>>> 3. Boot up and run `gpart backup ada1 > gpart.ada1` then `gpart
>>> restore ada0 < gpart.ada1`
>> Cloning ada1's GPT's (primary and secondary) to ada0 will result in
>> duplicate identifiers on two disks -- UUID's, labels, etc..
>
> `gpart backup | gpart restore` does not copy UUIDs or labels.


Thank you for the clarification regarding gpart(8). Testing backup:

2025-08-21 09:05:30 toor@f5 ~
# gpart backup ada0
GPT 128
1 freebsd-boot 40 1024
2 freebsd-ufs 1064 29359104
3 freebsd-swap 29360168 1564672


You are correct -- gpart(8) backups are incomplete (!). Disk GUID
(UUID) is missing and partition GUID's (UUID's) and names (labels) are
missing. Where is this documented?


RTFM glabel(8), I believe I see how to set labels after a gpart(8)
restore onto a wiped disk.


RTFM glabel(8) says it supports UUID's, but I do not see how. How do I
set the disk GUID and partition GUID's after a gpart(8) restore?


(`gpart list ada0` shows much more information, but not the disk GUID.)


>
>> [...] cloning the failed disk to the replacement disk could save
>> effort. ddrescue(1) may be required to get past bad blocks.
>
> You mean recoverdisk(1).
>
> DES


recoverdisk(1) looks useful -- thank you for the pointer. :-)


David



Dag-Erling Smørgrav

unread,
Aug 21, 2025, 3:03:48 PMAug 21
to Robert, ques...@freebsd.org
Robert <rob...@webtent.org> writes:
> Dag-Erling Smørgrav <d...@FreeBSD.org> writes:
> > `gpart backup | gpart restore` does not copy UUIDs or labels.
> Good to know, this would be the best option then for preparing the
> drive for zpool-replace?

Yes. You should take a look in /var/backups, you may find a backup of
the partition table from the failed drive. Assuming you remove the
failed drive first, you can safely `gpart restore -l` this backup onto
the replacement drive, which will recreate the labels (but not UUIDs).

> Would recovering the disk be beneficial versus replace? As far as
> faster recovery, not needing to resilver or as much. These are not big
> drives as you can see and RAID10 zpool.

You can try to use recoverdisk to copy undamaged portions of the failed
drive onto the replacement, but it's likely to take longer than
resilvering.

David Christensen

unread,
Aug 21, 2025, 3:05:47 PMAug 21
to ques...@freebsd.org
On 8/21/25 06:46, Robert wrote:
> On 8/21/2025 2:14 AM, David Christensen wrote:
>> On 8/20/25 17:55, Robert wrote:
>>
>> I am used to seeing non-zero numbers in the READ, WRITE, and/or CKSUM
>> columns for a bad disk.  Did you do something to reset the numbers?
>
> I know what you mean, I have seen REMOVED using SSD on an IBM x3550
> servers using mdadm as well, does the SATA connector hiccup?


The wrong SATA HBA, cable, backplane, rack, etc., and/or a poor
connection anywhere can wreak havoc. A few years ago, I bought all new
SATA III cables and SATA III mobile racks, and replaced the previous
mixture of SATA I, II, III stuff in my various computers. Disk
reliability improved dramatically.


Re-seating all of the SATA cables, re-seating all of the drive power
cables, and re-seating the HBA, followed by `zpool online zdb1 ada0p3`
and `zpool scrub zdb1` could fix the problem.


> Partition scheme? As I showed with gpart, these drives have a GUID
> Partition Table with partitions as shown. I believe this server is
> legacy and does not supports UEFI.


Okay -- BIOS/Legacy and GPT. Those are key parameters (among others)
that the FreeBSD installer detects and uses to choose what actions to
take. The goal is to reproduce what the installer did.


RTFM gpart(8) has some good information about bootloader stages and what
pieces are needed for the various combinations of BIOS/UEFI, MBR/GPT,
UFS/ZFS/gmirror/gstripe, etc..


Much of the FreeBSD installer is a suite of shell scripts. They are
very well written and reasonably easy to read. Crawling the installer
code is another possible source of information.


> Yes, I am concerned about the boot,


I tried a manual install of FreeBSD not too long ago by following Lucas
([1] and [2]?). Trying to remember those, looking at your previous
`gpart show ada1` console session, knowing the computer is BIOS/Legacy,
and knowing the disks are GPT, I now believe ada[0-4]p1 are bootloader
stages that run after the MBR bootloader stage. They should all be the
same; gmirror/gstripe should not be involved.


Please run and post the following commands to check if ada[0-3]p1
bootloader stages are the same:

# cmp /dev/ada1p1 /dev/ada0p1

# cmp /dev/ada1p1 /dev/ada2p1

# cmp /dev/ada1p1 /dev/ada3p1


Please run and post the following commands to check if ada[0-3] MBR's
are the same:

# cmp <(dd if=/dev/ada1 count=1 status=none) <(dd if=/dev/ada0 count=1
status=none)

# cmp <(dd if=/dev/ada1 count=1 status=none) <(dd if=/dev/ada2 count=1
status=none)

# cmp <(dd if=/dev/ada1 count=1 status=none) <(dd if=/dev/ada3 count=1
status=none)


Please run and post the following commands to confirm that gmirror and
gstripe are not in use. Do not bother to load if prompted:

# gmirror status

# gstripe status


Please run and post the following command to confirm swap:

# swapinfo


Please post (redact as necessary) /boot/loader.conf,
/boot/loader.conf.d/*, /etc/rc.conf, and /etc/fstab in case there is
anything else we need to consider.


> I guess I need to `gpart bootcode -
> b /boot/pmbr -p /boot/gptzfsboot -i 1 ada?` the new drive after I replace?

>> You will need to disconnect ada0p1 freebsd-boot and  ada0p2 freebsd-
>> swap according to how they are configured into your system.
>
> This is where I'm not sure. I used the default FreeBSD installation with
> ZFS and selected the drives for redundancy I believe. I never did this
> before this server and always used UFS with gmirror before. But after
> seeing the default in FreeBSD for a long time now, I decided to use. I
> get more risky when I have a redundant services as in this case.


Running and posting the commands above should help determine what to do
with ada0p1 and adap2.


> Ok, I have a matching drive, so I don't plan on reusing the REMOVED
> drive. I usually take the REMOVED one and examine it to see if it can be
> re-inserted safely as needed. I was hoping the gpart backup/restore
> would be the equivalent of sgdisk in Linux that I've used many times to
> duplicate a disk used for replacement in software RAID.


I also use Linux, and FreeBSD gpart(8) backup and restore surprised me
in a bad way. Please see the thread branch starting here:

https://lists.freebsd.org/archives/freebsd-questions/2025-August/006883.html


>> I would use zpool-attach(8) to add the replacement ada0p3 as a mirror
>> of ada1p3.
>
> Attach in place of zpool-replace?


AIUI RTFM zpool(8) if you detach one of two drives in a mirror, the
detached drive is forgotten, the mirror goes away, and the remaining
disk becomes striped at the top level. When you install a new drive and
want to create a mirror with a singular striped drive, use zpool(8) attach.


> Again, I used the default FreeBSD ZFS install. But this is where I
> believe I need to tweak my recovery if gpart backup/restore does not
> prepare the disk as expected.


Yes.


> Thanks for all the pointers!


YW. There is still more to figure out, but you are getting closer.


David


[1] https://mwl.io/nonfiction/os#af3e

[2] https://mwl.io/nonfiction/os#fmse


Dag-Erling Smørgrav

unread,
Aug 21, 2025, 3:06:57 PMAug 21
to David Christensen, ques...@freebsd.org
David Christensen <dpch...@holgerdanske.com> writes:
> You are correct -- gpart(8) backups are incomplete (!). Disk GUID
> (UUID) is missing and partition GUID's (UUID's) and names (labels) are
> missing. Where is this documented?

Labels are included, they're just not restored unless you specify -l.
If your backup doesn't include labels, it means your partition table
didn't have any. UUIDs are intentionally neither included nor restored.

> RTFM glabel(8), I believe I see how to set labels after a gpart(8)
> restore onto a wiped disk.

glabel labels are different and separate from gpart labels.

David Christensen

unread,
Aug 21, 2025, 3:28:59 PMAug 21
to ques...@freebsd.org
On 8/21/25 12:06, Dag-Erling Smørgrav wrote:
> David Christensen <dpch...@holgerdanske.com> writes:
>> You are correct -- gpart(8) backups are incomplete (!). Disk GUID
>> (UUID) is missing and partition GUID's (UUID's) and names (labels) are
>> missing. Where is this documented?
>
> Labels are included, they're just not restored unless you specify -l.
> If your backup doesn't include labels, it means your partition table
> didn't have any.


Thank you for the clarification -- the disk I tested does not have GPT
partition labels. Testing on a disk with GPT partition labels, gpart(8)
action "backup" does print the labels.


> UUIDs are intentionally neither included nor restored.
>
>> RTFM glabel(8), I believe I see how to set labels after a gpart(8)
>> restore onto a wiped disk.
>
> glabel labels are different and separate from gpart labels.
>
> DES


Thank you for the clarification regarding GPT partition labels, GEOM
provider labels, and glabel(8).


So if I want a GPT partition label, I will use gpart(8) action "add"
with option "-l label" or action "modify" with option "-l label".


David


Robert

unread,
Aug 22, 2025, 12:03:24 AMAug 22
to Dag-Erling Smørgrav, ques...@freebsd.org
On 8/21/2025 3:03 PM, Dag-Erling Smørgrav wrote:
> Robert <rob...@webtent.org> writes:
>> Dag-Erling Smørgrav <d...@FreeBSD.org> writes:
>>> `gpart backup | gpart restore` does not copy UUIDs or labels.
>> Good to know, this would be the best option then for preparing the
>> drive for zpool-replace?
> Yes. You should take a look in /var/backups, you may find a backup of
> the partition table from the failed drive. Assuming you remove the
> failed drive first, you can safely `gpart restore -l` this backup onto
> the replacement drive, which will recreate the labels (but not UUIDs).

Great, had no idea, yes, I see the gpartada0.backup in /var/backups...

root@db1:~ # cat /var/backups/gpart.ada0.bak <<-- REMOVED disk
GPT 128
1   freebsd-boot        40      1024 gptboot0
2   freebsd-swap      2048  16777216 swap0
3    freebsd-zfs  16779264 276267008 zfs0
root@db1:~ # cat /var/backups/gpart.ada1.bak
GPT 128
1   freebsd-boot        40      1024 gptboot1
2   freebsd-swap      2048  16777216 swap1
3    freebsd-zfs  16779264 276267008 zfs1
root@db1:~ # cat /var/backups/gpart.ada2.bak
GPT 128
1   freebsd-boot        40      1024 gptboot2
2   freebsd-swap      2048  16777216 swap2
3    freebsd-zfs  16779264 276267008 zfs2
root@db1:~ # cat /var/backups/gpart.ada3.bak
GPT 128
1   freebsd-boot        40      1024 gptboot3
2   freebsd-swap      2048  16777216 swap3
3    freebsd-zfs  16779264 276267008 zfs3
root@db1:~ # cat /var/backups/gpart.ada4.bak

>> Would recovering the disk be beneficial versus replace? As far as
>> faster recovery, not needing to resilver or as much. These are not big
>> drives as you can see and RAID10 zpool.
> You can try to use recoverdisk to copy undamaged portions of the failed
> drive onto the replacement, but it's likely to take longer than
> resilvering.
Then I'll stick to the original plan but with attach instead of replace
using `zpool attach ada0p3 ada0p3`.

Robert

unread,
Aug 22, 2025, 12:44:27 AMAug 22
to ques...@freebsd.org
On 8/21/2025 3:04 PM, David Christensen wrote:
> The wrong SATA HBA, cable, backplane, rack, etc., and/or a poor
> connection anywhere can wreak havoc.  A few years ago, I bought all
> new SATA III cables and SATA III mobile racks, and replaced the
> previous mixture of SATA I, II, III stuff in my various computers. 
> Disk reliability improved dramatically.
>
>
> Re-seating all of the SATA cables, re-seating all of the drive power
> cables, and re-seating the HBA, followed by `zpool online zdb1 ada0p3`
> and `zpool scrub zdb1` could fix the problem.

Will try first, thanks!

> Okay -- BIOS/Legacy and GPT.  Those are key parameters (among others)
> that the FreeBSD installer detects and uses to choose what actions to
> take.  The goal is to reproduce what the installer did.

Which is...

root@db1:~ # cat /var/backups/gpart.ada0.bak
GPT 128
1   freebsd-boot        40      1024 gptboot0
2   freebsd-swap      2048  16777216 swap0
3    freebsd-zfs  16779264 276267008 zfs0

> Please run and post the following commands to check if ada[0-3]p1
> bootloader stages are the same:
>
> # cmp /dev/ada1p1 /dev/ada0p1
Not until I resolve ada0.
> # cmp /dev/ada1p1 /dev/ada2p1
root@db1:~ # cmp /dev/ada1p1 /dev/ada2p1
/dev/ada1p1 /dev/ada2p1 differ: char 159233, line 417
> # cmp /dev/ada1p1 /dev/ada3p1

Interesting?

root@db1:~ # cmp /dev/ada1p1 /dev/ada3p1
root@db1:~ #

> Please run and post the following commands to check if ada[0-3] MBR's
> are the same:
>
> # cmp <(dd if=/dev/ada1 count=1 status=none) <(dd if=/dev/ada0 count=1
> status=none)
>
> # cmp <(dd if=/dev/ada1 count=1 status=none) <(dd if=/dev/ada2 count=1
> status=none)
>
> # cmp <(dd if=/dev/ada1 count=1 status=none) <(dd if=/dev/ada3 count=1
> status=none)
root@db1:~ # cmp <(dd if=/dev/ada1 count=1 status=none) <(dd
if=/dev/ada0 count=1 status=none)
Missing name for redirect.
root@db1:~ # cmp <(dd if=/dev/ada1 count=1 status=none) <(dd
if=/dev/ada2 count=1 status=none)
Missing name for redirect.
root@db1:~ # cmp <(dd if=/dev/ada1 count=1 status=none) <(dd
if=/dev/ada3 count=1 status=none)
Missing name for redirect.
> Please run and post the following commands to confirm that gmirror and
> gstripe are not in use.  Do not bother to load if prompted:
>
> # gmirror status
>
> # gstripe status

Yes, gmirror is in use, I'm guessing a `gmirror insert ada0p2` is neeed?

root@db1:~ # gmirror status
       Name    Status  Components
mirror/swap  DEGRADED  ada2p2 (ACTIVE)
                       ada3p2 (ACTIVE)
                       ada1p2 (ACTIVE)
root@db1:~ # gstripe status
gstripe: Command 'status' not available; try 'load' first.

> Please run and post the following command to confirm swap:
>
> # swapinfo
root@db1:~ # swapinfo
Device          1K-blocks     Used    Avail Capacity
/dev/mirror/swap   8388604    30216  8358388     0%

> Please post (redact as necessary) /boot/loader.conf,
> /boot/loader.conf.d/*, /etc/rc.conf, and /etc/fstab  in case there is
> anything else we need to consider.
root@db1:~ # cat /boot/loader.conf
geom_mirror_load="YES"
kern.geom.label.disk_ident.enable="0"
kern.geom.label.gptid.enable="0"
cryptodev_load="YES"
zfs_load="YES"
root@db1:~ # ls -lah /boot/loader.conf.d/
total 9
drwxr-xr-x   2 root  wheel     2B May 12  2022 .
drwxr-xr-x  15 root  wheel    71B Jul  4 11:33 ..
root@db1:~ # cat /etc/rc.conf
hostname="db1.REDACTED"
ifconfig_em0="inet REDACTED netmask 255.255.255.192"
defaultrouter="REDACTED.1"

# Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
dumpdev="AUTO"
nrpe3_enable="YES"
sshd_enable="YES"
ntpdate_enable="YES"
ntpd_enable="YES"
named_enable="YES"
postgresql_enable="YES"
nfs_client_enable="YES"
nfs_server_enable="YES"
nfs_server_flags="-u -t -n 4"
rpcbind_enable="YES"
mountd_flags="-r"
mountd_enable="YES"
apache24_enable="YES"
postfix_enable="YES"
sendmail_enable="NONE"
sendmail_submit_enable="NO"
sendmail_outbound_enable="NO"
sendmail_msp_queue_enable="NO"
slapd_enable="YES"
slapd_flags='-h "ldapi://%2fvar%2frun%2fopenldap%2fldapi/
ldap://0.0.0.0:389/ ldaps://0.0.0.0:636/"'
slapd_sockets="/var/run/openldap/ldapi"
mrtg_daemon_enable="YES"
saslauthd_enable="YES"
saslauthd_flags="-a ldap"
root@db1:~ # cat /etc/fstab
# Device                Mountpoint      FStype  Options  Dump    Pass#
/dev/mirror/swap                none    swap    sw              0      0
REDACTED:/mnt/REDACTED /nfs/backup nfs rw 0 0
> Running and posting the commands above should help determine what to
> do with ada0p1 and adap2.
Thanks for the help.
>> Ok, I have a matching drive, so I don't plan on reusing the REMOVED
>> drive. I usually take the REMOVED one and examine it to see if it can
>> be re-inserted safely as needed. I was hoping the gpart
>> backup/restore would be the equivalent of sgdisk in Linux that I've
>> used many times to duplicate a disk used for replacement in software
>> RAID.
>
>
> I also use Linux, and FreeBSD gpart(8) backup and restore surprised me
> in a bad way.  Please see the thread branch starting here:
>
> https://lists.freebsd.org/archives/freebsd-questions/2025-August/006883.html
>
Yeah, I read. So, gpart backup/restore can be used as long as I throw
the -l switch?
>>> I would use zpool-attach(8) to add the replacement ada0p3 as a
>>> mirror of ada1p3.
>>
>> Attach in place of zpool-replace?
>
>
> AIUI RTFM zpool(8) if you detach one of two drives in a mirror, the
> detached drive is forgotten, the mirror goes away, and the remaining
> disk becomes striped at the top level.  When you install a new drive
> and want to create a mirror with a singular striped drive, use
> zpool(8) attach.

This is the plan now thanks to all ya'lls help, unless I get away with
re-seating all and zpool online the disk.

--
Robert


David Christensen

unread,
Aug 22, 2025, 1:12:34 AMAug 22
to ques...@freebsd.org
On 8/21/25 21:02, Robert wrote:
> On 8/21/2025 3:03 PM, Dag-Erling Smørgrav wrote:
>> You should take a look in /var/backups, you may find a backup of
>> the partition table from the failed drive.  Assuming you remove the
>> failed drive first, you can safely `gpart restore -l` this backup onto
>> the replacement drive, which will recreate the labels (but not UUIDs).
>
> Great, had no idea, yes, I see the gpartada0.backup in /var/backups...
>
> root@db1:~ # cat /var/backups/gpart.ada0.bak <<-- REMOVED disk
> GPT 128
> 1   freebsd-boot        40      1024 gptboot0
> 2   freebsd-swap      2048  16777216 swap0
> 3    freebsd-zfs  16779264 276267008 zfs0
> root@db1:~ # cat /var/backups/gpart.ada1.bak
> GPT 128
> 1   freebsd-boot        40      1024 gptboot1
> 2   freebsd-swap      2048  16777216 swap1
> 3    freebsd-zfs  16779264 276267008 zfs1
> root@db1:~ # cat /var/backups/gpart.ada2.bak
> GPT 128
> 1   freebsd-boot        40      1024 gptboot2
> 2   freebsd-swap      2048  16777216 swap2
> 3    freebsd-zfs  1677926l /v4 276267008 zfs2
> root@db1:~ # cat /var/backups/gpart.ada3.bak
> GPT 128
> 1   freebsd-boot        40      1024 gptboot3
> 2   freebsd-swap      2048  16777216 swap3
> 3    freebsd-zfs  16779264 276267008 zfs3
> root@db1:~ # cat /var/backups/gpart.ada4.bak
>


Good. So long as nothing uses GUID/UUID, gpart(8) restore with labels
should work.


This is my server system disk (BIOS, MBR):

2025-08-21 21:13:19 toor@f5 ~
# gpart show ada0
=> 40 117231328 ada0 GPT (56G)
40 1024 1 freebsd-boot (512K)
1064 29359104 2 freebsd-ufs (14G)
29360168 1564672 3 freebsd-swap (764M)
30924840 86306528 - free - (41G)


I have a backup of the freebsd boot partition:

2025-08-21 21:55:05 toor@f5 ~
# ll /var/backups/boot.ada0p1.bak
-rw-r--r-- 1 root wheel 524288 2024/03/04 03:01:00
/var/backups/boot.ada0p1.bak


And the backup still matches adap1:

2025-08-21 21:13:44 toor@f5 ~
# cmp /dev/ada0p1 /var/backups/boot.ada0p1.bak

2025-08-21 21:14:00 toor@f5 ~
# echo $?
0


The last piece of the puzzle is the MBR. I see some possibilities in /boot:

2025-08-21 21:20:36 toor@f5 ~
# ll -S /boot | grep ' 512 ' | grep -v drwx
-r--r--r-- 1 root wheel 512 2025/05/24 14:51:34 boot0
-r--r--r-- 1 root wheel 512 2025/05/24 14:51:34 boot0sio
-r--r--r-- 1 root wheel 512 2023/04/06 21:24:38 boot1
-r--r--r-- 1 root wheel 512 2023/04/06 21:24:38 mbr
-r--r--r-- 1 root wheel 512 2023/04/06 21:24:38 pmbr


Referring to WikiPedia "Master boot record" table "Structure of a
classical generic MBR":

https://en.wikipedia.org/wiki/Master_boot_record


The bootstrap code area is the first 446 bytes. Look for a match:

2025-08-21 21:24:08 toor@f5 ~
# cmp -n 446 /dev/ada0 /boot/boot0
/dev/ada0 /boot/boot0 differ: char 12, line 1

2025-08-21 21:25:00 toor@f5 ~
# cmp -n 446 /dev/ada0 /boot/boot0sio
/dev/ada0 /boot/boot0sio differ: char 12, line 1

2025-08-21 21:25:05 toor@f5 ~
# cmp -n 446 /dev/ada0 /boot/boot1
/dev/ada0 /boot/boot1 differ: char 1, line 1

2025-08-21 21:25:08 toor@f5 ~
# cmp -n 446 /dev/ada0 /boot/mbr
/dev/ada0 /boot/mbr differ: char 12, line 1

2025-08-21 21:25:12 toor@f5 ~
# cmp -n 446 /dev/ada0 /boot/pmbr


So, the FreeBSD installer put /boot/pmbr into the MBR of my system disk.


Checking the partition table entries and boot signature:

2025-08-21 21:28:19 toor@f5 ~
# cmp -i 446 -n 16 /dev/ada0 /boot/pmbr
/dev/ada0 /boot/pmbr differ: char 3, line 1

2025-08-21 21:28:50 toor@f5 ~
# cmp -i 462 -n 16 /dev/ada0 /boot/pmbr

2025-08-21 21:28:58 toor@f5 ~
# cmp -i 478 -n 16 /dev/ada0 /boot/pmbr

2025-08-21 21:29:09 toor@f5 ~
# cmp -i 494 -n 16 /dev/ada0 /boot/pmbr

2025-08-21 21:29:17 toor@f5 ~
# cmp -i 510 -n 2 /dev/ada0 /boot/pmbr


So, everything matches except partition entry number 1:

2025-08-21 21:31:33 toor@f5 ~
# dd if=/dev/ada0 count=1 status=none | hexdump -s 446 -n 16
000001be 00 00 02 00 ee ff ff ff 01 00 00 00 2f cf fc 06
|............/...|
000001ce

2025-08-21 21:32:27 toor@f5 ~
# dd if=/boot/pmbr count=1 status=none | hexdump -s 446 -n 16
000001be 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|................|
000001ce


So, the installer must have populated the first partition entry.


Referring to the WikiPedia page table "Layout of one 16-byte partition
entry", decoding my MBR first partition entry:

Status or physical drive
inactive

CHS address of first absolute sector in partition
cylinder = 0
head = 0
sector = 2

Partition type
ee = GPT protective MBR

CHS adress of last absolute sector in partition
cylinder = 1023
head = 255
sector = 31

LBA of first absolute sector in the partition
0x00000001 = sector 1

Number of sectors in partition
0x06fccf2f = 117231407 sectors


Convert the number of sectors in partition field value to decimal:

2025-08-21 21:32:37 toor@f5 ~
# perl -e 'printf "%i\n", 0x06fccf2f'
117231407


This matches the disk size minus 1 (for the MBR):

2025-08-21 21:54:57 toor@f5 ~
# diskinfo -v ada0 | grep 'mediasize in sectors'
117231408 # mediasize in sectors


Again, I would check if the failed disk and the other disck all have the
same MBR. If so, you could clone one of them into the MBR of
replacement disk.


>>> Would recovering the disk be beneficial versus replace? As far as
>>> faster recovery, not needing to resilver or as much. These are not big
>>> drives as you can see and RAID10 zpool.
>> You can try to use recoverdisk to copy undamaged portions of the failed
>> drive onto the replacement, but it's likely to take longer than
>> resilvering.
> Then I'll stick to the original plan but with attach instead of replace
> using `zpool attach ada0p3 ada0p3`.
>

I think you have a typo -- the replacement ada0p3 should attach to ada1p3.


David


David Christensen

unread,
Aug 22, 2025, 2:05:12 AMAug 22
to ques...@freebsd.org
On 8/21/25 21:43, Robert wrote:
> On 8/21/2025 3:04 PM, David Christensen wrote:
>> Please run and post the following commands to check if ada[0-3]p1
>> bootloader stages are the same:
>>
>> # cmp /dev/ada1p1 /dev/ada0p1
> Not until I resolve ada0.
>> # cmp /dev/ada1p1 /dev/ada2p1
> root@db1:~ # cmp /dev/ada1p1 /dev/ada2p1
> /dev/ada1p1 /dev/ada2p1 differ: char 159233, line 417
>> # cmp /dev/ada1p1 /dev/ada3p1
>
> Interesting?
>
> root@db1:~ # cmp /dev/ada1p1 /dev/ada3p1
> root@db1:~ #
>


Hmmm... freebsd-boot matches on ada1 and ada3, but not ada2 (?). That
is strange. I wonder if ada2p1 is corrupt? I would try to find the
matching bootloader file in /boot and compare it against all of
ada[0-3]p1. See my post that crossed yours for commands.


>> Please run and post the following commands to check if ada[0-3] MBR's
>> are the same:
>>
>> # cmp <(dd if=/dev/ada1 count=1 status=none) <(dd if=/dev/ada0 count=1
>> status=none)
>>
>> # cmp <(dd if=/dev/ada1 count=1 status=none) <(dd if=/dev/ada2 count=1
>> status=none)
>>
>> # cmp <(dd if=/dev/ada1 count=1 status=none) <(dd if=/dev/ada3 count=1
>> status=none)
> root@db1:~ # cmp <(dd if=/dev/ada1 count=1 status=none) <(dd if=/dev/
> ada0 count=1 status=none)
> Missing name for redirect.
> root@db1:~ # cmp <(dd if=/dev/ada1 count=1 status=none) <(dd if=/dev/
> ada2 count=1 status=none)
> Missing name for redirect.
> root@db1:~ # cmp <(dd if=/dev/ada1 count=1 status=none) <(dd if=/dev/
> ada3 count=1 status=none)
> Missing name for redirect.


I think we are using different shells, and yours does not support
process substitution (?):

2025-08-21 22:58:13 dpchrist@laalaa ~
$ ssh root@f5
root@f5:~ # echo $SHELL
/bin/csh
root@f5:~ # exit
logout
Connection to f5 closed.

2025-08-21 22:58:37 dpchrist@laalaa ~
$ ssh toor@f5

2025-08-21 22:58:41 toor@f5 ~
# echo $SHELL
/usr/local/bin/bash

2025-08-21 22:58:43 toor@f5 ~
# exit
logout
Connection to f5 closed.


>> Please run and post the following commands to confirm that gmirror and
>> gstripe are not in use.  Do not bother to load if prompted:
>>
>> # gmirror status
>>
>> # gstripe status
>
> Yes, gmirror is in use, I'm guessing a `gmirror insert ada0p2` is neeed?
>
> root@db1:~ # gmirror status
>        Name    Status  Components
> mirror/swap  DEGRADED  ada2p2 (ACTIVE)
>                        ada3p2 (ACTIVE)
>                        ada1p2 (ACTIVE)
> root@db1:~ # gstripe status
> gstripe: Command 'status' not available; try 'load' first.
>


So, the freebsd-swap partitions are mirrored and the freebsd-boot
partitions are not are not. That is key information. You will need to
make sure that you match the installed configuration when you configure
the replacement disk.


>> Please run and post the following command to confirm swap:
>>
>> # swapinfo
> root@db1:~ # swapinfo
> Device          1K-blocks     Used    Avail Capacity
> /dev/mirror/swap   8388604    30216  8358388     0%
>
>> Please post (redact as necessary) /boot/loader.conf, /boot/
>> loader.conf.d/*, /etc/rc.conf, and /etc/fstab  in case there is
>> anything else we need to consider.
> root@db1:~ # cat /boot/loader.conf
> geom_mirror_load="YES"
> kern.geom.label.disk_ident.enable="0"
> kern.geom.label.gptid.enable="0"
> cryptodev_load="YES"
> zfs_load="YES"


Looks reasonable.


> root@db1:~ # ls -lah /boot/loader.conf.d/
> total 9
> drwxr-xr-x   2 root  wheel     2B May 12  2022 .
> drwxr-xr-x  15 root  wheel    71B Jul  4 11:33 ..


No configuration add-on files.
That server has some complexity.


> root@db1:~ # cat /etc/fstab
> # Device                Mountpoint      FStype  Options  Dump    Pass#
> /dev/mirror/swap                none    swap    sw              0      0
> REDACTED:/mnt/REDACTED /nfs/backup nfs rw 0 0


Regarding read-write /nfs/backup, I will assume that you have taken
security precautions to protect backups and the backup server.
You are making good progress! :-)


David


Dag-Erling Smørgrav

unread,
Aug 22, 2025, 3:44:45 AMAug 22
to David Christensen, ques...@freebsd.org
David Christensen <dpch...@holgerdanske.com> writes:
> Again, I would check if the failed disk and the other disck all have
> the same MBR. If so, you could clone one of them into the MBR of
> replacement disk.

No. Use `gpart bootcode -p /boot/pmbr adaN`.

Robert

unread,
Aug 22, 2025, 10:03:40 AMAug 22
to ques...@freebsd.org
On 8/22/2025 3:44 AM, Dag-Erling Smørgrav wrote:
> No. Use `gpart bootcode -p /boot/pmbr adaN`.

Yes, this is what I've used in the past. So, I would not need to examine
the disks for the same MBR and just use the gpart bootcode above on the
new disk?

Thanks to all for the pointers!


Dag-Erling Smørgrav

unread,
Aug 22, 2025, 11:22:13 AMAug 22
to Robert, ques...@freebsd.org
Robert <rob...@webtent.org> writes:
> So, I would not need to examine the disks for the same MBR and just
> use the gpart bootcode above on the new disk?

Yes. And if you haven't done so in a while (i.e. not since your last
upgrade), I would run `gpart bootcode` on all disks. You can combine -b
and -p in a single command, e.g.

gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0

Robert

unread,
Aug 22, 2025, 12:54:00 PMAug 22
to ques...@freebsd.org

Perhaps. As the host name nomenclature would suggest, it is our db server that serves lightweight bind9 DNS and LDAP to serve mail routing to our MX servers. The PostgreSQL houses the spam filtering data for same MX servers that run Amavisd with SpamAssassin and some other small utility databases. That's it in a nutshell. 

I have two of these for redundancy primary/secondary DNS, openldap and PostgreSQL replication. As we know, these servers are setup with root on ZFS and gmirror swap. I've done zfs send/recv with data snapshots, but never for this root on ZFS setup. I'd like to also put a plan together for cloning to a new server. Could I get some pointers to good posts or docs on doing this and possibly start another thread to discuss questions? Perhaps this involves dump and beadm.

--
Robert

David Christensen

unread,
Aug 22, 2025, 4:12:41 PMAug 22
to ques...@freebsd.org
On 8/22/25 08:21, Dag-Erling Smørgrav wrote:
> Robert <rob...@webtent.org> writes:
>> So, I would not need to examine the disks for the same MBR and just
>> use the gpart bootcode above on the new disk?
>
> Yes. And if you haven't done so in a while (i.e. not since your last
> upgrade), I would run `gpart bootcode` on all disks. You can combine -b
> and -p in a single command, e.g.
>
> gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0
>
> DES


Thank you for clarifying the straight-forward method for installing the
MBR bootstrap code and the FreeBSD boot partition bootloader code.


In the general case, the FreeBSD installer or sysadmin needs to figure
out which bootcode to use, which partcode to use, and which geom to use.
I have RTFM gpart(8) several times now, but it is taking me a while to
get a mental picture of the FreeBSD bootstrap decision tree formed by
firmware (BIOS vs. UEFI) and sysamin choices (partition/slice scheme,
volume management, filesystem, etc.). Is there a single document that
clearly explains all of this stuff?


David


David Christensen

unread,
Aug 22, 2025, 9:14:14 PMAug 22
to ques...@freebsd.org
Some thoughts:

1. Put your OS on a disk-on-module, single SSD, or gmirror/ ZFS mirror
of two such devices. Keep the OS instance small and self-contained. In
addition to backups, take images regularly.

2. Put your data on a ZFS pool with suitable redundancy and performance.

3. ZFS mirrors and stripes-of-mirrors perform better and resilver
faster than ZFS RAID-Z or stripes-of-RAID-Z's, but they are not as space
efficient. Mirrors should not be a problem for a 300 GB pool.

4. AIUI some databases can manage disks directly. I suggest
researching databases on raw disks vs. databases on ZFS volumes vs.
databases on ZFS file systems. If the first is compelling, include
dedicated SSD's for the database.

5. Root-on-ZFS offers many features (including replication), most
notably boot environments. I started with this, but now use UFS.

6. It is easier to manage multiple computers with similar hardware than
to manage multiple computers with differing hardware. Your organization
might benefit from standardizing on one or a few server platforms. You
would then pick a suitable platform, and add disks and options as needed
for the next pair of database servers.

7. Virtualization and containers are possibilities, especially if you
want isolation for security and/or damage control.


David


Matthias Fechner

unread,
Aug 23, 2025, 12:22:04 AMAug 23
to Robert, ques...@freebsd.org
if you never did such a restore I suggest you setup a virtual machine in
a similar way you have your setup (I think do a default install and use
two additional disc to build an additional zpool is enough).
You can use much smaller discs, it is just to simulate the commands.

If the setup is done, make a snapshot.

Then just remove/detach a disc and see what happens.
Add a new disc and try to replace it in the zpool.

This will give you some practice.

You can ask and read as many as you want, you will only get your answers
if you try it and a virtual machine is here perfect to test this.

Matthias



Frank Leonhardt

unread,
Aug 29, 2025, 8:23:57 AM (9 days ago) Aug 29
to ques...@freebsd.org
On 21/08/2025 01:55, Robert wrote:
> I have my first zpool degraded on a FreeBSD 13.5 server and looking
> for advice on the steps I'll be taking to successfully replace the
> REMOVED drive in a 4 disk 2 mirror zpool. It is scrubbed monthly with
> last scrub August 3rd...
>
Funnily enough I wrote about this exact thing in Feb:

https://blog.frankleonhardt.com/2025/freebsd-zfs-raidz-failed-disk-replacement/

Okay, you're configuration is mirrors but I think most of the same
applies. Note what I say about making the replacement disk bootable,
which doesn't seem to have been mentioned in this tread.

Also, posted a script on this list that installed ZFS manually and other
stuff a few months later. Guess what got me interested :-)

Since then I've had a spectacular failure of a ZFS mirror that had been
happy for years. The problem with mirrored drives is that when you
replace one it thrashes the other while it's resilvering, so if the old
one becoming dodgy it could push it over the edge. I hope you have more
luck than I did.




Reply all
Reply to author
Forward
0 new messages