Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

ZFS trashed by bad import

14 views
Skip to first unread message

James Phillips via freebsd-fs

unread,
Feb 17, 2018, 2:53:53 AM2/17/18
to
Was considering posting this on the forum: but the rules on topic selection suggested really specific things should be on the mailing list.

Short version (reconstructed from notes):

On a fresh 11.1 install:

# zpool import -> shows list of available pools, including degraded striped mirror.
# zpool import -f 8255478166520290766 granny
# zpool status (any zpool command cause same error):
internal error: failed to initialize ZFS library

Upon reboot, I was not able to switch VT consoles or log in.

Tried telling the BIOS to boot from my old installation (granny), and it failed after kernel device
detection.

*Background*:

Granny was originally a 160GB ZFS mirrored with FreeBSD 10. I later expanded the pool with a mirrored pair of 80GB drives.

I had successfully tested booting with a simulated controller failure. (each mirror was on a different disk controller + all drives had a boot partition set)

About a week ago, one of my drives appeared to fail:

(ada1:ata2:0:1:0): Error 5. retries exhausted
GEOM_ELI: g_eli_read_done() failed (error=5) label/granny3p1.eli[READ(offset=32083968, lengt...
swap_pager: I/O error - pagein failed; blkno 2367129, size 4096, erro 5
va_fault: pager read error, pid 91969 (xfdesktop)
(ada1:ata2:0:1:0): READ_DMA. ACB: c8 00 32 9b 00 40 00 00 00 00 18 00
(ada1:ata2:0:1:0): CAM status: ATA Status Error
(ada1:ata2:0:1:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC )
(ada1:ata2:0:1:0): RES: 51 40 42 9b 00 00 00 00 00 08
(ada1:ata2:0:1:0): Retrying command
(ada1:ata2:0:1:0): READ_DMA. ACB: c8 00 32 9b 00 40 00 00 00 00 18 00
...

I was able to (temporarily) use my computer again by pulling one of the IDE cables. (by luck guessed which side the first time -- did not notice the label above until I typed this.)
Was a little surprised it was not the drive re-certified by manufacturer software after throwing errors (years ago).

I decided to resolve the problem by moving to a ZFS mirror on a pair of 2TB drives. Incidentally, I accidentally deleted pkg while trying to update the ports collection, so decided a fresh BSD 11 install may be a good idea as well.

*Confounding variables*:

While pulling the defective half of the mirror, I tentatively ruled heat death due to dust build up on air intake. However, I also noticed the Northbridge heatsink was loose due to a broken clip.

Because my "real" machine (with ECC RAM even) is going to be delayed at least a week, I decided to do a temporary board swap with an older machine I had laying around. This machine was overclocked by under-volting, and pushing thermal limits of the CPU (while under-clocking RAM), then backing off a bit to tolerate summer heat.

I mention the over-clocking because the system failed to boot properly after installation. I bumped the voltage a little, but it may have had to do with BIOS Booting from an unexpected drive instead. (the 2TB disks were seen as ad2 and ad3). The Over-clock was stable when that machine went in storage around a year ago. However it is now in a case with a different PSU (same wattage, more efficient), and more drives.

Tried all the ZFS options in the BSD 11 install wizard:
2 disk mirror
4k sectors - GPT partition
Encrypted disks - 50GB swap (large for the memory: 3200MB)
Mirror swap - Encrypt swap
-> Note: granny only had encrypted (non mirrored) swap: could not get encrypted striping to work.

System hardening:
- clean /tmp on startup
- disable opening sylogd network socket

At the time of the failure, I was running mprime (prime95) in the back-ground, and periodically monitoring
CPU temperature and fan speed. This implies that ZFS had only ~1600MB to work with (3200MB-1600MB used by mprime)

*Next Steps*:

1. image all 4 drives (one at a time) onto a third 2TB drive with the System Rescue CD and dd-rescue.
2. Try to import the degraded mirror with a BSD live DVD (and re-export if successful, I guess)

Depending on results of step 2:
- find machine with ECC RAM, put granny3 on a fresh drive, and tell ZFS to scrub?
- copy over boot partitions that may have been clobbered by BSD 11 install?

If all else fails, I did do a full export in the last 90 days.

Regards,

James Phillips

Note: not subscribed to the list.
_______________________________________________
freeb...@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-...@freebsd.org"
0 new messages