Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

ZFS pool permanent error question -- errors: Permanent errors have been detected in the following files: storage: <0x0>

128 views

Skip to first unread message

Anders Jensen-Waud

unread,

Jun 15, 2014, 1:04:16 AM6/15/14

Hi all,

My main zfs storage pool (named ``storage'') has recently started
displaying a very odd error:

root@beastie> zpool status -v
/

pool: backup

state: ONLINE

scan: none requested

config:

NAME STATE READ WRITE CKSUM

backup ONLINE 0 0 0

da1 ONLINE 0 0 0

errors: No known data errors

pool: storage

state: ONLINE

status: One or more devices has experienced an error resulting in data

corruption. Applications may be affected.

action: Restore the file in question if possible. Otherwise restore the

entire pool from backup.

see: http://illumos.org/msg/ZFS-8000-8A

scan: scrub in progress since Sun Jun 15 14:18:45 2014

34.3G scanned out of 839G at 19.3M/s, 11h50m to go

72K repaired, 4.08% done

config:

NAME STATE READ WRITE CKSUM

storage ONLINE 0 0 0

da0 ONLINE 0 0 0 (repairing)

errors: Permanent errors have been detected in the following files:

storage:<0x0>

My dmesg:

Copyright (c) 1992-2014 The FreeBSD Project.

Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994

The Regents of the University of California. All rights reserved.

FreeBSD is a registered trademark of The FreeBSD Foundation.

FreeBSD 10.0-RELEASE-p1 #0: Tue Apr 8 06:45:06 UTC 2014

ro...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64

FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610

CPU: Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz (1995.04-MHz K8-class
CPU)

Origin = "GenuineIntel" Id = 0x6fa Family = 0x6 Model = 0xf Stepping
= 10

Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>

Features2=0xe3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM>

AMD Features=0x20100800<SYSCALL,NX,LM>

AMD Features2=0x1<LAHF>

TSC: P-state invariant, performance statistics

real memory = 3221225472 (3072 MB)

avail memory = 3074908160 (2932 MB)

Event timer "LAPIC" quality 400

ACPI APIC Table: <LENOVO TP-7L >

FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs

FreeBSD/SMP: 1 package(s) x 2 core(s)

cpu0 (BSP): APIC ID: 0

cpu1 (AP): APIC ID: 1

ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe1Block: 0/32
(20130823/tbfadt-601)

ACPI BIOS Warning (bug): Optional FADT field Gpe1Block has zero address or
length: 0x000000000000102C/0x0 (20130823/tbfadt-630)

ioapic0: Changing APIC ID to 1

ioapic0 <Version 2.0> irqs 0-23 on motherboard

kbd1 at kbdmux0

random: <Software, Yarrow> initialized

acpi0: <LENOVO TP-7L> on motherboard

CPU0: local APIC error 0x40

acpi_ec0: <Embedded Controller: GPE 0x12, ECDT> port 0x62,0x66 on acpi0

acpi0: Power Button (fixed)

acpi0: reservation of 0, a0000 (3) failed

acpi0: reservation of 100000, bef00000 (3) failed

cpu0: <ACPI CPU> on acpi0

cpu1: <ACPI CPU> on acpi0

attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0

Timecounter "i8254" frequency 1193182 Hz quality 0

Event timer "i8254" frequency 1193182 Hz quality 100

hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0

Timecounter "HPET" frequency 14318180 Hz quality 950

Event timer "HPET" frequency 14318180 Hz quality 450

Event timer "HPET1" frequency 14318180 Hz quality 440

Event timer "HPET2" frequency 14318180 Hz quality 440

atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0

Event timer "RTC" frequency 32768 Hz quality 0

Timecounter "ACPI-fast" frequency 3579545 Hz quality 900

acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0

acpi_lid0: <Control Method Lid Switch> on acpi0

acpi_button0: <Sleep Button> on acpi0

pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0

pci0: <ACPI PCI bus> on pcib0

pcib1: <ACPI PCI-PCI bridge> irq 16 at device 1.0 on pci0

pci1: <ACPI PCI bus> on pcib1

vgapci0: <VGA-compatible display> port 0x2000-0x207f mem
0xd6000000-0xd6ffffff,0xe0000000-0xefffffff,0xd4000000-0xd5ffffff irq 16 at
device 0.0 on pci1

vgapci0: Boot video device

em0: <Intel(R) PRO/1000 Network Connection 7.3.8> port 0x1840-0x185f mem
0xfe200000-0xfe21ffff,0xfe225000-0xfe225fff irq 20 at device 25.0 on pci0

em0: Using an MSI interrupt

em0: Ethernet address: 00:15:58:c6:c3:3f

uhci0: <Intel 82801H (ICH8) USB controller USB-D> port 0x1860-0x187f irq 20
at device 26.0 on pci0

usbus0 on uhci0

uhci1: <Intel 82801H (ICH8) USB controller USB-E> port 0x1880-0x189f irq 21
at device 26.1 on pci0

usbus1 on uhci1

ehci0: <Intel 82801H (ICH8) USB 2.0 controller USB2-B> mem
0xfe226c00-0xfe226fff irq 22 at device 26.7 on pci0

usbus2: EHCI version 1.0

usbus2 on ehci0

hdac0: <Intel 82801H HDA Controller> mem 0xfe220000-0xfe223fff irq 17 at
device 27.0 on pci0

pcib2: <ACPI PCI-PCI bridge> irq 20 at device 28.0 on pci0

pci2: <ACPI PCI bus> on pcib2

pcib3: <ACPI PCI-PCI bridge> irq 21 at device 28.1 on pci0

pci3: <ACPI PCI bus> on pcib3

iwn0: <Intel Wireless WiFi Link 4965> mem 0xdf2fe000-0xdf2fffff irq 17 at
device 0.0 on pci3

pcib4: <ACPI PCI-PCI bridge> irq 22 at device 28.2 on pci0

pci4: <ACPI PCI bus> on pcib4

pcib5: <ACPI PCI-PCI bridge> irq 23 at device 28.3 on pci0

pci5: <ACPI PCI bus> on pcib5

pcib6: <ACPI PCI-PCI bridge> irq 20 at device 28.4 on pci0

pci13: <ACPI PCI bus> on pcib6

uhci2: <Intel 82801H (ICH8) USB controller USB-A> port 0x18a0-0x18bf irq 16
at device 29.0 on pci0

usbus3 on uhci2

uhci3: <Intel 82801H (ICH8) USB controller USB-B> port 0x18c0-0x18df irq 17
at device 29.1 on pci0

usbus4 on uhci3

uhci4: <Intel 82801H (ICH8) USB controller USB-C> port 0x18e0-0x18ff irq 18
at device 29.2 on pci0

usbus5 on uhci4

ehci1: <Intel 82801H (ICH8) USB 2.0 controller USB2-A> mem
0xfe227000-0xfe2273ff irq 19 at device 29.7 on pci0

usbus6: EHCI version 1.0

usbus6 on ehci1

pcib7: <ACPI PCI-PCI bridge> at device 30.0 on pci0

pci21: <ACPI PCI bus> on pcib7

cbb0: <RF5C476 PCI-CardBus Bridge> mem 0xf8100000-0xf8100fff irq 16 at
device 0.0 on pci21

cardbus0: <CardBus bus> on cbb0

pccard0: <16-bit PCCard bus> on cbb0

pci21: <serial bus, FireWire> at device 0.1 (no driver attached)

sdhci_pci0: <RICOH R5C822 SD> mem 0xf8101800-0xf81018ff irq 18 at device
0.2 on pci21

sdhci_pci0: 1 slot(s) allocated

pci21: <base peripheral> at device 0.3 (no driver attached)

pci21: <base peripheral> at device 0.4 (no driver attached)

pci21: <base peripheral> at device 0.5 (no driver attached)

isab0: <PCI-ISA bridge> at device 31.0 on pci0

isa0: <ISA bus> on isab0

atapci0: <Intel ICH8M UDMA100 controller> port
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x1830-0x183f at device 31.1 on pci0

ata0: <ATA channel> at channel 0 on atapci0

ahci0: <Intel ICH8M AHCI SATA controller> port
0x1c48-0x1c4f,0x1c1c-0x1c1f,0x1c40-0x1c47,0x1c18-0x1c1b,0x1c20-0x1c3f mem
0xfe226000-0xfe2267ff irq 16 at device 31.2 on pci0

ahci0: AHCI v1.10 with 3 1.5Gbps ports, Port Multiplier not supported

ahcich0: <AHCI channel> at channel 0 on ahci0

ahcich2: <AHCI channel> at channel 2 on ahci0

pci0: <serial bus, SMBus> at device 31.3 (no driver attached)

acpi_tz0: <Thermal Zone> on acpi0

acpi_tz1: <Thermal Zone> on acpi0

atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0

atkbd0: <AT Keyboard> irq 1 on atkbdc0

kbd0 at atkbd0

atkbd0: [GIANT-LOCKED]

psm0: <PS/2 Mouse> irq 12 on atkbdc0

psm0: [GIANT-LOCKED]

psm0: model Generic PS/2 mouse, device ID 0psm0: model Generic PS/2 mouse,
device ID 0

battery0: <ACPI Control Method Battery> on acpi0

acpi_acad0: <AC Adapter> on acpi0

orm0: <ISA Option ROMs> at iomem
0xc0000-0xcefff,0xcf000-0xcffff,0xd0000-0xd0fff,0xe0000-0xeffff on isa0

sc0: <System console> at flags 0x100 on isa0

sc0: VGA <16 virtual consoles, flags=0x300>

vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0

ppc0: cannot reserve I/O port range

est0: <Enhanced SpeedStep Frequency Control> on cpu0

p4tcc0: <CPU Frequency Thermal Control> on cpu0

est1: <Enhanced SpeedStep Frequency Control> on cpu1

p4tcc1: <CPU Frequency Thermal Control> on cpu1

Timecounters tick every 1.000 msec

hdacc0: <Analog Devices AD1984 HDA CODEC> at cad 0 on hdac0

hdaa0: <Analog Devices AD1984 Audio Function Group> at nid 1 on hdacc0

pcm0: <Analog Devices AD1984 (Analog 2.0+HP/2.0)> at nid 18,17 and 28,20,21
on hdaa0

pcm1: <Analog Devices AD1984 (Ext-Rear Digital)> at nid 27 on hdaa0

hdacc1: <Conexant (0x2bfa) HDA CODEC> at cad 1 on hdac0

unknown: <Conexant (0x2bfa) HDA CODEC Modem Function Group> at nid 2 on
hdacc1 (no driver attached)

random: unblocking device.

usbus0: 12Mbps Full Speed USB v1.0

usbus1: 12Mbps Full Speed USB v1.0

usbus2: 480Mbps High Speed USB v2.0

usbus3: 12Mbps Full Speed USB v1.0

usbus4: 12Mbps Full Speed USB v1.0

usbus5: 12Mbps Full Speed USB v1.0

usbus6: 480Mbps High Speed USB v2.0

ugen0.1: <Intel> at usbus0

uhub0: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0

ugen2.1: <Intel> at usbus2

uhub1: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus2

ugen1.1: <Intel> at usbus1

uhub2: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1

ugen5.1: <Intel> at usbus5

uhub3: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus5

ugen4.1: <Intel> at usbus4

uhub4: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4

ugen3.1: <Intel> at usbus3

uhub5: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus3

ugen6.1: <Intel> at usbus6

uhub6: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus6

ada0 at ahcich0 bus 0 scbus1 target 0 lun 0

ada0: <Hitachi HTS722020K9SA00 DC4OC76A> ATA-8 SATA 1.x device

ada0: Serial Number 071201DP0410DTG7P9AP

ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)

ada0: Command Queueing enabled

ada0: 190782MB (390721968 512 byte sectors: 16H 63S/T 16383C)

ada0: Previously was known as ad4

cd0 at ata0 bus 0 scbus0 target 0 lun 0

cd0: <MATSHITA DVD-RAM UJ-852 RB01> Removable CD-ROM SCSI-0 device

cd0: Serial Number HC43 045371

cd0: 33.300MB/s transfers (UDMA2, ATAPI 12bytes, PIO 65534bytes)

cd0: Attempt to query device size failed: NOT READY, Medium not present

Netvsc initializing... SMP: AP CPU #1 Launched!

Root mount waiting for: usbus6 usbus5 usbus4 usbus3 usbus2 usbus1 usbus0

uhub2: 2 ports with 2 removable, self powered

uhub3: 2 ports with 2 removable, self powered

uhub0: 2 ports with 2 removable, self powered

uhub5: 2 ports with 2 removable, self powered

uhub4: 2 ports with 2 removable, self powered

Root mount waiting for: usbus6 usbus2

uhub1: 4 ports with 4 removable, self powered

Root mount waiting for: usbus6 usbus2

uhub6: 6 ports with 6 removable, self powered

ada0 at ahcich0 bus 0 scbus1 target 0 lun 0

ada0: <Hitachi HTS722020K9SA00 DC4OC76A> ATA-8 SATA 1.x device

ada0: Serial Number 071201DP0410DTG7P9AP

ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)

ada0: Command Queueing enabled

ada0: 190782MB (390721968 512 byte sectors: 16H 63S/T 16383C)

ada0: Previously was known as ad4

cd0 at ata0 bus 0 scbus0 target 0 lun 0

cd0: <MATSHITA DVD-RAM UJ-852 RB01> Removable CD-ROM SCSI-0 device

cd0: Serial Number HC43 045371

cd0: 33.300MB/s transfers (UDMA2, ATAPI 12bytes, PIO 65534bytes)

cd0: Attempt to query device size failed: NOT READY, Medium not present

Netvsc initializing... SMP: AP CPU #1 Launched!

Root mount waiting for: usbus6 usbus5 usbus4 usbus3 usbus2 usbus1 usbus0

uhub2: 2 ports with 2 removable, self powered

uhub3: 2 ports with 2 removable, self powered

uhub0: 2 ports with 2 removable, self powered

uhub5: 2 ports with 2 removable, self powered

uhub4: 2 ports with 2 removable, self powered

Root mount waiting for: usbus6 usbus2

uhub1: 4 ports with 4 removable, self powered

Root mount waiting for: usbus6 usbus2

uhub6: 6 ports with 6 removable, self powered

Root mount waiting for: usbus6

ugen0.2: <STMicroelectronics> at usbus0

Root mount waiting for: usbus6

ugen6.2: <Seagate> at usbus6

umass0: <Seagate Expansion, class 0/0, rev 2.10/1.00, addr 2> on usbus6

umass0: SCSI over Bulk-Only; quirks = 0x0100

umass0:3:0:-1: Attached to scbus3

da0 at umass-sim0 bus 0 scbus3 target 0 lun 0

da0: <Seagate Expansion 060E> Fixed Direct Access SCSI-6 device

da0: Serial Number NA46H44R

da0: 40.000MB/s transfers

da0: 953869MB (1953525167 512 byte sectors: 255H 63S/T 121601C)

da0: quirks=0x2<NO_6_BYTE>

GEOM: da0: the primary GPT table is corrupt or invalid.

GEOM: da0: using the secondary instead -- recovery strongly advised.

GEOM: diskid/DISK-NA46H44R: the primary GPT table is corrupt or invalid.

GEOM: diskid/DISK-NA46H44R: using the secondary instead -- recovery
strongly advised.

Root mount waiting for: usbus6

ugen6.3: <Seagate> at usbus6

umass1: <Interface0> on usbus6

umass1: SCSI over Bulk-Only; quirks = 0x0100

umass1:4:1:-1: Attached to scbus4

Trying to mount root from ufs:/dev/ada0p2 [rw]...

da1 at umass-sim1 bus 1 scbus4 target 0 lun 0

da1: <Seagate FreeAgent Go 102D> Fixed Direct Access SCSI-4 device

da1: Serial Number 2GE1GTVM

da1: 40.000MB/s transfers

da1: 476940MB (976773168 512 byte sectors: 255H 63S/T 60801C)

da1: quirks=0x2<NO_6_BYTE>

GEOM: da1: the primary GPT table is corrupt or invalid.

GEOM: da1: using the secondary instead -- recovery strongly advised.

GEOM: diskid/DISK-2GE1GTVM: the primary GPT table is corrupt or invalid.

GEOM: diskid/DISK-2GE1GTVM: using the secondary instead -- recovery
strongly advised.

ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is
present;

to enable, add "vfs.zfs.prefetch_disable=0" to
/boot/loader.conf.

ZFS filesystem version: 5

ZFS storage pool version: features support (5000)

Cheers

Anders
_______________________________________________
freeb...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-...@freebsd.org"

kpn...@pobox.com

unread,

Jun 15, 2014, 5:10:52 PM6/15/14

I'm not sure what causes ZFS to lose the filename like this. I'll let
someone else comment. I want to say you have a corrupt file in a snapshot,
but don't hold me to that.

It looks like you are running ZFS with pools consisting of a single disk.
In cases like this if ZFS detects that a file has been corrupted ZFS is
unable to do anything to fix it. Run with the option "copies=2" to have
two copies of every file if you want ZFS to be able to fix broken files.
Of course, this doubles the amount of space you will use, so you have to
think about how important your data is to you.

I don't know what caused the corrupt file. It could be random chance, or
it could be that you accidentally did something to damage the pool. I say
that because:

> da1 at umass-sim1 bus 1 scbus4 target 0 lun 0
> da1: <Seagate FreeAgent Go 102D> Fixed Direct Access SCSI-4 device
> da1: Serial Number 2GE1GTVM
> da1: 40.000MB/s transfers
> da1: 476940MB (976773168 512 byte sectors: 255H 63S/T 60801C)
> da1: quirks=0x2<NO_6_BYTE>
> GEOM: da1: the primary GPT table is corrupt or invalid.
> GEOM: da1: using the secondary instead -- recovery strongly advised.
> GEOM: diskid/DISK-2GE1GTVM: the primary GPT table is corrupt or invalid.
> GEOM: diskid/DISK-2GE1GTVM: using the secondary instead -- recovery
> strongly advised.

You've got something going on here. Did you GPT partition the disk? The
zpool status you posted says you built your pools on the entire disk and
not inside a partition. But GEOM is saying the disk has been partitioned.
GPT stores data at both the beginning and end of the disk. ZFS may have
trashed the beginning of the disk but not gotten to the end yet.

Running ZFS in a partition or on the entire disk is fine either way. But
you have to be consistent. Partitioning a disk and then writing outside
of the partition creates errors like the above GEOM one.
--
Kevin P. Neal http://www.pobox.com/~kpn/
"Not even the dumbest terrorist would choose an encryption program that
allowed the U.S. government to hold the key." -- (Fortune magazine
is smarter than the US government, Oct 29 2001, page 196.)

Andrew Berg

unread,

Jun 15, 2014, 6:24:17 PM6/15/14

On 2014.06.15 16:10, kpn...@pobox.com wrote:
> It looks like you are running ZFS with pools consisting of a single disk.
> In cases like this if ZFS detects that a file has been corrupted ZFS is
> unable to do anything to fix it. Run with the option "copies=2" to have
> two copies of every file if you want ZFS to be able to fix broken files.
> Of course, this doubles the amount of space you will use, so you have to
> think about how important your data is to you.

A proper mirror with another disk would protect against disk failure and
give better performance with the same space cost, so doing that is
recommended over using copies=2.

> Running ZFS in a partition or on the entire disk is fine either way. But
> you have to be consistent. Partitioning a disk and then writing outside
> of the partition creates errors like the above GEOM one.

I recommend using a partition solely to take advantage of GPT labels.
Identifying disks is much easier when you create a pool using devices
from labels (/dev/gpt/yourlabel). Even more so if you have a matching
physical label on the disk.

Anders Jensen-Waud

unread,

Jun 15, 2014, 10:49:42 PM6/15/14

> It looks like you are running ZFS with pools consisting of a single disk.
> In cases like this if ZFS detects that a file has been corrupted ZFS is
> unable to do anything to fix it. Run with the option "copies=2" to have
> two copies of every file if you want ZFS to be able to fix broken files.
> Of course, this doubles the amount of space you will use, so you have to
> think about how important your data is to you.

Thank you for the tip. I didn't know about copies=2, so I will
definitely consider that option.

I am running ZFS on a single disk -- a 1 TB USB drive -- attached to my
"server" at home. It is not exactly an enterprise server, but it fits
well for my home purposes, namely file backup from my different
computers. On a nightly basis I then copy and compress the data sets
from storage to another USB drive to have a second copy. In this
instance, the nightly backup script (zfs send/recv based) hadn't run
properly so I had no backup to recover from.

Given that my machine only has 3 GB RAM, I was wondering if the issue
might be memory related and if I am better off converting the volume
back to UFS. I am keen to stay on ZFS to benefit from snapshots,
compression, security etc. Any thoughts?

>
> I don't know what caused the corrupt file. It could be random chance, or
> it could be that you accidentally did something to damage the pool. I say
> that because:
>
> > da1 at umass-sim1 bus 1 scbus4 target 0 lun 0
> > da1: <Seagate FreeAgent Go 102D> Fixed Direct Access SCSI-4 device
> > da1: Serial Number 2GE1GTVM
> > da1: 40.000MB/s transfers
> > da1: 476940MB (976773168 512 byte sectors: 255H 63S/T 60801C)
> > da1: quirks=0x2<NO_6_BYTE>
> > GEOM: da1: the primary GPT table is corrupt or invalid.
> > GEOM: da1: using the secondary instead -- recovery strongly advised.
> > GEOM: diskid/DISK-2GE1GTVM: the primary GPT table is corrupt or invalid.
> > GEOM: diskid/DISK-2GE1GTVM: using the secondary instead -- recovery
> > strongly advised.
>
> You've got something going on here. Did you GPT partition the disk? The
> zpool status you posted says you built your pools on the entire disk and
> not inside a partition. But GEOM is saying the disk has been partitioned.
> GPT stores data at both the beginning and end of the disk. ZFS may have
> trashed the beginning of the disk but not gotten to the end yet.

This disk is not the ``storage'' zpool -- it is my ``backup'' pool,
which is on a different drive:

NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
backup 464G 235G 229G 50% 1.00x ONLINE -
storage 928G 841G 87.1G 90% 1.00x ONLINE -

Running 'gpt recover /dev/da1' fixes the error above but after a reboot
it reappears. Would it be better to completely wipe the disk and
reinitialise it with zfs?

Miraculously, an overnight 'zpool scrub storage' has wiped out the errors
from yesterday, and I am puzzled why that is the case. As per the
original zpool status from yesterday, ZFS warned that I needed to
recover all the files from backup

aj@beastie> zpool status ~

pool: backup
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
backup ONLINE 0 0 0
da1 ONLINE 0 0 0

errors: No known data errors

pool: storage
state: ONLINE

scan: scrub repaired 984K in 11h37m with 0 errors on Mon Jun 16 01:55:48 2014

config:

NAME STATE READ WRITE CKSUM
storage ONLINE 0 0 0
da0 ONLINE 0 0 0

errors: No known data errors

Agree. In this instance it wasn't da0/storage, however.

> --
> Kevin P. Neal http://www.pobox.com/~kpn/
> "Not even the dumbest terrorist would choose an encryption program that
> allowed the U.S. government to hold the key." -- (Fortune magazine
> is smarter than the US government, Oct 29 2001, page 196.)

--
Anders Jensen-Waud
E: and...@jensenwaud.com

Fabian Keil

unread,

Jun 16, 2014, 3:39:28 AM6/16/14

Anders Jensen-Waud <and...@jensenwaud.com> wrote:

> On Sun, Jun 15, 2014 at 05:10:52PM -0400, kpn...@pobox.com wrote:
> > On Sun, Jun 15, 2014 at 03:04:16PM +1000, Anders Jensen-Waud wrote:
> > > Hi all,
> > >
> > > My main zfs storage pool (named ``storage'') has recently started
> > > displaying a very odd error:

[...]

> > > errors: Permanent errors have been detected in the following files:
> > > storage:<0x0>
> >
> > I'm not sure what causes ZFS to lose the filename like this. I'll let
> > someone else comment. I want to say you have a corrupt file in a
> > snapshot, but don't hold me to that.
> >
> > It looks like you are running ZFS with pools consisting of a single
> > disk. In cases like this if ZFS detects that a file has been corrupted
> > ZFS is unable to do anything to fix it. Run with the option "copies=2"
> > to have two copies of every file if you want ZFS to be able to fix
> > broken files. Of course, this doubles the amount of space you will
> > use, so you have to think about how important your data is to you.
>
> Thank you for the tip. I didn't know about copies=2, so I will
> definitely consider that option.
>
> I am running ZFS on a single disk -- a 1 TB USB drive -- attached to my
> "server" at home. It is not exactly an enterprise server, but it fits
> well for my home purposes, namely file backup from my different
> computers. On a nightly basis I then copy and compress the data sets
> from storage to another USB drive to have a second copy. In this
> instance, the nightly backup script (zfs send/recv based) hadn't run
> properly so I had no backup to recover from.
>
> Given that my machine only has 3 GB RAM, I was wondering if the issue
> might be memory related and if I am better off converting the volume
> back to UFS. I am keen to stay on ZFS to benefit from snapshots,
> compression, security etc. Any thoughts?

I doubt that the issue is memory related. BTW, I use single-disk
pools for backups as well and one of my systems only has 2 GB RAM.

My impression is that ZFS's "permanent error" detection is flawed
and may also count (some) temporary errors as permanent.

If the "permanent errors" don't survive scrubbing, I wouldn't
worry about them, especially if no corrupt files are mentioned.

> > You've got something going on here. Did you GPT partition the disk? The
> > zpool status you posted says you built your pools on the entire disk
> > and not inside a partition. But GEOM is saying the disk has been
> > partitioned. GPT stores data at both the beginning and end of the
> > disk. ZFS may have trashed the beginning of the disk but not gotten to
> > the end yet.
>
> This disk is not the ``storage'' zpool -- it is my ``backup'' pool,
> which is on a different drive:
>
> NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
> backup 464G 235G 229G 50% 1.00x ONLINE -
> storage 928G 841G 87.1G 90% 1.00x ONLINE -
>
> Running 'gpt recover /dev/da1' fixes the error above but after a reboot
> it reappears. Would it be better to completely wipe the disk and
> reinitialise it with zfs?

As you mentioned being keen on security above, I think it would
make sense to wipe the disk to add geli encryption to the mix [0],
but I doubt that the gpt complaints are related to the "problem".

Fabian

[0] I use zogftw for this: http://www.fabiankeil.de/gehacktes/zogftw/

signature.asc

Tom Evans

unread,

Jun 16, 2014, 4:40:17 AM6/16/14

On Mon, Jun 16, 2014 at 3:49 AM, Anders Jensen-Waud
<and...@jensenwaud.com> wrote:
> Running 'gpt recover /dev/da1' fixes the error above but after a reboot
> it reappears. Would it be better to completely wipe the disk and
> reinitialise it with zfs?
>

You agree, but both of your pools are on the whole disk - da0 and da1
- and not on any partition on that disk. This means that consistently
ZFS will trash the GPT table/labels, because you have told it that it
can write there.

If you GPT partition your disk, your pool should consist of partitions
- da1p1, not da1. You do not need to GPT partition your disk unless
you want to.

Cheers

Tom

Warren Block

unread,

Jun 16, 2014, 10:11:42 AM6/16/14

On Mon, 16 Jun 2014, Anders Jensen-Waud wrote:

> This disk is not the ``storage'' zpool -- it is my ``backup'' pool,
> which is on a different drive:
>
> NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
> backup 464G 235G 229G 50% 1.00x ONLINE -
> storage 928G 841G 87.1G 90% 1.00x ONLINE -

What does 'zpool status' say about the device names of that pool?

> Running 'gpt recover /dev/da1' fixes the error above but after a reboot
> it reappears. Would it be better to completely wipe the disk and
> reinitialise it with zfs?

Most likely the problem is that the disk was GPT partitioned, but when
the pool was created, ZFS was told to use the whole disk (ada0) rather
than just a partition (ada0p1). One of the partition tables was
overwritten by ZFS information. Possibly this space was mostly unused
by ZFS, because otherwise a 'gpart recover' would have damaged it. This
could also have happened if GPT partitioning was not cleared from the
disk before using it for ZFS. ZFS leaves some unused space at the end
of the disk, enough to not overwrite a backup GPT. That would be
detected by GEOM, and not match the primary, which was overwritten by
ZFS. The error would be spurious, but attempting a recovery could
overwrite actual ZFS data.

ZFS works fine on whole disks or in partitions. But yes, in this case,
I'd back up, destroy the pool, destroy partition information on the
drives, then recreate the pool.

A handy way to make sure a backup GPT table is not left on a disk is to
create and then destroy GPT partitioning:

gpart destroy -F adaN
gpart create -s gpt adaN
gpart destroy adaN

0 new messages