Re: ZFS pool faulted (corrupt metadata) but the disk data appears

Message has been deleted

Michelle Sullivan

unread,

Feb 5, 2015, 5:32:27 PM2/5/15

to

Michelle Sullivan wrote:
> Any clues on this?
>
> root@colossus:~ # zpool import
> pool: storage
> id: 10618504954404185222
> state: FAULTED
> status: The pool metadata is corrupted.
> action: The pool cannot be imported due to damaged devices or data.
> The pool may be active on another system, but can be imported using
> the '-f' flag.
> see: http://illumos.org/msg/ZFS-8000-72
> config:
>
> storage FAULTED corrupted data
> raidz2-0 ONLINE
> mfid0 ONLINE
> mfid15 ONLINE
> mfid1 ONLINE
> mfid2 ONLINE
> mfid3 ONLINE
> mfid4 ONLINE
> mfid5 ONLINE
> replacing-7 ONLINE
> mfid13 ONLINE
> mfid14 ONLINE
> mfid6 ONLINE
> mfid7 ONLINE
> mfid8 ONLINE
> mfid9 ONLINE
> mfid10 ONLINE
> mfid11 ONLINE
> mfid12 ONLINE
> root@colossus:~ # zpool import -Ff storage
> cannot import 'storage': I/O error
> Destroy and re-create the pool from
> a backup source.
> root@colossus:~ # zdb -l /dev/mfid0
> --------------------------------------------
> LABEL 0
> --------------------------------------------
> version: 5000
> name: 'storage'
> state: 0
> txg: 1928241
> pool_guid: 10618504954404185222
> hostid: 4203774842
> hostname: 'colossus'
> top_guid: 12489400212295803034
> guid: 3998695725653225547
> vdev_children: 1
> vdev_tree:
> type: 'raidz'
> id: 0
> guid: 12489400212295803034
> nparity: 2
> metaslab_array: 34
> metaslab_shift: 38
> ashift: 9
> asize: 45000449064960
> is_log: 0
> create_txg: 4
> children[0]:
> type: 'disk'
> id: 0
> guid: 3998695725653225547
> path: '/dev/mfid0'
> phys_path: '/dev/mfid0'
> whole_disk: 1
> DTL: 168
> create_txg: 4
> children[1]:
> type: 'disk'
> id: 1
> guid: 10795471632546545577
> path: '/dev/mfid1'
> phys_path: '/dev/mfid1'
> whole_disk: 1
> DTL: 167
> create_txg: 4
> children[2]:
> type: 'disk'
> id: 2
> guid: 15820272272734706674
> path: '/dev/mfid2'
> phys_path: '/dev/mfid2'
> whole_disk: 1
> DTL: 166
> create_txg: 4
> children[3]:
> type: 'disk'
> id: 3
> guid: 3928579496187019848
> path: '/dev/mfid3'
> phys_path: '/dev/mfid3'
> whole_disk: 1
> DTL: 165
> create_txg: 4
> children[4]:
> type: 'disk'
> id: 4
> guid: 7125052278051590304
> path: '/dev/mfid4'
> phys_path: '/dev/mfid4'
> whole_disk: 1
> DTL: 164
> create_txg: 4
> children[5]:
> type: 'disk'
> id: 5
> guid: 14370198745088794709
> path: '/dev/mfid5'
> phys_path: '/dev/mfid5'
> whole_disk: 1
> DTL: 163
> create_txg: 4
> children[6]:
> type: 'disk'
> id: 6
> guid: 1843597351388951655
> path: '/dev/mfid6'
> phys_path: '/dev/mfid6'
> whole_disk: 1
> DTL: 162
> create_txg: 4
> children[7]:
> type: 'replacing'
> id: 7
> guid: 2914889727426054645
> whole_disk: 0
> create_txg: 4
> children[0]:
> type: 'disk'
> id: 0
> guid: 10956220251832269421
> path: '/dev/mfid15'
> phys_path: '/dev/mfid15'
> whole_disk: 1
> DTL: 179
> create_txg: 4
> children[1]:
> type: 'disk'
> id: 1
> guid: 2463756237300743131
> path: '/dev/mfid13'
> phys_path: '/dev/mfid13'
> whole_disk: 1
> DTL: 181
> create_txg: 4
> resilvering: 1
> children[8]:
> type: 'disk'
> id: 8
> guid: 8864096842672670007
> path: '/dev/mfid7'
> phys_path: '/dev/mfid7'
> whole_disk: 1
> DTL: 160
> create_txg: 4
> children[9]:
> type: 'disk'
> id: 9
> guid: 4650681673751655245
> path: '/dev/mfid8'
> phys_path: '/dev/mfid8'
> whole_disk: 1
> DTL: 159
> create_txg: 4
> children[10]:
> type: 'disk'
> id: 10
> guid: 8432109430432996813
> path: '/dev/mfid9'
> phys_path: '/dev/mfid9'
> whole_disk: 1
> DTL: 158
> create_txg: 4
> children[11]:
> type: 'disk'
> id: 11
> guid: 414941847968750824
> path: '/dev/mfid10'
> phys_path: '/dev/mfid10'
> whole_disk: 1
> DTL: 157
> create_txg: 4
> children[12]:
> type: 'disk'
> id: 12
> guid: 7335375930620195352
> path: '/dev/mfid11'
> phys_path: '/dev/mfid11'
> whole_disk: 1
> DTL: 156
> create_txg: 4
> children[13]:
> type: 'disk'
> id: 13
> guid: 5100737174610362
> path: '/dev/mfid12'
> phys_path: '/dev/mfid12'
> whole_disk: 1
> DTL: 155
> create_txg: 4
> children[14]:
> type: 'disk'
> id: 14
> guid: 15695558693726858796
> path: '/dev/mfid14'
> phys_path: '/dev/mfid14'
> whole_disk: 1
> DTL: 174
> create_txg: 4
> features_for_read:
> --------------------------------------------
> LABEL 1
> --------------------------------------------
> version: 5000
> name: 'storage'
> state: 0
> txg: 1928241
> pool_guid: 10618504954404185222
> hostid: 4203774842
> hostname: 'colossus'
> top_guid: 12489400212295803034
> guid: 3998695725653225547
> vdev_children: 1
> vdev_tree:
> type: 'raidz'
> id: 0
> guid: 12489400212295803034
> nparity: 2
> metaslab_array: 34
> metaslab_shift: 38
> ashift: 9
> asize: 45000449064960
> is_log: 0
> create_txg: 4
> children[0]:
> type: 'disk'
> id: 0
> guid: 3998695725653225547
> path: '/dev/mfid0'
> phys_path: '/dev/mfid0'
> whole_disk: 1
> DTL: 168
> create_txg: 4
> children[1]:
> type: 'disk'
> id: 1
> guid: 10795471632546545577
> path: '/dev/mfid1'
> phys_path: '/dev/mfid1'
> whole_disk: 1
> DTL: 167
> create_txg: 4
> children[2]:
> type: 'disk'
> id: 2
> guid: 15820272272734706674
> path: '/dev/mfid2'
> phys_path: '/dev/mfid2'
> whole_disk: 1
> DTL: 166
> create_txg: 4
> children[3]:
> type: 'disk'
> id: 3
> guid: 3928579496187019848
> path: '/dev/mfid3'
> phys_path: '/dev/mfid3'
> whole_disk: 1
> DTL: 165
> create_txg: 4
> children[4]:
> type: 'disk'
> id: 4
> guid: 7125052278051590304
> path: '/dev/mfid4'
> phys_path: '/dev/mfid4'
> whole_disk: 1
> DTL: 164
> create_txg: 4
> children[5]:
> type: 'disk'
> id: 5
> guid: 14370198745088794709
> path: '/dev/mfid5'
> phys_path: '/dev/mfid5'
> whole_disk: 1
> DTL: 163
> create_txg: 4
> children[6]:
> type: 'disk'
> id: 6
> guid: 1843597351388951655
> path: '/dev/mfid6'
> phys_path: '/dev/mfid6'
> whole_disk: 1
> DTL: 162
> create_txg: 4
> children[7]:
> type: 'replacing'
> id: 7
> guid: 2914889727426054645
> whole_disk: 0
> create_txg: 4
> children[0]:
> type: 'disk'
> id: 0
> guid: 10956220251832269421
> path: '/dev/mfid15'
> phys_path: '/dev/mfid15'
> whole_disk: 1
> DTL: 179
> create_txg: 4
> children[1]:
> type: 'disk'
> id: 1
> guid: 2463756237300743131
> path: '/dev/mfid13'
> phys_path: '/dev/mfid13'
> whole_disk: 1
> DTL: 181
> create_txg: 4
> resilvering: 1
> children[8]:
> type: 'disk'
> id: 8
> guid: 8864096842672670007
> path: '/dev/mfid7'
> phys_path: '/dev/mfid7'
> whole_disk: 1
> DTL: 160
> create_txg: 4
> children[9]:
> type: 'disk'
> id: 9
> guid: 4650681673751655245
> path: '/dev/mfid8'
> phys_path: '/dev/mfid8'
> whole_disk: 1
> DTL: 159
> create_txg: 4
> children[10]:
> type: 'disk'
> id: 10
> guid: 8432109430432996813
> path: '/dev/mfid9'
> phys_path: '/dev/mfid9'
> whole_disk: 1
> DTL: 158
> create_txg: 4
> children[11]:
> type: 'disk'
> id: 11
> guid: 414941847968750824
> path: '/dev/mfid10'
> phys_path: '/dev/mfid10'
> whole_disk: 1
> DTL: 157
> create_txg: 4
> children[12]:
> type: 'disk'
> id: 12
> guid: 7335375930620195352
> path: '/dev/mfid11'
> phys_path: '/dev/mfid11'
> whole_disk: 1
> DTL: 156
> create_txg: 4
> children[13]:
> type: 'disk'
> id: 13
> guid: 5100737174610362
> path: '/dev/mfid12'
> phys_path: '/dev/mfid12'
> whole_disk: 1
> DTL: 155
> create_txg: 4
> children[14]:
> type: 'disk'
> id: 14
> guid: 15695558693726858796
> path: '/dev/mfid14'
> phys_path: '/dev/mfid14'
> whole_disk: 1
> DTL: 174
> create_txg: 4
> features_for_read:
> --------------------------------------------
> LABEL 2
> --------------------------------------------
> version: 5000
> name: 'storage'
> state: 0
> txg: 1928241
> pool_guid: 10618504954404185222
> hostid: 4203774842
> hostname: 'colossus'
> top_guid: 12489400212295803034
> guid: 3998695725653225547
> vdev_children: 1
> vdev_tree:
> type: 'raidz'
> id: 0
> guid: 12489400212295803034
> nparity: 2
> metaslab_array: 34
> metaslab_shift: 38
> ashift: 9
> asize: 45000449064960
> is_log: 0
> create_txg: 4
> children[0]:
> type: 'disk'
> id: 0
> guid: 3998695725653225547
> path: '/dev/mfid0'
> phys_path: '/dev/mfid0'
> whole_disk: 1
> DTL: 168
> create_txg: 4
> children[1]:
> type: 'disk'
> id: 1
> guid: 10795471632546545577
> path: '/dev/mfid1'
> phys_path: '/dev/mfid1'
> whole_disk: 1
> DTL: 167
> create_txg: 4
> children[2]:
> type: 'disk'
> id: 2
> guid: 15820272272734706674
> path: '/dev/mfid2'
> phys_path: '/dev/mfid2'
> whole_disk: 1
> DTL: 166
> create_txg: 4
> children[3]:
> type: 'disk'
> id: 3
> guid: 3928579496187019848
> path: '/dev/mfid3'
> phys_path: '/dev/mfid3'
> whole_disk: 1
> DTL: 165
> create_txg: 4
> children[4]:
> type: 'disk'
> id: 4
> guid: 7125052278051590304
> path: '/dev/mfid4'
> phys_path: '/dev/mfid4'
> whole_disk: 1
> DTL: 164
> create_txg: 4
> children[5]:
> type: 'disk'
> id: 5
> guid: 14370198745088794709
> path: '/dev/mfid5'
> phys_path: '/dev/mfid5'
> whole_disk: 1
> DTL: 163
> create_txg: 4
> children[6]:
> type: 'disk'
> id: 6
> guid: 1843597351388951655
> path: '/dev/mfid6'
> phys_path: '/dev/mfid6'
> whole_disk: 1
> DTL: 162
> create_txg: 4
> children[7]:
> type: 'replacing'
> id: 7
> guid: 2914889727426054645
> whole_disk: 0
> create_txg: 4
> children[0]:
> type: 'disk'
> id: 0
> guid: 10956220251832269421
> path: '/dev/mfid15'
> phys_path: '/dev/mfid15'
> whole_disk: 1
> DTL: 179
> create_txg: 4
> children[1]:
> type: 'disk'
> id: 1
> guid: 2463756237300743131
> path: '/dev/mfid13'
> phys_path: '/dev/mfid13'
> whole_disk: 1
> DTL: 181
> create_txg: 4
> resilvering: 1
> children[8]:
> type: 'disk'
> id: 8
> guid: 8864096842672670007
> path: '/dev/mfid7'
> phys_path: '/dev/mfid7'
> whole_disk: 1
> DTL: 160
> create_txg: 4
> children[9]:
> type: 'disk'
> id: 9
> guid: 4650681673751655245
> path: '/dev/mfid8'
> phys_path: '/dev/mfid8'
> whole_disk: 1
> DTL: 159
> create_txg: 4
> children[10]:
> type: 'disk'
> id: 10
> guid: 8432109430432996813
> path: '/dev/mfid9'
> phys_path: '/dev/mfid9'
> whole_disk: 1
> DTL: 158
> create_txg: 4
> children[11]:
> type: 'disk'
> id: 11
> guid: 414941847968750824
> path: '/dev/mfid10'
> phys_path: '/dev/mfid10'
> whole_disk: 1
> DTL: 157
> create_txg: 4
> children[12]:
> type: 'disk'
> id: 12
> guid: 7335375930620195352
> path: '/dev/mfid11'
> phys_path: '/dev/mfid11'
> whole_disk: 1
> DTL: 156
> create_txg: 4
> children[13]:
> type: 'disk'
> id: 13
> guid: 5100737174610362
> path: '/dev/mfid12'
> phys_path: '/dev/mfid12'
> whole_disk: 1
> DTL: 155
> create_txg: 4
> children[14]:
> type: 'disk'
> id: 14
> guid: 15695558693726858796
> path: '/dev/mfid14'
> phys_path: '/dev/mfid14'
> whole_disk: 1
> DTL: 174
> create_txg: 4
> features_for_read:
> --------------------------------------------
> LABEL 3
> --------------------------------------------
> version: 5000
> name: 'storage'
> state: 0
> txg: 1928241
> pool_guid: 10618504954404185222
> hostid: 4203774842
> hostname: 'colossus'
> top_guid: 12489400212295803034
> guid: 3998695725653225547
> vdev_children: 1
> vdev_tree:
> type: 'raidz'
> id: 0
> guid: 12489400212295803034
> nparity: 2
> metaslab_array: 34
> metaslab_shift: 38
> ashift: 9
> asize: 45000449064960
> is_log: 0
> create_txg: 4
> children[0]:
> type: 'disk'
> id: 0
> guid: 3998695725653225547
> path: '/dev/mfid0'
> phys_path: '/dev/mfid0'
> whole_disk: 1
> DTL: 168
> create_txg: 4
> children[1]:
> type: 'disk'
> id: 1
> guid: 10795471632546545577
> path: '/dev/mfid1'
> phys_path: '/dev/mfid1'
> whole_disk: 1
> DTL: 167
> create_txg: 4
> children[2]:
> type: 'disk'
> id: 2
> guid: 15820272272734706674
> path: '/dev/mfid2'
> phys_path: '/dev/mfid2'
> whole_disk: 1
> DTL: 166
> create_txg: 4
> children[3]:
> type: 'disk'
> id: 3
> guid: 3928579496187019848
> path: '/dev/mfid3'
> phys_path: '/dev/mfid3'
> whole_disk: 1
> DTL: 165
> create_txg: 4
> children[4]:
> type: 'disk'
> id: 4
> guid: 7125052278051590304
> path: '/dev/mfid4'
> phys_path: '/dev/mfid4'
> whole_disk: 1
> DTL: 164
> create_txg: 4
> children[5]:
> type: 'disk'
> id: 5
> guid: 14370198745088794709
> path: '/dev/mfid5'
> phys_path: '/dev/mfid5'
> whole_disk: 1
> DTL: 163
> create_txg: 4
> children[6]:
> type: 'disk'
> id: 6
> guid: 1843597351388951655
> path: '/dev/mfid6'
> phys_path: '/dev/mfid6'
> whole_disk: 1
> DTL: 162
> create_txg: 4
> children[7]:
> type: 'replacing'
> id: 7
> guid: 2914889727426054645
> whole_disk: 0
> create_txg: 4
> children[0]:
> type: 'disk'
> id: 0
> guid: 10956220251832269421
> path: '/dev/mfid15'
> phys_path: '/dev/mfid15'
> whole_disk: 1
> DTL: 179
> create_txg: 4
> children[1]:
> type: 'disk'
> id: 1
> guid: 2463756237300743131
> path: '/dev/mfid13'
> phys_path: '/dev/mfid13'
> whole_disk: 1
> DTL: 181
> create_txg: 4
> resilvering: 1
> children[8]:
> type: 'disk'
> id: 8
> guid: 8864096842672670007
> path: '/dev/mfid7'
> phys_path: '/dev/mfid7'
> whole_disk: 1
> DTL: 160
> create_txg: 4
> children[9]:
> type: 'disk'
> id: 9
> guid: 4650681673751655245
> path: '/dev/mfid8'
> phys_path: '/dev/mfid8'
> whole_disk: 1
> DTL: 159
> create_txg: 4
> children[10]:
> type: 'disk'
> id: 10
> guid: 8432109430432996813
> path: '/dev/mfid9'
> phys_path: '/dev/mfid9'
> whole_disk: 1
> DTL: 158
> create_txg: 4
> children[11]:
> type: 'disk'
> id: 11
> guid: 414941847968750824
> path: '/dev/mfid10'
> phys_path: '/dev/mfid10'
> whole_disk: 1
> DTL: 157
> create_txg: 4
> children[12]:
> type: 'disk'
> id: 12
> guid: 7335375930620195352
> path: '/dev/mfid11'
> phys_path: '/dev/mfid11'
> whole_disk: 1
> DTL: 156
> create_txg: 4
> children[13]:
> type: 'disk'
> id: 13
> guid: 5100737174610362
> path: '/dev/mfid12'
> phys_path: '/dev/mfid12'
> whole_disk: 1
> DTL: 155
> create_txg: 4
> children[14]:
> type: 'disk'
> id: 14
> guid: 15695558693726858796
> path: '/dev/mfid14'
> phys_path: '/dev/mfid14'
> whole_disk: 1
> DTL: 174
> create_txg: 4
> features_for_read:
>
> LSI9260-16i raid controller all drives in single disk RAID0, system
> wasn't busy (=in use for writing, however was resilvering a replacement
> drive after the HSP had taken over due to failed drive) and it got
> rebooted... faulted on reboot.
>
> Recoverable?
>
> Regards,
>
>
FYI before the first reboot:

root@colossus:~ # zpool status -x
pool: storage
state: FAULTED
status: The pool metadata is corrupted and the pool cannot be opened.
action: Destroy and re-create the pool from
a backup source.
see: http://illumos.org/msg/ZFS-8000-72
scan: none requested
config:

NAME STATE READ WRITE CKSUM
storage FAULTED 0 0 1
raidz2-0 ONLINE 0 0 7
mfid0 ONLINE 0 0 1
mfid1 ONLINE 0 0 0
mfid2 ONLINE 0 0 0
mfid3 ONLINE 0 0 0
mfid4 ONLINE 0 0 0
mfid5 ONLINE 0 0 0
mfid6 ONLINE 0 0 0
replacing-7 ONLINE 0 0 0
mfid14 ONLINE 0 0 0
mfid15 ONLINE 0 0 0
mfid7 ONLINE 0 0 0
mfid8 ONLINE 0 0 0
mfid9 ONLINE 0 0 0
mfid10 ONLINE 0 0 0
mfid11 ONLINE 0 0 0
mfid12 ONLINE 0 0 1
mfid13 ONLINE 0 0 0
root@colossus:~ # zpool clear -nF storage
internal error: out of memory
root@colossus:~ #

"zpool import -Fn storage" reports no errors.

Really don't care if a file or three is corrupted, just want the pool
back as don't have a backup since Dec 3, 2014.

--
Michelle Sullivan
http://www.mhix.org/

_______________________________________________
freeb...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-...@freebsd.org"

Message has been deleted

Xin Li

unread,

Feb 5, 2015, 8:17:07 PM2/5/15

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 02/05/15 14:08, Michelle Sullivan wrote:
> Any clues on this?
>
> root@colossus:~ # zpool import pool: storage id:
> 10618504954404185222 state: FAULTED status: The pool metadata is
> corrupted. action: The pool cannot be imported due to damaged
> devices or data.

This is standard FAULTED message.

> The pool may be active on another system, but can be imported
> using the '-f' flag.

This suggests the pool was connected to a different system, is that
the case?

> see: http://illumos.org/msg/ZFS-8000-72 config:
>
> storage FAULTED corrupted data raidz2-0 ONLINE
> mfid0 ONLINE mfid15 ONLINE mfid1 ONLINE mfid2
> ONLINE mfid3 ONLINE mfid4 ONLINE mfid5 ONLINE
> replacing-7 ONLINE mfid13 ONLINE mfid14 ONLINE mfid6
> ONLINE mfid7 ONLINE mfid8 ONLINE mfid9 ONLINE
> mfid10 ONLINE mfid11 ONLINE mfid12 ONLINE
> root@colossus:~ # zpool import -Ff storage cannot import 'storage':
> I/O error Destroy and re-create the pool from a backup source.

uname -a?

> Recoverable?

It's hard to tell right now, and we shall try all possible remedies
but be prepared for the worst.

Cheers,
- --
Xin LI <del...@delphij.net> https://www.delphij.net/
FreeBSD - The Power to Serve! Live free or die
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.1.1 (FreeBSD)

iQIcBAEBCgAGBQJU1BYEAAoJEJW2GBstM+nsjJkP/2AqVGt/OOw2pmGtO3lkiB4c
w1lzxzVniXWrpbP/eStgSYlOtHzhF4/ObE1v+00N1qiTwUwnQLJvIbav6pBW78lW
sXILY8JHk5Wcp26PyyzWbC0STJzFzdoA9y8QiXgzSa2tb3ZVzKhVn410fCdqcC0h
5wVAs0+jsoo/+e+A/y4CE5CmjtbW2Ql6BJk2BB5iKOXvLVL+4Ejpw2mp6qBFQlkm
SCWdGlRNv0UMna0KNye7FGZ/SJ+ERKFM1THT3181WVhxzGIxfsISAIBcoYpO3wiL
g3P45ne8vHBLxe41vdDCXiqKMs0sGrvsN3p7Xucni17VvlzjF8IXJABLFlIvnCW0
d5PvOsFvp1eLfu2xKMH/LMRg3UmLMbcqRQFrz/5XJkVYxNVZawhSs7zTXgtU63vY
k2Xd6CE3RI8kyyYxuRtGtsW73bX2kb5L+QFdvH+bmmCQhozi7J1O7pWn/XYh7Q5S
2HnuniOxKNdCw1hTD0iAWcqdNpdxc8dJryW5MNOGL6kxiIpr1RXRnbp4Ta/qiniy
DMvzLm/I1e7W2kXBxzmHYhPtMbn1Hi2fJ6lz3AnV+dZcl3eSFgIYSlvRBiMlIeS9
mQgFf9ru5rp8vDsPy26ykKGgKGBK866WYDj7xtJ5o01geFgTsr3GmLGKkIysOoVj
9WlGIW5gQW6j92A43eFe
=DFA1
-----END PGP SIGNATURE-----

Michelle Sullivan

unread,

Feb 5, 2015, 8:36:54 PM2/5/15

to

Xin Li wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
> On 02/05/15 14:08, Michelle Sullivan wrote:
>
>> Any clues on this?
>>
>> root@colossus:~ # zpool import pool: storage id:
>> 10618504954404185222 state: FAULTED status: The pool metadata is
>> corrupted. action: The pool cannot be imported due to damaged
>> devices or data.
>>
>
> This is standard FAULTED message.
>
>
>> The pool may be active on another system, but can be imported
>> using the '-f' flag.
>>
>
> This suggests the pool was connected to a different system, is that
> the case?
>

No.

>> see: http://illumos.org/msg/ZFS-8000-72 config:
>>
>> storage FAULTED corrupted data raidz2-0 ONLINE
>> mfid0 ONLINE mfid15 ONLINE mfid1 ONLINE mfid2
>> ONLINE mfid3 ONLINE mfid4 ONLINE mfid5 ONLINE
>> replacing-7 ONLINE mfid13 ONLINE mfid14 ONLINE mfid6
>> ONLINE mfid7 ONLINE mfid8 ONLINE mfid9 ONLINE
>> mfid10 ONLINE mfid11 ONLINE mfid12 ONLINE
>> root@colossus:~ # zpool import -Ff storage cannot import 'storage':
>> I/O error Destroy and re-create the pool from a backup source.
>>
>
> uname -a?
>

FreeBSD colossus 9.2-RELEASE-p15 FreeBSD 9.2-RELEASE-p15 #0: Mon Nov 3
20:31:29 UTC 2014
ro...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64

>
>> Recoverable?
>>
>
> It's hard to tell right now, and we shall try all possible remedies
> but be prepared for the worst.
>

I am :(

--
Michelle Sullivan
http://www.mhix.org/

Xin Li

unread,

Feb 5, 2015, 8:43:53 PM2/5/15

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 02/05/15 17:36, Michelle Sullivan wrote:
>> This suggests the pool was connected to a different system, is
>> that the case?
>>
>
> No.

Ok, that's good. Actually if you have two heads that writes to the
same pool at the same time, it can easily enter an unrecoverable state.

>> It's hard to tell right now, and we shall try all possible
>> remedies but be prepared for the worst.
>
> I am :(

The next thing I would try is to:

1. move /boot/zfs/zpool.cache to somewhere else;

2. zpool import -f -n -F -X storage and see if the system would give
you a proposal.

Cheers,
- --
Xin LI <del...@delphij.net> https://www.delphij.net/
FreeBSD - The Power to Serve! Live free or die
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.1.1 (FreeBSD)

iQIcBAEBCgAGBQJU1BxLAAoJEJW2GBstM+nsOJ0P/3be8Z1WsGGOGNY+WZdr7FRp
Jl++Ef3VSpd1Qf1jFZuRIS/hLfbMh0bWjOxyKiF9ivu77QZ9qCXk+pmn0oTZ3e1r
7g80CRKk2rapTqkagFRuPfo6b9vDQz3qYazahhZrhRyTFA1l2V+Wka+yw9Hx18ds
MLaAps7Kpn67BRRV6Q+9+/oQdBzllSx8S77AkesPp5s3oHTQ8jntSSN9D9p/+jQu
Wo0/t4k7x3pYpA0BzBQdms/pj38vIPSvjtnHpFggwztNKKkEaIPy49kFOBIVhJTv
e8h3z5PoXre9r1cZ5ay3zTs23vc7GLGqphrRLguwsUvYa1cY1T4vQWY4dommpM/0
VHLUhp8oNtokqqzUSYMd8FTF+55rzSuBN+Y+UEFUHakZ9QXOnvwXfAJk6CwQdTHn
YCGNKGY24qpYeJkfEq3e2QQC+WNDd1pqLCBENpD1uCpmejctHO4mVaO3032Gxd5/
FCVGiBgV+SW7h0jUEr3pk7CnUigBwMGy9UT/QuDP9N2ID7tAbfbmrr0zJ8hkLmR8
0xFGyaMK2jJx9C+DDjzbCw4lrKfWGkvjHRR6MPJ5QUcKWiji8xh8TCSlNZOxCq43
Mt7aMjZbWJhlIH15F8wSCrKFOAWHRud35asHJqPFZhRFJvA5Ly8Yy5cVcb4hboZj
bkaZwfABTvGLO0SEFb1T
=xRdB
-----END PGP SIGNATURE-----

Michelle Sullivan

unread,

Feb 5, 2015, 9:20:43 PM2/5/15

to

Xin Li wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
> On 02/05/15 17:36, Michelle Sullivan wrote:
>
>>> This suggests the pool was connected to a different system, is
>>> that the case?
>>>
>>>
>> No.
>>
>
> Ok, that's good. Actually if you have two heads that writes to the
> same pool at the same time, it can easily enter an unrecoverable state.
>
>
>>> It's hard to tell right now, and we shall try all possible
>>> remedies but be prepared for the worst.
>>>
>> I am :(
>>
>
> The next thing I would try is to:
>
> 1. move /boot/zfs/zpool.cache to somewhere else;
>

There isn't one. However 'cat'ing the inode I can see there was one...

<83>^LR^@^L^@^D^A.^@^@^@<80>^LR^@<F4>^A^D^B..^@^@<89>^LR^@^X^@^H^Ozpool.cache.tmp^@<89>^LR^@<D0>^A^H^Kzpool.cache^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@

> 2. zpool import -f -n -F -X storage and see if the system would give
> you a proposal.
>

This crashes (without -n) the machine out of memory.... there's 32G of
RAM. /boot/loader.conf contains:

vfs.zfs.prefetch_disable=1
#vfs.zfs.arc_min="8G"
#vfs.zfs.arc_max="16G"
#vm.kmem_size_max="8"
#vm.kmem_size="6G"
vfs.zfs.txg.timeout="5"
kern.maxvnodes=250000
vfs.zfs.write_limit_override=1073741824
vboxdrv_load="YES"

Regards,

Michelle

PS: it's 16x3T Drives in RAIDZ2+HSP - 34T formatted.

> Cheers,
> - --
> Xin LI <del...@delphij.net> https://www.delphij.net/
> FreeBSD - The Power to Serve! Live free or die
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.1.1 (FreeBSD)
>
> iQIcBAEBCgAGBQJU1BxLAAoJEJW2GBstM+nsOJ0P/3be8Z1WsGGOGNY+WZdr7FRp
> Jl++Ef3VSpd1Qf1jFZuRIS/hLfbMh0bWjOxyKiF9ivu77QZ9qCXk+pmn0oTZ3e1r
> 7g80CRKk2rapTqkagFRuPfo6b9vDQz3qYazahhZrhRyTFA1l2V+Wka+yw9Hx18ds
> MLaAps7Kpn67BRRV6Q+9+/oQdBzllSx8S77AkesPp5s3oHTQ8jntSSN9D9p/+jQu
> Wo0/t4k7x3pYpA0BzBQdms/pj38vIPSvjtnHpFggwztNKKkEaIPy49kFOBIVhJTv
> e8h3z5PoXre9r1cZ5ay3zTs23vc7GLGqphrRLguwsUvYa1cY1T4vQWY4dommpM/0
> VHLUhp8oNtokqqzUSYMd8FTF+55rzSuBN+Y+UEFUHakZ9QXOnvwXfAJk6CwQdTHn
> YCGNKGY24qpYeJkfEq3e2QQC+WNDd1pqLCBENpD1uCpmejctHO4mVaO3032Gxd5/
> FCVGiBgV+SW7h0jUEr3pk7CnUigBwMGy9UT/QuDP9N2ID7tAbfbmrr0zJ8hkLmR8
> 0xFGyaMK2jJx9C+DDjzbCw4lrKfWGkvjHRR6MPJ5QUcKWiji8xh8TCSlNZOxCq43
> Mt7aMjZbWJhlIH15F8wSCrKFOAWHRud35asHJqPFZhRFJvA5Ly8Yy5cVcb4hboZj
> bkaZwfABTvGLO0SEFb1T
> =xRdB
> -----END PGP SIGNATURE-----
>

--
Michelle Sullivan
http://www.mhix.org/

Message has been deleted

Michelle Sullivan

unread,

Feb 6, 2015, 6:21:13 AM2/6/15

to

Xin Li wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
>
>

> On 2/5/15 18:20, Michelle Sullivan wrote:

>
>> Xin Li wrote: On 02/05/15 17:36, Michelle Sullivan wrote:
>>
>>
>>>>>> This suggests the pool was connected to a different system,
>>>>>> is that the case?
>>>>>>
>>>>>>
>>>>>>
>>>>> No.
>>>>>
>>>>>
>> Ok, that's good. Actually if you have two heads that writes to
>> the same pool at the same time, it can easily enter an
>> unrecoverable state.
>>
>>
>>
>>>>>> It's hard to tell right now, and we shall try all possible
>>>>>> remedies but be prepared for the worst.
>>>>>>
>>>>>>
>>>>> I am :(
>>>>>
>>>>>
>> The next thing I would try is to:
>>
>> 1. move /boot/zfs/zpool.cache to somewhere else;
>>
>>
>>
>>> There isn't one. However 'cat'ing the inode I can see there was
>>> one...
>>>
>>> <83>^LR^@^L^@^D^A.^@^@^@<80>^LR^@<F4>^A^D^B..^@^@<89>^LR^@^X^@^H^Ozpool.cache.tmp^@<89>^LR^@<D0>^A^H^Kzpool.cache^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
>>>
>>>
>>>
> ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
>
>> 2. zpool import -f -n -F -X storage and see if the system would
>> give you a proposal.
>>
>>
>>
>>> This crashes (without -n) the machine out of memory.... there's
>>> 32G of RAM. /boot/loader.conf contains:
>>>
>>> vfs.zfs.prefetch_disable=1 #vfs.zfs.arc_min="8G"
>>> #vfs.zfs.arc_max="16G" #vm.kmem_size_max="8" #vm.kmem_size="6G"
>>> vfs.zfs.txg.timeout="5" kern.maxvnodes=250000
>>> vfs.zfs.write_limit_override=1073741824 vboxdrv_load="YES"
>>>
>

> Which release this is? write_limit_override have been removed quite a
> while ago.

>

FreeBSD colossus 9.2-RELEASE-p15 FreeBSD 9.2-RELEASE-p15 #0: Mon Nov 3
20:31:29 UTC 2014
ro...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64

> I'd recommend using a fresh -CURRENT snapshot if possible (possibly
> with -NODEBUG kernel).
>

I'm sorta afraid to try and upgrade it at this point.

Michelle

Michelle Sullivan

unread,

Feb 6, 2015, 6:28:28 AM2/6/15

to

Stefan Esser wrote:

> Am 06.02.2015 um 03:20 schrieb Michelle Sullivan:
>
>> 2. zpool import -f -n -F -X storage and see if the system would
>> give you a proposal.
>>
>>
>>> This crashes (without -n) the machine out of memory.... there's
>>> 32G of RAM. /boot/loader.conf contains:
>>>
>>> vfs.zfs.prefetch_disable=1 #vfs.zfs.arc_min="8G"
>>> #vfs.zfs.arc_max="16G" #vm.kmem_size_max="8" #vm.kmem_size="6G"
>>> vfs.zfs.txg.timeout="5" kern.maxvnodes=250000
>>> vfs.zfs.write_limit_override=1073741824 vboxdrv_load="YES"
>>>
>

> I've recovered two "lost" ZFS pools (1 on my system, the other on
> someone elses) by identifying a TXG for state that at least allowed
> copying to a fresh pool.
>
> The main tool was zdb, which contains user land implementations of
> the kernel code, that leads to panic on import. You can use zdb to
> test what is left from your pool (and then try to find a way to get
> most of it rescued), if you add commands that make errors non-fatal
> and that skip some consistency checks, e.g.:
>
> # zdb -AAA -L -u %POOL%
>
> You may need to add -e and possibly also -p %PATH_TO_DEVS% before
> the pool lname.
>
>
root@colossus:~ # zdb -AAA -L -e storage

Configuration for import:
vdev_children: 1
version: 5000
pool_guid: 10618504954404185222
name: 'storage'
state: 0
hostid: 4203774842
hostname: 'colossus'
vdev_tree:
type: 'root'
id: 0
guid: 10618504954404185222
children[0]:

type: 'raidz'
id: 0
guid: 12489400212295803034
nparity: 2
metaslab_array: 34
metaslab_shift: 38
ashift: 9
asize: 45000449064960
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 3998695725653225547

phys_path: '/dev/mfid0'
whole_disk: 1
DTL: 168
create_txg: 4

path: '/dev/mfid15'

children[1]:
type: 'disk'
id: 1
guid: 10795471632546545577

phys_path: '/dev/mfid1'
whole_disk: 1
DTL: 167
create_txg: 4

path: '/dev/mfid13'

children[2]:
type: 'disk'
id: 2
guid: 15820272272734706674

phys_path: '/dev/mfid2'
whole_disk: 1
DTL: 166
create_txg: 4

path: '/dev/mfid0'

children[3]:
type: 'disk'
id: 3
guid: 3928579496187019848

phys_path: '/dev/mfid3'
whole_disk: 1
DTL: 165
create_txg: 4

path: '/dev/mfid1'

children[4]:
type: 'disk'
id: 4
guid: 7125052278051590304

phys_path: '/dev/mfid4'
whole_disk: 1
DTL: 164
create_txg: 4

path: '/dev/mfid2'

children[5]:
type: 'disk'
id: 5
guid: 14370198745088794709

phys_path: '/dev/mfid5'
whole_disk: 1
DTL: 163
create_txg: 4

path: '/dev/mfid3'

children[6]:
type: 'disk'
id: 6
guid: 1843597351388951655

phys_path: '/dev/mfid6'
whole_disk: 1
DTL: 162
create_txg: 4

path: '/dev/mfid4'

children[7]:
type: 'replacing'
id: 7
guid: 2914889727426054645
whole_disk: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 10956220251832269421

phys_path: '/dev/mfid15'
whole_disk: 1
DTL: 179
create_txg: 4

path: '/dev/mfid11'

children[1]:
type: 'disk'
id: 1
guid: 2463756237300743131

phys_path: '/dev/mfid13'
whole_disk: 1
DTL: 181
create_txg: 4
resilvering: 1

path: '/dev/mfid12'

children[8]:
type: 'disk'
id: 8
guid: 8864096842672670007

phys_path: '/dev/mfid7'
whole_disk: 1
DTL: 160
create_txg: 4

path: '/dev/mfid5'

children[9]:
type: 'disk'
id: 9
guid: 4650681673751655245

phys_path: '/dev/mfid8'
whole_disk: 1
DTL: 159
create_txg: 4

path: '/dev/mfid14'

children[10]:
type: 'disk'
id: 10
guid: 8432109430432996813

phys_path: '/dev/mfid9'
whole_disk: 1
DTL: 158
create_txg: 4

path: '/dev/mfid6'

children[11]:
type: 'disk'
id: 11
guid: 414941847968750824

phys_path: '/dev/mfid10'
whole_disk: 1
DTL: 157
create_txg: 4

path: '/dev/mfid7'

children[12]:
type: 'disk'
id: 12
guid: 7335375930620195352

phys_path: '/dev/mfid11'
whole_disk: 1
DTL: 156
create_txg: 4

path: '/dev/mfid8'

children[13]:
type: 'disk'
id: 13
guid: 5100737174610362

phys_path: '/dev/mfid12'
whole_disk: 1
DTL: 155
create_txg: 4

path: '/dev/mfid9'

children[14]:
type: 'disk'
id: 14
guid: 15695558693726858796

phys_path: '/dev/mfid14'
whole_disk: 1
DTL: 174
create_txg: 4

path: '/dev/mfid10'
Segmentation fault (core dumped)
root@colossus:~ # zdb -AAA -L -u -e storage
Segmentation fault (core dumped)

> Other commands to try instead of -u are e.g. -d and -h.
>
root@colossus:~ # zdb -AAA -L -d -e storage
Segmentation fault (core dumped)
root@colossus:~ # zdb -AAA -L -h -e storage
Segmentation fault (core dumped)

> If you can get a history list, then you may want to add -T %TXG%
> for some txg number in the past, to see whether you get better
> results.
>
> You may want to set "vfs.zfs.debug=1" in loader.conf to prevent the
> kernel from panicing during import, BTW. But be careful, this can
> lead to undetected inconsistencies and is only a last resort for a
> read-only mounted pool that is to be copied out (once you are able
> to import it).
>
> Good luck, STefan

Robert David

unread,

Feb 6, 2015, 7:24:38 AM2/6/15

to

I suggest booting to 10.1 livecd.

Than check the partitions if they were created prior zfs:

$ gpart show mfid0

And than try to import pool as suggested.

Robert.

Michael Ware

unread,

Feb 6, 2015, 11:17:54 AM2/6/15

to

You can mount using a live cd if you haven't tried yet. Maybe try the 10.1
iso and see if you have any luck.
Mike

Will Andrews

unread,

Feb 8, 2015, 3:20:36 PM2/8/15

to

On Fri, Feb 06, 2015 at 12:28:18PM +0100, Michelle Sullivan wrote:
> root@colossus:~ # zdb -AAA -L -u -e storage
> Segmentation fault (core dumped)

[...]

Getting stack traces from these zdb crash dumps can give a strong clue
as to where the problem is. zdb turns on ASSERTs that are turned off by
default in non-ZFS-debug kernels.

--
wca

Michelle Sullivan

unread,

Feb 9, 2015, 8:19:37 AM2/9/15

to

Stefan Esser wrote:
>
> The point were zdb seg faults hints at the data structure that is
> corrupt. You may get some output before the seg fault, if you add
> a number of -v options (they add up to higher verbosity).
>
> Else, you may be able to look at the core and identify the function
> that fails. You'll most probably need zdb and libzfs compiled with
> "-g" to get any useful information from the core, though.
>
> For my failed pool, I noticed that internal assumptions were
> violated, due to some free space occuring in more than one entry.
> I had to special case the test in some function to ignore this
> situation (I knew that I'd only ever wanted to mount that pool
> R/O to rescue my data). But skipping the test did not suffice,
> since another assert triggered (after skipping the NULL dereference,
> the calculated sum of free space did not match the recorded sum, I
> had to disable that assert, too). With these two patches I was able
> to recover the pool starting at a TXG less than 100 transactions back,
> which was sufficient for my purpose ...
>

Question is will zdb 'fix' things or is it just a debug utility (for
displaying)?

If it is just a debug and won't fix anything, I'm quite happy to roll
back transactions, question is how (presumably after one finds the
corrupt point - I'm quite happy to just do it by hand until I get
success - it will save 2+months of work - I did get an output with a
date/time that indicates where the rollback would go to...)

In the mean time this appears to be working without crashing - it's been
running days now...

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
COMMAND
4332 root 209 22 0 23770M 23277M uwait 1 549:07 11.04%
zdb -AAA -L -uhdi -FX -e storage

Michelle

Stefan Esser

unread,

Feb 9, 2015, 10:14:48 AM2/9/15

to

Am 09.02.2015 um 14:19 schrieb Michelle Sullivan:
> Stefan Esser wrote:
>>
>> The point were zdb seg faults hints at the data structure that is
>> corrupt. You may get some output before the seg fault, if you add
>> a number of -v options (they add up to higher verbosity).
>>
>> Else, you may be able to look at the core and identify the function
>> that fails. You'll most probably need zdb and libzfs compiled with
>> "-g" to get any useful information from the core, though.
>>
>> For my failed pool, I noticed that internal assumptions were
>> violated, due to some free space occuring in more than one entry.
>> I had to special case the test in some function to ignore this
>> situation (I knew that I'd only ever wanted to mount that pool
>> R/O to rescue my data). But skipping the test did not suffice,
>> since another assert triggered (after skipping the NULL dereference,
>> the calculated sum of free space did not match the recorded sum, I
>> had to disable that assert, too). With these two patches I was able
>> to recover the pool starting at a TXG less than 100 transactions back,
>> which was sufficient for my purpose ...
>>
>
> Question is will zdb 'fix' things or is it just a debug utility (for
> displaying)?

The purpose of zdf is to access the pool without the need to import
it (which tends to crash the kernel) and to possibly identify a safe
TXG to go back to. Once you have found that zdb survives accesses to
critical data structures of your pool, you can then try to import the
pool to rescue your data.

> If it is just a debug and won't fix anything, I'm quite happy to roll
> back transactions, question is how (presumably after one finds the
> corrupt point - I'm quite happy to just do it by hand until I get
> success - it will save 2+months of work - I did get an output with a
> date/time that indicates where the rollback would go to...)
>
> In the mean time this appears to be working without crashing - it's been
> running days now...
>
> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
> COMMAND
> 4332 root 209 22 0 23770M 23277M uwait 1 549:07 11.04%
> zdb -AAA -L -uhdi -FX -e storage

Options -u and -h do not take much time, -i depends on how much was
in the intent log (and recovery should be possible without, if your
kernel is not too old with regard to supported ZFS features).

zdb -d takes a long time, and if it succeeds, you should be able to
recover your data. But zdb -m should also run to completion (and ISTR,
in my case that was where my kernel blew up trying to import the pool).

Using the debugger to analyze the failed instruction let me work around
the inconsistency with two small patches (one skipped a consistency
check, the second fixed up the sum of free space which was miscalculated
due to the free block that lead to the panic being omitted).

After I had these patches tested with zdb, I was able to import the pool
into a kernel that included these exact patches. You obviously do not
want to perform any other activities with the patched kernel, since it
lacks some internal checks - it is purely required for the one time
backup operation of the failed pool.

So, zdb and even the patches that make zdb dump your pool's internal
state will not directly lead to access to your data. But if you manage
to print all state with "zdb -dm", chances are very good, that you'll
be able to import the pool - possibly with temporary hacks to libzfs
that skip corrupt data elements (if not strictly required for read
accesses to your data).

After that succeeded, you have a good chance to copy off your data
using a kernel that has the exact same patches in the ZFS driver ...
(if any are required, as in my case).

Regards, STefan

Michelle Sullivan

unread,

Feb 9, 2015, 10:39:26 AM2/9/15

to

Yeah -m crashes.

The kernel will not crash it just refuses to import.

zdb crashes with some options (or without others) ... question is how
to get the import to succeed with a rollback of transactions to clean up
the transaction DB - realistically there should be a maximum of 10
transactions that caused the issue as that's the length of the queue
(kernel param) and data was lost up to that 10.... What I need to do is
get it to recover to how it was before the metadb was corrupted - and if
I am right it will be an incomplete write to the metadb that is blocking
it (so the last transaction - or maybe last upto 10) are corrupted...
Is it possible to discard 10 transactions and import - if so how? Is it
possible to rebuild the metadb? The only accesses at the time to the
array/fs was a network mount (NFS) where the connected client was
torrenting (with less than 10k/s as I think there was only one active
torrent at the time)... so if the torrented file is corrupt I can just
discard it and start again, and if it's copy-on-write (as I thought zfs
was) resetting it to the 'before write' version will just discard a
piece of the file which will be replaced/re-downloaded as soon as it is
restarted...

I have not yet tried a 10.1 kernel - wrote the disk but it seems I can't
use an apple superdrive as a boot rom on a PC...

Michelle

--
Michelle Sullivan
http://www.mhix.org/

Mike Tancsa

unread,

Feb 9, 2015, 10:45:00 AM2/9/15

to

On 2/9/2015 10:39 AM, Michelle Sullivan wrote:
>
> I have not yet tried a 10.1 kernel - wrote the disk but it seems I can't
> use an apple superdrive as a boot rom on a PC...

USB stick ? I find it a lot easier that CDs/DVDs

---Mike

--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/

Michelle Sullivan

unread,

Feb 9, 2015, 10:58:44 AM2/9/15

to

Mike Tancsa wrote:
> On 2/9/2015 10:39 AM, Michelle Sullivan wrote:
>>
>> I have not yet tried a 10.1 kernel - wrote the disk but it seems I can't
>> use an apple superdrive as a boot rom on a PC...
>
> USB stick ? I find it a lot easier that CDs/DVDs
>
> ---Mike
>
>

Will have to see if I have one large enough... most of mine are 128k ;-)

(yes I use them a lot! ;-) )

--
Michelle Sullivan
http://www.mhix.org/

Michelle Sullivan

unread,

Feb 9, 2015, 11:34:27 AM2/9/15

to

Michelle Sullivan wrote:
> Mike Tancsa wrote:
>
>> On 2/9/2015 10:39 AM, Michelle Sullivan wrote:
>>
>>> I have not yet tried a 10.1 kernel - wrote the disk but it seems I can't
>>> use an apple superdrive as a boot rom on a PC...
>>>
>> USB stick ? I find it a lot easier that CDs/DVDs
>>
>> ---Mike
>>
>>
>>
> Will have to see if I have one large enough... most of mine are 128k ;-)
>
> (yes I use them a lot! ;-) )
>
>
>

Oh I forgot to mention when I use zfs import -fFX the crashes of the
kernel as the same everytime - out of memory (ie panics in an
alloc/trying ot alloc) 32G of RAM on the box.

Found up a 32G stick, downloading the latest 11.0 snapshot currently...
(12/2014 - on *.de servers)

Michelle

Message has been deleted