Broke my zfs pool :(

1,743 views
Skip to first unread message

piku

unread,
Jan 19, 2013, 7:08:32 PM1/19/13
to zfs-...@googlegroups.com
I was hoping for any kind of help.  Hours of googling is failing me.

I had a linux zfs-fuse system on fedora.  I created a zfs pool called tank using two partitions which span the entire disk of 2 1.5TB drives, so a mirror.   So /dev/sda1 and /dev/sdb1 were the devices that made up the pool.  So far so good.  Now I enable compression and use this setup successfully for several years.

... fast forward

I get a SMART alert that one of the drives is failing.  I detach this drive from the pool.  The pool is stil functional.  I reboot to run the seagate seatools to get the rma code.  I then reboot.  This is where I made a huge error.  I don't know what state the pool was in, if it was still detached or what but I ran shred on the failing disk (/dev/sda) which overwrite random bytes on everything and then zeroed the disk.  Then later on when this was done tank was no longer accessible.  

When I did a zpool status it said IIRC "unhealthy" and said it was corrupt.  There was only one member of the pool and it was the failing drive that I detached previously.  I struggled with it for a while and couldn't get it to use the other good drive in any way so I did a zpool destroy on it.  Now with one zeroed drive, /dev/sda and one copy of the mirror on /dev/sdb1 no permuations of zpool import will import the drive.  If I hexdump /dev/sdb I do find zfs related information listed about the drive, the name, etc.  

What do I do?  I can't believe I have a mirror that is uhmm.. Not a mirror :(   Even if I zeroed the wrong drive, the failing drive was still appearing to function fine.  No matter what I should have a copy of my data, I seem to, but I cannot access it.  I'm quite sure this is a normal mirror configuration.  Is there any way to tell zpool to start scanning a device for vdev headers?  Can zdb help me?  I'm ok with even just basic file recovery at this point.

Thanks in advance for any help,
Mark

Emmanuel Anne

unread,
Jan 19, 2013, 7:21:35 PM1/19/13
to zfs-...@googlegroups.com
Hum, not sure I got everything. So :
/dev/sda is the drive you zeroed so this one is now useless.
Then you destroyed the pool, but /dev/sdb is still available with 1 side of the mirror on it.
Well in this case zpool import -D should do the trick, just make sure that /dev/sda is NOT connected, otherwise you'll get errors after starting to use the pool.

Detaching/re-attaching parts of a mirror works really well on zfs (and zfs-fuse), I did it for long time to maintain a backup because when you re-attach the 2nd part only what has changed is sent to it.
So it's 100% reliable usually !


2013/1/20 piku <evapo...@gmail.com>

--
To post to this group, send email to zfs-...@googlegroups.com
To visit our Web site, click on http://zfs-fuse.net/



--
my zfs-fuse git repository : http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary

piku

unread,
Jan 19, 2013, 8:07:20 PM1/19/13
to zfs-...@googlegroups.com
Well /dev/sda IS still connected but zpool import -D is NOT finding the pool.  I was wondering if it's because I used a partition (/dev/sdb1) rather than the whole disk (/dev/sdb).

Thanks,
Mark

Emmanuel Anne

unread,
Jan 20, 2013, 5:09:39 AM1/20/13
to zfs-...@googlegroups.com
I must say I never used import -D so I don't know how it works !
With both drives connected, either zpool import or zpool import -D should work, if not it probably means too much has been overwritten already.
You can also try zdb -l /dev/sda1 and zdb -l /dev/sdb1 to see if something from zfs can still be found on 1 of the drive at least.


2013/1/20 piku <evapo...@gmail.com>

Björn Kahl

unread,
Jan 20, 2013, 6:01:55 AM1/20/13
to zfs-...@googlegroups.com
Am 20.01.13 02:07, schrieb piku:
> Well /dev/sda IS still connected but zpool import -D is NOT finding the
> pool. I was wondering if it's because I used a partition (/dev/sdb1)
> rather than the whole disk (/dev/sdb).

Any chance you accidentally shredded the wrong disk, for example
because after reboot disk names have been shuffled around for whatever
reason?

If zdb -lu shows some usable ueberblocks, you may try to import an
older version of the pool, however for that to work "zpool import"
needs to recognize the disk first. (And I am not sure zfs-fuse has
the needed import options. I think last time I had to reset a pool
by some transaction groups I did it with ZoL in a VM.)


Have you verified that the disk is actually recognized by the OS, i.e.
cat /proc/partitions lists the partition(s) with the right size?

And just to be on the save side: There are no traces of the pool in
"zpool status" anymore, right? "zpool import" won't import a pool it
thinks is already half-imported.

(more comments inlined below)


> On Saturday, January 19, 2013 7:21:35 PM UTC-5, Emmanuel Anne wrote:
>>
>> Hum, not sure I got everything. So :
>> /dev/sda is the drive you zeroed so this one is now useless.
>> Then you destroyed the pool, but /dev/sdb is still available with 1 side
>> of the mirror on it.
>> Well in this case zpool import -D should do the trick, just make sure that
>> /dev/sda is NOT connected, otherwise you'll get errors after starting to
>> use the pool.
>>
>> Detaching/re-attaching parts of a mirror works really well on zfs (and
>> zfs-fuse), I did it for long time to maintain a backup because when you
>> re-attach the 2nd part only what has changed is sent to it.
>> So it's 100% reliable usually !
>>
>>
>> 2013/1/20 piku <evapo...@gmail.com <javascript:>>
>>
>>> I was hoping for any kind of help. Hours of googling is failing me.
>>>
>>> I had a linux zfs-fuse system on fedora. I created a zfs pool called
>>> tank using two partitions which span the entire disk of 2 1.5TB drives, so
>>> a mirror. So /dev/sda1 and /dev/sdb1 were the devices that made up the
>>> pool. So far so good. Now I enable compression and use this setup
>>> successfully for several years.
>>>
>>> ... fast forward
>>>
>>> I get a SMART alert that one of the drives is failing. I detach this
>>> drive from the pool. The pool is stil functional. I reboot to run the
>>> seagate seatools to get the rma code. I then reboot. This is where I made
>>> a huge error. I don't know what state the pool was in, if it was still
>>> detached or what but I ran shred on the failing disk (/dev/sda) which
>>> overwrite random bytes on everything and then zeroed the disk. Then later
>>> on when this was done tank was no longer accessible.
>>>
>>> When I did a zpool status it said IIRC "unhealthy" and said it was
>>> corrupt. There was only one member of the pool and it was the failing
>>> drive that I detached previously.

To me that sounds like "shredded the wrong drive" and/or "drive
renamed on boot, but zpool.cache around with old information".

There is a reason why one should never use the /dev/sdX names. They
are way to instable these days and keep changing when drives are added
or removed or just slow on startup.


>>> I struggled with it for a while and
>>> couldn't get it to use the other good drive in any way so I did a zpool
>>> destroy on it.

I would never had done that. "zpool export -f"? Maybe. Locating and
deleting the zpool.cache? Probably yes. But zpool destroy on a pool I
want to rescue. Never.


>>> Now with one zeroed drive, /dev/sda and one copy of the
>>> mirror on /dev/sdb1 no permuations of zpool import will import the drive.
>>> If I hexdump /dev/sdb I do find zfs related information listed about the
>>> drive, the name, etc.
>>>
>>> What do I do? I can't believe I have a mirror that is uhmm.. Not a
>>> mirror :( Even if I zeroed the wrong drive, the failing drive was still
>>> appearing to function fine. No matter what I should have a copy of my
>>> data, I seem to, but I cannot access it. I'm quite sure this is a normal
>>> mirror configuration. Is there any way to tell zpool to start scanning a
>>> device for vdev headers? Can zdb help me? I'm ok with even just basic
>>> file recovery at this point.

To the best of my knowledge, ZFS neither has a zfs.fsck (sure on this
one) nor any means to manually cat files as a rescue measure (I may be
wrong on this).

In theory, one can use zdb to walk the tree of blocks and extract data
blocks from files. But it is a real pain, requires additional tools
to decode some of the meta data blocks not "pretty-printed enough"
from zdb and of course requires intimate knowledge of the on-disk
structure. I'd estimate a progress rate of one file per hour if
manually walking the block tree, decoding meta data blocks on paper and
extracting individual data blocks.


Best

Björn

--
| Bjoern Kahl +++ Siegburg +++ Germany |
| "googlelogin@-my-domain-" +++ www.bjoern-kahl.de |
| Languages: German, English, Ancient Latin (a bit :-)) |

signature.asc

Mark Duckworth

unread,
Jan 20, 2013, 9:55:27 AM1/20/13
to zfs-...@googlegroups.com
Well it's a mirror so I can't shred the wrong disk. As long as I shredded only one of them, I should be fine.

Incidentally another unrelated disk in the system just threw up SMART failure. Another seagate too. Pssh. At least this one has about 7-8 years of life and isn't still in warranty like this one.

zdb -l shows stuff, so I am going to try to find a freebsd or solaris live cd and see if I can import it into that. All I really need to do is get it working to some level and transfer only a small subset of files off of it. Most of the disk contained rsnapshot backups and I can safely lose those if necessary.

Thanks,
Mark

Mark Duckworth

unread,
Jan 20, 2013, 9:57:26 AM1/20/13
to zfs-...@googlegroups.com
Some hope!

[root@nas by-id]# zdb -l /dev/sdb1
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
failed to unpack label 2
--------------------------------------------
LABEL 3
--------------------------------------------
failed to unpack label 3
[root@nas by-id]# zdb -l /dev/sdb
--------------------------------------------
LABEL 0
--------------------------------------------
version: 23
name: 'tank'
state: 0
txg: 0
pool_guid: 9122112223662145768
hostid: 8323328
hostname: 'nas.atari-source.org'
top_guid: 1985564633562652710
guid: 104226394787364593
vdev_children: 1
vdev_tree:
type: 'mirror'
id: 0
guid: 1985564633562652710
metaslab_array: 23
metaslab_shift: 32
ashift: 9
asize: 1500297035776
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 17269997286221292893
path: '/dev/disk/by-id/ata-ST31500341AS_9VS0SHKA'
whole_disk: 0
DTL: 161
create_txg: 4
children[1]:
type: 'disk'
id: 1
guid: 104226394787364593
path: '/dev/disk/by-id/wwn-0x5000c50010b1613d'
whole_disk: 0
DTL: 160
create_txg: 0
--------------------------------------------
LABEL 1
--------------------------------------------
version: 23
name: 'tank'
state: 0
txg: 0
pool_guid: 9122112223662145768
hostid: 8323328
hostname: 'nas.atari-source.org'
top_guid: 1985564633562652710
guid: 104226394787364593
vdev_children: 1
vdev_tree:
type: 'mirror'
id: 0
guid: 1985564633562652710
metaslab_array: 23
metaslab_shift: 32
ashift: 9
asize: 1500297035776
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 17269997286221292893
path: '/dev/disk/by-id/ata-ST31500341AS_9VS0SHKA'
whole_disk: 0
DTL: 161
create_txg: 4
children[1]:
type: 'disk'
id: 1
guid: 104226394787364593
path: '/dev/disk/by-id/wwn-0x5000c50010b1613d'
whole_disk: 0
DTL: 160
create_txg: 0
--------------------------------------------
LABEL 2
--------------------------------------------
version: 23
name: 'tank'
state: 0
txg: 0
pool_guid: 9122112223662145768
hostid: 8323328
hostname: 'nas.atari-source.org'
top_guid: 1985564633562652710
guid: 104226394787364593
vdev_children: 1
vdev_tree:
type: 'mirror'
id: 0
guid: 1985564633562652710
metaslab_array: 23
metaslab_shift: 32
ashift: 9
asize: 1500297035776
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 17269997286221292893
path: '/dev/disk/by-id/ata-ST31500341AS_9VS0SHKA'
whole_disk: 0
DTL: 161
create_txg: 4
children[1]:
type: 'disk'
id: 1
guid: 104226394787364593
path: '/dev/disk/by-id/wwn-0x5000c50010b1613d'
whole_disk: 0
DTL: 160
create_txg: 0
--------------------------------------------
LABEL 3
--------------------------------------------
version: 23
name: 'tank'
state: 0
txg: 0
pool_guid: 9122112223662145768
hostid: 8323328
hostname: 'nas.atari-source.org'
top_guid: 1985564633562652710
guid: 104226394787364593
vdev_children: 1
vdev_tree:
type: 'mirror'
id: 0
guid: 1985564633562652710
metaslab_array: 23
metaslab_shift: 32
ashift: 9
asize: 1500297035776
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 17269997286221292893
path: '/dev/disk/by-id/ata-ST31500341AS_9VS0SHKA'
whole_disk: 0
DTL: 161
create_txg: 4
children[1]:
type: 'disk'
id: 1
guid: 104226394787364593
path: '/dev/disk/by-id/wwn-0x5000c50010b1613d'
whole_disk: 0
DTL: 160
create_txg: 0
[root@nas by-id]#

So it appears I made everything on sdb rather than sdb1. I'm not sure why the disk even has a partition table then. I am going to remove the offending disk and reboot the system and see if I can import it. Also going to hunt down a freebsd or solaris live cd and see if zimport recognizes it on those systems.

Thanks for all your help,
I'm so excited to see ANY kind of response from zfs tools.

Thanks,
Mark




On Jan 20, 2013, at 5:09 AM, Emmanuel Anne wrote:

> zdb -l /

Emmanuel Anne

unread,
Jan 20, 2013, 11:03:24 AM1/20/13
to zfs-...@googlegroups.com
With such an output from zdb it's sure you can import this pool with zfs-fuse anyway.


2013/1/20 Mark Duckworth <evapo...@gmail.com>
--
To post to this group, send email to zfs-...@googlegroups.com
To visit our Web site, click on http://zfs-fuse.net/

Mark Duckworth

unread,
Jan 20, 2013, 11:48:40 AM1/20/13
to zfs-...@googlegroups.com
Except I can't, it doesn't find it.  zdb output is the same on freebsd 9.1 and zpool won't import it either.

I just run zpool import -D or really any permutations I have found so far and no output, it just waits half a second and then returns to the prompt.

Thanks,
Mark

Christ Schlacta

unread,
Jan 20, 2013, 12:16:30 PM1/20/13
to zfs-...@googlegroups.com

Try zpool import -D /dev/disk/by-id/ -f poolname and see if that works.

Mark Duckworth

unread,
Jan 20, 2013, 12:41:23 PM1/20/13
to zfs-...@googlegroups.com
Well I see that you used -D instead of -d but I ran it as is just in case I was missing something

[root@nas ~]# zpool import -D /dev/disk/by-id/ -f tank
cannot import '/dev/disk/by-id/': no such pool available
[root@nas ~]# 

So I ran it the corrected way
[root@nas ~]# zpool import -D -d /dev/disk/by-id/ -f tank
cannot import 'tank': no such pool available
[root@nas ~]# 

Or:
[root@nas ~]# zpool import -D -d /dev/disk/by-id/
[root@nas ~]# 

And then:
[root@nas ~]# zpool import -d /dev -D
[root@nas ~]# 

Or: 
[root@nas ~]# cd /dev/disk/by-id/
[root@nas by-id]# zdb -lu wwn-0x5000c50010b1613d
Uberblock[0]
magic = 0000000000bab10c
version = 23
txg = 0
guid_sum = 10035156464523904348
timestamp = 1358256490 UTC = Tue Jan 15 08:28:10 2013
Uberblock[0]
magic = 0000000000bab10c
version = 23
txg = 0
guid_sum = 10035156464523904348
timestamp = 1358256490 UTC = Tue Jan 15 08:28:10 2013
Uberblock[0]
magic = 0000000000bab10c
version = 23
txg = 0
guid_sum = 10035156464523904348
timestamp = 1358256490 UTC = Tue Jan 15 08:28:10 2013
Uberblock[0]
magic = 0000000000bab10c
version = 23
txg = 0
guid_sum = 10035156464523904348
timestamp = 1358256490 UTC = Tue Jan 15 08:28:10 2013
[root@nas by-id]# 


I'm not sure what to do with it ;)  I downloaded the source and I am probably going to put a bunch of printf debugging statements in the import process to see what's going on.  If someone could advise me how to use gdb or something else to follow what is going on (what devs get scanned, etc) I would be most appreciative.

Thanks,
Mark

Björn Kahl

unread,
Jan 20, 2013, 12:44:11 PM1/20/13
to zfs-...@googlegroups.com
Am 20.01.13 18:16, schrieb Christ Schlacta:
> Try zpool import -D /dev/disk/by-id/ -f poolname and see if that works.

Isn't it "zpool import -d /dev/disk/by-id/ -D -f poolname" ?
I.e. capital -D to look for destroyed pools and small letter -d to
specify search path for device files?


> On Jan 20, 2013 8:48 AM, "Mark Duckworth" <evapo...@gmail.com> wrote:
>
>> Except I can't, it doesn't find it. zdb output is the same on freebsd 9.1
>> and zpool won't import it either.
>>
>> I just run zpool import -D or really any permutations I have found so far
>> and no output, it just waits half a second and then returns to the prompt.

??? Did you just run "zpool import" to list available pools, or did you
asked for you specific pool "zpool import poolname"? In the latter
case, if it just returns silently, then it did its job and imported
the pool.

Under Linux, with zfs-fuse or ZoL, you can monitor which disks it
tries. Run as root:

strace -e open zpool import


>> On Jan 20, 2013, at 11:03 AM, Emmanuel Anne wrote:
>>
>> With such an output from zdb it's sure you can import this pool with
>> zfs-fuse anyway.

Not really. You can still have a toasted MOS (or any other dataset),
corrupted space maps etc. pp.
signature.asc

piku

unread,
Jan 20, 2013, 12:51:42 PM1/20/13
to zfs-...@googlegroups.com
[root@nas by-id]# strace -e open zpool import -D

shows that it is definitely hitting the disk in question 
open("/dev/disk/by-id/wwn-0x5000c50010b1613d", O_RDONLY|O_LARGEFILE) = 7

but passing right by it for some reason.  Interestingly, this is why the whole thing failed to begin with.  After I shredded the other disk, ONLY that one was part of the mirror.  The other one mysteriously vanished and didnt' even appear to be part of the pool, even offline.  The pool could have still been accessible at that point, just degraded since crazy stuff happened on the other disk, but it wasn't.  

zpool import just returns, no errors or information of any sort shown unless I specify the pool name and it says it's not found.  zdb -l is the only thing that has given me anything at all so far.  Is there any more zdb check and repair I can do on the disk itself versus the pool since I can't seem to access it at the pool level?

Thanks,
Mark

piku

unread,
Jan 20, 2013, 12:55:38 PM1/20/13
to zfs-...@googlegroups.com
Well this is interesting, looking through the source it is supposed to say "no pools available to import" and return an error code but instead I am getting errorcode 1.  Definitely something interesting going on.

Thanks,
Mark

Björn Kahl

unread,
Jan 20, 2013, 1:10:29 PM1/20/13
to zfs-...@googlegroups.com
Am 20.01.13 18:51, schrieb piku:
> [root@nas by-id]# strace -e open zpool import -D
>
> shows that it is definitely hitting the disk in question
> open("/dev/disk/by-id/wwn-0x5000c50010b1613d", O_RDONLY|O_LARGEFILE) = 7
>
> but passing right by it for some reason.

Passing right by it isn't unusual. That strace just reports on the
open calls done by zpool import, and these are quick, including
reading a few kb of data to look for the vdev tree.

> Interestingly, this is why the
> whole thing failed to begin with. After I shredded the other disk, ONLY
> that one was part of the mirror. The other one mysteriously vanished and
> didnt' even appear to be part of the pool, even offline. The pool could
> have still been accessible at that point, just degraded since crazy stuff
> happened on the other disk, but it wasn't.
>
> zpool import just returns, no errors or information of any sort shown
> unless I specify the pool name and it says it's not found. zdb -l is the
> only thing that has given me anything at all so far. Is there any more zdb
> check and repair I can do on the disk itself versus the pool since I can't
> seem to access it at the pool level?


There are some more checks. As Emmanuel already suggested:

zdb -lu /dev/disk/by-id/wwn-0x5000c50010b1613d

That looks for the Ueberblock array. If it comes back empty handed,
you are out of luck.

If you find some Ueberblocks, try to get the MOS

zdb -e -p /dev/disk/by-id -AAA -dd tank 1

If it works, try some other dataset or try the pool history object

zdb -e -p /dev/disk/by-id -AAA -hh tank


If zdb -lu returns some results, by the other zdb calls complain that
the pool could not be found, then it's probably time to build a custom
debug version of zdb and trace it, to find out which check fails to
recognize the pool. (You wrote you looked already through the code,
so I assume you know how to compile and single-step zdb.)
signature.asc

Mark D

unread,
Jan 20, 2013, 3:34:01 PM1/20/13
to zfs-...@googlegroups.com
On 1/20/2013 1:10 PM, Bj�rn Kahl wrote


There are some more checks. As Emmanuel already suggested:

zdb -lu /dev/disk/by-id/wwn-0x5000c50010b1613d


This prints uberblock[0] after each of the 4 labels as done previously

> That looks for the Ueberblock array. If it comes back empty handed,
> you are out of luck.
>
> If you find some Ueberblocks, try to get the MOS
>
> zdb -e -p /dev/disk/by-id -AAA -dd tank 1

This fails
[root@nas tv]# zdb -e -p /dev/disk/by-id -AAA -dd tank 1
zdb: can't open 'tank': No such file or directory


> If it works, try some other dataset or try the pool history object
>
> zdb -e -p /dev/disk/by-id -AAA -hh tank
>
>
> If zdb -lu returns some results, by the other zdb calls complain that
> the pool could not be found, then it's probably time to build a custom
> debug version of zdb and trace it, to find out which check fails to
> recognize the pool. (You wrote you looked already through the code,
> so I assume you know how to compile and single-step zdb.)
Yep I wil probably do just that.

Thanks,
Mark

Björn Kahl

unread,
Jan 20, 2013, 4:00:40 PM1/20/13
to zfs-...@googlegroups.com
Am 20.01.13 21:34, schrieb Mark D:
> On 1/20/2013 1:10 PM, Björn Kahl wrote
>
>
> There are some more checks. As Emmanuel already suggested:
>
> zdb -lu /dev/disk/by-id/wwn-0x5000c50010b1613d
>
>
> This prints uberblock[0] after each of the 4 labels as done previously

Just the line "uberblock[0]", or does it actually print the array?
(sorry for asking again, but you aren't very precise in your posts.)

There is room for 128 Uberblocks in each label, if I remember
correctly. So it should look like this (example from an old testpool
of mine):
(Your vdev tree will obviously look different, since you have a mirror
and my example is a single-disk pool running in an partition)

> --------------------------------------------
> LABEL 0
> --------------------------------------------
> version: 6
> name: 'macpool1'
> state: 1
> txg: 338
> pool_guid: 3111872378547917630
> hostid: 1463809007
> hostname: 'example.host'
> top_guid: 11190581884910879566
> guid: 11190581884910879566
> vdev_children: 1
> vdev_tree:
> type: 'disk'
> id: 0
> guid: 11190581884910879566
> path: '/dev/disk/by-id/usb-WD_10EADS_External_57442D574341563533353239353234-0:0-part3'
> whole_disk: 0
> metaslab_array: 14
> metaslab_shift: 30
> ashift: 9
> asize: 150097625088
> is_log: 0
> Uberblock[1]
> magic = 0000000000bab10c
> version = 6
> txg = 257
> guid_sum = 14302454263458797196
> timestamp = 1264975544 UTC = Sun Jan 31 23:05:44 2010
>
> ... several Uberblocks, not necessarily 128 as some might be
corrupted, unreadable, unused, or else.
>
> Uberblock[127]
> magic = 0000000000bab10c
> version = 6
> txg = 255
> guid_sum = 14302454263458797196
> timestamp = 1264975494 UTC = Sun Jan 31 23:04:54 2010
> --------------------------------------------
> LABEL 1
> --------------------------------------------
> version: 6
>
> ... next label content, followed from another run of essentially the
same Uberblocks as before. Repeated for all labels.


>
>> That looks for the Ueberblock array. If it comes back empty handed,
>> you are out of luck.
>>
>> If you find some Ueberblocks, try to get the MOS
>>
>> zdb -e -p /dev/disk/by-id -AAA -dd tank 1
>
> This fails
> [root@nas tv]# zdb -e -p /dev/disk/by-id -AAA -dd tank 1
> zdb: can't open 'tank': No such file or directory
>
>
>> If it works, try some other dataset or try the pool history object
>>
>> zdb -e -p /dev/disk/by-id -AAA -hh tank
>>
>>
>> If zdb -lu returns some results, by the other zdb calls complain that
>> the pool could not be found, then it's probably time to build a custom
>> debug version of zdb and trace it, to find out which check fails to
>> recognize the pool. (You wrote you looked already through the code,
>> so I assume you know how to compile and single-step zdb.)
> Yep I wil probably do just that.

If you find out what is wrong, I'd like to know. Just in case I ever
run in similar troubles. Good luck!


Best

Björn
signature.asc

Mark D

unread,
Jan 20, 2013, 5:59:57 PM1/20/13
to zfs-...@googlegroups.com
It's dying in zpool_read_label(). It reads 3 labels and then on the 4th
label it stops at this code

if (pread64(fd, label, sizeof (vdev_label_t),
label_offset(size, l)) != sizeof (vdev_label_t))
continue;


I ran a few things:
(gdb) print l
$1 = <optimized out>
(gdb) print label
$2 = (vdev_label_t *) 0xb7da6008
(gdb) print sizeof(vdev_label_t)
$3 = 262144
(gdb)

But we are fast reaching the end of my skills. Do you guys know what
it's doing here and why it would fail this step on the 4th label?

Thanks,
Mark



On 1/20/2013 4:00 PM, Bj�rn Kahl wrote:
> Am 20.01.13 21:34, schrieb Mark D:
>> On 1/20/2013 1:10 PM, Bj�rn Kahl wrote
> Bj�rn
>

piku

unread,
Jan 22, 2013, 1:35:23 AM1/22/13
to zfs-...@googlegroups.com
I maybe have my data back.  There is some c source code out there for something called labelfix.  There is an updated write function out there from a post in 2011.  I changed my write function to this.  I built this source into zpool_main.c in zpool and ran it.  It returned with no input.  After this zpool import -FfVD was able to see my pool and zpool import -f tank imported it!  I am now formatting my backup disk and barring any serious issues I should be able to copy my data away from zfs...  perhaps forever :(  This is a scary thing, there are basically no support/repair tools!  I was almost unable to use my perfectly acceptable mirror.

Daniel Brooks

unread,
Jan 22, 2013, 2:29:31 AM1/22/13
to zfs-...@googlegroups.com
On Mon, Jan 21, 2013 at 10:35 PM, piku <evapo...@gmail.com> wrote:
I maybe have my data back.  There is some c source code out there for something called labelfix.  There is an updated write function out there from a post in 2011.  I changed my write function to this.  I built this source into zpool_main.c in zpool and ran it.  It returned with no input.  After this zpool import -FfVD was able to see my pool and zpool import -f tank imported it!  I am now formatting my backup disk and barring any serious issues I should be able to copy my data away from zfs...  perhaps forever :(  This is a scary thing, there are basically no support/repair tools!  I was almost unable to use my perfectly acceptable mirror.

True enough, but it doesn't sound like your mirror was perfectly intact once you shredded one of the disks. A few misplaced bits in just the right place will kill any filesystem, and you were doing this because one of your drives was giving you enough trouble that you were going to RMA it. If you hadn't accidentally shredded the wrong disk then everything would have been fine, since the disk error wouldn't have mattered.

You should post your modified source somewhere so people can see what actually went wrong that it fixed. I'd make a git branch with the code, that's usually the best way to share this sort of thing. Maybe there's a way to make the error reporting better when one of the transaction groups becomes corrupted, as happened in this case.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages