Metadata Corrupted

30 views
Skip to first unread message

warren

unread,
Nov 15, 2008, 10:32:29 PM11/15/08
to zfs-fuse
During a power outage a while ago my system went down. I was unable
to bring my zfs pool back online. I've upgraded to the most recent
version of zfs, but still haven't had any luck. I posted some
messages about this a while ago, but had to sideline the recovery for
a while, but am now back at it to see if I can get some of the data
back.

I am running on Ubuntu Gutsty Gibson, and have 6 500GB drives in a
RAIDZ1 pool.

If I try to import, I get the following error. Any thoughts? Thanks
in advance!

root@storage: # ./zpool import -f
pool: tank1
id: 2314521930077808218
state: FAULTED
status: The pool metadata is corrupted.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported
using
the '-f' flag.
see: http://www.sun.com/msg/ZFS-8000-72
config:

tank1 FAULTED corrupted data
raidz1 ONLINE
sdc ONLINE
sdd ONLINE
sde ONLINE
sdf ONLINE
sda ONLINE
sdb ONLINE

pool: tank
id: 5523861625148184458
state: FAULTED
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported
using
the '-f' flag.
see: http://www.sun.com/msg/ZFS-8000-EY
config:

tank FAULTED corrupted data
raidz1 DEGRADED
sdd1 ONLINE
sdf1 ONLINE
dsk/c2d0s0 UNAVAIL cannot open
sde1 ONLINE
sdb1 ONLINE

Ricardo M. Correia

unread,
Nov 15, 2008, 11:34:42 PM11/15/08
to zfs-...@googlegroups.com
Hi warren,

On Sáb, 2008-11-15 at 19:32 -0800, warren wrote:
> During a power outage a while ago my system went down. I was unable
> to bring my zfs pool back online. I've upgraded to the most recent
> version of zfs, but still haven't had any luck. I posted some
> messages about this a while ago, but had to sideline the recovery for
> a while, but am now back at it to see if I can get some of the data
> back.

I just reread your old thread and saw that you were running zfs-fuse
0.4.0 beta 1.
That version is very old, and it had a serious bug that can cause
corruption during power outages, which is very likely the bug that you
hit.

> I am running on Ubuntu Gutsty Gibson, and have 6 500GB drives in a
> RAIDZ1 pool.
>
> If I try to import, I get the following error. Any thoughts? Thanks
> in advance!

I'm going to need you to do a few steps. Please make sure you capture
the output of the commands below, so that I can take a look in case
something fails.

1) Apply the patch that I've attached on top of the latest trunk
version:

$ cd zfs-fuse
$ patch -p1 < trunk-uberblock.patch

2) Recompile zfs-fuse (in debug mode):

$ cd src
$ scons debug=2

3) Find out the latest txg number of your pool. You can do this in 2
ways.
The preferred way is to use "zdb -u", like this:

(make sure zfs-fuse is running first)
$ zdb -u -e tank1

Or, if this fails, you can use "zdb -l":

$ zdb -l /dev/sda

4) The zdb commands above will report a "txg=<xxxxx>" number.
Now you'll need to try to see if you can read the pool using *the
previous* txg number, like this:

$ zdb -cv -t <yyyyy> -e tank1
(where yyyyy is xxxxx-1)

So, for example, if the txg number that we found out in the step 3)
above is 1343356, you would try txg number 1343355, like this:

$ zdb -cv -t 1343355 -e tank1

5) At this point, if the above command succeeds and checksums
everything, it means the pool is recoverable (but we haven't recovered
it yet).

Anyway, a lot of things could have failed by now, so please let me know
how far you've gone (and what the output of those commands were), so
that I can help you further.

Regards,
Ricardo

trunk-uberblock.patch

Ricardo M. Correia

unread,
Nov 15, 2008, 11:37:21 PM11/15/08
to zfs-...@googlegroups.com
On Dom, 2008-11-16 at 05:34 +0100, Ricardo M. Correia wrote:
> 2) Recompile zfs-fuse (in debug mode):
>
> $ cd src
> $ scons debug=2

Sorry, I forgot to mention that you should also install the patched code
with "scons debug=2 install", otherwise one of the zdb commands in a
step below will fail.

- Ricardo

warren

unread,
Nov 16, 2008, 11:50:01 AM11/16/08
to zfs-fuse
Hi Ricardo,

Thank you for helping me out on this issue again, it is greatly
appreciated!

I went through the steps you indicated, with the following results:

1) Applied patch successfully
2) Recompiled successfully
3) Found txg was '1343356'
4) Received error: 'zdb: can't open tank1: File exists'

I've included the full output below for reference. Any ideas?

Thanks again,
-Warren

===================================

1) Apply patch:

root@urquan:~/zfs/trunk# patch -p1 < trunk-uberblock.patch
patching file src/cmd/zdb/zdb.c
patching file src/lib/libzfscommon/include/sys/vdev_impl.h
patching file src/lib/libzpool/vdev_label.c

2) Recompile
[Compiled successfully, no errors reported]
scons: done building targets.

2a) Start zfs-fuse
[Started to support the zdb command in step 3]
hostname = urquan
hw_serial = 8323329
ncpus = 1
physmem = 388899 pages (1.48 GB)
pagesize = 4096, pageshift: 12
pwd_buflen = 1024, grp_buflen = 1024

3) Find out latest txg number of the pool

root@urquan:~/zfs/trunk/src# zdb -u -e tank1
zdb: can't open tank1: Input/output error

root@urquan:~/zfs/trunk/src# zdb -l /dev/sda
--------------------------------------------
LABEL 0
--------------------------------------------
version=3
name='tank1'
state=0
txg=1343356
pool_guid=2314521930077808218
top_guid=7554331498281353786
guid=5548940849682524809
vdev_tree
type='raidz'
id=0
guid=7554331498281353786
nparity=1
metaslab_array=14
metaslab_shift=31
ashift=9
asize=3000618713088
children[0]
type='disk'
id=0
guid=15613864033253702211
path='/dev/sda'
whole_disk=0
DTL=936
children[1]
type='disk'
id=1
guid=8347963192258921941
path='/dev/sdb'
whole_disk=0
DTL=935
children[2]
type='disk'
id=2
guid=5404007667257043155
path='/dev/sdc'
whole_disk=0
DTL=934
children[3]
type='disk'
id=3
guid=6853375767993526698
path='/dev/sdd'
whole_disk=0
DTL=933
children[4]
type='disk'
id=4
guid=5548940849682524809
path='/dev/sde'
whole_disk=0
DTL=937
children[5]
type='disk'
id=5
guid=17230078534950693853
path='/dev/sdf'
whole_disk=0
DTL=932
--------------------------------------------
LABEL 1
--------------------------------------------
version=3
name='tank1'
state=0
txg=1343356
pool_guid=2314521930077808218
top_guid=7554331498281353786
guid=5548940849682524809
vdev_tree
type='raidz'
id=0
guid=7554331498281353786
nparity=1
metaslab_array=14
metaslab_shift=31
ashift=9
asize=3000618713088
children[0]
type='disk'
id=0
guid=15613864033253702211
path='/dev/sda'
whole_disk=0
DTL=936
children[1]
type='disk'
id=1
guid=8347963192258921941
path='/dev/sdb'
whole_disk=0
DTL=935
children[2]
type='disk'
id=2
guid=5404007667257043155
path='/dev/sdc'
whole_disk=0
DTL=934
children[3]
type='disk'
id=3
guid=6853375767993526698
path='/dev/sdd'
whole_disk=0
DTL=933
children[4]
type='disk'
id=4
guid=5548940849682524809
path='/dev/sde'
whole_disk=0
DTL=937
children[5]
type='disk'
id=5
guid=17230078534950693853
path='/dev/sdf'
whole_disk=0
DTL=932
--------------------------------------------
LABEL 2
--------------------------------------------
version=3
name='tank1'
state=0
txg=1343356
pool_guid=2314521930077808218
top_guid=7554331498281353786
guid=5548940849682524809
vdev_tree
type='raidz'
id=0
guid=7554331498281353786
nparity=1
metaslab_array=14
metaslab_shift=31
ashift=9
asize=3000618713088
children[0]
type='disk'
id=0
guid=15613864033253702211
path='/dev/sda'
whole_disk=0
DTL=936
children[1]
type='disk'
id=1
guid=8347963192258921941
path='/dev/sdb'
whole_disk=0
DTL=935
children[2]
type='disk'
id=2
guid=5404007667257043155
path='/dev/sdc'
whole_disk=0
DTL=934
children[3]
type='disk'
id=3
guid=6853375767993526698
path='/dev/sdd'
whole_disk=0
DTL=933
children[4]
type='disk'
id=4
guid=5548940849682524809
path='/dev/sde'
whole_disk=0
DTL=937
children[5]
type='disk'
id=5
guid=17230078534950693853
path='/dev/sdf'
whole_disk=0
DTL=932
--------------------------------------------
LABEL 3
--------------------------------------------
version=3
name='tank1'
state=0
txg=1343356
pool_guid=2314521930077808218
top_guid=7554331498281353786
guid=5548940849682524809
vdev_tree
type='raidz'
id=0
guid=7554331498281353786
nparity=1
metaslab_array=14
metaslab_shift=31
ashift=9
asize=3000618713088
children[0]
type='disk'
id=0
guid=15613864033253702211
path='/dev/sda'
whole_disk=0
DTL=936
children[1]
type='disk'
id=1
guid=8347963192258921941
path='/dev/sdb'
whole_disk=0
DTL=935
children[2]
type='disk'
id=2
guid=5404007667257043155
path='/dev/sdc'
whole_disk=0
DTL=934
children[3]
type='disk'
id=3
guid=6853375767993526698
path='/dev/sdd'
whole_disk=0
DTL=933
children[4]
type='disk'
id=4
guid=5548940849682524809
path='/dev/sde'
whole_disk=0
DTL=937
children[5]
type='disk'
id=5
guid=17230078534950693853
path='/dev/sdf'
whole_disk=0
DTL=932

4) Run zdb

root@urquan:~/zfs/trunk/src# zdb -cv -t 1343355 -e tank1
zdb: can't open tank1: File exists


Ricardo M. Correia

unread,
Nov 16, 2008, 1:38:03 PM11/16/08
to zfs-...@googlegroups.com
On Dom, 2008-11-16 at 08:50 -0800, warren wrote:
> 4) Run zdb
>
> root@urquan:~/zfs/trunk/src# zdb -cv -t 1343355 -e tank1
> zdb: can't open tank1: File exists

Is the pool imported at this point? (you can check with
'zpool status').

If so, can you do 'zpool export tank1' and try running zdb
again?

- Ricardo


warren

unread,
Nov 16, 2008, 9:49:26 PM11/16/08
to zfs-fuse
On Nov 16, 1:38 pm, "Ricardo M. Correia" <Ricardo.M.Corr...@Sun.COM>
wrote:

> Is the pool imported at this point? (you can check with
> 'zpool status').

> If so, can you do 'zpool export tank1' and try running zdb
> again?

Hmm, looks like it was indeed imported. Exported and tried to run zdb
again, but received a 'can't open tank1: no such device or address'
error:

root@urquan:~/zfs/trunk/src/cmd/zdb# zpool status
pool: tank1
state: FAULTED
status: The pool metadata is corrupted and the pool cannot be opened.
action: Destroy and re-create the pool from a backup source.
see: http://www.sun.com/msg/ZFS-8000-72
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
tank1 FAULTED 0 0 6 corrupted data
raidz1 ONLINE 0 0 6
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
sda ONLINE 0 0 0
sdb ONLINE 0 0 0
root@urquan:~/zfs/trunk/src/cmd/zdb# zpool export tank1
root@urquan:~/zfs/trunk/src/cmd/zdb# zdb -cv -t 1343355 -e tank1
zdb: can't open tank1: No such device or address


Ricardo M. Correia

unread,
Nov 18, 2008, 2:03:55 PM11/18/08
to zfs-...@googlegroups.com
Hi warren,

On Dom, 2008-11-16 at 18:49 -0800, warren wrote:
> root@urquan:~/zfs/trunk/src/cmd/zdb# zdb -cv -t 1343355 -e tank1
> zdb: can't open tank1: No such device or address

I suspect this might take a few steps before we can recover the data.

I think exporting a pool *might* have caused some txgs to be written
(though I'm not sure about that), so we might have to check the txg
numbers again.

Can you provide me the output of these steps:

1) zdb -l /dev/sda

(This one will hopefully work now:)
2) zdb -u -e tank1 debug=on

3) zdb -u -t 1343355 -e tank1 debug=on

4) zdb -cv -t 1343355 -e tank1 debug=on

Thanks,
Ricardo


warren

unread,
Nov 18, 2008, 3:28:31 PM11/18/08
to zfs-fuse
Hi Ricardo,

Thank you again for taking time to look at this. It is greatly
appreciated!

I ran the steps you indicated, though came across some errors on step
2. The raw data is included below:
root@urquan:~/zfs/trunk/src# zdb -u -e tank1 debug=on
kernel_init: physmem = 388899 pages (1.48 GB)
cv_timedwait: thread -1246413936 is at cv_timedwait at 234890.65 with
delta 1.00 secs
cv_timedwait: thread -1254806640 is at cv_timedwait at 234890.65 with
delta 1.00 secs
cv_timedwait: thread -1246413936 exited cv_timedwait at 234891.65 (rem
= 0.00)
cv_timedwait: thread -1246413936 is at cv_timedwait at 234891.65 with
delta 1.00 secs
cv_timedwait: thread -1254806640 exited cv_timedwait at 234891.65 (rem
= 0.00)
cv_timedwait: thread -1254806640 is at cv_timedwait at 234891.65 with
delta 1.00 secs
hdr_recl: hdr_recl called
cv_timedwait: thread -1246413936 exited cv_timedwait at 234891.74 (rem
= 0.91)
cv_timedwait: thread -1246413936 is at cv_timedwait at 234891.74 with
delta 1.00 secs
vdev_queue_io_to_issue: read T=436752 off= 29c00 agg= 2 old=
400 new= 800
vdev_queue_io_to_issue: read T=436752 off= 2b000 agg= 84 old=
400 new=15000
vdev_queue_io_to_issue: read T=436752 off= 60000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=436752 off= 24000 agg= 13 old=
400 new= 3400
vdev_queue_io_to_issue: read T=436752 off= 28000 agg= 9 old=
400 new= 2400
vdev_queue_io_to_issue: read T=436752 off= 2ac00 agg= 5 old=
400 new= 1400
vdev_queue_io_to_issue: read T=436752 off=7470ba0000 agg=128
old= 400 new=20000
vdev_queue_io_to_issue: read T=436752 off=7470be0000 agg=128
old= 400 new=20000
vdev_queue_io_to_issue: read T=436752 off= 2c800 agg= 78 old=
400 new=13800
vdev_queue_io_to_issue: read T=436752 off= 60000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=436752 off=7470ba0000 agg=128
old= 400 new=20000
vdev_queue_io_to_issue: read T=436752 off=7470be0000 agg=128
old= 400 new=20000
vdev_queue_io_to_issue: read T=436752 off= 2d800 agg= 4 old=
400 new= 1000
vdev_queue_io_to_issue: read T=436752 off= 2f400 agg= 67 old=
400 new=10c00
vdev_queue_io_to_issue: read T=436752 off= 60000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=436752 off=7470ba0000 agg=128
old= 400 new=20000
vdev_queue_io_to_issue: read T=436752 off=7470be0000 agg=128
old= 400 new=20000
vdev_queue_io_to_issue: read T=436752 off= 28c00 agg= 19 old=
400 new= 4c00
vdev_queue_io_to_issue: read T=436752 off= 23400 agg= 4 old=
400 new= 1000
vdev_queue_io_to_issue: read T=436752 off= 2e800 agg= 70 old=
400 new=11800
vdev_queue_io_to_issue: read T=436752 off= 60000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=436752 off=7470ba0000 agg=128
old= 400 new=20000
vdev_queue_io_to_issue: read T=436752 off=7470be0000 agg=128
old= 400 new=20000
vdev_queue_io_to_issue: read T=436752 off= 25000 agg=108 old=
400 new=1b000
vdev_queue_io_to_issue: read T=436752 off= 60000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=436752 off=7470ba0000 agg= 71
old= 400 new=11c00
vdev_queue_io_to_issue: read T=436752 off= 30c00 agg= 3 old=
400 new= c00
vdev_queue_io_to_issue: read T=436752 off= 32400 agg= 55 old=
400 new= dc00
vdev_queue_io_to_issue: read T=436752 off= 60000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=436752 off=7470ba0000 agg=128
old= 400 new=20000
vdev_queue_io_to_issue: read T=436752 off=7470be0000 agg=128
old= 400 new=20000
vdev_queue_io_to_issue: read T=436752 off=7470bb1c00 agg= 57
old= 400 new= e400
vdev_queue_io_to_issue: read T=436752 off=7470be0000 agg=128
old= 400 new=20000
dmu_objset_open_impl: reading [L0 DMU objset] 400L/200P DVA[0]
=<0:1639a5b0c00:400> DVA[1]=<0:9822977c00:400> DVA[2]
=<0:2c0076c400:400> fletcher4 lzjb LE contiguous birth=3443226
fill=938 cksum=a863aa6e0:4497cadde62:e2f5f8df197b:1fc543dc9018be
dbuf_create: ds=mos obj=mdn lvl=1 blkid=0 db=0x8375ce78
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
vdev_mirror_io_done: retrying i/o (err=52) on child raidz
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
vdev_mirror_io_done: retrying i/o (err=52) on child raidz
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
vdev_mirror_io_done: retrying i/o (err=52) on child raidz
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
vdev_mirror_io_done: retrying i/o (err=52) on child raidz
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
arc_evict: only evicted 1024 bytes from 81ac500arc_evict_ghost: only
deleted 1024 bytes from 0x81ac560arc_evict_ghost: only deleted 0 bytes
from 0x81ac620spa_load: spa_load(): error 5 in dsl_pool_open()
spa_load: spa_load(): error 5
vdev_queue_io_to_issue: read T=436753 off= 24800 agg= 2 old=
400 new= 800
vdev_queue_io_to_issue: read T=436753 off= 23800 agg= 22 old=
400 new= 5800
vdev_queue_io_to_issue: read T=436753 off= 26000 agg=104 old=
400 new=1a000
vdev_queue_io_to_issue: read T=436753 off= 60000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=436753 off=7470ba0000 agg=128
old= 400 new=20000
vdev_queue_io_to_issue: read T=436753 off=7470be0000 agg=128
old= 400 new=20000
vdev_queue_io_to_issue: read T=436753 off= 2a000 agg= 88 old=
400 new=16000
vdev_queue_io_to_issue: read T=436753 off= 60000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=436753 off=7470ba0000 agg=128
old= 400 new=20000
vdev_queue_io_to_issue: read T=436753 off=7470be0000 agg= 73
old= 400 new=12400
vdev_queue_io_to_issue: read T=436753 off= 21000 agg= 14 old=
400 new= 3800
vdev_queue_io_to_issue: read T=436753 off= 23000 agg= 19 old=
400 new= 4c00
vdev_queue_io_to_issue: read T=436753 off= 25800 agg=106 old=
400 new=1a800
vdev_queue_io_to_issue: read T=436753 off= 60000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=436753 off=7470ba0000 agg=128
old= 400 new=20000
vdev_queue_io_to_issue: read T=436753 off=7470be0000 agg=128
old= 400 new=20000
vdev_queue_io_to_issue: read T=436753 off=7470bf2c00 agg= 53
old= 400 new= d400
vdev_queue_io_to_issue: read T=436753 off= 28c00 agg= 93 old=
400 new=17400
vdev_queue_io_to_issue: read T=436753 off= 60000 agg= 9 old=
400 new= 2400
vdev_queue_io_to_issue: read T=436753 off= 66800 agg= 4 old=
400 new= 1000
vdev_queue_io_to_issue: read T=436753 off= 21400 agg= 2 old=
400 new= 800
vdev_queue_io_to_issue: read T=436753 off= 68400 agg= 95 old=
400 new=17c00
vdev_queue_io_to_issue: read T=436753 off=7470ba0000 agg=128
old= 400 new=20000
vdev_queue_io_to_issue: read T=436753 off= 22800 agg=118 old=
400 new=1d800
vdev_queue_io_to_issue: read T=436753 off= 60000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=436753 off=7470be0000 agg=128
old= 400 new=20000
vdev_queue_io_to_issue: read T=436753 off=7470ba0000 agg=128
old= 400 new=20000
vdev_queue_io_to_issue: read T=436753 off=7470be0000 agg=128
old= 400 new=20000
vdev_queue_io_to_issue: read T=436753 off= 2ac00 agg= 5 old=
400 new= 1400
vdev_queue_io_to_issue: read T=436753 off= 2cc00 agg= 77 old=
400 new=13400
vdev_queue_io_to_issue: read T=436753 off= 60000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=436753 off=7470ba0000 agg=128
old= 400 new=20000
vdev_queue_io_to_issue: read T=436753 off=7470be0000 agg= 22
old= 400 new= 5800
cv_timedwait: thread -1254806640 exited cv_timedwait at 234892.66 (rem
= -0.01)
cv_timedwait: thread -1254806640 is at cv_timedwait at 234892.66 with
delta 1.00 secs
vdev_queue_io_to_issue: read T=436753 off=7470be5800 agg=106
old= 400 new=1a800
dmu_objset_open_impl: reading [L0 DMU objset] 400L/200P DVA[0]
=<0:1639a5b0c00:400> DVA[1]=<0:9822977c00:400> DVA[2]
=<0:2c0076c400:400> fletcher4 lzjb LE contiguous birth=3443226
fill=938 cksum=a863aa6e0:4497cadde62:e2f5f8df197b:1fc543dc9018be
dbuf_create: ds=mos obj=mdn lvl=1 blkid=0 db=0x8375ce78
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
vdev_mirror_io_done: retrying i/o (err=52) on child raidz
cv_timedwait: thread -1246413936 exited cv_timedwait at 234892.74 (rem
= 0.00)
cv_timedwait: thread -1246413936 is at cv_timedwait at 234892.74 with
delta 1.00 secs
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
vdev_mirror_io_done: retrying i/o (err=52) on child raidz
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
vdev_mirror_io_done: retrying i/o (err=52) on child raidz
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
vdev_mirror_io_done: retrying i/o (err=52) on child raidz
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
arc_evict: only evicted 1024 bytes from 81ac500arc_evict_ghost: only
deleted 1024 bytes from 0x81ac560arc_evict_ghost: only deleted 0 bytes
from 0x81ac620spa_load: spa_load(): error 5 in dsl_pool_open()
spa_load: spa_load(): error 5
zdb: can't open tank1: Input/output error



root@urquan:~/zfs/trunk/src# zdb -u -t 1343355 -e tank1 debug=on
kernel_init: physmem = 388899 pages (1.48 GB)
cv_timedwait: thread -1246717040 is at cv_timedwait at 234937.67 with
delta 1.00 secs
cv_timedwait: thread -1255109744 is at cv_timedwait at 234937.67 with
delta 1.00 secs
zdb: can't open tank1: File exists



root@urquan:~/zfs/trunk/src# zdb -cv -t 1343355 -e tank1 debug=on
kernel_init: physmem = 388899 pages (1.48 GB)
cv_timedwait: thread -1246352496 is at cv_timedwait at 234948.98 with
delta 1.00 secs
cv_timedwait: thread -1254745200 is at cv_timedwait at 234948.98 with
delta 1.00 secs
zdb: can't open tank1: File exists

Ricardo M. Correia

unread,
Nov 18, 2008, 3:48:36 PM11/18/08
to zfs-...@googlegroups.com
Hi warren,

Thanks for the information.

I just noticed that the zdb commands below failed with the error "File
exists" again, which isn't very useful.

This indicates the pool has been somehow imported again. Can you check
if that's the case? If so, can you export the pool again and rerun the
zdb commands?

Also, do you have any idea how the pool ended up imported? AFAIK, the
zdb commands shouldn't have imported the pool.. did you run "zpool
import" in the mean time?

Thanks,
Ricardo

warren

unread,
Nov 18, 2008, 4:16:38 PM11/18/08
to zfs-fuse
Hi Ricardo,

It is possible I ran a zpool import in the meantime. I apologize if
this caused any confusion.

I've rebooted the machine and started from scratch to reduce and
unknown variables. it looks like I still have the 'file exists' error
after the second zdb command (the first one gets further, but does
eventually report an error). I ran a 'zpool status' at that point and
it did not appear any pools were imported.

I've included the console log below, immediately after starting zfs-
fuse.

Thanks again,
-Warren

root@urquan:~# zpool status
pool: tank1
state: FAULTED
status: The pool metadata is corrupted and the pool cannot be opened.
action: Destroy and re-create the pool from a backup source.
see: http://www.sun.com/msg/ZFS-8000-72
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
tank1 FAULTED 0 0 6 corrupted data
raidz1 ONLINE 0 0 6
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
sda ONLINE 0 0 0
sdb ONLINE 0 0 0


root@urquan:~# zpool export tank1


root@urquan:~# zdb -l /dev/sda
root@urquan:~# zdb -u -e tank1 debug=on
kernel_init: physmem = 388899 pages (1.48 GB)
cv_timedwait: thread -1246315632 is at cv_timedwait at 300.32 with
delta 1.00 secs
cv_timedwait: thread -1254708336 is at cv_timedwait at 300.32 with
delta 1.00 secs
cv_timedwait: thread -1246315632 exited cv_timedwait at 301.32 (rem =
0.00)
cv_timedwait: thread -1246315632 is at cv_timedwait at 301.32 with
delta 1.00 secs
cv_timedwait: thread -1254708336 exited cv_timedwait at 301.32 (rem =
0.00)
cv_timedwait: thread -1254708336 is at cv_timedwait at 301.32 with
delta 1.00 secs
hdr_recl: hdr_recl called
cv_timedwait: thread -1246315632 exited cv_timedwait at 301.47 (rem =
0.85)
cv_timedwait: thread -1246315632 is at cv_timedwait at 301.47 with
delta 1.00 secs
vdev_queue_io_to_issue: read T=560 off= 22800 agg= 7 old=
400 new= 1c00
vdev_queue_io_to_issue: read T=560 off= 25400 agg=107 old=
400 new=1ac00
vdev_queue_io_to_issue: read T=560 off= 60000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=560 off=7470ba0000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=560 off=7470be0000 agg= 67 old=
400 new=10c00
vdev_queue_io_to_issue: read T=560 off= 21000 agg= 29 old=
400 new= 7400
vdev_queue_io_to_issue: read T=560 off=7470bf1400 agg= 59 old=
400 new= ec00
vdev_queue_io_to_issue: read T=560 off= 3b800 agg= 3 old=
400 new= c00
vdev_queue_io_to_issue: read T=560 off= 24c00 agg= 15 old=
400 new= 3c00
vdev_queue_io_to_issue: read T=560 off= 3d000 agg= 12 old=
400 new= 3000
vdev_queue_io_to_issue: read T=560 off= 60000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=560 off= 21400 agg= 3 old=
400 new= c00
vdev_queue_io_to_issue: read T=560 off= 29800 agg= 90 old=
400 new=16800
vdev_queue_io_to_issue: read T=560 off= 60000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=560 off=7470ba0000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=560 off=7470be0000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=560 off=7470ba0000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=560 off=7470be0000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=560 off= 28800 agg= 39 old=
400 new= 9c00
vdev_queue_io_to_issue: read T=560 off= 21000 agg= 9 old=
400 new= 2400
vdev_queue_io_to_issue: read T=560 off= 33400 agg= 51 old=
400 new= cc00
vdev_queue_io_to_issue: read T=560 off= 60000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=560 off=7470ba0000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=560 off=7470be0000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=560 off= 24400 agg=104 old=
400 new=1a000
vdev_queue_io_to_issue: read T=561 off= 3f000 agg= 4 old=
400 new= 1000
vdev_queue_io_to_issue: read T=561 off= 60000 agg= 57 old=
400 new= e400
vdev_queue_io_to_issue: read T=561 off= 2a000 agg= 5 old=
400 new= 1400
vdev_queue_io_to_issue: read T=561 off= 6e800 agg= 70 old=
400 new=11800
vdev_queue_io_to_issue: read T=561 off=7470ba0000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=561 off=7470be0000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=561 off= 2c000 agg= 80 old=
400 new=14000
vdev_queue_io_to_issue: read T=561 off= 60000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=561 off=7470ba0000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=561 off=7470be0000 agg=128 old=
400 new=20000
dmu_objset_open_impl: reading [L0 DMU objset] 400L/200P DVA[0]
=<0:1639a5b0c00:400> DVA[1]=<0:9822977c00:400> DVA[2]
=<0:2c0076c400:400> fletcher4 lzjb LE contiguous birth=3443226
fill=938 cksum=a863aa6e0:4497cadde62:e2f5f8df197b:1fc543dc9018be
dbuf_create: ds=mos obj=mdn lvl=1 blkid=0 db=0x83850e78
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
vdev_mirror_io_done: retrying i/o (err=52) on child raidz
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
vdev_mirror_io_done: retrying i/o (err=52) on child raidz
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
vdev_mirror_io_done: retrying i/o (err=52) on child raidz
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
vdev_mirror_io_done: retrying i/o (err=52) on child raidz
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
arc_evict: only evicted 1024 bytes from 81ac500arc_evict_ghost: only
deleted 1024 bytes from 0x81ac560arc_evict_ghost: only deleted 0 bytes
from 0x81ac620spa_load: spa_load(): error 5 in dsl_pool_open()
spa_load: spa_load(): error 5
vdev_queue_io_to_issue: read T=561 off= 25c00 agg= 3 old=
400 new= c00
vdev_queue_io_to_issue: read T=562 off= 26c00 agg= 38 old=
400 new= 9800
vdev_queue_io_to_issue: read T=561 off= 27400 agg= 99 old=
400 new=18c00
vdev_queue_io_to_issue: read T=561 off= 60000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=562 off= 31400 agg= 59 old=
400 new= ec00
vdev_queue_io_to_issue: read T=562 off= 60000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=562 off=7470ba0000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=562 off=7470be0000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=562 off=7470ba0000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=562 off=7470be0000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=562 off= 26c00 agg= 40 old=
400 new= a000
vdev_queue_io_to_issue: read T=562 off= 31c00 agg= 57 old=
400 new= e400
vdev_queue_io_to_issue: read T=562 off= 60000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=562 off=7470ba0000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=562 off=7470be0000 agg=128 old=
400 new=20000
cv_timedwait: thread -1254708336 exited cv_timedwait at 302.32 (rem =
0.00)
cv_timedwait: thread -1254708336 is at cv_timedwait at 302.32 with
delta 1.00 secs
vdev_queue_io_to_issue: read T=562 off= 2d000 agg= 5 old=
400 new= 1400
vdev_queue_io_to_issue: read T=562 off= 21400 agg= 2 old=
400 new= 800
vdev_queue_io_to_issue: read T=562 off= 2f000 agg= 68 old=
400 new=11000
vdev_queue_io_to_issue: read T=562 off= 60000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=562 off= 22000 agg= 12 old=
400 new= 3000
vdev_queue_io_to_issue: read T=562 off= 22c00 agg=117 old=
400 new=1d400
vdev_queue_io_to_issue: read T=562 off= 60000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=562 off=7470ba0000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=562 off=7470be0000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=562 off= 26000 agg=104 old=
400 new=1a000
vdev_queue_io_to_issue: read T=562 off=7470ba0000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=562 off=7470be0000 agg=128 old=
400 new=20000
vdev_queue_io_to_issue: read T=562 off= 60000 agg= 68 old=
400 new=11000
vdev_queue_io_to_issue: read T=562 off= 72000 agg= 56 old=
400 new= e000
vdev_queue_io_to_issue: read T=562 off=7470ba0000 agg=108 old=
400 new=1b000
vdev_queue_io_to_issue: read T=562 off=7470bbb800 agg= 7 old=
400 new= 1c00
vdev_queue_io_to_issue: read T=562 off=7470bbd800 agg= 10 old=
400 new= 2800
vdev_queue_io_to_issue: read T=562 off=7470be0000 agg=128 old=
400 new=20000
dmu_objset_open_impl: reading [L0 DMU objset] 400L/200P DVA[0]
=<0:1639a5b0c00:400> DVA[1]=<0:9822977c00:400> DVA[2]
=<0:2c0076c400:400> fletcher4 lzjb LE contiguous birth=3443226
fill=938 cksum=a863aa6e0:4497cadde62:e2f5f8df197b:1fc543dc9018be
dbuf_create: ds=mos obj=mdn lvl=1 blkid=0 db=0x83850e78
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
vdev_mirror_io_done: retrying i/o (err=52) on child raidz
vdev_raidz_io_done: rereading
cv_timedwait: thread -1246315632 exited cv_timedwait at 302.47 (rem =
0.00)
cv_timedwait: thread -1246315632 is at cv_timedwait at 302.47 with
delta 1.00 secs
vdev_raidz_io_done: rereading
vdev_mirror_io_done: retrying i/o (err=52) on child raidz
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
vdev_mirror_io_done: retrying i/o (err=52) on child raidz
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
vdev_mirror_io_done: retrying i/o (err=52) on child raidz
vdev_raidz_io_done: rereading
vdev_raidz_io_done: rereading
arc_evict: only evicted 1024 bytes from 81ac500arc_evict_ghost: only
deleted 1024 bytes from 0x81ac560arc_evict_ghost: only deleted 0 bytes
from 0x81ac620spa_load: spa_load(): error 5 in dsl_pool_open()
spa_load: spa_load(): error 5
zdb: can't open tank1: Input/output error


root@urquan:~# zdb -u -t 1343355 -e tank1 debug=on
kernel_init: physmem = 388899 pages (1.48 GB)
cv_timedwait: thread -1246844016 is at cv_timedwait at 316.34 with
delta 1.00 secs
cv_timedwait: thread -1255236720 is at cv_timedwait at 316.34 with
delta 1.00 secs
zdb: can't open tank1: File exists



root@urquan:~# zpool status
no pools available



root@urquan:~# zdb -cv -t 1343355 -e tank1 debug=on
kernel_init: physmem = 388899 pages (1.48 GB)
cv_timedwait: thread -1246995568 is at cv_timedwait at 372.28 with
delta 1.00 secs
cv_timedwait: thread -1255388272 is at cv_timedwait at 372.28 with

Jonathan Schmidt

unread,
Nov 18, 2008, 4:26:59 PM11/18/08
to zfs-...@googlegroups.com
Hi warren,

I know very little about the zdb commands but it looks like you are
getting I/O errors. Do you know for sure that these drives are
functional? Maybe they sustained some damage during the power outage?
Does 'dmesg' give you any hints (or smartctl)? Sorry if that's already
been asked.

Jonathan

warren

unread,
Nov 18, 2008, 4:40:52 PM11/18/08
to zfs-fuse
On Nov 18, 4:26 pm, "Jonathan Schmidt" <j...@jschmidt.ca> wrote:
> Do you know for sure that these drives are
> functional?  Maybe they sustained some damage during the power outage?
> Does 'dmesg' give you any hints (or smartctl)?

Hi Jonathan,

I appreciate all of the help I can get.

I ran smartctl on all drives in the array, and they all came back as
PASSED.

I did see something odd in the logs however:

[ 25.314941] sd 1:0:0:0: [sdb] 976773168 512-byte hardware sectors
(500108 MB)
[ 25.314957] sd 1:0:0:0: [sdb] Write Protect is off
[ 25.314960] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 25.314983] sd 1:0:0:0: [sdb] Write cache: disabled, read cache:
enabled, doesn't support DPO or FUA
[ 25.315037] sd 1:0:0:0: [sdb] 976773168 512-byte hardware sectors
(500108 MB)
[ 25.315050] sd 1:0:0:0: [sdb] Write Protect is off
[ 25.315053] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 25.315074] sd 1:0:0:0: [sdb] Write cache: disabled, read cache:
enabled, doesn't support DPO or FUA
[ 25.315079] sdb:<4>GPT:Primary header thinks Alt. header is not at
the end of the disk.
[ 25.352630] GPT:976760063 != 976773167
[ 25.352632] GPT:Alternate GPT header not at the end of the disk.
[ 25.352635] GPT:976760063 != 976773167
[ 25.352636] GPT: Use GNU Parted to correct GPT errors.
[ 25.352639] sdb1 sdb9

It looks like the partition data could be suspect. I'm using the
drives as raw disks, and not using LVM or anything like that. From
what I read, ZFS doesn't need the drive to be formatted, but rather
handles that itself. I've not touched the drives with fdisk, gparted,
etc.

It looks like it may have occured on 3 of the 6 drives (sdb, sdd, sde)

root@urquan:~# dmesg | grep GPT
[ 25.315079] sdb:<4>GPT:Primary header thinks Alt. header is not at
the end of the disk.
[ 25.352630] GPT:976760063 != 976773167
[ 25.352632] GPT:Alternate GPT header not at the end of the disk.
[ 25.352635] GPT:976760063 != 976773167
[ 25.352636] GPT: Use GNU Parted to correct GPT errors.
[ 25.376767] sdd:<4>GPT:Primary header thinks Alt. header is not at
the end of the disk.
[ 25.412162] GPT:976760063 != 976773167
[ 25.412164] GPT:Alternate GPT header not at the end of the disk.
[ 25.412166] GPT:976760063 != 976773167
[ 25.412168] GPT: Use GNU Parted to correct GPT errors.
[ 25.413080] sde:<4>GPT:Primary header thinks Alt. header is not at
the end of the disk.
[ 25.452709] GPT:976760063 != 976773167
[ 25.452711] GPT:Alternate GPT header not at the end of the disk.
[ 25.452713] GPT:976760063 != 976773167
[ 25.452715] GPT: Use GNU Parted to correct GPT errors.
[ 25.494585] GPT:Primary header thinks Alt. header is not at the end
of the disk.
[ 25.494590] GPT:976760063 != 976773167
[ 25.494592] GPT:Alternate GPT header not at the end of the disk.
[ 25.494594] GPT:976760063 != 976773167
[ 25.494595] GPT: Use GNU Parted to correct GPT errors.

Ruben Wisniewski

unread,
Nov 18, 2008, 4:58:43 PM11/18/08
to zfs-...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

warren wrote:
> root@urquan:~/zfs/trunk/src# zdb -l /dev/sda
> LABEL 3
> version=3
> name='tank1'
> vdev_tree
> type='raidz'
> id=0
> children[0]
> path='/dev/sda'
> whole_disk=0
> children[1]
> path='/dev/sdb'
> whole_disk=0
> children[2]
> path='/dev/sdc'
> whole_disk=0
> children[3]
> path='/dev/sdd'
> whole_disk=0
> children[4]
> path='/dev/sde'
> whole_disk=0
> children[5]
> path='/dev/sdf'
> whole_disk=0

warren wrote:
> I'm using the drives as raw disks, and not using LVM or anything like
> that.

Does whole_disk shouldn't be 1 if you use the whole drive for ZFS?


Greetings Ruben

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJIzqTA71SGzTeS3ARAlcbAKCEw4UahALyOF23424JuEGYZ4HcWwCeNAGb
PX+2f1i/BzfjGQkRb0YT320=
=PYnK
-----END PGP SIGNATURE-----

Jonathan Schmidt

unread,
Nov 18, 2008, 5:06:35 PM11/18/08
to zfs-...@googlegroups.com
> It looks like the partition data could be suspect. I'm using the
> drives as raw disks, and not using LVM or anything like that. From
> what I read, ZFS doesn't need the drive to be formatted, but rather
> handles that itself. I've not touched the drives with fdisk, gparted,
> etc.

Yes, ZFS does not need disks to be partitioned. I have 4 disks in my
zpool, two of them are partitioned and two are using the raw disk. I
don't have any GPT errors in my dmesg, but it wouldn't surprise me if they
occurred normally. That isn't an authoritative answer, by the way, since
I know nothing about ZFS's on-disk format, just that it *could* be
incompatible with what the kernel expects.

As a general note, I have decided to give ZFS partitions instead of the
full raw disk. It allows me some flexibility to do a few things:

- Linux software RAID1 across a ~100MB partition on each disk for /boot
- A 1GB swap partition on each disk, with equal priorities so the kernel
stripes across them
- (Note some people have pointed out that the swap reliability is
actually quite LOW now, basically being in a n-disk RAID0. Agreed, but
it's sure fast! Caveat emptor.)
- Partition table is complete and correct so as to not confuse the OS
- Doesn't waste much space

Jonathan Schmidt

unread,
Nov 18, 2008, 5:12:53 PM11/18/08
to zfs-...@googlegroups.com

From my working zpool: looks like whole_disk = 0. Maybe that flag means
something else? Good catch regardless :)

# zpool status

pool: tank
state: ONLINE
scrub: none requested
config:

NAME
STATE READ WRITE CKSUM
tank

ONLINE 0 0 0
disk/by-id/scsi-SATA_WDC_WD5000AAKS-_WD-WCAPW2878855
ONLINE 0 0 0
disk/by-id/scsi-SATA_WD1000FYPS-12ZKWCASJ0330645-part2
ONLINE 0 0 0
disk/by-id/scsi-SATA_WD1000FYPS-12ZKWCASJ0330721-part2
ONLINE 0 0 0
disk/by-id/scsi-SATA_WDC_WD5000AAKS-_WD-WCAPW4431098-part2
ONLINE 0 0 0

errors: No known data errors
# zdb -l /dev/disk/by-id/scsi-SATA_WDC_WD5000AAKS-_WD-WCAPW2878855
--------------------------------------------
LABEL 0
--------------------------------------------
version=10
name='tank'
state=0
txg=2641731
pool_guid=10396150648929218294
hostid=1329989442
hostname=''
top_guid=372537301972830197
guid=372537301972830197
vdev_tree
type='disk'
id=0
guid=372537301972830197
path='/dev/disk/by-id/scsi-SATA_WDC_WD5000AAKS-_WD-WCAPW2878855'
whole_disk=0
metaslab_array=18
metaslab_shift=32
ashift=9
asize=500103118848
is_log=0
DTL=191
--------------------------------------------
LABEL 1
--------------------------------------------
version=10
name='tank'
state=0
txg=2641731
pool_guid=10396150648929218294
hostid=1329989442
hostname=''
top_guid=372537301972830197
guid=372537301972830197
vdev_tree
type='disk'
id=0
guid=372537301972830197
path='/dev/disk/by-id/scsi-SATA_WDC_WD5000AAKS-_WD-WCAPW2878855'
whole_disk=0
metaslab_array=18
metaslab_shift=32
ashift=9
asize=500103118848
is_log=0
DTL=191
--------------------------------------------
LABEL 2
--------------------------------------------
version=10
name='tank'
state=0
txg=2641731
pool_guid=10396150648929218294
hostid=1329989442
hostname=''
top_guid=372537301972830197
guid=372537301972830197
vdev_tree
type='disk'
id=0
guid=372537301972830197
path='/dev/disk/by-id/scsi-SATA_WDC_WD5000AAKS-_WD-WCAPW2878855'
whole_disk=0
metaslab_array=18
metaslab_shift=32
ashift=9
asize=500103118848
is_log=0
DTL=191
--------------------------------------------
LABEL 3
--------------------------------------------
version=10
name='tank'
state=0
txg=2641731
pool_guid=10396150648929218294
hostid=1329989442
hostname=''
top_guid=372537301972830197
guid=372537301972830197
vdev_tree
type='disk'
id=0
guid=372537301972830197
path='/dev/disk/by-id/scsi-SATA_WDC_WD5000AAKS-_WD-WCAPW2878855'
whole_disk=0
metaslab_array=18
metaslab_shift=32
ashift=9
asize=500103118848
is_log=0
DTL=191

Omen Wild

unread,
Nov 19, 2008, 4:39:12 PM11/19/08
to zfs-...@googlegroups.com
Quoting Jonathan Schmidt <j...@jschmidt.ca> on Tue, Nov 18 14:06:
>
> Yes, ZFS does not need disks to be partitioned. I have 4 disks in my
> zpool, two of them are partitioned and two are using the raw disk.
[ snip ]

> As a general note, I have decided to give ZFS partitions instead of the
> full raw disk. It allows me some flexibility to do a few things:
>
> - Linux software RAID1 across a ~100MB partition on each disk for /boot

Does this mean you have an initrd that can do ZFS boot? I am looking to
set up a new server and was hoping to do this, but would like to hear
some success stories before tackling it. Can you share your technique?

Thanks,
Omen

--
A flashlight is a case for holding dead batteries.

signature.asc

Jonathan Schmidt

unread,
Nov 19, 2008, 5:13:33 PM11/19/08
to zfs-...@googlegroups.com

Sorry, no, I didn't mean to imply that. My /boot is ext2 and I have an
ext3 root filesystem with my OS installed. ZFS is used for bulk data
storage only.

Besides, zfs-fuse is a bit slow to run root off of. I'd love to do it
too, but I've resigned myself to just rsync'ing a copy of it from ext3
root to a zfs filesystem in a cron script (and snapshotting it that way).

Ruben Wisniewski

unread,
Nov 19, 2008, 6:05:30 PM11/19/08
to zfs-...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jonathan Schmidt wrote:
> Besides, zfs-fuse is a bit slow to run root off of. I'd love to do it
> too, but I've resigned myself to just rsync'ing a copy of it from ext3
> root to a zfs filesystem in a cron script (and snapshotting it that way).

You write my thoughts down (about speed). I have a smiliar solution, but
with XFS instead of ext2/3. Do you use you're own script to backup, do
it support snapshots in ZFS?

If not, could you please provide a link =)


Greetings Ruben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJJJu5A71SGzTeS3ARAidqAJ9lEeb0+ZolastJ7ZBIM45TuPWD5QCfc6Gi
KZ8F2XH11vVM1eNSE/SfqiU=
=nnvg
-----END PGP SIGNATURE-----

Jonathan Schmidt

unread,
Nov 19, 2008, 6:37:28 PM11/19/08
to zfs-...@googlegroups.com
> Jonathan Schmidt wrote:
>> Besides, zfs-fuse is a bit slow to run root off of. I'd love to do it
>> too, but I've resigned myself to just rsync'ing a copy of it from ext3
>> root to a zfs filesystem in a cron script (and snapshotting it that
>> way).
> You write my thoughts down (about speed). I have a smiliar solution, but
> with XFS instead of ext2/3. Do you use you're own script to backup, do
> it support snapshots in ZFS?
>
> If not, could you please provide a link =)

The script was a bit of a lie (sorry). I haven't actually written the
script yet, it's just a few manual commands I type. But I fully *plan* to
script it up, and I doubt it'd be more than 10 minutes of effort, but it's
been tough to find those 10 minutes for some reason. Once I get it
running I'll post it for everyone.

sghe...@hotmail.com

unread,
Nov 19, 2008, 6:59:33 PM11/19/08
to zfs-...@googlegroups.com
I have a very similar setup, and as it happens I had recently upgraded my box to Intrepid. That requirs me at the moment to rewrite the nightly backup.

I have decided this time to go with xfs on lvm for all 'live' file-systems. xfs_freeze+xfs_growfs+lvm2 are a really nice team.
As such, ZFS is now in the backseat (I used to run most including /usr directly on zfs-fuse) for backups. I am still working on this script. It is rather crude and verbose, not very performance-conscious (yet) and it focuses on syncing my off-site backup first.
Oh and PS. I use the 'detect-renamed' patch on rsync 3.0.4 for obvious reasons.
The astute reader will also note that I use .rsync-filter rules files to indicate no-backup files and at the moment do not off-site synch my home dirs.

In your mind replace all remote rsyncs with rsyncs to zfs and then issue
                zfs snapshot -r mypool@`date +%b%d`
or something similar to your taste and you'll get my intended setup!

One final note: I'm not happy with write performance on LVM2 w/snapshots... which is why I intend to keep the XFS snapshot shortlived. However, I'll need to have an atomic 'fast' XFS snapshot (using lvm) anyway even when rsyncing to ZFS, because under certain circumstances the rsync XFS->ZFS might take longer than I'm willing to freeze the source file-system for.

Here goes without any further comment:

#!/bin/bash
set -o xtrace
BWLIMIT=0
RSYNCFLAGS="--bwlimit=$BWLIMIT --delete -zhxDPavilFHy --stats --detect-renamed"
SERVICES='dovecot postfix apache2 imapproxy'

LVS='home repositories root varmail varwww'

for LV in $LVS
do
	umount -f /dev/kooluvg/$LV-nightly
	lvremove -f kooluvg/$LV-nightly
done

time {
	echo Start snapshot at `date`
	for service in $SERVICES; do /etc/init.d/$service stop; done 

	time for LV in $LVS
	do
		lvcreate -s -L 500m -n $LV-nightly kooluvg/$LV
		mkdir -pv /media/nightly/$LV
		mount -o ro,nouuid /dev/kooluvg/$LV-nightly /media/nightly/$LV
	done

	for service in $SERVICES; do /etc/init.d/$service start; done
}

(cd /media/nightly/repositories/ && rsync $RSYNCFLAGS -R svn/ trac/ repo/ HIP/ vpsland:/)
rsync $RSYNCFLAGS /media/nightly/varmail/ vpsland:/var/mail/
rsync $RSYNCFLAGS /media/nightly/varwww/  vpsland:/var/www/

exit 0;

for LV in $LVS
do
	umount -f /dev/kooluvg/$LV-nightly
	lvremove -f kooluvg/$LV-nightly
done

sghe...@hotmail.com

unread,
Nov 19, 2008, 7:03:47 PM11/19/08
to zfs-...@googlegroups.com
I'm sorry, I should have known not to send my nifty syntax highlighted
version of this shell script to google-groups ... It comes out out
rather distorted... Here's a better version (I hope)

Chris Samuel

unread,
Nov 20, 2008, 1:13:29 PM11/20/08
to zfs-...@googlegroups.com

----- "Ruben Wisniewski" <cyr...@gmail.com> wrote:


> If not, could you please provide a link =)

I've got one scripted up at home (at SC'08 in Austin at the
moment), I'll try and remember to post it wen I get back!

--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Rudd-O

unread,
Nov 21, 2008, 12:55:51 PM11/21/08
to zfs-fuse
Ha, detail by detail, exactly what I ended up doing myself!

warren

unread,
Nov 22, 2008, 9:15:47 AM11/22/08
to zfs-fuse
So, with regards to the issue of metadata corruption, does anyone else
have any ideas? I'm hoping to recover at least some of the data from
the array.. Also, has anyone else seen those GPT errors in your
syslog on zfs disks? Thanks!

Chris Samuel

unread,
Nov 30, 2008, 6:09:07 AM11/30/08
to zfs-...@googlegroups.com
On Fri, 21 Nov 2008 5:13:29 am Chris Samuel wrote:

> I've got one scripted up at home (at SC'08 in Austin at the
> moment), I'll try and remember to post it wen I get back!

OK, here we go, this also relies on my ZFS upstart script for Ubuntu to
start/stop ZFS (and mount/unmount) automatically which I've also attached.

They will both need editing for your systems!

cheers,
Chris


--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP

zfs
zsnapshot
signature.asc
Reply all
Reply to author
Forward
0 new messages