On Sáb, 2008-11-15 at 19:32 -0800, warren wrote:
> During a power outage a while ago my system went down. I was unable
> to bring my zfs pool back online. I've upgraded to the most recent
> version of zfs, but still haven't had any luck. I posted some
> messages about this a while ago, but had to sideline the recovery for
> a while, but am now back at it to see if I can get some of the data
> back.
I just reread your old thread and saw that you were running zfs-fuse
0.4.0 beta 1.
That version is very old, and it had a serious bug that can cause
corruption during power outages, which is very likely the bug that you
hit.
> I am running on Ubuntu Gutsty Gibson, and have 6 500GB drives in a
> RAIDZ1 pool.
>
> If I try to import, I get the following error. Any thoughts? Thanks
> in advance!
I'm going to need you to do a few steps. Please make sure you capture
the output of the commands below, so that I can take a look in case
something fails.
1) Apply the patch that I've attached on top of the latest trunk
version:
$ cd zfs-fuse
$ patch -p1 < trunk-uberblock.patch
2) Recompile zfs-fuse (in debug mode):
$ cd src
$ scons debug=2
3) Find out the latest txg number of your pool. You can do this in 2
ways.
The preferred way is to use "zdb -u", like this:
(make sure zfs-fuse is running first)
$ zdb -u -e tank1
Or, if this fails, you can use "zdb -l":
$ zdb -l /dev/sda
4) The zdb commands above will report a "txg=<xxxxx>" number.
Now you'll need to try to see if you can read the pool using *the
previous* txg number, like this:
$ zdb -cv -t <yyyyy> -e tank1
(where yyyyy is xxxxx-1)
So, for example, if the txg number that we found out in the step 3)
above is 1343356, you would try txg number 1343355, like this:
$ zdb -cv -t 1343355 -e tank1
5) At this point, if the above command succeeds and checksums
everything, it means the pool is recoverable (but we haven't recovered
it yet).
Anyway, a lot of things could have failed by now, so please let me know
how far you've gone (and what the output of those commands were), so
that I can help you further.
Regards,
Ricardo
Sorry, I forgot to mention that you should also install the patched code
with "scons debug=2 install", otherwise one of the zdb commands in a
step below will fail.
- Ricardo
Is the pool imported at this point? (you can check with
'zpool status').
If so, can you do 'zpool export tank1' and try running zdb
again?
- Ricardo
On Dom, 2008-11-16 at 18:49 -0800, warren wrote:
> root@urquan:~/zfs/trunk/src/cmd/zdb# zdb -cv -t 1343355 -e tank1
> zdb: can't open tank1: No such device or address
I suspect this might take a few steps before we can recover the data.
I think exporting a pool *might* have caused some txgs to be written
(though I'm not sure about that), so we might have to check the txg
numbers again.
Can you provide me the output of these steps:
1) zdb -l /dev/sda
(This one will hopefully work now:)
2) zdb -u -e tank1 debug=on
3) zdb -u -t 1343355 -e tank1 debug=on
4) zdb -cv -t 1343355 -e tank1 debug=on
Thanks,
Ricardo
warren wrote:
> root@urquan:~/zfs/trunk/src# zdb -l /dev/sda
> LABEL 3
> version=3
> name='tank1'
> vdev_tree
> type='raidz'
> id=0
> children[0]
> path='/dev/sda'
> whole_disk=0
> children[1]
> path='/dev/sdb'
> whole_disk=0
> children[2]
> path='/dev/sdc'
> whole_disk=0
> children[3]
> path='/dev/sdd'
> whole_disk=0
> children[4]
> path='/dev/sde'
> whole_disk=0
> children[5]
> path='/dev/sdf'
> whole_disk=0
warren wrote:
> I'm using the drives as raw disks, and not using LVM or anything like
> that.
Does whole_disk shouldn't be 1 if you use the whole drive for ZFS?
Greetings Ruben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFJIzqTA71SGzTeS3ARAlcbAKCEw4UahALyOF23424JuEGYZ4HcWwCeNAGb
PX+2f1i/BzfjGQkRb0YT320=
=PYnK
-----END PGP SIGNATURE-----
Yes, ZFS does not need disks to be partitioned. I have 4 disks in my
zpool, two of them are partitioned and two are using the raw disk. I
don't have any GPT errors in my dmesg, but it wouldn't surprise me if they
occurred normally. That isn't an authoritative answer, by the way, since
I know nothing about ZFS's on-disk format, just that it *could* be
incompatible with what the kernel expects.
As a general note, I have decided to give ZFS partitions instead of the
full raw disk. It allows me some flexibility to do a few things:
- Linux software RAID1 across a ~100MB partition on each disk for /boot
- A 1GB swap partition on each disk, with equal priorities so the kernel
stripes across them
- (Note some people have pointed out that the swap reliability is
actually quite LOW now, basically being in a n-disk RAID0. Agreed, but
it's sure fast! Caveat emptor.)
- Partition table is complete and correct so as to not confuse the OS
- Doesn't waste much space
From my working zpool: looks like whole_disk = 0. Maybe that flag means
something else? Good catch regardless :)
# zpool status
pool: tank
state: ONLINE
scrub: none requested
config:
NAME
STATE READ WRITE CKSUM
tank
ONLINE 0 0 0
disk/by-id/scsi-SATA_WDC_WD5000AAKS-_WD-WCAPW2878855
ONLINE 0 0 0
disk/by-id/scsi-SATA_WD1000FYPS-12ZKWCASJ0330645-part2
ONLINE 0 0 0
disk/by-id/scsi-SATA_WD1000FYPS-12ZKWCASJ0330721-part2
ONLINE 0 0 0
disk/by-id/scsi-SATA_WDC_WD5000AAKS-_WD-WCAPW4431098-part2
ONLINE 0 0 0
errors: No known data errors
# zdb -l /dev/disk/by-id/scsi-SATA_WDC_WD5000AAKS-_WD-WCAPW2878855
--------------------------------------------
LABEL 0
--------------------------------------------
version=10
name='tank'
state=0
txg=2641731
pool_guid=10396150648929218294
hostid=1329989442
hostname=''
top_guid=372537301972830197
guid=372537301972830197
vdev_tree
type='disk'
id=0
guid=372537301972830197
path='/dev/disk/by-id/scsi-SATA_WDC_WD5000AAKS-_WD-WCAPW2878855'
whole_disk=0
metaslab_array=18
metaslab_shift=32
ashift=9
asize=500103118848
is_log=0
DTL=191
--------------------------------------------
LABEL 1
--------------------------------------------
version=10
name='tank'
state=0
txg=2641731
pool_guid=10396150648929218294
hostid=1329989442
hostname=''
top_guid=372537301972830197
guid=372537301972830197
vdev_tree
type='disk'
id=0
guid=372537301972830197
path='/dev/disk/by-id/scsi-SATA_WDC_WD5000AAKS-_WD-WCAPW2878855'
whole_disk=0
metaslab_array=18
metaslab_shift=32
ashift=9
asize=500103118848
is_log=0
DTL=191
--------------------------------------------
LABEL 2
--------------------------------------------
version=10
name='tank'
state=0
txg=2641731
pool_guid=10396150648929218294
hostid=1329989442
hostname=''
top_guid=372537301972830197
guid=372537301972830197
vdev_tree
type='disk'
id=0
guid=372537301972830197
path='/dev/disk/by-id/scsi-SATA_WDC_WD5000AAKS-_WD-WCAPW2878855'
whole_disk=0
metaslab_array=18
metaslab_shift=32
ashift=9
asize=500103118848
is_log=0
DTL=191
--------------------------------------------
LABEL 3
--------------------------------------------
version=10
name='tank'
state=0
txg=2641731
pool_guid=10396150648929218294
hostid=1329989442
hostname=''
top_guid=372537301972830197
guid=372537301972830197
vdev_tree
type='disk'
id=0
guid=372537301972830197
path='/dev/disk/by-id/scsi-SATA_WDC_WD5000AAKS-_WD-WCAPW2878855'
whole_disk=0
metaslab_array=18
metaslab_shift=32
ashift=9
asize=500103118848
is_log=0
DTL=191
Does this mean you have an initrd that can do ZFS boot? I am looking to
set up a new server and was hoping to do this, but would like to hear
some success stories before tackling it. Can you share your technique?
Thanks,
Omen
--
A flashlight is a case for holding dead batteries.
Sorry, no, I didn't mean to imply that. My /boot is ext2 and I have an
ext3 root filesystem with my OS installed. ZFS is used for bulk data
storage only.
Besides, zfs-fuse is a bit slow to run root off of. I'd love to do it
too, but I've resigned myself to just rsync'ing a copy of it from ext3
root to a zfs filesystem in a cron script (and snapshotting it that way).
Jonathan Schmidt wrote:
> Besides, zfs-fuse is a bit slow to run root off of. I'd love to do it
> too, but I've resigned myself to just rsync'ing a copy of it from ext3
> root to a zfs filesystem in a cron script (and snapshotting it that way).
You write my thoughts down (about speed). I have a smiliar solution, but
with XFS instead of ext2/3. Do you use you're own script to backup, do
it support snapshots in ZFS?
If not, could you please provide a link =)
Greetings Ruben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFJJJu5A71SGzTeS3ARAidqAJ9lEeb0+ZolastJ7ZBIM45TuPWD5QCfc6Gi
KZ8F2XH11vVM1eNSE/SfqiU=
=nnvg
-----END PGP SIGNATURE-----
The script was a bit of a lie (sorry). I haven't actually written the
script yet, it's just a few manual commands I type. But I fully *plan* to
script it up, and I doubt it'd be more than 10 minutes of effort, but it's
been tough to find those 10 minutes for some reason. Once I get it
running I'll post it for everyone.
#!/bin/bash set -o xtrace BWLIMIT=0 RSYNCFLAGS="--bwlimit=$BWLIMIT --delete -zhxDPavilFHy --stats --detect-renamed" SERVICES='dovecot postfix apache2 imapproxy' LVS='home repositories root varmail varwww' for LV in $LVS do umount -f /dev/kooluvg/$LV-nightly lvremove -f kooluvg/$LV-nightly done time { echo Start snapshot at `date` for service in $SERVICES; do /etc/init.d/$service stop; done time for LV in $LVS do lvcreate -s -L 500m -n $LV-nightly kooluvg/$LV mkdir -pv /media/nightly/$LV mount -o ro,nouuid /dev/kooluvg/$LV-nightly /media/nightly/$LV done for service in $SERVICES; do /etc/init.d/$service start; done } (cd /media/nightly/repositories/ && rsync $RSYNCFLAGS -R svn/ trac/ repo/ HIP/ vpsland:/) rsync $RSYNCFLAGS /media/nightly/varmail/ vpsland:/var/mail/ rsync $RSYNCFLAGS /media/nightly/varwww/ vpsland:/var/www/ exit 0; for LV in $LVS do umount -f /dev/kooluvg/$LV-nightly lvremove -f kooluvg/$LV-nightly done
> If not, could you please provide a link =)
I've got one scripted up at home (at SC'08 in Austin at the
moment), I'll try and remember to post it wen I get back!
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
> I've got one scripted up at home (at SC'08 in Austin at the
> moment), I'll try and remember to post it wen I get back!
OK, here we go, this also relies on my ZFS upstart script for Ubuntu to
start/stop ZFS (and mount/unmount) automatically which I've also attached.
They will both need editing for your systems!
cheers,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP