Broken zfs setup - log device lost. How to import the pool?

805 views
Skip to first unread message

Igor Hjelmstrom Vinhas Ribeiro

unread,
Jun 23, 2012, 1:04:29 PM6/23/12
to zfs-...@googlegroups.com
Hi!

   I lost a log device and a cache device of an (exported, offline) zpool version 23.

   I am unable to import it now:

root:~/ # zpool import
  pool: igorhvr-data
    id: 269256866566772131
 state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
        devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-6X
config:

        igorhvr-data                           UNAVAIL  missing device
          mirror-0                             ONLINE
            disk/by-id/dm-name-igorhvr-data-0  ONLINE
            disk/by-id/dm-name-igorhvr-data-1  ONLINE

        Additional devices are known to be part of this pool, though their
        exact configuration cannot be determined.
root:~/ # zpool import -f igorhvr-data
cannot import 'igorhvr-data': one or more devices is currently unavailable
        Destroy and re-create the pool from
        a backup source.

   zpool import -F fails with the same error message. Both data devices are ok and where not touched (except for the failed import attempt above). Supposedly the pool should be recoverable (since it is version 23), based on what I read.

   Is this true? Assuming it is possible to recover it, what method would be most advised? The options I am currently considering are:

    - Retrying the import under a recente FreeBSD version (perhaps a version-28 toolset would have a better chance of importing the pool);
    - Some (rather painful) variation of the method described at the bottom of http://forums.freebsd.org/showthread.php?t=18221 to build a (fake) log device that can be used to import the pool;
    - Remove the GUID SUM verification code (basically comment out this section):
        /* 
         * If the vdev guid sum doesn't match the uberblock, we have an
         * incomplete configuration.
         */
        if (mosconfig && type != SPA_IMPORT_ASSEMBLE &&
            rvd->vdev_guid_sum != ub->ub_guid_sum)
                return (spa_vdev_err(rvd, VDEV_AUX_BAD_GUID_SUM, ENXIO));


 and re-attempt the import (and hope the pool will be loaded in a degraded state, my guess).

   Any advice on what road to take and/or other ideas I could try?

Best Regards,
--
igorhvr

Igor Hjelmstrom Vinhas Ribeiro

unread,
Jun 23, 2012, 5:44:19 PM6/23/12
to zfs-...@googlegroups.com
All,

   Problem solved. I am writing this in case someone has the same
problem in the future trying to zpool import a pool without a log
device.

   After debugging zfs-fuse a bit (printf and zfs-fuse -n are my
friends, I guess) to understand better where the problem was, I ended
up doing the following change (in libzpool/vdev.c):

        /*
         * If this is a top-level vdev, initialize its metaslabs.
         */
        if (vd == vd->vdev_top && !vd->vdev_ishole &&
            (vd->vdev_ashift == 0 || vd->vdev_asize == 0 ||
-           vdev_metaslab_init(vd, 0) != 0))
-               vdev_set_state(vd, B_FALSE, VDEV_STATE_CANT_OPEN,
-                   VDEV_AUX_CORRUPT_DATA);
+            vdev_metaslab_init(vd, 0) != 0)) {
+         printf("\nFound corrupted top level vdev.");
+         //vdev_set_state(vd, B_FALSE, VDEV_STATE_CANT_OPEN,
+         //        VDEV_AUX_CORRUPT_DATA);
+       }

Basically I commented out the location that marked the state of the
log vdev as broken. With that done, the pool will zpool import -F
even without the log device, in an slightly odd state (missing was not
the name of the log device, this comes from VDEV_TYPE_MISSING):

root:~/ # zpool status igorhvr-data
pool: igorhvr-data
state: UNAVAIL
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
igorhvr-data UNAVAIL 0 0
0 insufficient replicas
mirror-0 DEGRADED 0 0 0
disk/by-id/dm-name-igorhvr-data-0 ONLINE 0 0 0
disk/by-id/dm-name-igorhvr-data-1 OFFLINE 0 0 0
missing-1 ONLINE 0 0 0
cache
mapper/cache UNAVAIL 0 0
0 cannot open

errors: No known data errors
root:~/ # ls /igorhvr-data
aasylum dt fileList.txt floating-asylum ildata u wqasylum www.iasylum.net

Still, despite the unavailable state, everything seems to be
working fine. In particular zfs send works well, so I was able to zfs
send my data to a sane place...

Regards,
--
igorhvr

sgheeren

unread,
Jun 23, 2012, 6:18:36 PM6/23/12
to zfs-...@googlegroups.com
Igor,

thanks a bunch for sharing this information. Perhaps it would be good to
share this information on the zfsonlinux list
(zfs-d...@zfsonlinux.org) too, as it is probably not specific to
zfs-fuse. (I don't know whether you acually ended up trying using
OpenSolaris/BSD/zfsonlinux ports).

This looks like rather am unintentional point of 'unrecoverable failure'
for ZFS, so it might need to be fixed. I'm not going to see about fixing
it (as far as I'm aware using log/cache devices is rather uncommon using
fuse-based zfs). I have the impression that zfsonlinux is actively being
used in quite large hardware scales (in fact, LLNL appears to have
developed the linux port for precisely that reason) so this will be more
relevant to their port, in a way.

Looks like quite a nice feat of troubleshooting you have achieved there
anyways. Remember to scrub your data :)

Seth
Reply all
Reply to author
Forward
Message has been deleted
0 new messages