Your scenario seems to work here.
I think your devices got renamed and zfs-fuse is getting confused
because of that.
In particular, my guess is that the
device /dev/disk/by-id/ata-WDC_WD10EACS-00ZJB0_WD-WCASJ1128671-part1 is
in fact refers to the same partition as one of the other devices
(/dev/sdc1, /dev/sdd1 or /dev/sde1), and therefore zfs-fuse may be
trying to open the same partition twice.
Anyway, can you try doing "zpool import -d /dev/disk/by-id tank"?
If my guess is correct, that should work fine.
Failing that, can you give me the output of:
1) strace zpool import tank -d /dev/disk/by-id 2>&1 | grep open
2) strace zpool import tank -d /dev 2>&1 | grep open
3) strace zpool import tank 2>&1 | grep open
Thanks,
Ricardo
In fact, forget the above commands - this is going to be a bit more
complicated.
If the "zpool import -d /dev/disk/by-id tank" command didn't work,
you're going to need to do this as root:
$ cd /proc/`pgrep zfs-fuse`/task
$ for i in *; do sh -c "strace -p $i 2> /tmp/zfs_debug.$i &"; done
$ zpool import -d /dev/disk/by-id tank
$ zpool import -d /dev tank
$ zpool import tank
$ killall strace
$ (send me the output of this:) grep open /tmp/zfs_debug.*
$ rm /tmp/zfs_debug.*
Thanks,
Ricardo
I'm seeing some EBUSY's on sdc, but only when it's opening the device,
not the partition. Seems to love urandom too.
-Charles
On Sex, 2008-10-03 at 10:25 -0500, GenPFault wrote:
> I'm seeing some EBUSY's on sdc, but only when it's opening the device,
> not the partition. Seems to love urandom too.
The output looks normal, so it seems the problem is not when opening the
devices. It seems that another system call is failing.
Can you send me all the /tmp/zfs_debug.* files so that I can analyze
them?
Thanks,
Ricardo
Also, can you do the following:
$ cd $ZFS_FUSE_SRC/src/cmd/zdb
$ scons -u debug=2
$ (as root) ./zdb -p /dev/disk/by-id -e tank debug=on
The output of the last command will greatly help to pinpoint the
problem.
Thanks,
Ricardo
The first zdb invocation complained about multiple matching pools, so
I tried the guid(?) which generated more output.
I don't think I mentioned it before but I was using either 0.5.0 or
trunk before the zdb test; now I'm definitely using trunk.
-Charles
Thanks, that one was much more helpful.
> I don't think I mentioned it before but I was using either 0.5.0 or
> trunk before the zdb test; now I'm definitely using trunk.
You've ran into a strange problem. It seems ZFS is getting an error
about a missing device when reading a log record.
AFAICT, this may be happening because you have a missing device and ZFS
is not being able to reconstruct the block from parity.
This could also happen if you had a slog device in your pool and now
it's missing (although this doesn't seem likely from what I read in your
original post).
Since ZFS is unable to replay the log, it refuses to import the pool.
Unfortunately it's not very easy to determine if this is a bug or not
without a more detailed investigation.
Anyway, if you want to try to force ZFS into importing your pool, you
can try modifying the file $ZFS_FUSE_SRC/src/lib/libzpool/zil.c, inside
function zil_check_log_chain(), around line 568-570 you should see this
piece of code:
error = zil_read_log_block(zilog, &blk, &abuf);
if (error)
break;
You can change that piece of code into this:
(void) zil_read_log_block(zilog, &blk, &abuf);
error = 0;
You can then recompile and reinstall with "scons" and "scons install",
and you should be able to import your pool.
However, I recommend that you change that piece of code back as it were
and recompile/reinstall after importing your pool, because ignoring
errors when replaying the log is generally not a good idea.
HTH,
Ricardo
You should change this:
On Sex, 2008-10-03 at 19:04 +0100, Ricardo M. Correia wrote:
> error = zil_read_log_block(zilog, &blk, &abuf);
> if (error)
> break;
Into this:
> error = zil_read_log_block(zilog, &blk, &abuf);
> if (error) {
> error = 0;
> break;
> }
(Otherwise you'll probably get another error).
Cheers,
Ricardo
Also, while searching over lunch I came across this upstream bug:
http://bugs.opensolaris.org/view_bug.do?bug_id=6736213
The symptoms look similar to the problem I was experiencing, although
guid import did not work in my case.
-Charles
On Ter, 2008-11-11 at 01:13 -0800, Matt B wrote:
> I'm pretty sure this is the same situation, I've got a raidZ with one
> disk missing, which should be importable, but zpool import refuses to.
> I've tried importing from /dev/disk/by-id and i've tried the patch
> posted but neither of these works.
> Can someone give me some strace/zdb commands that could help diagnose
> the problem?
Ok, I think this will help:
1) Compile zdb in debug mode:
$ cd $ZFS_FUSE_SRC/src/cmd/zdb
$ scons -u debug=2
2) Try to import the pool with zdb and with debug enabled:
$ (as root) ./zdb -p /dev/disk/by-id -e tank debug=on
Can you tell me the output of that?
Thanks,
Ricardo
Thanks for that.
Unfortunately, it didn't help much :(
If you can do the following steps, it will help a lot:
$ cd $ZFS_FUSE/src/cmd/zdb
$ scons -u debug=3
$ touch TRACE
$ mkdir trace
$ ./zdb -p /dev/disk/by-id -e <pool-name>
$ ./trace-parse.py ./zdb
$ tar -jcvf trace-data.tar.bz2 trace/
$ rm -rf TRACE trace
Of course, I would need you to send me the trace-data.tar.bz2 file for
analysis..
Thanks,
Ricardo