[zfs-fuse] 'zfs get all' core dumps if /mypool/a is bind mounted

2 views
Skip to first unread message

devsk

unread,
May 21, 2010, 11:49:41 PM5/21/10
to zfs-fuse
Simple test case:

1. Create a pool called mypool. Let's assume its mounted at /mypool
2. Create a filesystem mypool/a and mount it at /mypool/a
3. Bind mount /mypool/a at /mnt/temp with 'mkdir -p /mnt/temp &&
mount -o bind /mypool/a /mnt/temp'
4. 'zfs get all mypool/a' dumps core.
5. umount /mnt/temp.
6. 'zfs get all mypool/a' does not dump core.

--
To post to this group, send email to zfs-...@googlegroups.com
To visit our Web site, click on http://zfs-fuse.net/

sgheeren

unread,
May 22, 2010, 5:44:18 AM5/22/10
to zfs-...@googlegroups.com
On 05/22/2010 05:49 AM, devsk wrote:
> Simple test case:
>
> 1. Create a pool called mypool. Let's assume its mounted at /mypool
> 2. Create a filesystem mypool/a and mount it at /mypool/a
> 3. Bind mount /mypool/a at /mnt/temp with 'mkdir -p /mnt/temp &&
> mount -o bind /mypool/a /mnt/temp'
> 4. 'zfs get all mypool/a' dumps core.
> 5. umount /mnt/temp.
> 6. 'zfs get all mypool/a' does not dump core.
>
filed as http://zfs-fuse.net/issues/44

sgheeren

unread,
May 22, 2010, 6:13:51 AM5/22/10
to zfs-...@googlegroups.com
On 05/22/2010 11:44 AM, sgheeren wrote:
On 05/22/2010 05:49 AM, devsk wrote:
  
Simple test case:

1. Create a pool called mypool. Let's assume its mounted at /mypool
2. Create a filesystem mypool/a and mount it at /mypool/a
3. Bind mount /mypool/a  at /mnt/temp with 'mkdir -p /mnt/temp &&
mount -o bind /mypool/a /mnt/temp'
4. 'zfs get all mypool/a' dumps core.
5. umount /mnt/temp.
6. 'zfs get all mypool/a' does not dump core.
  
    
filed as http://zfs-fuse.net/issues/44

It turns out that this was fixed as issue #42. Closing because of duplicate bug.

Proof: reproducible with only _beta or _beta2. Symptoms disappear after a 

git cherry-pick --no-commit cc745d5b8426b9cf5ff486fcfba6917e84269c58
Would is most interesting to me is that the failure mode (debug=2) was the wellknown 

zfs: lib/libavl/avl.c:634: avl_add: Assertion `0' failed.

Could it be that earlier 'unresolved' issues (that tripped that assert) were caused by the same? I'll check for existing tickets and ask for retest when appropriate

Emmanuel Anne

unread,
May 22, 2010, 7:48:42 AM5/22/10
to zfs-...@googlegroups.com
Don't look for anything that complicated here.
It's simply that when you mount a fs with -o bind, it appears with exactly the same fs type and options in /proc/mounts (while the command mount still shows it as a bind fs).
Since zfs-fuse uses /proc/mounts to find its mounts, it can't say which one it owns.
Maybe it should use /etc/mtab instead ? I thought these 2 files were supposed to be the same files...
Anyway for now I just committed a patch which ignores any other fs with the same type options and mount point, it fixes the problem.

Now I have noticed a smaller issue with this behavior :
with an fs mounted with the -o bind option as described above
if you call zfs umount -a before killing zfs-fuse everything works normally, the bound fs is umounted too.
But if instead you kill directly zfs-fuse without calling zfs umount -a before, in this case it umounts only the fs it knows internally and the bound fs is left mounted. The consequence is that you get "transport endpoint is not connected" when trying to access the directory after that, and if you try to relaunch zfs-fuse it will be unable to mount this fs (because its directory is in use).

Anyway since calling zfs umount -a fixes it, it's not really an issue, and it's very normal since the zfs-fuse daemon can't know about the bound fs.

2010/5/22 sgheeren <sghe...@hotmail.com>



--
zfs-fuse git repository : http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary

sgheeren

unread,
May 22, 2010, 9:09:45 AM5/22/10
to zfs-...@googlegroups.com
On 05/22/2010 01:48 PM, Emmanuel Anne wrote:
> Don't look for anything that complicated here.
> It's simply that when you mount a fs with -o bind, it appears with exactly
> the same fs type and options in /proc/mounts (while the command mount still
> shows it as a bind fs).
> Since zfs-fuse uses /proc/mounts to find its mounts, it can't say which one
> it owns.
> Maybe it should use /etc/mtab instead ? I thought these 2 files were
> supposed to be the same files...
> Anyway for now I just committed a patch which ignores any other fs with the
> same type options and mount point, it fixes the problem.
>
That might all be true (I understood that this kind of effect was going
on), but regardless the problem was fixable using the same mentioned
patch - which means that the eventual _cause_ of error wasn't the
duplicate mount point as such but the fact that two IOCTLS were being
issued in a race. With that effect removed, we need not change a thing.
I've learned over the years to fix the problem at hand, not any
side-catch - especially when aiming for stability.

So, unless you found out - that the wrong mountpoint is being reported?
In that case: file a bug! (feel free to fix it too, but from your
description I'm not too sure that simply ignoring the second/later
mountpoints selects the correct one by definition. I'd have to look at
that some more)
> Now I have noticed a smaller issue with this behavior :
> with an fs mounted with the -o bind option as described above
> if you call zfs umount -a before killing zfs-fuse everything works normally,
> the bound fs is umounted too.
> But if instead you kill directly zfs-fuse without calling zfs umount -a
> before, in this case it umounts only the fs it knows internally and the
> bound fs is left mounted. The consequence is that you get "transport
> endpoint is not connected" when trying to access the directory after that,
> and if you try to relaunch zfs-fuse it will be unable to mount this fs
> (because its directory is in use).
>
> Anyway since calling zfs umount -a fixes it, it's not really an issue, and
> it's very normal since the zfs-fuse daemon can't know about the bound fs.
>
Agreed by the same logic as my own above

Emmanuel Anne

unread,
May 22, 2010, 9:41:10 AM5/22/10
to zfs-...@googlegroups.com
Yeah saw that. Except that it does not make sense : the zfs command uses only 1 thread, so it can't create a race condition on threads alone.
More likely, it was a side effect because somme assertions were disabled in the optimized build (in this case the error is not always reported at the right place).
Anyway the failed assertion (0) was totally correct, it was because it was trying to add a 2nd element which was already here on an avl tree. Nothing important here.
And you can get the exact same error message (assertion 0 failed) by using this git version :
git checkout 88105bb84206e257f5507ce96f4ce7c9aee30e71
which is the commit just before your fix for the threads.

But anyway if you want to dig this further, do so, anyway the commit is on my branch, you are not obliged to merge it !

sgheeren

unread,
May 22, 2010, 10:13:22 AM5/22/10
to zfs-...@googlegroups.com
On 05/22/2010 03:41 PM, Emmanuel Anne wrote:
> Yeah saw that. Except that it does not make sense : the zfs command uses
> only 1 thread, so it can't create a race condition on threads alone.
>
Well mind the subtlety: I am not saying anything was _fixed_ in the
latest version. I'm saying: cannot reproduce with latest testing branch.
It's close, but not equivalent.

I found out about that particular patch (cc745d5b84) by systematically
bisecting the bug (the disappearing edge in the repo).
As shown, I cherry-picked just that commit and the problem vanishes. Hmmm
I'm pretty sure that if I bisected for the appearing edge I'd find 8b6b9
(introducing the ioctl threads). No time though.

I _would_ agree that it would be a good idea to try to analyse the root
cause for assert a bit further instead of just stating "cannot be
reproduced with the latest version"More likely, it was a side effect
because somme assertions were disabled in
> the optimized build (in this case the error is not always reported at the
> right place).
>

Don't know what you are getting at. I use debug=2 exclusively when
reproducing bugs. debug=2 implies -O0, so no optimizations _and_ with
asserts.

> Anyway the failed assertion (0) was totally correct, it was because it was

> trying to add a 2nd element which was already here on an avl tree. Nothing
> important here.
> And you can get the exact same error message (assertion 0 failed) by using
> this git version :
> git checkout 88105bb84206e257f5507ce96f4ce7c9aee30e71
> which is the commit just before your fix for the threads.
>

I know that. See above
This simply means that hitting the avl assert is related to the
threading bug. I'm not suggesting _how_ it was related. It simply is
related.

> But anyway if you want to dig this further, do so, anyway the commit is on
> my branch, you are not obliged to merge it !
>

I will have a look. Currently trying to reproduce other failures from
the tracker. Reopened #44 awaiting better analysis, then

devsk

unread,
May 22, 2010, 2:35:04 PM5/22/10
to zfs-fuse
Oh my! Lot of discussion around this one while I was asleep...:-)

So, what's the final take? How far are the beta2 and Emmanuel's
branch? Is there a git command (I am a git newbie) that I can use to
diff the two?

Bound mount point not getting unmounted if zfs-fuse aborts, should be
addressed. I think Linux behavior is correct and Solaris is being anal
about it. It should just allow mounting over the non-empty directory
and then the original bound mount point should start working like
before without any umounting/mounting.

Are there other potential issues because of bind mounts which we
haven't found yet?I use bind mounts for sharing data between my root
and the livecd environment. So, I don't really want the whole thing
blow up on me just because I used bind mounts.

-devsk
> > 2010/5/22 sgheeren <sghee...@hotmail.com>
> >>> 2010/5/22 sgheeren <sghee...@hotmail.com>
>
> >>>>  On 05/22/2010 11:44 AM, sgheeren wrote:
>
> >>>> On 05/22/2010 05:49 AM, devsk wrote:
>
> >>>>  Simple test case:
>
> >>>> 1. Create a pool called mypool. Let's assume its mounted at /mypool
> >>>> 2. Create a filesystem mypool/a and mount it at /mypool/a
> >>>> 3. Bind mount /mypool/a  at /mnt/temp with 'mkdir -p /mnt/temp &&
> >>>> mount -o bind /mypool/a /mnt/temp'
> >>>> 4. 'zfs get all mypool/a' dumps core.
> >>>> 5. umount /mnt/temp.
> >>>> 6. 'zfs get all mypool/a' does not dump core.
>
> >>>>  filed ashttp://zfs-fuse.net/issues/44
>
> >>>> It turns out that this was fixed as issue #42. Closing because of
>
> >> duplicate bug.
>
> >>>> Proof: reproducible with only _beta or _beta2. Symptoms disappear after
>
> >> a
>
> >>>> git cherry-pick --no-commit cc745d5b8426b9cf5ff486fcfba6917e84269c58
>
> >>>> Would is most interesting to me is that the failure mode (debug=2) was
>
> >> the wellknown
>
> >>>> zfs: lib/libavl/avl.c:634: avl_add: Assertion `0' failed.
>
> >>>>  Could it be that earlier 'unresolved' issues (that tripped that assert)
> >>>> were caused by the same? I'll check for existing tickets and ask for
>
> >> retest
>
> >>>> when appropriate
>
> >>>>   --
> >>>> To post to this group, send email to zfs-...@googlegroups.com
> >>>> To visit our Web site, click onhttp://zfs-fuse.net/
>
> >> --
> >> To post to this group, send email to zfs-...@googlegroups.com
> >> To visit our Web site, click onhttp://zfs-fuse.net/
>
> --
> To post to this group, send email to zfs-...@googlegroups.com
> To visit our Web site, click onhttp://zfs-fuse.net/

sgheeren

unread,
May 22, 2010, 3:06:25 PM5/22/10
to zfs-...@googlegroups.com
On 05/22/2010 08:35 PM, devsk wrote:
> Oh my! Lot of discussion around this one while I was asleep...:-)
>
> So, what's the final take? How far are the beta2 and Emmanuel's
> branch?
The way I see it, it is still under investigation.
I personally could not reproduce with latest testing (will tag a _beta3
within the next half-hour). You could try your luck.

Emmanuel is right in pointing out some more peculiarities in the case of
bind mounts. I have no final take yet.
> Is there a git command (I am a git newbie) that I can use to
> diff the two?
>

I am no longer a git newbie, yet I find it very hard to get a diff
between upstream branches. Here's my trick:

$ git diff zfs-fuse.net/testing rainemu/master -- src

of course that is assuming zfs-fuse.net and rainemu remotes setup and
fetched:

$ git remote -v
rainemu http://rainemu.swishparty.co.uk/git/zfs (fetch)
zfs-fuse.net http://git.zfs-fuse.net/official (fetch)

Emmanuel Anne

unread,
May 23, 2010, 3:45:14 AM5/23/10
to zfs-...@googlegroups.com
The part I don't understand is taht I can't reproduce what you describe.
Trying :
git checkout 139f955b1aaffa6e677d35677fb0ba96369a8ba1
to return before my fix
and then
git cherry-pick --no-commit cc745d5b8426b9cf5ff486fcfba6917e84269c58
does nothing at all, the bug still happens (but it's the zfs command which crashes on the assert, the zfs-fuse daemon is not affected).

Humm... are yiou sure you have the same format for /proc/mounts ?
Otherwise there is something I must have missed somewhere !

2010/5/22 sgheeren <sghe...@hotmail.com>

sgheeren

unread,
May 23, 2010, 5:29:55 AM5/23/10
to zfs-...@googlegroups.com
On 05/23/2010 09:45 AM, Emmanuel Anne wrote:
> The part I don't understand is taht I can't reproduce what you describe.
> Trying :
> git checkout 139f955b1aaffa6e677d35677fb0ba96369a8ba1
> to return before my fix
> and then
> git cherry-pick --no-commit cc745d5b8426b9cf5ff486fcfba6917e84269c58
> does nothing at all, the bug still happens (but it's the zfs command which
> crashes on the assert, the zfs-fuse daemon is not affected).
>
> Humm... are yiou sure you have the same format for /proc/mounts ?
> Otherwise there is something I must have missed somewhere !
>
This is exactly why I re-opened the bug. This could somehow be
'accidental' on my system; the problem really might still be there _with
or without_ the cherry pick.
Reply all
Reply to author
Forward
0 new messages