[RFC] shared subtrees

0 views
Skip to first unread message

Al Viro

unread,
Jan 13, 2005, 5:33:12 PM1/13/05
to linux-...@vger.kernel.org, linux-...@vger.kernel.org
[apologies for delay - there'd been lots of unrelated crap lately]
======================================================================
NOTE: as far as I'm concerned, that's a beginning of VFS-2.7 branch.
All that work will stay in a separate tree, with gradual merge back
into 2.6 once the things start settling down.
======================================================================

OK, here comes the first draft of proposed semantics for subtree
sharing. What we want is being able to propagate events between
the parts of mount trees. Below is a description of what I think
might be a workable semantics; it does *NOT* describe the data
structures I would consider final and there are considerable
areas where we still need to figure out the right behaviour.

Let's start with introducing a notion of propagation node; I consider
it only as a convenient way to describe the desired behaviour - it
almost certainly won't be a data structure in the final variant.

1) each p-node corresponds to a group of 1 or more vfsmounts.
2) there is at most 1 p-node containing a given vfsmount.
3) each p-node owns a possibly empty set of p-nodes and vfsmounts
4) no p-node or vfsmount can be owned by more than one p-node
5) only vfsmounts that are not contained in any p-nodes might be owned.
6) no p-node can own (directly or via intermediates) itself (i.e. the
graph of p-node ownership is a forest).

These guys define propagation:
a) if vfsmounts A and B are contained in the same p-node, events
propagate from A to B
b) if vfsmount A is contained in p-node p, vfsmount B is contained
in p-node q and p owns q, events propagate from A to B
c) if vfsmount A is contained in p-node p and vfsmount B is owned
by p, events propagate from A to B
d) propagation is transitive: if events propagate from A to B and
from B to C, they propagate from A to C.

In other words, members of the same p-node are equivalent and events anywhere
in p-node are propagated to all its slaves. Note that not any transitive
relation can be represented that way; it has to satisfy the following
condition:
* A->C and B->C => A->B or B->A
All propagation setups we are going to deal with will satisfy that condition.


How do we set them up?

* we can mark a subtree sharable. Every vfsmount in the subtree
that is not already in some p-node gets a single-element p-node of its
own.
* we can mark a subtree slave. That removes all vfsmounts in
the subtree from their p-nodes and makes them owned by said p-nodes.
p-nodes that became empty will disappear and everything they used to
own will be repossessed by their owners (if any).
* we can mark a subtree private. Same as above, but followed
by taking all vfsmounts in our subtree and making them *not* owned
by anybody.


Of course, namespace operations (clone, mount, etc.) affect that structure
and are affected by it (that's what it's for, after all).

1. CLONE_NS

That one is simple - we copy vfsmounts as usual
* if vfsmount A is contained in p-node p, then copy of A goes into
the same p-node
* if A is owned by p, then copy of A is also owned by p
* no new p-nodes are created.

2. mount

We have a new vfsmount A and want to attach it to mountpoint somewhere in
vfsmount B. If B does not belong to any p-node, everything is as usual; A
doesn't become a member or slave of any p-node and is simply attached to B.

If B belongs to a p-node p, consider all vfsmounts B1,...,Bn that get events
propagated from B and all p-nodes p1,...,pk that contain them.
* A gets cloned into n copies and these copies (A1,...,An) are attached
to corresponding points in B1,...,Bn.
* k new p-nodes (q1,...,qk) are created
* Ai is contained in qj <=> Bi is contained in qj
* qi owns qj <=> pi owns pj
* qi owns Aj <=> pi owns Bj

In other words, mount is propagated and propagation among the new vfsmounts
mirrors the propagation between mountpoints.

3. bind

bind works almost identically to mount; new vfsmount is created for every
place that gets propagation from mountpoint and propagation is set up to
mirror that between the mountpoints. However, there is a difference: unlike
the case of mount, vfsmount we were going to attach (say it, A) has some
history - it was created as a copy of some pre-existing vfsmount V. And
that's where the things get interesting:
* if V is contained in some p-node p, A is placed into the same
p-node. That may require merging one of the p-nodes we'd just created
with p (that will be the counterpart of the p-node containing the mountpoint).
* if V is owned by some p-node p, then A (or p-node containing A)
becomes owned by p.

4. rbind
rbind is recursive bind, so we just do binds for everything we had in
a subtree we are binding in obvious order; everything is described
by previous case.

5. umount
umount everything that gets propagation from victim.

6. mount --move
prohibited if what we are moving is in some p-node, otherwise we move
as usual to intended mountpoint and create copies for everything that
gets propagation from there (as we would do for rbind).

7. pivot_root
similar to --move


How to use all that stuff?

Example 1:
mount --bind /floppy /floppy
mount --make-shared /floppy
mount --rbind / /jail
<finish setting the jail up, umount whatever doesn't belong there,
etc.>
mount --make-slave /jail/floppy
and we get /floppy in chroot jail slave to /floppy outside - if somebody
(u)mounts stuff on it, that will get propagated to jail.

Example 2:
same, but with the namespaces instead of chroots.

Example 3:
same subtree visible (and kept in sync) in several places - just
mark it shared and rbind; it will stay in sync

Example 4:
have some daemon control the stuff in a subtree sharable with many
namespaces, chroots, etc. without any magic:
mark that subtree sharable
clone with CLONE_NS
parent marks that subtree slave
child keeps working on the tree in its private namespace.

There's a lot more applications of the same idea, of course - AFS and its
ilk, autofs-like stuff (with proper handling of MNT_EXPIRE and traps - see
below), etc., etc.

Areas where we still have to figure things out:

* MNT_EXPIRE handling done right; there are some fun ideas in that area,
but they still need to be done in more details (basically, lazy expire -
mount in a slave expiring into a trap that would clone a copy from master
when stepped upon).

* traps and their sharing. What we want is an ability to use the master/slave
mechanisms for *all* cross-namespace/cross-chroot issues in autofs, so that
daemon would only need to work with the namespace of its own and no nothing
about other instances.

* implementation ;-) It certainly looks reasonably easy to do; memory
demands are linear by number of vfsmounts involved and locking appears
to be solvable.

* whatever issues that might come up from MVFS demands (and AFS, and...)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Mike Waychison

unread,
Jan 13, 2005, 7:03:37 PM1/13/05
to Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Just a few comments below. Some of this will take time to digest ;)

How is (c) different from (a)? Is there a distinction between
'containing' and 'owning' here?

> d) propagation is transitive: if events propagate from A to B and
> from B to C, they propagate from A to C.
>
> In other words, members of the same p-node are equivalent and events anywhere
> in p-node are propagated to all its slaves. Note that not any transitive
> relation can be represented that way; it has to satisfy the following
> condition:
> * A->C and B->C => A->B or B->A
> All propagation setups we are going to deal with will satisfy that condition.
>
>
> How do we set them up?
>
> * we can mark a subtree sharable. Every vfsmount in the subtree
> that is not already in some p-node gets a single-element p-node of its
> own.
> * we can mark a subtree slave. That removes all vfsmounts in
> the subtree from their p-nodes and makes them owned by said p-nodes.
> p-nodes that became empty will disappear and everything they used to
> own will be repossessed by their owners (if any).

Would this be better read as "That removes each vfsmount A in the
subtree from its respective p-node p and makes it contained by a new
p-node p' (containing only A), and p' becomes 'owned' by p." ?


> * we can mark a subtree private. Same as above, but followed
> by taking all vfsmounts in our subtree and making them *not* owned
> by anybody.
>
>
> Of course, namespace operations (clone, mount, etc.) affect that structure
> and are affected by it (that's what it's for, after all).
>
> 1. CLONE_NS
>
> That one is simple - we copy vfsmounts as usual
> * if vfsmount A is contained in p-node p, then copy of A goes into
> the same p-node
> * if A is owned by p, then copy of A is also owned by p
> * no new p-nodes are created.
>
> 2. mount
>
> We have a new vfsmount A and want to attach it to mountpoint somewhere in
> vfsmount B. If B does not belong to any p-node, everything is as usual; A
> doesn't become a member or slave of any p-node and is simply attached to B.
>
> If B belongs to a p-node p, consider all vfsmounts B1,...,Bn that get events
> propagated from B and all p-nodes p1,...,pk that contain them.

By p1,...,pk, I assume you mean all p-nodes in the effective propagation
tree? If so, the following looks okay.

> * A gets cloned into n copies and these copies (A1,...,An) are attached
> to corresponding points in B1,...,Bn.
> * k new p-nodes (q1,...,qk) are created
> * Ai is contained in qj <=> Bi is contained in qj
> * qi owns qj <=> pi owns pj
> * qi owns Aj <=> pi owns Bj
>
> In other words, mount is propagated and propagation among the new vfsmounts
> mirrors the propagation between mountpoints.
>
> 3. bind
>
> bind works almost identically to mount; new vfsmount is created for every
> place that gets propagation from mountpoint and propagation is set up to
> mirror that between the mountpoints. However, there is a difference: unlike
> the case of mount, vfsmount we were going to attach (say it, A) has some
> history - it was created as a copy of some pre-existing vfsmount V. And
> that's where the things get interesting:
> * if V is contained in some p-node p, A is placed into the same
> p-node. That may require merging one of the p-nodes we'd just created
> with p (that will be the counterpart of the p-node containing the mountpoint).
> * if V is owned by some p-node p, then A (or p-node containing A)
> becomes owned by p.

I don't follow this. I still don't see the distinction between being
owned and being contained. Also, for statements like 'A belongs to B',
which is it?


- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFB5wSpdQs4kOxk3/MRAvKoAJ9hpJhSZFSED6yLKvFL8VvwgZfJNwCZAe+x
Ibm55ty86r4EfPVd32OUkTw=
=V1jV
-----END PGP SIGNATURE-----

Al Viro

unread,
Jan 13, 2005, 7:52:40 PM1/13/05
to Mike Waychison, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Thu, Jan 13, 2005 at 06:30:50PM -0500, Mike Waychison wrote:
> > 1) each p-node corresponds to a group of 1 or more vfsmounts.
> > 2) there is at most 1 p-node containing a given vfsmount.
> > 3) each p-node owns a possibly empty set of p-nodes and vfsmounts
> > 4) no p-node or vfsmount can be owned by more than one p-node
> > 5) only vfsmounts that are not contained in any p-nodes might be owned.
> > 6) no p-node can own (directly or via intermediates) itself (i.e. the
> > graph of p-node ownership is a forest).
> >
> > These guys define propagation:
> > a) if vfsmounts A and B are contained in the same p-node, events
> > propagate from A to B
> > b) if vfsmount A is contained in p-node p, vfsmount B is contained
> > in p-node q and p owns q, events propagate from A to B
> > c) if vfsmount A is contained in p-node p and vfsmount B is owned
> > by p, events propagate from A to B
>
> How is (c) different from (a)? Is there a distinction between
> 'containing' and 'owning' here?

Yes. See (3) and (1) above. Consider the following:
p = {A, B}
p owns C

Then we have propagation between A and B _and_ from either to C.

> > * we can mark a subtree slave. That removes all vfsmounts in
> > the subtree from their p-nodes and makes them owned by said p-nodes.
> > p-nodes that became empty will disappear and everything they used to
> > own will be repossessed by their owners (if any).
>
> Would this be better read as "That removes each vfsmount A in the
> subtree from its respective p-node p and makes it contained by a new
> p-node p' (containing only A), and p' becomes 'owned' by p." ?

No. "Belongs to a single-element p-node" != "doesn't belong to any
p-node". The former means "share on copy" (and might have slaves).
The latter is noone's master. Again, see the propagation rules and
behaviour on clone/rbind.

> > * if V is contained in some p-node p, A is placed into the same
> > p-node. That may require merging one of the p-nodes we'd just created
> > with p (that will be the counterpart of the p-node containing the mountpoint).
> > * if V is owned by some p-node p, then A (or p-node containing A)
> > becomes owned by p.
>
> I don't follow this. I still don't see the distinction between being
> owned and being contained. Also, for statements like 'A belongs to B',
> which is it?

"V owned by p" == "V is a slave of (equivelent) members of p"
"p contains V" == "V is one of the members of p, whatever happens to it
will happen to all of them".

"element belongs to set" means what it usually means ;-) (again, p-nodes
are sets of vfsmounts).

Erez Zadok

unread,
Jan 13, 2005, 8:21:14 PM1/13/05
to Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
Al, how do shared subtrees related to stacking? From your description, it
looks like event propagation is similar to what stacking does (pass an op
from one layer to another), only that subtree sharing is for "mount points"
and not for every VFS object. Am I right?

If shared subtrees have nothing to do with stacking, do you foresee them as
perhaps a first step toward full stacking support in the VFS? (I mean, if
we're going to have to hack the VFS heavily already...) Your "p-node"
sounds awfully similar to Rosenthal's and Skinner's "pvnode"s. :-)

Thanks,
Erez.

Al Viro

unread,
Jan 13, 2005, 8:49:17 PM1/13/05
to Erez Zadok, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Thu, Jan 13, 2005 at 08:11:59PM -0500, Erez Zadok wrote:
> Al, how do shared subtrees related to stacking? From your description, it
> looks like event propagation is similar to what stacking does (pass an op
> from one layer to another), only that subtree sharing is for "mount points"
> and not for every VFS object. Am I right?

Umm... Not really - that's propagation of operations on VFS-*only* data
structures from one part of tree to another; I don't see how that's
related to layering.



> If shared subtrees have nothing to do with stacking, do you foresee them as
> perhaps a first step toward full stacking support in the VFS? (I mean, if
> we're going to have to hack the VFS heavily already...)

I don't see how they are related, so anything towards stacking would be
a separate story, IMO... I'm not sure whether it makes sense to put that
into the same cycle - depends on how much will be affected by each set
of patches and how well it will split into trivial widespread modifications
vs. heavy localized work...

IOW, no idea right now.

> Your "p-node"
> sounds awfully similar to Rosenthal's and Skinner's "pvnode"s. :-)

Heh. "p-node" is a result of giving up on finding a better term than
"node in propagation graph" - no more, no less. I doubt that it'll
survive to final edition - both as term and as something that would
have a corresponding in-core object...

Al Viro

unread,
Jan 15, 2005, 7:53:28 PM1/15/05
to J. Bruce Fields, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Sat, Jan 15, 2005 at 07:46:59PM -0500, J. Bruce Fields wrote:

> On Thu, Jan 13, 2005 at 10:18:51PM +0000, Al Viro wrote:
> > 2. mount
> >
> > We have a new vfsmount A and want to attach it to mountpoint somewhere in
> > vfsmount B. If B does not belong to any p-node, everything is as usual; A
> > doesn't become a member or slave of any p-node and is simply attached to B.
> >
> > If B belongs to a p-node p, consider all vfsmounts B1,...,Bn that get events
> > propagated from B and all p-nodes p1,...,pk that contain them.
> > * A gets cloned into n copies and these copies (A1,...,An) are attached
> > to corresponding points in B1,...,Bn.
> > * k new p-nodes (q1,...,qk) are created
> > * Ai is contained in qj <=> Bi is contained in qj
>
> Minor typo: looks like that second qj should be pj.

ACK

J. Bruce Fields

unread,
Jan 16, 2005, 11:03:58 AM1/16/05
to Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Thu, Jan 13, 2005 at 10:18:51PM +0000, Al Viro wrote:
> 6. mount --move
> prohibited if what we are moving is in some p-node, otherwise we move
> as usual to intended mountpoint and create copies for everything that
> gets propagation from there (as we would do for rbind).

Why this prohibition?

--Bruce Fields

J. Bruce Fields

unread,
Jan 16, 2005, 12:58:19 PM1/16/05
to Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Thu, Jan 13, 2005 at 10:18:51PM +0000, Al Viro wrote:
> 2. mount
>
> We have a new vfsmount A and want to attach it to mountpoint somewhere in
> vfsmount B. If B does not belong to any p-node, everything is as usual; A
> doesn't become a member or slave of any p-node and is simply attached to B.
>
> If B belongs to a p-node p, consider all vfsmounts B1,...,Bn that get events
> propagated from B and all p-nodes p1,...,pk that contain them.
> * A gets cloned into n copies and these copies (A1,...,An) are attached
> to corresponding points in B1,...,Bn.
> * k new p-nodes (q1,...,qk) are created
> * Ai is contained in qj <=> Bi is contained in qj

Minor typo: looks like that second qj should be pj.

--b.

Al Viro

unread,
Jan 16, 2005, 1:09:17 PM1/16/05
to J. Bruce Fields, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Sun, Jan 16, 2005 at 11:02:13AM -0500, J. Bruce Fields wrote:
> On Thu, Jan 13, 2005 at 10:18:51PM +0000, Al Viro wrote:
> > 6. mount --move
> > prohibited if what we are moving is in some p-node, otherwise we move
> > as usual to intended mountpoint and create copies for everything that
> > gets propagation from there (as we would do for rbind).
>
> Why this prohibition?

How do you propagate that? We can weaken that to "in a p-node that
owns something or contains more than one vfsmount", but it's not
worth the trouble, AFAICS.

J. Bruce Fields

unread,
Jan 16, 2005, 1:43:47 PM1/16/05
to Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Sun, Jan 16, 2005 at 06:06:56PM +0000, Al Viro wrote:
> On Sun, Jan 16, 2005 at 11:02:13AM -0500, J. Bruce Fields wrote:
> > On Thu, Jan 13, 2005 at 10:18:51PM +0000, Al Viro wrote:
> > > 6. mount --move
> > > prohibited if what we are moving is in some p-node, otherwise we move
> > > as usual to intended mountpoint and create copies for everything that
> > > gets propagation from there (as we would do for rbind).
> >
> > Why this prohibition?
>
> How do you propagate that? We can weaken that to "in a p-node that
> owns something or contains more than one vfsmount", but it's not
> worth the trouble, AFAICS.

I guess I'm not seeing what there is to propagate. If the vfsmount we
are moving is mounted under a vfsmount that's in a p-node, then there'd
be something to propagate, but since the --move doesn't change the
structure of mounts underneath the moved mountpoint, I wouldn't expect
any changes to be propagated from it to other mountpoints.

I must be missing something fundamental....

--Bruce Fields

Al Viro

unread,
Jan 17, 2005, 1:15:26 AM1/17/05
to J. Bruce Fields, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Sun, Jan 16, 2005 at 01:42:09PM -0500, J. Bruce Fields wrote:
> On Sun, Jan 16, 2005 at 06:06:56PM +0000, Al Viro wrote:
> > On Sun, Jan 16, 2005 at 11:02:13AM -0500, J. Bruce Fields wrote:
> > > On Thu, Jan 13, 2005 at 10:18:51PM +0000, Al Viro wrote:
> > > > 6. mount --move
> > > > prohibited if what we are moving is in some p-node, otherwise we move
> > > > as usual to intended mountpoint and create copies for everything that
> > > > gets propagation from there (as we would do for rbind).
> > >
> > > Why this prohibition?
> >
> > How do you propagate that? We can weaken that to "in a p-node that
> > owns something or contains more than one vfsmount", but it's not
> > worth the trouble, AFAICS.
>
> I guess I'm not seeing what there is to propagate. If the vfsmount we
> are moving is mounted under a vfsmount that's in a p-node, then there'd
> be something to propagate, but since the --move doesn't change the
> structure of mounts underneath the moved mountpoint, I wouldn't expect
> any changes to be propagated from it to other mountpoints.
>
> I must be missing something fundamental....

No - I have been missing a typo. Make that "if mountpoint of what we
are moving...".

J. Bruce Fields

unread,
Jan 17, 2005, 12:34:11 PM1/17/05
to Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Mon, Jan 17, 2005 at 06:11:50AM +0000, Al Viro wrote:
> No - I have been missing a typo. Make that "if mountpoint of what we
> are moving...".

OK, got it, so the point is that its not clear how you'd propagate the
removal of the subtree from the vfsmount of the source mountpoint.

By the way, I wrote up some notes this weekend in an attempt to explain
the shared subtrees RFC to myself. They may or may not be helpful to
anyone else:

http://www.fieldses.org/~bfields/kernel/viro_mount_propagation.txt

--b.

Mike Waychison

unread,
Jan 17, 2005, 1:41:47 PM1/17/05
to Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Al Viro wrote:
> 3. bind
>
> bind works almost identically to mount; new vfsmount is created for every
> place that gets propagation from mountpoint and propagation is set up to
> mirror that between the mountpoints. However, there is a difference: unlike
> the case of mount, vfsmount we were going to attach (say it, A) has some
> history - it was created as a copy of some pre-existing vfsmount V. And
> that's where the things get interesting:
> * if V is contained in some p-node p, A is placed into the same
> p-node. That may require merging one of the p-nodes we'd just created
> with p (that will be the counterpart of the p-node containing the mountpoint).
> * if V is owned by some p-node p, then A (or p-node containing A)
> becomes owned by p.
>

Corner case: how do we handle the case where:

mount --make-shared /foo
mount --bind /foo /foo/bar

A nested --bind without sharing makes sense, but doesn't when sharing is
enabled (infinite loop).

How about a rule that states that for all Ai,Aj in p-node p, Ai must not
parent Aj in the vfsmount tree. This can be enforced at graft time.

- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFB7ARmdQs4kOxk3/MRAjkjAKCEBWx7iOWhTu1EOR2ABMr5abW4RgCdGlMu
u/Isw16fgZaErR3BErWq3JI=
=mJnu
-----END PGP SIGNATURE-----

J. Bruce Fields

unread,
Jan 17, 2005, 2:02:07 PM1/17/05
to Mike Waychison, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Mon, Jan 17, 2005 at 01:31:02PM -0500, Mike Waychison wrote:
> Corner case: how do we handle the case where:
>
> mount --make-shared /foo
> mount --bind /foo /foo/bar
>
> A nested --bind without sharing makes sense, but doesn't when sharing is
> enabled (infinite loop).

How does this force an infinite loop? I don't see it.

--Bruce Fields

Mike Waychison

unread,
Jan 17, 2005, 2:34:38 PM1/17/05
to J. Bruce Fields, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

J. Bruce Fields wrote:
> On Mon, Jan 17, 2005 at 01:31:02PM -0500, Mike Waychison wrote:
>
>>Corner case: how do we handle the case where:
>>
>>mount --make-shared /foo
>>mount --bind /foo /foo/bar
>>
>>A nested --bind without sharing makes sense, but doesn't when sharing is
>>enabled (infinite loop).
>
>
> How does this force an infinite loop? I don't see it.
>

Well, if I understand it correctly:

(assuming /foo is vfsmount A)

$> mount --make-shared /foo

will make A->A

$> mount --bind /foo /foo/bar

will create a vfsmount B based off A, but because A is in a p-node,
A->B, B->A.

Then, we attach B to A in the vfsmount tree, but because A->B in the
propagation tree, B also gets a vfsmount C added on dentry 'bar'.
Recurse ad infinitum.

Make sense?

- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFB7BJTdQs4kOxk3/MRAm9HAJ9gLZC9N1QkpriYtwE6pfJ7u47FyACfYXwU
tTIEFgSUeoocka4RZVe9McI=
=iWNB
-----END PGP SIGNATURE-----

J. Bruce Fields

unread,
Jan 17, 2005, 2:37:08 PM1/17/05
to Mike Waychison, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Mon, Jan 17, 2005 at 02:30:27PM -0500, Mike Waychison wrote:
> Well, if I understand it correctly:
>
> (assuming /foo is vfsmount A)
>
> $> mount --make-shared /foo
>
> will make A->A
>
> $> mount --bind /foo /foo/bar
>
> will create a vfsmount B based off A, but because A is in a p-node,
> A->B, B->A.
>
> Then, we attach B to A in the vfsmount tree, but because A->B in the
> propagation tree, B also gets a vfsmount C added on dentry 'bar'.
> Recurse ad infinitum.
>
> Make sense?

Yes, but couldn't the whole thing be avoided if we just agreed that the
propagation wasn't set up till after B was attached to A?

--b.

Mike Waychison

unread,
Jan 17, 2005, 3:14:24 PM1/17/05
to J. Bruce Fields, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

J. Bruce Fields wrote:
> On Mon, Jan 17, 2005 at 02:30:27PM -0500, Mike Waychison wrote:
>
>>Well, if I understand it correctly:
>>
>>(assuming /foo is vfsmount A)
>>
>>$> mount --make-shared /foo
>>
>>will make A->A
>>
>>$> mount --bind /foo /foo/bar
>>
>>will create a vfsmount B based off A, but because A is in a p-node,
>>A->B, B->A.
>>
>>Then, we attach B to A in the vfsmount tree, but because A->B in the
>>propagation tree, B also gets a vfsmount C added on dentry 'bar'.
>>Recurse ad infinitum.
>>
>>Make sense?
>
>
> Yes, but couldn't the whole thing be avoided if we just agreed that the
> propagation wasn't set up till after B was attached to A?

I don't think that solves the problem. B should receive copies (with
shared semantics if called for) of all mountpoints C1,..,Cn that are
children of A if A->A. This is regardless of whether or not propagation
occurs before or after the attach.

Allowing this is like allowing directory aliasing in the sense that an
aliased directory that is nested within itself opens us to
badness/headaches 8)

I still think the only way to handle this is to disallow vfsmounts in a
p-node to have (grand)parent-child relationships. This may have to be
extended to the 'owned by' case as well.

- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFB7BvmdQs4kOxk3/MRAkDnAJ0SgZ4KJJXu5gHpCAmgZY199ts3sgCeKFoD
qpQqB+hkExDyuGLOfG8Hnso=
=H4nE
-----END PGP SIGNATURE-----

Al Viro

unread,
Jan 17, 2005, 3:43:57 PM1/17/05
to Mike Waychison, J. Bruce Fields, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Mon, Jan 17, 2005 at 03:11:18PM -0500, Mike Waychison wrote:

> I don't think that solves the problem. B should receive copies (with
> shared semantics if called for) of all mountpoints C1,..,Cn that are
> children of A if A->A. This is regardless of whether or not propagation
> occurs before or after the attach.

... when that makes sense. Do you see any real problems with the proposed
behaviour (i.e. propagation happens before attachment)?

BTW, you do realize that rbind also has "copy before attaching" semantics,
right?



> Allowing this is like allowing directory aliasing in the sense that an
> aliased directory that is nested within itself opens us to
> badness/headaches 8)
>
> I still think the only way to handle this is to disallow vfsmounts in a
> p-node to have (grand)parent-child relationships. This may have to be
> extended to the 'owned by' case as well.

Not feasible (and think what _that_ will do to --move, especially since
propagation can span namespace boundaries).

J. Bruce Fields

unread,
Jan 17, 2005, 4:24:13 PM1/17/05
to Mike Waychison, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Mon, Jan 17, 2005 at 03:11:18PM -0500, Mike Waychison wrote:
> I don't think that solves the problem. B should receive copies (with
> shared semantics if called for) of all mountpoints C1,..,Cn that are
> children of A if A->A. This is regardless of whether or not propagation
> occurs before or after the attach.

Consider this situation:
# #make new vfsmounts at /foo and /bar:
# mount --bind /foo /foo
# mount --bind /bar /bar
# # mount /bar under /foo, *then* put /bar and /mnt2 in the same p-node:
# mount --bind /bar /foo/mnt1
# mount --make-shared /foo
# mount --bind /foo /mnt2
# find # and I think this is what you'll get:
.
./foo
./bar
./bar/file_in_bar
./foo/mnt1
./foo/mnt1/file_in_bar
./mnt2/
./mnt2/mnt1/

Since /mnt2 and /foo are in the same p-node, any mounts we may make
under them later will be shared. But the mount under under /foo/mnt1 is
*not* automatically propagated to /mnt2/mnt1, and /mnt1 is still in its
own little p-node (so mounts under it won't be replicated).

At least, I think I have that right.

In any case, setting up propagation between two vfsmounts needn't force
propagation of preexisting mounts, it need only affect mounts made later.

--b.

Mike Waychison

unread,
Jan 18, 2005, 2:55:22 PM1/18/05
to Al Viro, J. Bruce Fields, linux-...@vger.kernel.org, linux-...@vger.kernel.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Al Viro wrote:
> On Mon, Jan 17, 2005 at 03:11:18PM -0500, Mike Waychison wrote:
>
>
>>I don't think that solves the problem. B should receive copies (with
>>shared semantics if called for) of all mountpoints C1,..,Cn that are
>>children of A if A->A. This is regardless of whether or not propagation
>>occurs before or after the attach.
>
>
> ... when that makes sense. Do you see any real problems with the proposed
> behaviour (i.e. propagation happens before attachment)?
>
> BTW, you do realize that rbind also has "copy before attaching" semantics,
> right?

Ya, okay, that semantic will work. Please add it to the RFC though :)

>
>
>>Allowing this is like allowing directory aliasing in the sense that an
>>aliased directory that is nested within itself opens us to
>>badness/headaches 8)
>>
>>I still think the only way to handle this is to disallow vfsmounts in a
>>p-node to have (grand)parent-child relationships. This may have to be
>>extended to the 'owned by' case as well.
>
>
> Not feasible (and think what _that_ will do to --move, especially since
> propagation can span namespace boundaries).

Fair enough.

Changing the topic slightly: How should we handle propagation events for
the detach_mnt() case? Is it fair to say: a detach_mnt of A mounted on
dentry d on parent B will 'umount -l Ai' all Ai where Ai is mounted on
dentry d in all peers and private derivatives of the p-node which B
belong to?

Steps to above:
- - Detaching A from parent B (mounted on dentry d)
- Let S = set of all peer vfsmounts in B's p-node p (if any)
unioned with all vfsmounts owned by p (expanding owned p-nodes
recursively):
- For each C in S
- If (C has a child mountpoint D mounted on dentry d)
&& (D is equivalent to A)
- umount -l D

Thoughts?

Also, brainstorming mountpoint expiry: How about something like this:

- - Each p-node has a recently-touched flag, like how vfsmount currently
has a mnt_expiry_mark.
- - A call to umount with MNT_EXPIRE of vfsmount A which is in a non-empty
p-node will:
- Will check to see if *all* Ai in A's p-node (and derivatives) are
not busy, if not, return -EBUSY
- Otherwise:
- Will clear the recently-touched flag of the p-node if set
- Otherwise it will umount all Ai.

This only works btw for autofs iff we have vfs native traps. Otherwise
we'll need to do recursive MNT_EXPIRE (overload MNT_EXPIRE | MNT_DETACH?)

- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFB7Wc6dQs4kOxk3/MRAo9/AJ415IkSmKqT7rpvo7Uwr8HZqI0okwCfXYs+
iuXoqlEyzGMCnPKwLlSfgvI=
=OAAC
-----END PGP SIGNATURE-----

Ram

unread,
Jan 25, 2005, 4:31:49 PM1/25/05
to J. Bruce Fields, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Mon, 2005-01-17 at 09:32, J. Bruce Fields wrote:
> On Mon, Jan 17, 2005 at 06:11:50AM +0000, Al Viro wrote:
> > No - I have been missing a typo. Make that "if mountpoint of what we
> > are moving...".
>
> OK, got it, so the point is that its not clear how you'd propagate the
> removal of the subtree from the vfsmount of the source mountpoint.
>
> By the way, I wrote up some notes this weekend in an attempt to explain
> the shared subtrees RFC to myself. They may or may not be helpful to
> anyone else:
>
> http://www.fieldses.org/~bfields/kernel/viro_mount_propagation.txt


Question 1:

If there exists a private subtree in a larger shared subtree, what
happens when the larger shared subtree is rbound to some other place?
Is a new private subtree created in the new larger shared subtree? or
will that be pruned out in the new larger subtree?

Concrete example:

mount <device1> /tmp/mnt1
mount <device2> /tmp/mnt1/mnt1.1
mount <device3> /tmp/mnt1/mnt1.1/mnt1.1.1
make --make-shared /tmp/mnt1
mount --make-private /tmp/mnt1/mnt1.1
make --rbind /tmp/mnt1 /tmp/mnt2

Question: will I see the mount at /tmp/mnt2/mnt1.1/mnt1.1.1 ?

My guess is since /tmp/mnt1/mnt1.1 is private that subtree
should not be even seen under /tmp/mnt2/mnt1.1 , Is that
the case? Or does the subtree get mirrored in /tmp/mnt2/mnt1.1;
however propogation is not set between the vfsstruct of
/mnt/mnt1/mnt1.1 and /mnt/mnt2/mnt1.1 ?

I believe its the former case.


Question 2:

When a mount gets propogated to a slave, but the slave
has mounted something else at the same place, and hence
that mount point is masked, what will happen?

Concrete example:

mount <device1> /tmp/mnt1
mkdir -p /tmp/mnt1/a/b
mount --rbind /tmp/mnt1 /tmp/mnt2
mount --make-slave /tmp/mnt2
mount <device2> /tmp/mnt2/a
rm -f /tmp/mnt2/a/*

what happens when a mount is attempted on /tmp/mnt1/a/b?
will that be reflected in /tmp/mnt2/a ?

I believe the answer is 'no', because that part of the subtree
in /tmp/mnt2 no more mirrors its parent subtree.

RP

Mike Waychison

unread,
Jan 25, 2005, 4:57:24 PM1/25/05
to Ram, J. Bruce Fields, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Ram,

I can't speak for Al, but the following is how I understand it:

Ram wrote:
> On Mon, 2005-01-17 at 09:32, J. Bruce Fields wrote:
>
>>On Mon, Jan 17, 2005 at 06:11:50AM +0000, Al Viro wrote:
>>
>>>No - I have been missing a typo. Make that "if mountpoint of what we
>>>are moving...".
>>
>>OK, got it, so the point is that its not clear how you'd propagate the
>>removal of the subtree from the vfsmount of the source mountpoint.
>>
>>By the way, I wrote up some notes this weekend in an attempt to explain
>>the shared subtrees RFC to myself. They may or may not be helpful to
>>anyone else:
>>
>>http://www.fieldses.org/~bfields/kernel/viro_mount_propagation.txt
>
>
>
> Question 1:
>
> If there exists a private subtree in a larger shared subtree, what
> happens when the larger shared subtree is rbound to some other place?
> Is a new private subtree created in the new larger shared subtree? or
> will that be pruned out in the new larger subtree?
>
> Concrete example:
>
> mount <device1> /tmp/mnt1
> mount <device2> /tmp/mnt1/mnt1.1
> mount <device3> /tmp/mnt1/mnt1.1/mnt1.1.1
> make --make-shared /tmp/mnt1
> mount --make-private /tmp/mnt1/mnt1.1

Not needed, see below:

> make --rbind /tmp/mnt1 /tmp/mnt2
>
> Question: will I see the mount at /tmp/mnt2/mnt1.1/mnt1.1.1 ?
>
> My guess is since /tmp/mnt1/mnt1.1 is private that subtree
> should not be even seen under /tmp/mnt2/mnt1.1 , Is that
> the case? Or does the subtree get mirrored in /tmp/mnt2/mnt1.1;
> however propogation is not set between the vfsstruct of
> /mnt/mnt1/mnt1.1 and /mnt/mnt2/mnt1.1 ?
>
> I believe its the former case.

Although Al hasn't explicitly defined the semantics for mount
- --make-shared, I think the idea is that 'only' that mountpoint becomes
tagged as shared (becomes a member of a p-node of size 1). The
- --make-shared / --make-private / --make-slave should probably all be
non-recursive actions.

/tmp/mnt1/mnt1.1 and /tmp/mnt1/mnt1.1/mnt1.1.1 will remain private.

The --rbind is described as simply walking the vfsmount tree rooted at
the argument and performing --bind.

So:

- - /tmp/mnt2 becomes a peer of /tmp/mnt1, because /tmp/mnt1 was in a
non-empty p-node.
- - /tmp/mnt2/mnt1.1 becomes a copy of /tmp/mnt1/mnt1.1 because the latter
was not in a p-node.
- - /tmp/mnt2/mnt1.1.1 becomes a copy of /tmp/mnt1/mnt1.1/mnt1.1.1 because
the latter was not in a p-node.

Only new mounts placed on top of /tmp/mnt1 and /tmp/mmnt2 will get
propagated back and forth.

>
>
> Question 2:
>
> When a mount gets propogated to a slave, but the slave
> has mounted something else at the same place, and hence
> that mount point is masked, what will happen?
>
> Concrete example:
>
> mount <device1> /tmp/mnt1
> mkdir -p /tmp/mnt1/a/b
> mount --rbind /tmp/mnt1 /tmp/mnt2
> mount --make-slave /tmp/mnt2

EINVAL. You should only be able to demote a mountpoint to a slave if it
was part of a p-node (shared).

> mount <device2> /tmp/mnt2/a
> rm -f /tmp/mnt2/a/*
>
> what happens when a mount is attempted on /tmp/mnt1/a/b?
> will that be reflected in /tmp/mnt2/a ?
>
> I believe the answer is 'no', because that part of the subtree
> in /tmp/mnt2 no more mirrors its parent subtree.
>
> RP
>
> -

> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in


> the body of a message to majo...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFB9r5YdQs4kOxk3/MRApT3AJ9xxpdacU0mp8IvsY395MDtEktJ+wCeOvRT
/g7qXO9nGxMT/iFAZoUO8F4=
=9D2G
-----END PGP SIGNATURE-----

J. Bruce Fields

unread,
Jan 25, 2005, 5:05:08 PM1/25/05
to Mike Waychison, Ram, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Tue, Jan 25, 2005 at 04:47:04PM -0500, Mike Waychison wrote:
> Although Al hasn't explicitly defined the semantics for mount
> - --make-shared, I think the idea is that 'only' that mountpoint becomes
> tagged as shared (becomes a member of a p-node of size 1).

On Thu, Jan 13, 2005 at 10:18:51PM +0000, Al Viro wrote:
> * we can mark a subtree sharable. Every vfsmount in the subtree
> that is not already in some p-node gets a single-element p-node of its
> own.

Also, note that mount automatically sets up propagation that mirrors
that of the mounted on vfsmount, so by default new mounts anywhere in
the subtree will also be tagged as shared.

--b.

Ram

unread,
Jan 25, 2005, 5:14:46 PM1/25/05
to Mike Waychison, J. Bruce Fields, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Tue, 2005-01-25 at 13:47, Mike Waychison wrote:
...snip...

> >
> > Question 2:
> >
> > When a mount gets propogated to a slave, but the slave
> > has mounted something else at the same place, and hence
> > that mount point is masked, what will happen?
> >
> > Concrete example:
> >
> > mount <device1> /tmp/mnt1
> > mkdir -p /tmp/mnt1/a/b
> > mount --rbind /tmp/mnt1 /tmp/mnt2
> > mount --make-slave /tmp/mnt2
>
> EINVAL. You should only be able to demote a mountpoint to a slave if it
> was part of a p-node (shared).

oops. I had the following in mind.

mount <device1> /tmp/mnt1
** mount --make-shared /tmp/mnt1 **


mkdir -p /tmp/mnt1/a/b
mount --rbind /tmp/mnt1 /tmp/mnt2
mount --make-slave /tmp/mnt2

In this case it cannot be EINVAL, because /tmp/mnt1 and /tmp/mnt2 will
both be part of a pnode and hence /tmp/mnt2 can be demoted to be a
slave.

Mike Waychison

unread,
Jan 25, 2005, 7:06:31 PM1/25/05
to J. Bruce Fields, Ram, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

J. Bruce Fields wrote:
> On Tue, Jan 25, 2005 at 04:47:04PM -0500, Mike Waychison wrote:
>
>>Although Al hasn't explicitly defined the semantics for mount
>>- --make-shared, I think the idea is that 'only' that mountpoint becomes
>>tagged as shared (becomes a member of a p-node of size 1).
>
>
> On Thu, Jan 13, 2005 at 10:18:51PM +0000, Al Viro wrote:
>
>> * we can mark a subtree sharable. Every vfsmount in the subtree
>>that is not already in some p-node gets a single-element p-node of its
>>own.
>
>
> Also, note that mount automatically sets up propagation that mirrors
> that of the mounted on vfsmount, so by default new mounts anywhere in
> the subtree will also be tagged as shared.
>

Why not simply call this --make-rshared and keep --make-shared only
share a single mount then?

- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFB9tzDdQs4kOxk3/MRAp3jAJ9CjPjEQs1jvcm92Q2jAizYvnBOSgCeJ9A0
Jt0d1v7iLB3EPbEWq9r6zik=
=3u5S
-----END PGP SIGNATURE-----

Mike Waychison

unread,
Jan 28, 2005, 5:35:24 PM1/28/05
to Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Al Viro wrote:

> OK, here comes the first draft of proposed semantics for subtree
> sharing. What we want is being able to propagate events between
> the parts of mount trees. Below is a description of what I think
> might be a workable semantics; it does *NOT* describe the data
> structures I would consider final and there are considerable
> areas where we still need to figure out the right behaviour.
>

Okay, I'm not convinced that shared subtrees as proposed will work well
with autofs.

The idea discussed off-line was this:

When you install an autofs mountpoint, on say /home, a daemon is started
to service the requests. As far as the admin is concerned, an fs is
mounted in the current namespace, call it namespaceA. The daemon
actually runs in it's one private namespace: call it namespaceB.
namespaceB receives a new autofs filesystem: call it autofsB. autofsB
is in it's own p-node. namespaceA gets an autofsA on /home as well, and
autofsA is 'owned' by autofsB's p-node.

So:

autofsB -> autofsB
and
autofsB -> autofsA

Effectively, namespaceA has a private instance of autofsB in its tree.

The problem is this:

Assume /home/mikew is accessed in namespaceA. The daemon running in
namespaceB gets the event, and mounts an nfs vfsmount on autofsB. This
event is propagated back to autofsA.

(Problem 1: how do you block access to /home/mikew in namespaceA?)

Next, a CLONE_NS is done in namespaceA, creating namespaceA'. the
homedir on /home/mikew is also copied.

Now, in namespaceA', what happens when a user umount's /home/mikew? We
haven't yet determined how to handle umount event propagation, but it
appears likely that it will be *a hard thing to do*.

Assuming the nfs umount succeeds, /home/mikew is accessed again in
namespaceA'.

(Problem 2: The daemon in namespaceB will see the event, but it already
has something mounted on it's version of /home/mikew. How does it
'send' a mountpoint to namespaceB.)

- -----------

Shared subtrees may help in some adminstrative situations, but don't
look like the right solution for autofs.

Autofs will work with namespaces if the following functionality is added
to the kernel: The ability to perform mount(2) operations on a
directory fd.

This has been discussed before and quickly vetoed, citing that it is a
security risk. I still fail to understand how allowing a mount to
happen cross-namespace given a dirfd target is any worse than what is
already possible given a dirfd. If you don't want someone to play with
your namespace, don't give them a dirfd.

Thoughts?

- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFB+r1OdQs4kOxk3/MRAmSpAJ96ix25fjze6o7viCq2DCET9J/AlQCfYlC1
CoLKusJXjL+fYxgwggOCW+w=
=8bTv
-----END PGP SIGNATURE-----

ra...@themaw.net

unread,
Jan 28, 2005, 11:55:37 PM1/28/05
to Mike Waychison, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Fri, 28 Jan 2005, Mike Waychison wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Al Viro wrote:
>
>> OK, here comes the first draft of proposed semantics for subtree
>> sharing. What we want is being able to propagate events between
>> the parts of mount trees. Below is a description of what I think
>> might be a workable semantics; it does *NOT* describe the data
>> structures I would consider final and there are considerable
>> areas where we still need to figure out the right behaviour.
>>
>
> Okay, I'm not convinced that shared subtrees as proposed will work well
> with autofs.

OK. I've read the thread but haven't digested it so you'll have to put up
with some stupid questions.

>
> The idea discussed off-line was this:
>
> When you install an autofs mountpoint, on say /home, a daemon is started
> to service the requests. As far as the admin is concerned, an fs is
> mounted in the current namespace, call it namespaceA. The daemon
> actually runs in it's one private namespace: call it namespaceB.
> namespaceB receives a new autofs filesystem: call it autofsB. autofsB
> is in it's own p-node. namespaceA gets an autofsA on /home as well, and
> autofsA is 'owned' by autofsB's p-node.
>
> So:
>
> autofsB -> autofsB
> and
> autofsB -> autofsA
>
> Effectively, namespaceA has a private instance of autofsB in its tree.
>
> The problem is this:
>
> Assume /home/mikew is accessed in namespaceA. The daemon running in
> namespaceB gets the event, and mounts an nfs vfsmount on autofsB. This
> event is propagated back to autofsA.

Which condition (or action) in the definition implies

autofsB -> autofsA

>
> (Problem 1: how do you block access to /home/mikew in namespaceA?)
>
> Next, a CLONE_NS is done in namespaceA, creating namespaceA'. the
> homedir on /home/mikew is also copied.
>
> Now, in namespaceA', what happens when a user umount's /home/mikew? We
> haven't yet determined how to handle umount event propagation, but it
> appears likely that it will be *a hard thing to do*.

No I haven't spent enough time on the RFC buy into this one.
So I'll just say it looks like something is missing in this argument.

Perhaps the later is namespaceC?

>
> Assuming the nfs umount succeeds, /home/mikew is accessed again in
> namespaceA'.

namespaceC?

> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in

Mike Waychison

unread,
Jan 31, 2005, 12:22:30 PM1/31/05
to ra...@themaw.net, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sorry for the bad quoting below:

autofsB -> autofsA indicates that mount events are propagated from
autofsB to autofsA.

Eg: if you have two mounts (A, B) in the same p-node, then

A -> B
and
B -> A

By definition, a mountpoint (A) alone in a one element p-node has the
property:

A -> A

Which doesn't mean much, other than to show that A is in a p-node.

If you have a p-node p' owned by p-node p, then all mountpoints i' in p'
will have the following relationship with all mountpoints i in p:

i -> i'

but not the reverse (one-way relationship).

>
> (Problem 1: how do you block access to /home/mikew in namespaceA?)
>
> Next, a CLONE_NS is done in namespaceA, creating namespaceA'. the
> homedir on /home/mikew is also copied.
>
> Now, in namespaceA', what happens when a user umount's /home/mikew? We
> haven't yet determined how to handle umount event propagation, but it
> appears likely that it will be *a hard thing to do*.
>
>
>> No I haven't spent enough time on the RFC buy into this one.
>> So I'll just say it looks like something is missing in this argument.
>
>> Perhaps the later is namespaceC?
>

Sure, it doesn't matter, namespaceA' is an arbitrary name.

HTH,

- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFB/miHdQs4kOxk3/MRArCsAJ9PxyxE7crSRk4R0OMB4yppH10wpQCfeQO8
qk6kcExaN7rzJOi4KoRyXoY=
=VvFb
-----END PGP SIGNATURE-----
-

Ian Kent

unread,
Jan 31, 2005, 8:33:32 PM1/31/05
to Mike Waychison, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org

Sorry guys.

I've gota spend some time to get into this.
It is really important.

Ram

unread,
Jan 31, 2005, 9:30:03 PM1/31/05
to Mike Waychison, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Fri, 2005-01-28 at 14:31, Mike Waychison wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Al Viro wrote:
>
> > OK, here comes the first draft of proposed semantics for subtree
> > sharing. What we want is being able to propagate events between
> > the parts of mount trees. Below is a description of what I think
> > might be a workable semantics; it does *NOT* describe the data
> > structures I would consider final and there are considerable
> > areas where we still need to figure out the right behaviour.
> >
>
> Okay, I'm not convinced that shared subtrees as proposed will work well
> with autofs.
>
> The idea discussed off-line was this:
>
> When you install an autofs mountpoint, on say /home, a daemon is started
> to service the requests. As far as the admin is concerned, an fs is
> mounted in the current namespace, call it namespaceA. The daemon
> actually runs in it's one private namespace: call it namespaceB.
> namespaceB receives a new autofs filesystem: call it autofsB. autofsB
> is in it's own p-node. namespaceA gets an autofsA on /home as well, and
> autofsA is 'owned' by autofsB's p-node.

Mike, multiple parsing through the problem definition, still did not
make the problem clear. What problem is autofs trying to solve using
namespaces?

My guess is you dont want to see a automount taking place in namespaceA,
when a automount takes place in namespaceB, even though
the automount-point is in a shared subtree?

Sorry don't understand automount's requirement in the first place,
RP

>
> So:
..snip...

Mike Waychison

unread,
Feb 1, 2005, 2:04:40 AM2/1/05
to Ram, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The major concern for automounting is that currently, if you start an
automount daemon in the primary namespace, and some process clones off
into a new namespace with clone(CLONE_NS), then there is no way for the
daemon running in the first namespace to automount (let alone expire)
any mounts in the second namespace. There doesn't exist a way for the
daemon to mount(2) nor umount(2) across namespaces.

The proposed solution for this is to use shared and private subtrees to
have the daemon run in it's own namespace, with the primary and any
derivative namespaces inheriting the automounts. I'm not convinced that
it'd work though.

Does this clarify?

- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFB/ymFdQs4kOxk3/MRAjuWAKCJfX+jZMUlm9ncM199Q0nJpxwPKQCgjQFE
VTNmwXtmKOLVlrqBd2AzfYk=
=tESv
-----END PGP SIGNATURE-----

Ram

unread,
Feb 1, 2005, 2:31:04 PM2/1/05
to Mike Waychison, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org

Yes it does clarify the problem and motivates the reason behind using
shared subtree.

However going back to your original problem 1:

you have a daemon running in namespaceB, and a process running in
namespaceA and it acceses a auto-mountpoint /home.

The expected behavior in this case should be: the autofs-daemon must
mount the corresponding device at that mount point '/home' on all
existing namespaces(provided that part of the subtree is shared). Right?
So in this case it should mount the device in both the namespaces, i.e
namespaceA and namespaceB. But you seem to be saying that you want to
block the auto-mount in namespaceA?

RP

Mike Waychison

unread,
Feb 1, 2005, 4:18:58 PM2/1/05
to Ram, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

(Hmm.. something is up with my quoting again..)

Yes. Sharing allows this to happen in a 'safe' way. The daemon doesn't
have to know how many instances of '/home' exist.

>> But you seem to be saying that you want to
>> block the auto-mount in namespaceA?
>

No. I want to allow the mount. However, if there are several shared
'/home' (through CLONE_NS or mount --bind), there remains the following
two key problems:

- - How do you expire the mounts and umount them? (undefined with shared
subtrees thus far)
- - How do you handle the case where '/home/mikew' is automounted in all
instances of it, and then umounted in a single namespace. Walking back
into '/home/mikew' in that namespace will trigger the daemon to mount
again, but the filesystem is already mounted in it's namespace.

I guess a solution to ponder is what if we included the following rule:

"An attempt to umount a vfsmount X will induce the umounting of all
vfsmounts in X's p-node as well as all vfsmounts/p-nodes 'owned' by said
p-node."

I'm not sure that is a desirable solution or even nice to implement.

- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFB//F3dQs4kOxk3/MRAtFwAJwJlbQiltnBFFzsZHNfYo4oRxXLtgCfZ6ny
AVcIOZ/BirLJtjK/CENMDxM=
=PS6I

J. Bruce Fields

unread,
Feb 1, 2005, 6:24:28 PM2/1/05
to Ram, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Tue, Jan 25, 2005 at 01:07:12PM -0800, Ram wrote:
> If there exists a private subtree in a larger shared subtree, what
> happens when the larger shared subtree is rbound to some other place?
> Is a new private subtree created in the new larger shared subtree? or
> will that be pruned out in the new larger subtree?

"mount --rbind" will always do at least all the mounts that it did
before the introduction of shared subtrees--so certainly it will copy
private subtrees along with shared ones. (Since subtrees are private by
default, anything else would make --rbind do nothing by default.) My
understanding of Viro's RFC is that the new subtree will have no
connection with the preexisting private subtree (we want private
subtrees to stay private), but that the new copy will end up with
whatever propagation the target of the "mount --rbind" had. (So the
addition of the copy of the private subtree to the target vfsmount will
be replicated on any vfsmount that the target vfsmount propogates to,
and those copies will propagate among themselves in the same way that
the copies of the target vfsmount propagate to each other.)

--Bruce Fields

Ram

unread,
Feb 1, 2005, 6:40:36 PM2/1/05
to Mike Waychison, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org

The same way we currently expire mounts. In the case of namespaces you
will expire a bunch of mounts simultaneously and unmount them togather.

> - - How do you handle the case where '/home/mikew' is automounted in all
> instances of it, and then umounted in a single namespace. Walking back
> into '/home/mikew' in that namespace will trigger the daemon to mount
> again, but the filesystem is already mounted in it's namespace.

I think, mount and unmount are two different type of events. And any
event propogates down the propogation tree. My understanding of the
expected behavior is, the unmount should take place in all the
namespaces(provided all the umount-points belong to vfstructs belonging
to the same pnode or are owned by the pnode). So in your example, I
imagine /home/mikew to be unmounted in all the namespaces.


>
> I guess a solution to ponder is what if we included the following rule:
>
> "An attempt to umount a vfsmount X will induce the umounting of all
> vfsmounts in X's p-node as well as all vfsmounts/p-nodes 'owned' by said
> p-node."
>

exactly. And I think this is what Al imagines it be too.

Well all this is my interpretation of Al's proposal. I would like to
hear from him though.....

J. Bruce Fields

unread,
Feb 1, 2005, 6:42:05 PM2/1/05
to Ram, Mike Waychison, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Tue, Jan 25, 2005 at 02:02:43PM -0800, Ram wrote:
> oops. I had the following in mind.
>
> mount <device1> /tmp/mnt1
> ** mount --make-shared /tmp/mnt1 **
> mkdir -p /tmp/mnt1/a/b
> mount --rbind /tmp/mnt1 /tmp/mnt2
> mount --make-slave /tmp/mnt2
>
> In this case it cannot be EINVAL, because /tmp/mnt1 and /tmp/mnt2 will
> both be part of a pnode and hence /tmp/mnt2 can be demoted to be a
> slave.
> >
> > > mount <device2> /tmp/mnt2/a
> > > rm -f /tmp/mnt2/a/*
> > >
> > > what happens when a mount is attempted on /tmp/mnt1/a/b?
> > > will that be reflected in /tmp/mnt2/a ?
> > >
> > > I believe the answer is 'no', because that part of the subtree
> > > in /tmp/mnt2 no more mirrors its parent subtree.

shared subtrees aside, it is the nature of --bind mounts that they share
all the same dentries; so the "rm -f" above will immediately be
reflected in all copies (with or without subtree sharing) and no mounts
will be possible on the (now absent) path a/b.

I think the question you meant to ask was what would happen if you
mounted something on /tmp/mnt2/a/b (the slave copy) and then mounted
something else on /tmp/mnt1/a/b. In that case there's two places where
the propagated mount might go:
1. On top of the dentry a/b in /tmp/mnt2, underneath the
preexisting mount.
2. On top of the root dentry of the thing mounted in
/tmp/mnt2/a/b, thus covering the preexisting mount.

Wouldn't option 1 require changing the mnt_parent of the preexisting
mount on /tmp/mnt2/a/b? That seems like an odd thing to do, so I assume
option 2 is the only possible solution, but perhaps I'm missing
something.

--b.

J. Bruce Fields

unread,
Feb 1, 2005, 8:40:09 PM2/1/05
to Ram, Mike Waychison, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Tue, Feb 01, 2005 at 06:37:54PM -0500, J. Bruce Fields wrote:
> I think the question you meant to ask was what would happen if you
> mounted something on /tmp/mnt2/a/b (the slave copy) and then mounted
> something else on /tmp/mnt1/a/b. In that case there's two places where
> the propagated mount might go:
> 1. On top of the dentry a/b in /tmp/mnt2, underneath the
> preexisting mount.
> 2. On top of the root dentry of the thing mounted in
> /tmp/mnt2/a/b, thus covering the preexisting mount.
>
> Wouldn't option 1 require changing the mnt_parent of the preexisting
> mount on /tmp/mnt2/a/b? That seems like an odd thing to do, so I assume
> option 2 is the only possible solution, but perhaps I'm missing
> something.

Yes, I'm confused: --move, for example, changes the mnt_parent, and it's
only ever used under vfsmount_lock.

So #1, which adheres to the rule that all the clones are mounted at the
same dentry, probably makes more sense.--b.

J. Bruce Fields

unread,
Feb 1, 2005, 9:12:17 PM2/1/05
to Mike Waychison, Ram, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Tue, Feb 01, 2005 at 04:15:36PM -0500, Mike Waychison wrote:
> No. I want to allow the mount. However, if there are several shared
> '/home' (through CLONE_NS or mount --bind), there remains the following
> two key problems:
>
> - - How do you expire the mounts and umount them? (undefined with shared
> subtrees thus far)
> - - How do you handle the case where '/home/mikew' is automounted in all
> instances of it, and then umounted in a single namespace. Walking back
> into '/home/mikew' in that namespace will trigger the daemon to mount
> again, but the filesystem is already mounted in it's namespace.
>
> I guess a solution to ponder is what if we included the following rule:
>
> "An attempt to umount a vfsmount X will induce the umounting of all
> vfsmounts in X's p-node as well as all vfsmounts/p-nodes 'owned' by said
> p-node."

From Viro's proposal:

> 5. umount
> umount everything that gets propagation from victim.

I think that agrees with your description.

What *should* be the behaviour when someone unmounts something that was
mounted by the automounter? That seems like a strange thing to do.

--b.

Ram

unread,
Feb 2, 2005, 3:03:44 PM2/2/05
to J. Bruce Fields, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Tue, 2005-02-01 at 15:21, J. Bruce Fields wrote:
> On Tue, Jan 25, 2005 at 01:07:12PM -0800, Ram wrote:
> > If there exists a private subtree in a larger shared subtree, what
> > happens when the larger shared subtree is rbound to some other place?
> > Is a new private subtree created in the new larger shared subtree? or
> > will that be pruned out in the new larger subtree?
>
> "mount --rbind" will always do at least all the mounts that it did
> before the introduction of shared subtrees--so certainly it will copy
> private subtrees along with shared ones. (Since subtrees are private by
> default, anything else would make --rbind do nothing by default.) My
> understanding of Viro's RFC is that the new subtree will have no
> connection with the preexisting private subtree (we want private
> subtrees to stay private), but that the new copy will end up with
> whatever propagation the target of the "mount --rbind" had. (So the
> addition of the copy of the private subtree to the target vfsmount will
> be replicated on any vfsmount that the target vfsmount propogates to,
> and those copies will propagate among themselves in the same way that
> the copies of the target vfsmount propagate to each other.)

ok. that makes sense. As you said the private subtree shall get copied
to the new location, however propogations wont be set in either
directions. However I have a rather unusual requirement which forces
multiple rbind of a shared subtree within the same shared subtree.

I did the calculation and found that the tree simply explodes with
vfsstructs. If I mark a subtree within the larger shared tree as
private, then the number of vfsstructs grows linearly O(n). However if
there was a way of marking a subtree within the larger shared tree as
unclonable than the increase in number of vfsstruct is constant.

What I am essentially driving at is, can we add another feature which
allows me to mark a subtree as unclonable?


Read below to see how the tree explodes:

to run you through an example:

(In case the tree pictures below gets garbled, it can also be seen at
http://www.sudhaa.com/~ram/readahead/sharedsubtree/subtree )

step 1:
lets say the root tree has just two directories with one vfsstruct.
root
/ \
tmp usr
All I want is to be able to see the entire root tree
(but not anything under /root/tmp) to be viewable under /root/tmp/m*

step2:
mount --make-shared /root

mkdir -p /tmp/m1

mount --rbind /root /tmp/m1

the new tree now looks like this:

root
/ \
tmp usr
/
m1
/ \
tmp usr
/
m1

it has two vfsstructs

step3:
mkdir -p /tmp/m2
mount --rbind /root /tmp/m2

the new tree now looks like this:

root
/ \
tmp usr
/ \
m1 m2
/ \ / \
tmp usr tmp usr
/ \ /
m1 m2 m1
/ \ / \
tmp usr tmp usr
/ / \
m1 m1 m2
/ \
tmp usr
/ \
m1 m2

it has 6 vfsstructs

step 4:
mkdir -p /tmp/m3
mount --rbind /root /tmp/m3

I wont' draw the tree..but it will have 24 vfstructs


at step i the number of vfsstructs V[i] = i*V[i-1] which is an
exponential function.

This is a issue in general if somebody does a --rbind of shared tree
within the same shared tree multiple times.

However this issue can be alleviated if we mark the subtree as private.
In the above example, if I mark the tree under /root/tmp as private the
number of vfsstructs will reduce drastically to O(n).

But if there is a way of marking a subtree unclonable, this entire issue
can be resolved.

RP

Ram

unread,
Feb 2, 2005, 3:49:12 PM2/2/05
to Mike Waychison, J. Bruce Fields, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Wed, 2005-02-02 at 11:45, Mike Waychison wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> At this step, you probably shouldn't be using --rbind, but --bind
> instead to only bind a copy of the root vfsmount, so it now looks like:

>
> > root
> > / \
> > tmp usr
> > / \
> > m1 m2
> > / \ / \
> > tmp usr tmp usr
> > / \ / \
> > m1 m2 m1 m2

Well I thought about this. Even Bruce Fields suggested this in a private
thread. But this solution can be racy. You may have to do multiple binds
for all the vfstructs that reside in the subtree under / (but not under
/root/tmp). And doing it atomically without racing with other
simultaneous mounts would be tricky.

RP


>
> - --
> Mike Waychison
> Sun Microsystems, Inc.
> 1 (650) 352-5299 voice
> 1 (416) 202-8336 voice
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> NOTICE: The opinions expressed in this email are held by me,
> and may not represent the views of Sun Microsystems, Inc.
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.5 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
>

> iD8DBQFCAS3ndQs4kOxk3/MRAm/qAJ0awCE49/g+HhMdX0MBZnFLSp2IjACgj5EQ
> El+YLq25hQeDAt9Y92nqoAU=
> =so+d
> -----END PGP SIGNATURE-----

Mike Waychison

unread,
Feb 2, 2005, 4:19:23 PM2/2/05
to Ram, J. Bruce Fields, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

At this step, you probably shouldn't be using --rbind, but --bind


instead to only bind a copy of the root vfsmount, so it now looks like:

> root


> / \
> tmp usr
> / \
> m1 m2
> / \ / \
> tmp usr tmp usr
> / \ / \

> m1 m2 m1 m2

- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCAS3ndQs4kOxk3/MRAm/qAJ0awCE49/g+HhMdX0MBZnFLSp2IjACgj5EQ
El+YLq25hQeDAt9Y92nqoAU=
=so+d
-----END PGP SIGNATURE-----

Mike Waychison

unread,
Feb 2, 2005, 4:25:46 PM2/2/05
to Ram, J. Bruce Fields, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org

Well, fwiw, I have the same kind of race in autofsng. I counter it by
building up the vfsmount tree elsewhere and mount --move'ing it.

Unfortunately, the RFC states that moving a shared vfsmount is
prohibited (for which the reasoning slips my mind).


- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCAUFQdQs4kOxk3/MRAksjAJ4wCzY7jc8aUGKeiHKTywFKxhN1qACeI4HM
eO3XGtYgnbOZJYT3K1nbKd4=
=wwuF

J. Bruce Fields

unread,
Feb 2, 2005, 4:34:26 PM2/2/05
to Mike Waychison, Ram, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Wed, Feb 02, 2005 at 04:08:32PM -0500, Mike Waychison wrote:
> Well, fwiw, I have the same kind of race in autofsng. I counter it by
> building up the vfsmount tree elsewhere and mount --move'ing it.
>
> Unfortunately, the RFC states that moving a shared vfsmount is
> prohibited (for which the reasoning slips my mind).

See http://marc.theaimsgroup.com/?l=linux-fsdevel&m=110594248826226&w=2

As I understand it, the problem isn't sharing of the vfsmount being
moved, but sharing of the vfsmount on which that vfsmount is
mounted.--b.

Mike Waychison

unread,
Feb 2, 2005, 4:41:07 PM2/2/05
to J. Bruce Fields, Ram, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

J. Bruce Fields wrote:
> On Wed, Feb 02, 2005 at 04:08:32PM -0500, Mike Waychison wrote:
>
>>Well, fwiw, I have the same kind of race in autofsng. I counter it by
>>building up the vfsmount tree elsewhere and mount --move'ing it.
>>
>>Unfortunately, the RFC states that moving a shared vfsmount is
>>prohibited (for which the reasoning slips my mind).
>
>
> See http://marc.theaimsgroup.com/?l=linux-fsdevel&m=110594248826226&w=2
>
> As I understand it, the problem isn't sharing of the vfsmount being
> moved, but sharing of the vfsmount on which that vfsmount is
> mounted.--b.

Okay, thanks for the refresher.

That still keeps you from using the 'build tree elsewhere' and 'mount
- --move' approach though, as the parent mountpoint would likely be shared.

- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCAUcUdQs4kOxk3/MRAubGAJ0fUrpVS9U5oQof5jv4JieVOo6JjwCgjHXa
oHcjXLEV5zj4OrB+TEipQdY=
=3hhk
-----END PGP SIGNATURE-----

J. Bruce Fields

unread,
Feb 2, 2005, 4:56:18 PM2/2/05
to Mike Waychison, Ram, Al Viro, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Wed, Feb 02, 2005 at 04:33:08PM -0500, Mike Waychison wrote:
> That still keeps you from using the 'build tree elsewhere' and 'mount
> - --move' approach though, as the parent mountpoint would likely be shared.

I believe it's also just the source mountpoint that's the problem, not
the destination; does that help?

--Bruce Fields

Ram

unread,
Apr 5, 2005, 5:54:08 AM4/5/05
to Al Viro, J. Bruce Fields, linux-...@vger.kernel.org, linux-...@vger.kernel.org
On Sun, 2005-01-16 at 22:11, Al Viro wrote:
> On Sun, Jan 16, 2005 at 01:42:09PM -0500, J. Bruce Fields wrote:
> > On Sun, Jan 16, 2005 at 06:06:56PM +0000, Al Viro wrote:

> > > On Sun, Jan 16, 2005 at 11:02:13AM -0500, J. Bruce Fields wrote:
> > > > On Thu, Jan 13, 2005 at 10:18:51PM +0000, Al Viro wrote:
> > > > > 6. mount --move
> > > > > prohibited if what we are moving is in some p-node, otherwise we move
> > > > > as usual to intended mountpoint and create copies for everything that
> > > > > gets propagation from there (as we would do for rbind).
> > > >
> > > > Why this prohibition?
> > >
> > > How do you propagate that? We can weaken that to "in a p-node that
> > > owns something or contains more than one vfsmount", but it's not
> > > worth the trouble, AFAICS.
> >
> > I guess I'm not seeing what there is to propagate. If the vfsmount we
> > are moving is mounted under a vfsmount that's in a p-node, then there'd
> > be something to propagate, but since the --move doesn't change the
> > structure of mounts underneath the moved mountpoint, I wouldn't expect
> > any changes to be propagated from it to other mountpoints.
> >
> > I must be missing something fundamental....

>
> No - I have been missing a typo. Make that "if mountpoint of what we
> are moving...".

Ok. I have been spending time lately on implementing this RFC. So time
for some questions.

If the vfsmount that is being moved is mounted within a shared-vfsmount
(i.e is in p-node) why should the move operation be prohibited?

The way I look at it is: umount the vfsmount, propogate the unmount
event to all corresponding vfsmounts, and mount the vfsmount struct at
its destination and if applicable propogate the mount event.

An example:

If A is a vfsmount contained in pnode p and B is a vfsmount
mounted on A, and B is moved to a mountpoint on vfsmount C the
operations involved are:

1. umount B from A and propogate the unmount to all vfsmount contained
in p as well as recursively to all slave-pnodes
and slave vfsstructs.
2. mount B on the mountpoint in C, and if C is in
some p-node, propogate the mount to all vfsmounts in that
pnode as well as recursively to its slave p-nodes and
slave vfsstructs.

RP


> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in

Reply all
Reply to author
Forward
0 new messages