Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

FUSE merging?

0 views
Skip to first unread message

Miklos Szeredi

unread,
Jun 30, 2005, 5:26:17 AM6/30/05
to ak...@osdl.org, linux-...@vger.kernel.org
Hi Andrew!

What's up with FUSE merging? Is there anything pending that I should
do?

Ted Ts'o's ideas about selective access to mountpoints are
interesting, but I wouldn't consider them merge critical, as they
solve a problem, that hasn't yet come up in real life.

Thanks,
Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Andrew Morton

unread,
Jun 30, 2005, 5:30:06 AM6/30/05
to Miklos Szeredi, linux-...@vger.kernel.org
Miklos Szeredi <mik...@szeredi.hu> wrote:
>
> What's up with FUSE merging? Is there anything pending that I should
> do?

Where are we up to with the fuse_allow_task() bunfight?

Miklos Szeredi

unread,
Jun 30, 2005, 5:53:25 AM6/30/05
to ak...@osdl.org, linux-...@vger.kernel.org
> > What's up with FUSE merging? Is there anything pending that I should
> > do?
>
> Where are we up to with the fuse_allow_task() bunfight?

I think we agreed, that there seem to be no alternatives.

Tytso said, that fuse_allow_task() thing is basically OK, but there
should be some method to make certain tasks excempt from this
limitation. I agree, with this, but I think there should be at least
one (preferably more) users who actually need this, before I start
thinking about implementing it.

Making a mount be excepmt is already possible with the 'allow_other'
(privileged by default) mount option.

Miklos

Arjan van de Ven

unread,
Jun 30, 2005, 6:03:29 AM6/30/05
to Miklos Szeredi, ak...@osdl.org, linux-...@vger.kernel.org
On Thu, 2005-06-30 at 11:51 +0200, Miklos Szeredi wrote:
> > > What's up with FUSE merging? Is there anything pending that I should
> > > do?
> >
> > Where are we up to with the fuse_allow_task() bunfight?
>
> I think we agreed, that there seem to be no alternatives.
>
> Tytso said, that fuse_allow_task() thing is basically OK, but there
> should be some method to make certain tasks excempt from this
> limitation. I agree, with this, but I think there should be at least
> one (preferably more) users who actually need this, before I start
> thinking about implementing it.
>
> Making a mount be excepmt is already possible with the 'allow_other'
> (privileged by default) mount option.

if you are so interested in getting fuse merged... why not merge it
first with the security stuff removed entirely. And then start
discussing putting security stuff back in ?

Miklos Szeredi

unread,
Jun 30, 2005, 6:16:45 AM6/30/05
to ar...@infradead.org, ak...@osdl.org, linux-...@vger.kernel.org
> if you are so interested in getting fuse merged... why not merge it
> first with the security stuff removed entirely. And then start
> discussing putting security stuff back in ?

a) it's already been discussed to death (just search for 'fuse' on
lkml and fsdevel)

b) I don't consider it a good idea to ship a defunct version of it in
the mainline

Can you please accept my wish to have FUSE merged _with_ the
unprivileged mount's thing.

If anybody has anything to add to the discussion, please do it now,
and not later. Delaying this further won't get us any bonus IMO.

Miklos

Miklos Szeredi

unread,
Jun 30, 2005, 6:24:03 AM6/30/05
to ar...@infradead.org, ak...@osdl.org, linux-...@vger.kernel.org
> if you are so interested in getting fuse merged... why not merge it
> first with the security stuff removed entirely. And then start
> discussing putting security stuff back in ?

BTW, I can split out the security stuff into a separate patch from the
rest, if people feel more confortable discussing a concrete patch,
instead of a range of lines (actually a 15 line function) of the
whole.

Miklos

Arjan van de Ven

unread,
Jun 30, 2005, 6:25:57 AM6/30/05
to Miklos Szeredi, ak...@osdl.org, linux-...@vger.kernel.org
On Thu, 2005-06-30 at 12:12 +0200, Miklos Szeredi wrote:
> > if you are so interested in getting fuse merged... why not merge it
> > first with the security stuff removed entirely. And then start
> > discussing putting security stuff back in ?
>
> a) it's already been discussed to death (just search for 'fuse' on
> lkml and fsdevel)
>
> b) I don't consider it a good idea to ship a defunct version of it in
> the mainline
>
> Can you please accept my wish to have FUSE merged _with_ the
> unprivileged mount's thing.

By the same argument:
Then can you please accept that FUSE will not get merged right now.

Miklos Szeredi

unread,
Jun 30, 2005, 6:32:34 AM6/30/05
to ar...@infradead.org, ak...@osdl.org, linux-...@vger.kernel.org
> By the same argument:
> Then can you please accept that FUSE will not get merged right now.

Yes.

My argument is: IF it's not going to get merged now, can we please
continue the discussion about why it's unacceptable, and what are the
alternatives.

Is that fair?

Miklos

Anton Altaparmakov

unread,
Jun 30, 2005, 7:16:08 AM6/30/05
to Arjan van de Ven, Miklos Szeredi, ak...@osdl.org, linux-...@vger.kernel.org
On Thu, 2005-06-30 at 12:20 +0200, Arjan van de Ven wrote:
> On Thu, 2005-06-30 at 12:12 +0200, Miklos Szeredi wrote:
> > > if you are so interested in getting fuse merged... why not merge it
> > > first with the security stuff removed entirely. And then start
> > > discussing putting security stuff back in ?
> >
> > a) it's already been discussed to death (just search for 'fuse' on
> > lkml and fsdevel)
> >
> > b) I don't consider it a good idea to ship a defunct version of it in
> > the mainline
> >
> > Can you please accept my wish to have FUSE merged _with_ the
> > unprivileged mount's thing.
>
> By the same argument:
> Then can you please accept that FUSE will not get merged right now.

Why should he? IMNSHO it should be merged right now with the security
stuff. FUSE works as is. Without the security stuff FUSE is useless.

I have yet to read even a single constructive argument why it should not
be merged as is.

Best regards,

Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

Avuton Olrich

unread,
Jun 30, 2005, 3:45:16 PM6/30/05
to Miklos Szeredi, ar...@infradead.org, ak...@osdl.org, linux-...@vger.kernel.org
On 6/30/05, Miklos Szeredi <mik...@szeredi.hu> wrote:
> > Then can you please accept that FUSE will not get merged right now.
> My argument is: IF it's not going to get merged now, can we please
> continue the discussion about why it's unacceptable, and what are the
> alternatives.

Why has there not been more discussion about just making an option for
those 15 lines, just for merging's sake, and hopefully after more
discussion, the option will go away one way or another. On the other
hand everyone says security, security, security and I don't remember
one person actually saying something negative about what it does to
security.

avuton

--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

Andrew Morton

unread,
Jun 30, 2005, 5:36:24 PM6/30/05
to Anton Altaparmakov, ar...@infradead.org, mik...@szeredi.hu, linux-...@vger.kernel.org, Frank van Maarseveen
Anton Altaparmakov <ai...@cam.ac.uk> wrote:
>
> On Thu, 2005-06-30 at 12:20 +0200, Arjan van de Ven wrote:
> > On Thu, 2005-06-30 at 12:12 +0200, Miklos Szeredi wrote:
> > > > if you are so interested in getting fuse merged... why not merge it
> > > > first with the security stuff removed entirely. And then start
> > > > discussing putting security stuff back in ?
> > >
> > > a) it's already been discussed to death (just search for 'fuse' on
> > > lkml and fsdevel)
> > >
> > > b) I don't consider it a good idea to ship a defunct version of it in
> > > the mainline
> > >
> > > Can you please accept my wish to have FUSE merged _with_ the
> > > unprivileged mount's thing.
> >
> > By the same argument:
> > Then can you please accept that FUSE will not get merged right now.
>
> Why should he? IMNSHO it should be merged right now with the security
> stuff. FUSE works as is. Without the security stuff FUSE is useless.
>
> I have yet to read even a single constructive argument why it should not
> be merged as is.

I believe that the requirement which fuse_allow_task() attempts to satisfy
is legitimate and is useful to FUSE users.

The fact that, AFAIK, nobody as found a way to implement it more nicely is
a Linux problem, not a FUSE problem.

Given that the actual amount of code involved is small, centralised and
well known about we can easily fix it up later if/when new infrastructure
or new ideas become available.

So unless someone is able to come up with a better approach in the next few
days I'm inclined to say "we suck" and merge the thing as-is.

However, a few things:

- is there anything in the current implementation of the permission stuff
which might tie our hands if it is later reimplemented? IOW: does the
current FUSE user interface in any way lock us into the current FUSE
implementation (fuse_allow_task())?

- the fuse mount options don't seem to be documented

- aren't we going to remove the nfs semi-server feature?

- Frank points out that a user can send a sigstop to his own setuid(0)
task and he intimates that this could cause DoS problems with FUSE. More
details needed please?

- I don't recall seeing an exhaustive investigation of how an
unprivileged user could use a FUSE mount to implement DoS attacks against
other users or against root.

Andrew Morton

unread,
Jun 30, 2005, 5:40:48 PM6/30/05
to ai...@cam.ac.uk, ar...@infradead.org, mik...@szeredi.hu, linux-...@vger.kernel.org, fra...@frankvm.com
Andrew Morton <ak...@osdl.org> wrote:
>
> However, a few things:
>
> - is there anything in the current implementation of the permission stuff
> which might tie our hands if it is later reimplemented? IOW: does the
> current FUSE user interface in any way lock us into the current FUSE
> implementation (fuse_allow_task())?
>
> - the fuse mount options don't seem to be documented
>
> - aren't we going to remove the nfs semi-server feature?
>
> - Frank points out that a user can send a sigstop to his own setuid(0)
> task and he intimates that this could cause DoS problems with FUSE. More
> details needed please?
>
> - I don't recall seeing an exhaustive investigation of how an
> unprivileged user could use a FUSE mount to implement DoS attacks against
> other users or against root.

You say

"If a sysadmin trusts the users enough, or can ensure through other
measures, that system processes will never enter non-privileged mounts,
it can relax the last limitation with a "user_allow_other" config
option. If this config option is set, the mounting user can add the
"allow_other" mount option which disables the check for other users'
processes."

What config option, where?

Frank van Maarseveen

unread,
Jun 30, 2005, 6:31:13 PM6/30/05
to Andrew Morton, Anton Altaparmakov, ar...@infradead.org, mik...@szeredi.hu, linux-...@vger.kernel.org, Frank van Maarseveen
On Thu, Jun 30, 2005 at 12:46:22PM -0700, Andrew Morton wrote:
>
> - Frank points out that a user can send a sigstop to his own setuid(0)
> task and he intimates that this could cause DoS problems with FUSE. More
> details needed please?

It's the other way around:
Apparently it is not a security problem to SIGSTOP or even SIGKILL a
setuid program. So why is it a security problem when such a program is
delayed by a supposedly malicious behaving FUSE mount?

I think that setuid programs take too many things for granted, especially
"time". I also think the ptrace equivalence principle (item C2 in the
FUSE doc) is too harsh for FUSE.

Suppose the process changes id to full root and we can no longer send
signals to it. Are there any other ways we could affect its scheduling
without FUSE? I think "yes", clearly not that easy as when it accesses a
FUSE mount but "yes". Think about typing ^S (XOFF), or by letting it read
from a pipe or from a file on a very very slow device. Or by renicing
the parent in advance. Regarding the pipe: yes the setuid program could
check that with fstat() but is such a check fundamentally the right
approach? I have doubt because unified I/O is a good thing and there is
no guarantee whatsoever about completion of any FS operation within a
certain amount of time. Suppose another malicious process does a lookup
in a huge directory without hashed names? What about a process consuming
lots of memory, pushing everything else into swap? What about deleting
a _huge_ file or do other things which might(?) take a considerable
amount of kernel time? [id]notify might even help using this to delay
a root process at a crucial point to exploit a race. So, I think there
are many ways to affect the execution speed of [setuid] programs. I
have never heard of a setuid root program which renices itself, such,
that it successfully avoids a race or DoS exploit.

And then the DoS thing using simulated endless files within FUSE. It is
already possible to create terabyte sized [sparse] files. Can the fstat()
size/blocks info be trusted from FUSE? no more than fstat() outside FUSE
because the file may still be growing!

> - I don't recall seeing an exhaustive investigation of how an
> unprivileged user could use a FUSE mount to implement DoS attacks against
> other users or against root.

In general I think it is _hard_ to protect against a local DoS for many
reasons and I don't see any new fundamental problem here with FUSE:
it is just making it more obvious that it's hard to write secure setuid
programs. Those programs should _know_ that input data and anything else
from the user is "tainted" and that they must be _very_ careful with it,
in every detail.

--
Frank

Miklos Szeredi

unread,
Jul 1, 2005, 2:25:03 AM7/1/05
to avu...@gmail.com, ar...@infradead.org, ak...@osdl.org, linux-...@vger.kernel.org
> > > Then can you please accept that FUSE will not get merged right now.
> > My argument is: IF it's not going to get merged now, can we please
> > continue the discussion about why it's unacceptable, and what are the
> > alternatives.
>
> Why has there not been more discussion about just making an option for
> those 15 lines, just for merging's sake, and hopefully after more
> discussion, the option will go away one way or another. On the other
> hand everyone says security, security, security and I don't remember
> one person actually saying something negative about what it does to
> security.

There is a mount option: 'allow_other' which does just this. Or did
you mean a config option?

Thanks,
Miklos

Miklos Szeredi

unread,
Jul 1, 2005, 2:38:08 AM7/1/05
to ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, mik...@szeredi.hu, linux-...@vger.kernel.org, fra...@frankvm.com
> However, a few things:
>
> - is there anything in the current implementation of the permission stuff
> which might tie our hands if it is later reimplemented? IOW: does the
> current FUSE user interface in any way lock us into the current FUSE
> implementation (fuse_allow_task())?

No. This thing is above the userspace interface and completely
independent. Either a task is allowed, and then the request goes
through to the interface. Or if it's not, the request is stopped
right there, and never reaches the userspace interface.

> - the fuse mount options don't seem to be documented

True. I'll send a patch (they are documented in the README of the
fuse distribution).

> - aren't we going to remove the nfs semi-server feature?

I leave the decision to you ;) It's a separate independent patch
already (fuse-nfs-export.patch).

> - Frank points out that a user can send a sigstop to his own setuid(0)
> task and he intimates that this could cause DoS problems with FUSE. More
> details needed please?

Will follow up in Franks answer.

> - I don't recall seeing an exhaustive investigation of how an
> unprivileged user could use a FUSE mount to implement DoS attacks against
> other users or against root.

Here's a description of a theoretical DoS scenario:

http://marc.theaimsgroup.com/?l=linux-fsdevel&m=111522019516694&w=2

Miklos

Miklos Szeredi

unread,
Jul 1, 2005, 2:41:58 AM7/1/05
to ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, mik...@szeredi.hu, linux-...@vger.kernel.org, fra...@frankvm.com
> > - I don't recall seeing an exhaustive investigation of how an
> > unprivileged user could use a FUSE mount to implement DoS attacks against
> > other users or against root.
>
> You say
>
> "If a sysadmin trusts the users enough, or can ensure through other
> measures, that system processes will never enter non-privileged mounts,
> it can relax the last limitation with a "user_allow_other" config
> option. If this config option is set, the mounting user can add the
> "allow_other" mount option which disables the check for other users'
> processes."
>
> What config option, where?

Currently that's a userspace issue. There's a /etc/fuse.conf file,
with two options:

max_mounts=X
user_allow_other

The fusermount helper reads this file, and decides if passing the
'allow_other' mount option to the kernel is OK or not.

If we want unprivileged sys_mount() these will have to be checked in
kernel (set via sysfs, etc).

Miklos

Andrew Morton

unread,
Jul 1, 2005, 2:53:18 AM7/1/05
to Miklos Szeredi, ai...@cam.ac.uk, ar...@infradead.org, mik...@szeredi.hu, linux-...@vger.kernel.org, fra...@frankvm.com
Miklos Szeredi <mik...@szeredi.hu> wrote:
>
> > - aren't we going to remove the nfs semi-server feature?
>
> I leave the decision to you ;) It's a separate independent patch
> already (fuse-nfs-export.patch).

Let's leave it out - that'll stimulate some activity in the
userspace-nfs-server-for-FUSE area.

Speaking of which, dumb question: what does FUSE offer over simply using
NFS protocol to talk to the userspace filesystem driver?

Miklos Szeredi

unread,
Jul 1, 2005, 3:03:41 AM7/1/05
to fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, mik...@szeredi.hu, linux-...@vger.kernel.org, fra...@frankvm.com
> >
> > - Frank points out that a user can send a sigstop to his own setuid(0)
> > task and he intimates that this could cause DoS problems with FUSE. More
> > details needed please?
>
> It's the other way around:
> Apparently it is not a security problem to SIGSTOP or even SIGKILL a
> setuid program. So why is it a security problem when such a program is
> delayed by a supposedly malicious behaving FUSE mount?

Perfectly valid argument. My question: is it not a security problem
to allow signals to reach a suid program?

> I think that setuid programs take too many things for granted, especially
> "time". I also think the ptrace equivalence principle (item C2 in the
> FUSE doc) is too harsh for FUSE.

It's obviously not equivalence. FUSE filesystem gets a subset of
ptrace's capabilities (and rather a small one).

> Suppose the process changes id to full root and we can no longer send
> signals to it. Are there any other ways we could affect its scheduling
> without FUSE? I think "yes", clearly not that easy as when it accesses a
> FUSE mount but "yes". Think about typing ^S (XOFF), or by letting it read
> from a pipe or from a file on a very very slow device. Or by renicing
> the parent in advance. Regarding the pipe: yes the setuid program could
> check that with fstat() but is such a check fundamentally the right
> approach? I have doubt because unified I/O is a good thing and there is
> no guarantee whatsoever about completion of any FS operation within a
> certain amount of time. Suppose another malicious process does a lookup
> in a huge directory without hashed names? What about a process consuming
> lots of memory, pushing everything else into swap? What about deleting
> a _huge_ file or do other things which might(?) take a considerable
> amount of kernel time? [id]notify might even help using this to delay
> a root process at a crucial point to exploit a race. So, I think there
> are many ways to affect the execution speed of [setuid] programs. I
> have never heard of a setuid root program which renices itself, such,
> that it successfully avoids a race or DoS exploit.

There's a huge difference between slowing down, and stopping a
process. I wouldn't consider the first a true DoS.

> And then the DoS thing using simulated endless files within FUSE. It is
> already possible to create terabyte sized [sparse] files. Can the fstat()
> size/blocks info be trusted from FUSE? no more than fstat() outside FUSE
> because the file may still be growing!
>
> > - I don't recall seeing an exhaustive investigation of how an
> > unprivileged user could use a FUSE mount to implement DoS attacks against
> > other users or against root.
>
> In general I think it is _hard_ to protect against a local DoS for many
> reasons and I don't see any new fundamental problem here with FUSE:
> it is just making it more obvious that it's hard to write secure setuid
> programs. Those programs should _know_ that input data and anything else
> from the user is "tainted" and that they must be _very_ careful with it,
> in every detail.

Yes. The extra problem with FUSE, is that they are not _able_ to be
careful. They can't even check if a file is in fact on a FUSE mount
or not without the FUSE daemon's intervention (lookup on a file will
be passed to userspace).

Thanks,
Miklos

Miklos Szeredi

unread,
Jul 1, 2005, 3:10:08 AM7/1/05
to ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com
> >
> > > - aren't we going to remove the nfs semi-server feature?
> >
> > I leave the decision to you ;) It's a separate independent patch
> > already (fuse-nfs-export.patch).
>
> Let's leave it out - that'll stimulate some activity in the
> userspace-nfs-server-for-FUSE area.
>
> Speaking of which, dumb question: what does FUSE offer over simply using
> NFS protocol to talk to the userspace filesystem driver?

Oh lots:

- no deadlocks (NFS mounted from localhost is riddled with them)

- efficient protocol, optimized for less context switches

- dcache invalidation policy

- probably more, but I can't remember

Miklos

Andrew Morton

unread,
Jul 1, 2005, 3:17:51 AM7/1/05
to Miklos Szeredi, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com
Miklos Szeredi <mik...@szeredi.hu> wrote:
>
> > >
> > > > - aren't we going to remove the nfs semi-server feature?
> > >
> > > I leave the decision to you ;) It's a separate independent patch
> > > already (fuse-nfs-export.patch).
> >
> > Let's leave it out - that'll stimulate some activity in the
> > userspace-nfs-server-for-FUSE area.
> >
> > Speaking of which, dumb question: what does FUSE offer over simply using
> > NFS protocol to talk to the userspace filesystem driver?
>
> Oh lots:
>
> - no deadlocks (NFS mounted from localhost is riddled with them)

It is? We had some low-memory problems a while back, but they got fixed.
During that work I did some nfs-to-localhost testing and things seemed OK.

> - efficient protocol, optimized for less context switches

One wouldn't really expect a userspace filesystem to be particularly fast,
and the performance will be dominated by memory copies and IO wait anyway.

> - dcache invalidation policy

What's that?

> - probably more, but I can't remember

Please do..

Miles Bader

unread,
Jul 1, 2005, 3:31:23 AM7/1/05
to Andrew Morton, Miklos Szeredi, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com
Andrew Morton <ak...@osdl.org> writes:
>> - efficient protocol, optimized for less context switches
>
> One wouldn't really expect a userspace filesystem to be particularly fast,
> and the performance will be dominated by memory copies and IO wait anyway.

Well there's slow and then there's slow... numbers are always nice though.

-miles
--
[|nurgle|] ddt- demonic? so quake will have an evil kinda setting? one that
will make every christian in the world foamm at the mouth?
[iddt] nurg, that's the goal

Miklos Szeredi

unread,
Jul 1, 2005, 3:40:48 AM7/1/05
to ak...@osdl.org, mik...@szeredi.hu, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com
> >
> > > >
> > > > > - aren't we going to remove the nfs semi-server feature?
> > > >
> > > > I leave the decision to you ;) It's a separate independent patch
> > > > already (fuse-nfs-export.patch).
> > >
> > > Let's leave it out - that'll stimulate some activity in the
> > > userspace-nfs-server-for-FUSE area.
> > >
> > > Speaking of which, dumb question: what does FUSE offer over simply using
> > > NFS protocol to talk to the userspace filesystem driver?
> >
> > Oh lots:
> >
> > - no deadlocks (NFS mounted from localhost is riddled with them)
>
> It is? We had some low-memory problems a while back, but they got fixed.
> During that work I did some nfs-to-localhost testing and things seemed OK.

Well, there's the "unsolvable" writeback deadlock problem, that FUSE
works around by not buffering dirty pages (and not allowing writable
mmap). Does NFS solve that? I'm interested :)

Then there's the usual "filesystem recursing into itself" deadlock.
Mounting with 'intr' probably solves this for NFS, but that has
unwanted side effects. FUSE only allows KILL to interrupt a request.

> > - efficient protocol, optimized for less context switches
>
> One wouldn't really expect a userspace filesystem to be particularly fast,

FUSE is pretty fast. >100Mbytes/s transfer speeds on a moderate
hardware are not unusual.

> and the performance will be dominated by memory copies and IO wait anyway.

Memory copies don't seem to be an issue (and FUSE does very little of
it). Performance is mostly dominated by context switch times (if the
underlying filesystem can keep up). Unfortunately unbuffered writes
mean a separate request for each written page, and thus a context
switch (on UP at least). This has a marked effect on write
performance.

> > - dcache invalidation policy
>
> What's that?

Userspace can tell the kernel, how long a dentry should be valid. I
don't think the NFS protocol provides this. Same holds for the inode
attributes.

> > - probably more, but I can't remember
>
> Please do..

OK, I'll do a little research.

Miklos

Frederik Deweerdt

unread,
Jul 1, 2005, 3:49:46 AM7/1/05
to Miklos Szeredi, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com
Le 01/07/05 08:36 +0200, Miklos Szeredi écrivit:

> Here's a description of a theoretical DoS scenario:
>
> http://marc.theaimsgroup.com/?l=linux-fsdevel&m=111522019516694
&w=2
>
> Miklos
Could this be solved by implementing some sort of (optional) timeout on
fuse
syscalls (in request_send)?

Fred

--
o---------------------------------------------o
| http://open-news.net : l'info alternative |
| Tech - Sciences - Politique - International |
o---------------------------------------------o

Andrew Morton

unread,
Jul 1, 2005, 4:07:19 AM7/1/05
to Miklos Szeredi, mik...@szeredi.hu, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com
Miklos Szeredi <mik...@szeredi.hu> wrote:
>
> > >
> > > > >
> > > > > > - aren't we going to remove the nfs semi-server feature?
> > > > >
> > > > > I leave the decision to you ;) It's a separate independent patch
> > > > > already (fuse-nfs-export.patch).
> > > >
> > > > Let's leave it out - that'll stimulate some activity in the
> > > > userspace-nfs-server-for-FUSE area.
> > > >
> > > > Speaking of which, dumb question: what does FUSE offer over simply using
> > > > NFS protocol to talk to the userspace filesystem driver?
> > >
> > > Oh lots:
> > >
> > > - no deadlocks (NFS mounted from localhost is riddled with them)
> >
> > It is? We had some low-memory problems a while back, but they got fixed.
> > During that work I did some nfs-to-localhost testing and things seemed OK.
>
> Well, there's the "unsolvable" writeback deadlock problem, that FUSE
> works around by not buffering dirty pages (and not allowing writable
> mmap). Does NFS solve that? I'm interested :)

I don't know - first you'd have to describe it.

> Then there's the usual "filesystem recursing into itself" deadlock.

Describe this completely as well, please.

> Mounting with 'intr' probably solves this for NFS, but that has
> unwanted side effects. FUSE only allows KILL to interrupt a request.

Maybe these things can be solved in NFS?

> > > - dcache invalidation policy
> >
> > What's that?
>
> Userspace can tell the kernel, how long a dentry should be valid. I
> don't think the NFS protocol provides this. Same holds for the inode
> attributes.

Why is that needed?

> > > - probably more, but I can't remember
> >
> > Please do..
>
> OK, I'll do a little research.
>

v9fs has a user-level server too. Maybe it has been used in FUSE-like
scenarios more than NFS.

Plus NFS and v9fs work across the network...

Frank van Maarseveen

unread,
Jul 1, 2005, 5:31:30 AM7/1/05
to Miklos Szeredi, fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
On Fri, Jul 01, 2005 at 08:58:05AM +0200, Miklos Szeredi wrote:
> > >
> > > - Frank points out that a user can send a sigstop to his own setuid(0)
> > > task and he intimates that this could cause DoS problems with FUSE. More
> > > details needed please?
> >
> > It's the other way around:
> > Apparently it is not a security problem to SIGSTOP or even SIGKILL a
> > setuid program. So why is it a security problem when such a program is
> > delayed by a supposedly malicious behaving FUSE mount?
>
> Perfectly valid argument. My question: is it not a security problem
> to allow signals to reach a suid program?

That's what I though too so I asked it first on the security mailing list.
Apparently this signal behavior is normal.

> There's a huge difference between slowing down, and stopping a
> process. I wouldn't consider the first a true DoS.

Stopping is a special case. But it is effectively the same as being
indefinately slowed down by, say, 10000+ malicious processes and from
that angle I don't see a fundamental difference w.r.t. security.

Killing the malicous processes should solve the problem. And killing
one FUSE process looks easier to me than killing 10000+ ones.

> Yes. The extra problem with FUSE, is that they are not _able_ to be
> careful.

I think this is not true. Every pathname passed to a setuid program
by the user is basically "tainted". Standard I/O is tainted as well.

> They can't even check if a file is in fact on a FUSE mount

They shouldn't. The pathname is not to be trusted anyway.

I think FUSE has shown to be conservative enough w.r.t. security to be
merged. But it may be interesting to consider:

- replace ptraceability test by a kill()ability test.
- some sort of "intr" mount option for most signals on by default.
- Forbid hiding data by mounting a FUSE filesystem on top of it (does
FUSE check for this already?)
- /proc isn't a problem: most root processes tend to avoid it because
it is synthetic and thus uninteresting. Maybe we should extend
the idea of "synthetic file-systems being uninteresting" to any
process which cannot receive signals from the FUSE mount owner. When
one cannot hide data by a FUSE mount and its synthetic anyway so not
interesting then just show the original empty mount point.

--
Frank

Frank van Maarseveen

unread,
Jul 1, 2005, 5:41:04 AM7/1/05
to Miklos Szeredi, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com
On Fri, Jul 01, 2005 at 08:36:02AM +0200, Miklos Szeredi wrote:
>
> Here's a description of a theoretical DoS scenario:
>
> http://marc.theaimsgroup.com/?l=linux-fsdevel&m=111522019516694&w=2

So the open() hangs indefinately. but what if blackhat tries to install
a package from a no longer existing server on /net or via NFS?

A user supplied pathname is not to be trusted by any setuid (or full
root) program.

Another example: I'm not sure if there are still /dev/tty devices which
may block indefinately upon open() but:

- I have yet to see a setuid program which always uses O_NONBLOCK
when opening user supplied pathnames.
- one cannot stat() and then open() because that gives a race.

--
Frank

Miklos Szeredi

unread,
Jul 1, 2005, 5:50:12 AM7/1/05
to frederik...@gmail.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com
> Could this be solved by implementing some sort of (optional) timeout on fuse
> syscalls (in request_send)?

Yes, but that would be thousand times worse than the current solution.
You just can't know in advance, what a "sane" timeout value is.

Miklos

Miklos Szeredi

unread,
Jul 1, 2005, 6:14:05 AM7/1/05
to ak...@osdl.org, mik...@szeredi.hu, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com
> > Well, there's the "unsolvable" writeback deadlock problem, that FUSE
> > works around by not buffering dirty pages (and not allowing writable
> > mmap). Does NFS solve that? I'm interested :)
>
> I don't know - first you'd have to describe it.

A dirty page is being written back, but the userspace server needs to
allocate memory to complete the request. But the allocation will
block, since there's no more free memory.

> > Then there's the usual "filesystem recursing into itself" deadlock.
>
> Describe this completely as well, please.

User does unlink("/mnt/userfs/file"). Userspace server receives
request to unlink "/file". Then the daemon does
unlink("/mnt/userfs/file"). This will deadlock on i_sem.

> > Mounting with 'intr' probably solves this for NFS, but that has
> > unwanted side effects. FUSE only allows KILL to interrupt a request.
>
> Maybe these things can be solved in NFS?

Possibly.

>
> > > > - dcache invalidation policy
> > >
> > > What's that?
> >
> > Userspace can tell the kernel, how long a dentry should be valid. I
> > don't think the NFS protocol provides this. Same holds for the inode
> > attributes.
>
> Why is that needed?

Because, I can well imagine a synthetic filesystem, where file
data/metadata change aribitrarily. In this case the timeout heuristic
in NFS is not useful.

In fact with NFS it's often a PITA, that it doesn't want to refresh a
file's data/metatata, which I _know_ has changed on the server.

> > > > - probably more, but I can't remember
> > >
> > > Please do..
> >
> > OK, I'll do a little research.
> >
>
> v9fs has a user-level server too. Maybe it has been used in FUSE-like
> scenarios more than NFS.

I think the p9 protocol is suffering from trying to be too generic.
The FUSE kernel interface is probably slightly tied to the linux VFS,
and would present problems if trying to port to other *NIX or god
forbid some other OS family altogether.

That may seem like a drawback, but I don't think it is:

- people are encouraged to use the FUSE library API instead of the
raw kernel interface

- if it will be ported to other systems, even the kernel interface
could probably be made compatible, only at the loss of
simplicity/performance.

> Plus NFS and v9fs work across the network...

Yes. I consider that a drawback. FUSE does data transfer very
efficiently (single copy), without the heavy network infrastructure
being in the way.

Miklos

Miklos Szeredi

unread,
Jul 1, 2005, 6:29:06 AM7/1/05
to fra...@frankvm.com, mik...@szeredi.hu, fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
> > Perfectly valid argument. My question: is it not a security problem
> > to allow signals to reach a suid program?
>
> That's what I though too so I asked it first on the security mailing list.
> Apparently this signal behavior is normal.

Well, I think it's a fertile ground for hole hunters out there. Just
needs a little publicity ;)

Is it considered DoS for example if I prevent other users from sending
email? SIGSTOP on sendmail at the right moment (when the database is
locked) should do it fine.

> Stopping is a special case. But it is effectively the same as being
> indefinately slowed down by, say, 10000+ malicious processes and from
> that angle I don't see a fundamental difference w.r.t. security.

On a well protected multiuser system there will be ulimits in place to
prevent that.

> Killing the malicous processes should solve the problem. And killing
> one FUSE process looks easier to me than killing 10000+ ones.

Killing always works, if the sysadmin happens to be around. If not
then there's not a lot other users can do.

> I think this is not true. Every pathname passed to a setuid program
> by the user is basically "tainted". Standard I/O is tainted as well.

You mean suid programs are never to touch paths passed to them?

If that would be true, then fuse_allow_task() would not be needed, but
would do no harm either, since it would never be invoked by a suid
program.

> > They can't even check if a file is in fact on a FUSE mount
>
> They shouldn't. The pathname is not to be trusted anyway.
>
> I think FUSE has shown to be conservative enough w.r.t. security to be
> merged. But it may be interesting to consider:
>
> - replace ptraceability test by a kill()ability test.

You didn't consider the information leak aspect (point B in fuse.txt).

> - some sort of "intr" mount option for most signals on by default.

KILL will always interrupt a request. So getting rid of a malicious
mount should present no problems.

> - Forbid hiding data by mounting a FUSE filesystem on top of it (does
> FUSE check for this already?)

Yes. It checks for writablilty on the mountpoing (excluding limited
writablilty as /tmp for example).

> - /proc isn't a problem: most root processes tend to avoid it because
> it is synthetic and thus uninteresting. Maybe we should extend
> the idea of "synthetic file-systems being uninteresting" to any
> process which cannot receive signals from the FUSE mount owner. When
> one cannot hide data by a FUSE mount and its synthetic anyway so not
> interesting then just show the original empty mount point.

Been there. People (like Al Viro) didn't like it.

Miklos

Miklos Szeredi

unread,
Jul 1, 2005, 6:51:30 AM7/1/05
to fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com
> >
> > Here's a description of a theoretical DoS scenario:
> >
> > http://marc.theaimsgroup.com/?l=linux-fsdevel&m=111522019516694&w=2
>
> So the open() hangs indefinately. but what if blackhat tries to install
> a package from a no longer existing server on /net or via NFS?
>
> A user supplied pathname is not to be trusted by any setuid (or full
> root) program.

If /net won't detect a dead server within a timeout, I think it can be
considered broken.

> Another example: I'm not sure if there are still /dev/tty devices which
> may block indefinately upon open() but:
>
> - I have yet to see a setuid program which always uses O_NONBLOCK
> when opening user supplied pathnames.
> - one cannot stat() and then open() because that gives a race.

Is "being already broken" an excuse for preventing future breakage,
when these are fixed?

Miklos

Andrew Morton

unread,
Jul 1, 2005, 7:34:39 AM7/1/05
to Miklos Szeredi, mik...@szeredi.hu, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com
Miklos Szeredi <mik...@szeredi.hu> wrote:
>
> > > Well, there's the "unsolvable" writeback deadlock problem, that FUSE
> > > works around by not buffering dirty pages (and not allowing writable
> > > mmap). Does NFS solve that? I'm interested :)
> >
> > I don't know - first you'd have to describe it.
>
> A dirty page is being written back, but the userspace server needs to
> allocate memory to complete the request. But the allocation will
> block, since there's no more free memory.

That shouldn't happen with write() traffic due to the dirty memory
balancing logic.

It'll happen with MAP_SHARED. Totally disallowing MAP_SHARED sounds a bit
drastic, but of course nfs/v9fs could be taught to do that.

> > > Then there's the usual "filesystem recursing into itself" deadlock.
> >
> > Describe this completely as well, please.
>
> User does unlink("/mnt/userfs/file"). Userspace server receives
> request to unlink "/file". Then the daemon does
> unlink("/mnt/userfs/file"). This will deadlock on i_sem.

eh? How can the fuse client and the fuse server both get access to the
same file in this manner? I don't see how you could set that up with NFS,
for example.

> > > Userspace can tell the kernel, how long a dentry should be valid. I
> > > don't think the NFS protocol provides this. Same holds for the inode
> > > attributes.
> >
> > Why is that needed?
>
> Because, I can well imagine a synthetic filesystem, where file
> data/metadata change aribitrarily. In this case the timeout heuristic
> in NFS is not useful.
>
> In fact with NFS it's often a PITA, that it doesn't want to refresh a
> file's data/metatata, which I _know_ has changed on the server.

I think nfs can do this, as long as the modification was done through the
server. I'd expect v9fs would be the same.

> > Plus NFS and v9fs work across the network...
>
> Yes. I consider that a drawback.

Others (many) would disagree.


Sorry, but I'm not buying it. I still don't see a solid reason why all
this could not be done with nfs/v9fs, some kernel tweaks and the rest in
userspace. It would take some effort, but that effort would end up
strengthening existing kernel capabilities rather than adding brand new
things, which is good.

Frank van Maarseveen

unread,
Jul 1, 2005, 7:39:00 AM7/1/05
to Miklos Szeredi, fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
On Fri, Jul 01, 2005 at 12:45:22PM +0200, Miklos Szeredi wrote:
> > >
> > > Here's a description of a theoretical DoS scenario:
> > >
> > > http://marc.theaimsgroup.com/?l=linux-fsdevel&m=111522019516694&w=2
> >
> > So the open() hangs indefinately. but what if blackhat tries to install
> > a package from a no longer existing server on /net or via NFS?
> >
> > A user supplied pathname is not to be trusted by any setuid (or full
> > root) program.
>
> If /net won't detect a dead server within a timeout, I think it can be
> considered broken.
>
> > Another example: I'm not sure if there are still /dev/tty devices which
> > may block indefinately upon open() but:
> >
> > - I have yet to see a setuid program which always uses O_NONBLOCK
> > when opening user supplied pathnames.
> > - one cannot stat() and then open() because that gives a race.
>
> Is "being already broken" an excuse for preventing future breakage,
> when these are fixed?

All this breakage points into the same direction: A user supplied pathname


is not to be trusted by any setuid (or full root) program.

--
Frank

Frank van Maarseveen

unread,
Jul 1, 2005, 8:02:15 AM7/1/05
to Miklos Szeredi, fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
On Fri, Jul 01, 2005 at 12:27:01PM +0200, Miklos Szeredi wrote:
>
> You mean suid programs are never to touch paths passed to them?

never when euid==root.
The pathname could even point into /proc or anything else yet unknown,
e.g. by putting some symlinks at the right places. The mere act of
opening the file as root could have unwanted side effects already.

>
> If that would be true, then fuse_allow_task() would not be needed, but
> would do no harm either, since it would never be invoked by a suid
> program.

In theory it should not be necessary. But on a practical side: we need
to provide security for daemons with elevated privileges which need to
traverse all local disks.

> You didn't consider the information leak aspect (point B in fuse.txt).

Correct. I have no answer to that other than: is it a real problem or
yet something else a setuid program should take into consideration?
And what info can we extract already using inotify/dnotify? There are
several ways to monitor activity and it is all information. /proc (ps)
gives information too.

> > - Forbid hiding data by mounting a FUSE filesystem on top of it (does
> > FUSE check for this already?)
>
> Yes. It checks for writablilty on the mountpoing (excluding limited
> writablilty as /tmp for example).

But can you mount FUSE on top of a populated tree, a non-leaf dir?

> > - /proc isn't a problem: most root processes tend to avoid it because
> > it is synthetic and thus uninteresting. Maybe we should extend
> > the idea of "synthetic file-systems being uninteresting" to any
> > process which cannot receive signals from the FUSE mount owner. When
> > one cannot hide data by a FUSE mount and its synthetic anyway so not
> > interesting then just show the original empty mount point.
>
> Been there. People (like Al Viro) didn't like it.

including changing the ptraceability test by a signal test and including
the (IMHO) required emptyness of the mount stub?

Traversing a FUSE mountpoint is almost equivalent to talking with a
userspace program. Why should that be interesting when one simply wants
to traverse the FS? root isn't going to execute all user programs to
see what they do either.

--
Frank

Miklos Szeredi

unread,
Jul 1, 2005, 8:05:13 AM7/1/05
to ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com
> > A dirty page is being written back, but the userspace server needs to
> > allocate memory to complete the request. But the allocation will
> > block, since there's no more free memory.
>
> That shouldn't happen with write() traffic due to the dirty memory
> balancing logic.

How? It either blocks other allocations until the writeback is
completed (DoS) or allows memory to be exhausted (deadlock).

Making unpriv mounts work securely is not a trivial thing I can tell
you ;)

> > User does unlink("/mnt/userfs/file"). Userspace server receives
> > request to unlink "/file". Then the daemon does
> > unlink("/mnt/userfs/file"). This will deadlock on i_sem.
>
> eh? How can the fuse client and the fuse server both get access to the
> same file in this manner? I don't see how you could set that up with NFS,
> for example.

With a custom userspace NFS server you can do whatever you want.
That's the whole purpose of the exercise.

> > Because, I can well imagine a synthetic filesystem, where file
> > data/metadata change aribitrarily. In this case the timeout heuristic
> > in NFS is not useful.
> >
> > In fact with NFS it's often a PITA, that it doesn't want to refresh a
> > file's data/metatata, which I _know_ has changed on the server.
>
> I think nfs can do this, as long as the modification was done through the
> server. I'd expect v9fs would be the same.

It's often not. Sshfs is a good example. File server will not be
able to notify the client when anything changes. Polling is the only
solution, and NFS doesn't always get it right (and in fact it cannot).
It's much better to leave cache timeout policy to the userspace
filesystem, then trying to guess it in the kernel.


> > > Plus NFS and v9fs work across the network...
> >
> > Yes. I consider that a drawback.
>
> Others (many) would disagree.
>
>
> Sorry, but I'm not buying it. I still don't see a solid reason why all
> this could not be done with nfs/v9fs, some kernel tweaks and the rest in
> userspace. It would take some effort, but that effort would end up
> strengthening existing kernel capabilities rather than adding brand new
> things, which is good.

I'm not sure. NFS is a monster, everybody can agree. Getting all the
requirements of FUSE (safe unprivileged mounts, etc) would be a
nightmare.

FUSE does one thing, and it does that right. I think that's good.

Miklos

Frank van Maarseveen

unread,
Jul 1, 2005, 8:27:26 AM7/1/05
to Miklos Szeredi, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com
On Fri, Jul 01, 2005 at 12:11:53PM +0200, Miklos Szeredi wrote:
> > > Userspace can tell the kernel, how long a dentry should be valid. I
> > > don't think the NFS protocol provides this. Same holds for the inode
> > > attributes.
> >
> > Why is that needed?
>
> Because, I can well imagine a synthetic filesystem, where file
> data/metadata change aribitrarily. In this case the timeout heuristic
> in NFS is not useful.
>
> In fact with NFS it's often a PITA, that it doesn't want to refresh a
> file's data/metatata, which I _know_ has changed on the server.

This NFS issue is on my radar for years already. I have a patch which
is practical but a bit disgusting. IMHO it's orthogonal to FUSE.

--
Frank

bert hubert

unread,
Jul 1, 2005, 8:40:51 AM7/1/05
to Andrew Morton, Miklos Szeredi, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com
On Thu, Jun 30, 2005 at 11:50:59PM -0700, Andrew Morton wrote:
> Speaking of which, dumb question: what does FUSE offer over simply using
> NFS protocol to talk to the userspace filesystem driver?

NFS currently does not currently engender warm feelings wrt ease of
programming and quality in general - especially under Linux sadly enough.

It is also a narrow window through which to speak to the rich set of
options, flags, attributes and features the Linux kernel offers.

I think Solaris used to implement bind mounts through loopback NFS, but that
went out of fashion as well.

--
http://www.PowerDNS.com Open source, database driven DNS Software
http://netherlabs.nl Open and Closed source services

Miklos Szeredi

unread,
Jul 1, 2005, 8:40:50 AM7/1/05
to fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
> > You mean suid programs are never to touch paths passed to them?
>
> never when euid==root.
> The pathname could even point into /proc or anything else yet unknown,
> e.g. by putting some symlinks at the right places. The mere act of
> opening the file as root could have unwanted side effects already.

OK, open is out. However other operations (stat, unlink, chmod etc)
are always without side effects on "normal" filesystems. However on
FUSE they are very much unsafe (can block, not do what was instructed
and return success, etc).

> > If that would be true, then fuse_allow_task() would not be needed, but
> > would do no harm either, since it would never be invoked by a suid
> > program.
>
> In theory it should not be necessary. But on a practical side: we need
> to provide security for daemons with elevated privileges which need to
> traverse all local disks.

I agree wholeheartedly. However, I'm not arguing this point, because
it has been (rightly) pointed out, that private namespaces can be used
to solve this. While the suid issue is not solvable with private
namespaces.

> > You didn't consider the information leak aspect (point B in fuse.txt).
>
> Correct. I have no answer to that other than: is it a real problem or
> yet something else a setuid program should take into consideration?
> And what info can we extract already using inotify/dnotify?

Probably not file access patterns. But yes I don't consider this a
very grave problem.

> There are several ways to monitor activity and it is all
> information. /proc (ps) gives information too.
>
> > > - Forbid hiding data by mounting a FUSE filesystem on top of it (does
> > > FUSE check for this already?)
> >
> > Yes. It checks for writablilty on the mountpoing (excluding limited
> > writablilty as /tmp for example).
>
> But can you mount FUSE on top of a populated tree, a non-leaf dir?

Yes, but I think that's OK, because if the directory is writable on
which you mount, than you can hide the data already (unlinking it, but
keeping a reference though a file descriptor). And it's not very
effective hiding, since a bind mount of the mountpoint's filesystem
will reveal what's underneeth the FUSE mount.

> > > - /proc isn't a problem: most root processes tend to avoid it because
> > > it is synthetic and thus uninteresting. Maybe we should extend
> > > the idea of "synthetic file-systems being uninteresting" to any
> > > process which cannot receive signals from the FUSE mount owner. When
> > > one cannot hide data by a FUSE mount and its synthetic anyway so not
> > > interesting then just show the original empty mount point.
> >
> > Been there. People (like Al Viro) didn't like it.
>
> including changing the ptraceability test by a signal test and including
> the (IMHO) required emptyness of the mount stub?

It's been thrown out for the reason, that it's unacceptable if suid
programs see a different namespace as non-suid.

> Traversing a FUSE mountpoint is almost equivalent to talking with a
> userspace program. Why should that be interesting when one simply wants
> to traverse the FS? root isn't going to execute all user programs to
> see what they do either.

Yes. Please explain that to Al Viro, Christoph Hellwig et. al.
Believe me it's not something that's easy to get across, and I'm very
happy that you see it this way too :).

Miklos

Anton Altaparmakov

unread,
Jul 1, 2005, 8:56:18 AM7/1/05
to Andrew Morton, Miklos Szeredi, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com
On Fri, 2005-07-01 at 04:29 -0700, Andrew Morton wrote:
> Sorry, but I'm not buying it. I still don't see a solid reason why all
> this could not be done with nfs/v9fs, some kernel tweaks and the rest in
> userspace. It would take some effort, but that effort would end up
> strengthening existing kernel capabilities rather than adding brand new
> things, which is good.

FUSE is a generic FS API which is _very_ easy to write an FS for
(learning curve is about 10-15 minutes starting after you have unpacked
the fuse source code, at least it took me that long to start writing an
FS based on the example one provided). NFS is not anything like that.

Also can the NFS approach provide me with different content depending on
the uid of the accessing process? With FUSE that is easy as pie. Even
easier than that actually...

Best regards,

Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

Frank van Maarseveen

unread,
Jul 1, 2005, 9:07:16 AM7/1/05
to Miklos Szeredi, fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
On Fri, Jul 01, 2005 at 02:36:22PM +0200, Miklos Szeredi wrote:
> > > You mean suid programs are never to touch paths passed to them?
> >
> > never when euid==root.
> > The pathname could even point into /proc or anything else yet unknown,
> > e.g. by putting some symlinks at the right places. The mere act of
> > opening the file as root could have unwanted side effects already.
>
> OK, open is out. However other operations (stat, unlink, chmod etc)
> are always without side effects on "normal" filesystems. However on
> FUSE they are very much unsafe (can block, not do what was instructed
> and return success, etc).

What about tricking a setuid program to stat into /auto (/mnt/auto,
/misc, whatever it is called)? then the automounter will act upon a root
request with again possibly unwanted side effects. See how careful a
setuid/full-root program must be in handling userdata including pathnames?

FUSE suddenly makes this more obvious but it is not new.

> > > > - /proc isn't a problem: most root processes tend to avoid it because
> > > > it is synthetic and thus uninteresting. Maybe we should extend
> > > > the idea of "synthetic file-systems being uninteresting" to any
> > > > process which cannot receive signals from the FUSE mount owner. When
> > > > one cannot hide data by a FUSE mount and its synthetic anyway so not
> > > > interesting then just show the original empty mount point.
> > >
> > > Been there. People (like Al Viro) didn't like it.
> >
> > including changing the ptraceability test by a signal test and including
> > the (IMHO) required emptyness of the mount stub?
>
> It's been thrown out for the reason, that it's unacceptable if suid
> programs see a different namespace as non-suid.

You mean root versus non-root. or user versus other user I assume. Because
the euid (fsuid) is what matters.

But then: this _is_ already the case for NFS when squash_root is in effect
(what about kerberos et.al?). So there are several reasons to consider
FUSE a nonlocal fs instead of a local one so nothing new there. FUSE could
be used to implement a usable (not perfect) userspace NFS/ftp client.

To require an empty stub to mount FUSE upon makes the whole picture
cleaner: users are only able to extend the namespace _leaf_ nodes for
themselves and processes they can send signals to: setuid programs
which do not fully become root. The existing namespace [nodes] remains
unchanged for everyone.

--
Frank

Anton Altaparmakov

unread,
Jul 1, 2005, 9:14:25 AM7/1/05
to Andrew Morton, Miklos Szeredi, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com
On Fri, 2005-07-01 at 13:53 +0100, Anton Altaparmakov wrote:
> On Fri, 2005-07-01 at 04:29 -0700, Andrew Morton wrote:
> > Sorry, but I'm not buying it. I still don't see a solid reason why all
> > this could not be done with nfs/v9fs, some kernel tweaks and the rest in
> > userspace. It would take some effort, but that effort would end up
> > strengthening existing kernel capabilities rather than adding brand new
> > things, which is good.
>
> FUSE is a generic FS API which is _very_ easy to write an FS for
> (learning curve is about 10-15 minutes starting after you have unpacked
> the fuse source code, at least it took me that long to start writing an
> FS based on the example one provided). NFS is not anything like that.
>
> Also can the NFS approach provide me with different content depending on
> the uid of the accessing process? With FUSE that is easy as pie. Even
> easier than that actually...

I forgot: And doesn't NFS require stable inode numbers and other
"invariables" like that for it to work? FUSE doesn't and those
requirements are a real PITA in a lot of cases where there simply are no
inodes and the numbers are synthetic and change on each remount or even
on each access after the dentry has expired...

And I always thought that doing FS in userspace via NFS is considered an
ugly hack. I didn't have the impression that that had changed recently.
(-;

Miklos Szeredi

unread,
Jul 1, 2005, 9:26:20 AM7/1/05
to fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
> > OK, open is out. However other operations (stat, unlink, chmod etc)
> > are always without side effects on "normal" filesystems. However on
> > FUSE they are very much unsafe (can block, not do what was instructed
> > and return success, etc).
>
> What about tricking a setuid program to stat into /auto (/mnt/auto,
> /misc, whatever it is called)? then the automounter will act upon a root
> request with again possibly unwanted side effects. See how careful a
> setuid/full-root program must be in handling userdata including pathnames?

I don't see why /auto is special. It's basically a userspace
filesystem too, but that's not what is specaial about FUSE. It's the
fact the it's a userspace filesystem controlled by an _ordinary user_.

> FUSE suddenly makes this more obvious but it is not new.

I believe it _is_ something new. If it were not, then your arguments
would be bulletproof. As it is, I think you miss the point that the
side effect is actually in the hands of the user invoking the suid
program, instead of something external.

> > > including changing the ptraceability test by a signal test and including
> > > the (IMHO) required emptyness of the mount stub?
> >
> > It's been thrown out for the reason, that it's unacceptable if suid
> > programs see a different namespace as non-suid.
>
> You mean root versus non-root. or user versus other user I assume. Because
> the euid (fsuid) is what matters.

Yes.

> But then: this _is_ already the case for NFS when squash_root is in effect
> (what about kerberos et.al?). So there are several reasons to consider
> FUSE a nonlocal fs instead of a local one so nothing new there. FUSE could
> be used to implement a usable (not perfect) userspace NFS/ftp client.

Yes. In fact even if the check were left out of the kernel, the
userspace filesystem could still return different data/error based on
fsuid/fsgid/pid.

So what's so controversial about it? I really fail to understand...

> To require an empty stub to mount FUSE upon makes the whole picture
> cleaner: users are only able to extend the namespace _leaf_ nodes for
> themselves and processes they can send signals to: setuid programs
> which do not fully become root. The existing namespace [nodes] remains
> unchanged for everyone.

It's not as simple. A filesystem can be mounted many times (either
with mount --bind, or just by mounting the same device on multiple
mountpoints). In this case you can't ensure, that a mountpoint will
remain a leaf node after being mounted on.

Miklos

Eric Van Hensbergen

unread,
Jul 1, 2005, 9:30:36 AM7/1/05
to Andrew Morton, Miklos Szeredi, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com, v9fs-de...@lists.sourceforge.net
On 7/1/05, Andrew Morton <ak...@osdl.org> wrote:

> Miklos Szeredi <mik...@szeredi.hu> wrote:
> > > > Userspace can tell the kernel, how long a dentry should be valid. I
> > > > don't think the NFS protocol provides this. Same holds for the inode
> > > > attributes.
> > >
> > > Why is that needed?
> >
> > Because, I can well imagine a synthetic filesystem, where file
> > data/metadata change aribitrarily. In this case the timeout heuristic
> > in NFS is not useful.
> >
> > In fact with NFS it's often a PITA, that it doesn't want to refresh a
> > file's data/metatata, which I _know_ has changed on the server.
>
> I think nfs can do this, as long as the modification was done through the
> server. I'd expect v9fs would be the same.
>

v9fs aggressively invalidates dentries by default -- it is our
experience that caching metadata (particularly in synthetics) causes
more problems than it is worth. That being said, there are prototype
designs for v9fs cache layers which actively detect if underlying file
systems are synthetic or static and allow parametrized cache policies
(for both the dcache and the page cache).

As a side-note which I know less about, I believe NFSv4 includes
server-push invalidation semantics, but I can't remember if that
applies to metadata or just data.

-eric

Eric Van Hensbergen

unread,
Jul 1, 2005, 9:32:58 AM7/1/05
to Miklos Szeredi, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com, v9fs-de...@lists.sourceforge.net
On 7/1/05, Miklos Szeredi <mik...@szeredi.hu> wrote:
> >
> > v9fs has a user-level server too. Maybe it has been used in FUSE-like
> > scenarios more than NFS.

We've really only dabbled with v9fs and user-level file services,
mostly through interacting with Plan 9 From User Space applications
(http://www.plan9.us) However, there are people actively improving
this area of functionality including providing an SDK to allow easy
creation of synthetic file systems. That being said, there are many
aspects of v9fs which have been written/re-written with the express
purpose of providing support for such synthetics.

>
> I think the p9 protocol is suffering from trying to be too generic.
> The FUSE kernel interface is probably slightly tied to the linux VFS,
> and would present problems if trying to port to other *NIX or god
> forbid some other OS family altogether.
>

I don't know where 9P "suffers" from being too generic, it's just
well-designed and has done a good job of keeping things simple --
something that the plethora of over designed, bloated interfaces of
today could learn from.

>
> > Plus NFS and v9fs work across the network...
>
> Yes. I consider that a drawback. FUSE does data transfer very
> efficiently (single copy), without the heavy network infrastructure
> being in the way.
>

I'll grant you this is something v9fs-2.0 suffers from, but its
something we are actively addressing in v9fs-2.1. We're working more
towards the implementation that is present in the Plan 9 kernel, where
the core efficiently multiplexes the requests either directly to local
servers (in Plan 9's case via function call APIs) or encapsulates them
for shipping across the network. The 9P interface is used for both,
it just has different embodiments depending on underlying transport.

That being said, I imagine the time spent context switching in and out
of the kernel dominates performance. With a proper mux there is no
reason why v9fs can't be made as efficient as FUSE - and that's what
we intend to demonstrate in v9fs-2.1. Plus, with v9fs you get the
benefit of being able to export your synthetic file systems over the
network with no additional copies.

Further, when you create an infrastructure which is meant to work over
a network, you take fewer things for granted -- which ultimately leads
to a more robust system capable of dealing with many of these
problems.

-eric

Frank van Maarseveen

unread,
Jul 1, 2005, 9:53:17 AM7/1/05
to Anton Altaparmakov, Andrew Morton, Miklos Szeredi, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com
On Fri, Jul 01, 2005 at 01:53:54PM +0100, Anton Altaparmakov wrote:
> On Fri, 2005-07-01 at 04:29 -0700, Andrew Morton wrote:
> > Sorry, but I'm not buying it. I still don't see a solid reason why all
> > this could not be done with nfs/v9fs, some kernel tweaks and the rest in
> > userspace. It would take some effort, but that effort would end up
> > strengthening existing kernel capabilities rather than adding brand new
> > things, which is good.
>
> Also can the NFS approach provide me with different content depending on
> the uid of the accessing process? With FUSE that is easy as pie. Even
> easier than that actually...

unfsd can that I believe. However, FUSE and user space NFSd are complementary.
For every NFS solution one still needs to do the mounting as root. FUSE
addresses the client side: it can implement a user space NFS client.

--
Frank

Miklos Szeredi

unread,
Jul 1, 2005, 9:56:15 AM7/1/05
to eri...@gmail.com, mik...@szeredi.hu, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com, v9fs-de...@lists.sourceforge.net
> I don't know where 9P "suffers" from being too generic, it's just
> well-designed and has done a good job of keeping things simple --
> something that the plethora of over designed, bloated interfaces of
> today could learn from.

True. I very much like the simplicity of the 9P protocol. But it's
system independence sometimes makes it fit poorly to the Linux VFS
interface. I guess you have a wide experience with this :)

> > > Plus NFS and v9fs work across the network...
> >
> > Yes. I consider that a drawback. FUSE does data transfer very
> > efficiently (single copy), without the heavy network infrastructure
> > being in the way.
> >
>
> I'll grant you this is something v9fs-2.0 suffers from, but its
> something we are actively addressing in v9fs-2.1. We're working more
> towards the implementation that is present in the Plan 9 kernel, where
> the core efficiently multiplexes the requests either directly to local
> servers (in Plan 9's case via function call APIs) or encapsulates them
> for shipping across the network. The 9P interface is used for both,
> it just has different embodiments depending on underlying transport.
>
> That being said, I imagine the time spent context switching in and out
> of the kernel dominates performance.

Context switch happens from one process to the other, not when
entering/leaving the kernel (which is very efficient).

So it's much more important to reduce the number of round-trips for a
single operation, than multiplexing requests for multiple operations.

> With a proper mux there is no reason why v9fs can't be made as
> efficient as FUSE - and that's what we intend to demonstrate in
> v9fs-2.1. Plus, with v9fs you get the benefit of being able to
> export your synthetic file systems over the network with no
> additional copies.

Yes, but does that matter? I'm not sure that it's a good idea
bundling network filesystem functionality together with userspace
filesystem functionality. Each has it's own set of requirements, and
it's own way of working optimally.

What would people say if ext3 was always mounted locally through NFS,
because the kernel would only provide the NFS filesystem client.

Differentiation of interfaces depending on the "closeness" of the
client to the server makes good sense IMO. We currently have
in-kernel and across-network. FUSE adds in-userspace in between those
two.

Sometime these can overlap, but one interface will always be more
optimal (in terms of functionality as well as speed) for a specific
application.

> Further, when you create an infrastructure which is meant to work over
> a network, you take fewer things for granted -- which ultimately leads
> to a more robust system capable of dealing with many of these
> problems.

Yes. I'm not speaking agains v9fs, which I think has a valid niche,
as well as FUSE.

Miklos

Eric Van Hensbergen

unread,
Jul 1, 2005, 10:20:55 AM7/1/05
to Miklos Szeredi, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com, v9fs-de...@lists.sourceforge.net
On 7/1/05, Miklos Szeredi <mik...@szeredi.hu> wrote:
> > I don't know where 9P "suffers" from being too generic, it's just
> > well-designed and has done a good job of keeping things simple --
> > something that the plethora of over designed, bloated interfaces of
> > today could learn from.
>
> True. I very much like the simplicity of the 9P protocol. But it's
> system independence sometimes makes it fit poorly to the Linux VFS
> interface. I guess you have a wide experience with this :)
>

Yeah, but most of our problems had less to do with the VFS interface
per se, and more to do with the dcache/page-cache. In the long run,
the portability is something you may want though -- not only to
provide support under BSD or whatever, but also to insulate changes in
the VFS API from user file servers.



>
> So it's much more important to reduce the number of round-trips for a
> single operation, than multiplexing requests for multiple operations.
>

Agreed, this will be something we'll (v9fs) have to keep a close tab
on to keep things efficient.



> > With a proper mux there is no reason why v9fs can't be made as
> > efficient as FUSE - and that's what we intend to demonstrate in
> > v9fs-2.1. Plus, with v9fs you get the benefit of being able to
> > export your synthetic file systems over the network with no
> > additional copies.
>
> Yes, but does that matter? I'm not sure that it's a good idea
> bundling network filesystem functionality together with userspace
> filesystem functionality. Each has it's own set of requirements, and
> it's own way of working optimally.
>

I see your point, but increasingly common usage environments are
distributed systems and I think network synthetics will have their
niche.

> What would people say if ext3 was always mounted locally through NFS,
> because the kernel would only provide the NFS filesystem client.

Probably the same thing they would say if ext3 was a user-space
application that always needed to be mounted via FUSE ;)

>
> Differentiation of interfaces depending on the "closeness" of the
> client to the server makes good sense IMO. We currently have
> in-kernel and across-network. FUSE adds in-userspace in between those
> two.
>

I think that remains to be seen. There is much to be gained from
blurring the differentiation as we move Linux towards a first-class
distributed system. If unified interfaces can be made "good-enough"
performance wise, what justifies having multiple interfaces depending
on network versus local? Specialization has its place, but
performance mongering at the cost of design is what killed systems
research. In the end, specialization has its place, but I think it's
always worth striving towards unified interfaces when performance
doesn't suffer to a great degree.

>
> > Further, when you create an infrastructure which is meant to work over
> > a network, you take fewer things for granted -- which ultimately leads
> > to a more robust system capable of dealing with many of these
> > problems.
>
> Yes. I'm not speaking agains v9fs, which I think has a valid niche,
> as well as FUSE.
>

FUSE certainly has its place, and has done a great job creating an
environment in which it is relatively easy to create new file systems
in user-space. My main point in responding was to take the position
that the v9fs mechanisms are adequate to provide user-space file
systems and that while it was not the primary motivation behind the
v9fs project, we are actively pursuing improving the performance and
robustness of our mechanisms for providing user-space (as well as
kernel-space) file service and developing an SDK to ease the
implementation of 9P-based synthetic file servers.

-eric

Miklos Szeredi

unread,
Jul 1, 2005, 10:33:34 AM7/1/05
to eri...@gmail.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com, v9fs-de...@lists.sourceforge.net
> > What would people say if ext3 was always mounted locally through NFS,
> > because the kernel would only provide the NFS filesystem client.
>
> Probably the same thing they would say if ext3 was a user-space
> application that always needed to be mounted via FUSE ;)

Yes, and rightly.

One of the misunderstandings about userspace filesystems (Linus falls
into this) is to compare it with microkernels.

FUSE (and userspace filesystems in general) are NOT meant to replace
in kernel filesystems or the VFS. They are an addition with which
different kinds of filesystems can be implemented much better than
they could be in kernel.

Miklos

Frank van Maarseveen

unread,
Jul 1, 2005, 11:25:10 AM7/1/05
to Miklos Szeredi, fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
On Fri, Jul 01, 2005 at 03:21:59PM +0200, Miklos Szeredi wrote:
>
> > To require an empty stub to mount FUSE upon makes the whole picture
> > cleaner: users are only able to extend the namespace _leaf_ nodes for
> > themselves and processes they can send signals to: setuid programs
> > which do not fully become root. The existing namespace [nodes] remains
> > unchanged for everyone.
>
> It's not as simple. A filesystem can be mounted many times (either
> with mount --bind, or just by mounting the same device on multiple
> mountpoints). In this case you can't ensure, that a mountpoint will
> remain a leaf node after being mounted on.

I have bind-mounted / on /net/blabla
I tried two experiments:

mounting something under / and looking for it under /net/blabla
mounting something under /net/blabla and looking for it under /

The experiment was done with bind mounts and by mounting a USB stick
(/dev/sdb1) and there was no auto propagation of mounts.

(2.6.12-rc6)

How can a leaf dir suddenly become non-leaf by a mount without an explicit
mount command?

--
Frank

Matthias Urlichs

unread,
Jul 1, 2005, 12:47:16 PM7/1/05
to linux-...@vger.kernel.org
Hi, Andrew Morton wrote:

> Sorry, but I'm not buying it. I still don't see a solid reason why all
> this could not be done with nfs/v9fs, some kernel tweaks and the rest in
> userspace.

Let's forget about NFS here. It's stateless. You don't want a wholly
stateless layer between two stateful instances; the fact that it works for
a disk-based NFS server isn't proof that it'd work for gmailfs or sshfs.

There are a lot of FUSE server implementations out there already.
You want all of them to rewrite their code for v9fs?

I admit that I don't know zilch about how difficult it is to write a v9fs
server (is there sane sample code / a support library?) or how much
overhead such a server would incur or how safe it'd be to run a
user-controlled server on the same machine as the mountpoint.
The point is that the FUSE people already cover all these points,
thus: unless there's a major technical problem with it that v9fs solves
better, I'd advocate to include it.

--
Matthias Urlichs | {M:U} IT Design @ m-u-it.de | sm...@smurf.noris.de
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
- -
Magpie, n.:
A bird whose thievish disposition suggested to someone that it
might be taught to talk.
-- Ambrose Bierce, "The Devil's Dictionary"

Miklos Szeredi

unread,
Jul 1, 2005, 1:07:59 PM7/1/05
to fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
> > It's not as simple. A filesystem can be mounted many times (either
> > with mount --bind, or just by mounting the same device on multiple
> > mountpoints). In this case you can't ensure, that a mountpoint will
> > remain a leaf node after being mounted on.
>
> I have bind-mounted / on /net/blabla
> I tried two experiments:
>
> mounting something under / and looking for it under /net/blabla
> mounting something under /net/blabla and looking for it under /
>
> The experiment was done with bind mounts and by mounting a USB stick
> (/dev/sdb1) and there was no auto propagation of mounts.

I'm not talking about auto propagation (that's only now being
implemented by Ram Pai, and is not in stock kernels).

What I'm saying is that mounting something over a leaf node, does not
guarantee, that it will remain a leaf node after it's been mounted on.

For example:

mkdir /tmp/leafdir
mkdir /tmp/rootcopy
mount --bind / /tmp/rootcopy
mount /dev/sdb1 /tmp/leafdir
mkdir /tmp/rootcopy/tmp/leafdir/child

Now 'leafdir' is no longer a leaf.

I'm not saying this is a problem, but also I don't see any
overwhelming reason to not allow user mounts over non-leaf
directories.

Miklos

Frank van Maarseveen

unread,
Jul 1, 2005, 2:05:41 PM7/1/05
to Miklos Szeredi, fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
On Fri, Jul 01, 2005 at 07:04:50PM +0200, Miklos Szeredi wrote:

> I'm not saying this is a problem, but also I don't see any
> overwhelming reason to not allow user mounts over non-leaf
> directories.

All things considered I'd still prefer forbidding FUSE mounts on non-leaf
dirs. For name space sanity. And it may be easier to get the whole thing
accepted:

- One could argue that the existing name space is extended rather than
changed [for a subset of processes], what Al Viro seems to reject.
- The processes which cannot be ptraced/sent a signal by the mount
owner are not "forced" to traverse the FUSE mount for the sake of
name space invariancy, with all associated security problems: they
can see everything up to the leaf node of all the usual mounts.

But put otherwise: is there a compelling reason to permit FUSE mounts on
non-leaf nodes?

Can FUSE mount on a file like NFS?

What is your opinion about replacing the ptrace check by a signal check
(later on, no hurry)?

--
Frank

Jeremy Maitin-Shepard

unread,
Jul 1, 2005, 3:42:57 PM7/1/05
to linux-...@vger.kernel.org
Frank van Maarseveen <fra...@frankvm.com> writes:

[snip]

> But put otherwise: is there a compelling reason to permit FUSE mounts on
> non-leaf nodes?

In my own use of FUSE, I have found it handy to stick mount scripts in
some of the directories that I use as FUSE mount points.

--
Jeremy Maitin-Shepard

Eric W. Biederman

unread,
Jul 2, 2005, 6:04:38 AM7/2/05
to Miklos Szeredi, eri...@gmail.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com, v9fs-de...@lists.sourceforge.net
Miklos Szeredi <mik...@szeredi.hu> writes:

>> > What would people say if ext3 was always mounted locally through NFS,
>> > because the kernel would only provide the NFS filesystem client.
>>
>> Probably the same thing they would say if ext3 was a user-space
>> application that always needed to be mounted via FUSE ;)
>
> Yes, and rightly.
>
> One of the misunderstandings about userspace filesystems (Linus falls
> into this) is to compare it with microkernels.
>
> FUSE (and userspace filesystems in general) are NOT meant to replace
> in kernel filesystems or the VFS. They are an addition with which
> different kinds of filesystems can be implemented much better than
> they could be in kernel.

Taking a quick glance at v9fs and fuse I fail to see how either
plays nicely with the page cache.

v9fs according to my reading of the protocol specification does
not have any concept of a lease. So you can't tell if you are
talking about a virtual filesystem where all calls should be passed
straight to the server or a real filesystem where you can perform
caching. The implementation simply appears to bypass the pagecache
which seems sane.

Skimming through the FUSE code I see the same problem, in that you can't
autodetect the right thing. This is currently hacked around with
"direct_io" mount option selecting between a cached and a non-cached
status on a filesystem basis at mount time. But having
a per file flag would be nicer. I also don't understand
why in fuse direct_io is an if statement in fuse_file_read/write
instead of simply being a different set of filesystem operations.

Neither implementation seems to forward user space locks to the
filesystem server.

Eric

Miklos Szeredi

unread,
Jul 2, 2005, 10:52:01 AM7/2/05
to fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
> > I'm not saying this is a problem, but also I don't see any
> > overwhelming reason to not allow user mounts over non-leaf
> > directories.
>
> All things considered I'd still prefer forbidding FUSE mounts on non-leaf
> dirs. For name space sanity. And it may be easier to get the whole thing
> accepted:
>
> - One could argue that the existing name space is extended rather than
> changed [for a subset of processes], what Al Viro seems to reject.
> - The processes which cannot be ptraced/sent a signal by the mount
> owner are not "forced" to traverse the FUSE mount for the sake of
> name space invariancy, with all associated security problems: they
> can see everything up to the leaf node of all the usual mounts.
>
> But put otherwise: is there a compelling reason to permit FUSE mounts on
> non-leaf nodes?

Not really. Maybe it does have some uses, but I'm not aware of any.

But I don't think it would matter in the acceptance of the mount
hiding patch, since that patch was not rejected on the basis of what
FUSE would use it for, rather for the general philosophy of not
allowing namespace differences based on user id.

> Can FUSE mount on a file like NFS?

Yes.

> What is your opinion about replacing the ptrace check by a signal check
> (later on, no hurry)?

Maybe. You'd still have to convince me, that signals sent to suid
programs are not a security problem.

Miklos

Miklos Szeredi

unread,
Jul 2, 2005, 11:01:45 AM7/2/05
to ebie...@xmission.com, eri...@gmail.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com, v9fs-de...@lists.sourceforge.net
> Taking a quick glance at v9fs and fuse I fail to see how either
> plays nicely with the page cache.
>
> v9fs according to my reading of the protocol specification does
> not have any concept of a lease. So you can't tell if you are
> talking about a virtual filesystem where all calls should be passed
> straight to the server or a real filesystem where you can perform
> caching. The implementation simply appears to bypass the pagecache
> which seems sane.
>
> Skimming through the FUSE code I see the same problem, in that you can't
> autodetect the right thing. This is currently hacked around with
> "direct_io" mount option selecting between a cached and a non-cached
> status on a filesystem basis at mount time. But having
> a per file flag would be nicer.

There's a plan to make this work. The kernel ABI has alredy been
prepared for this, it would be relatively little work to implement.
But I usually wait with something like this until people actually
start asking for this feature.

> I also don't understand why in fuse direct_io is an if statement in
> fuse_file_read/write instead of simply being a different set of
> filesystem operations.

Good point. I'll fix that.

> Neither implementation seems to forward user space locks to the
> filesystem server.

This too has been discussed. The last half year has been mostly spend
with ironing out problems cought during integration. Sometime this
summer I'll start implementing these new features (inode based API,
locking, userspace NFS serving, maybe shared writable mmap support).

Miklos

Frank van Maarseveen

unread,
Jul 2, 2005, 12:02:34 PM7/2/05
to Miklos Szeredi, fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
On Sat, Jul 02, 2005 at 04:49:24PM +0200, Miklos Szeredi wrote:
> >
> > All things considered I'd still prefer forbidding FUSE mounts on non-leaf
> > dirs. For name space sanity. And it may be easier to get the whole thing
> > accepted:
> >
>
> But I don't think it would matter in the acceptance of the mount
> hiding patch, since that patch was not rejected on the basis of what
> FUSE would use it for, rather for the general philosophy of not
> allowing namespace differences based on user id.

That would really be a loss.

After some thinking, the whole "not allowing namespace differences
based on user id" philosophy is unenforcable and not even true sometimes
nowadays. Think NFS: have a look at the unfsd server, you'll be surprised
what it can do. Think any other networked file system exported by a
machine with an unusual disk file-system underneath. IIRC ncpfs does
this on the server based on access and thus based on uid.

(hmm, I _hated_ it seeing empty directories only because I had no access
to anything below. Based on that I'd prefer EACCES instead of seeing an
empty mount stub when FUSE denies access to root or any other user.)

The thing is, root rules the _local_ part of the name space. So it should
make a _huge_ difference if FUSE can fiddle with that or only with what's
below the leaf nodes.

> > What is your opinion about replacing the ptrace check by a signal check
> > (later on, no hurry)?
>
> Maybe. You'd still have to convince me, that signals sent to suid
> programs are not a security problem.

google kill(2):

http://www.opengroup.org/onlinepubs/007908799/xsh/kill.html

It is _defined_ behavior. So, it is up to the quality of the programmer
whether or not it results in a security problem ;-)

--
Frank

Eric Van Hensbergen

unread,
Jul 2, 2005, 12:45:45 PM7/2/05
to Eric W. Biederman, Miklos Szeredi, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com, v9fs-de...@lists.sourceforge.net
On Sat, 2 Jul 2005 6:15 am, Eric W. Biederman wrote:
>
> Taking a quick glance at v9fs and fuse I fail to see how either
> plays nicely with the page cache.
>

True, in fact it actively avoids using it. The previous version used
both the page cache and the dcache with undesirable effects on synthetic
file systems so we removed cache support. Our intention is to design a
cache layer (similar to cfs on Plan 9) which handles cache semantics
which can be parameterized with the appropriate cache policy depending
on the underlying file server.

> v9fs according to my reading of the protocol specification does
> not have any concept of a lease. So you can't tell if you are
> talking about a virtual filesystem where all calls should be passed
> straight to the server or a real filesystem where you can perform
> caching.

While 9P contains no explicit support for leases and cacheing there is
an informal mechanism which is used (at least for plan 9 file servers).
If the qid.vers is 0 the file can be assumed to be a synthetic file and
so it is not cached.

>
> Neither implementation seems to forward user space locks to the
> filesystem server.
>

Yup. We have exclusive open semantics but not locks in the Posix
sense. Lock support is on our 2.1 roadmap.

-eric

Eric W. Biederman

unread,
Jul 2, 2005, 1:38:04 PM7/2/05
to Eric Van Hensbergen, Miklos Szeredi, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com, v9fs-de...@lists.sourceforge.net
Eric Van Hensbergen <eri...@gmail.com> writes:

> On Sat, 2 Jul 2005 6:15 am, Eric W. Biederman wrote:
>>
>> Taking a quick glance at v9fs and fuse I fail to see how either
>> plays nicely with the page cache.
>>
>
> True, in fact it actively avoids using it. The previous version used both the
> page cache and the dcache with undesirable effects on synthetic file systems so
> we removed cache support. Our intention is to design a cache layer (similar to
> cfs on Plan 9) which handles cache semantics which can be parameterized with the
> appropriate cache policy depending on the underlying file server.

Not having auto discovery for that kind of thing disturbs me. But
if you can discover what you must do and then the policy is about
what you can do it I guess I'm fine with that.

>> v9fs according to my reading of the protocol specification does
>> not have any concept of a lease. So you can't tell if you are
>> talking about a virtual filesystem where all calls should be passed
>> straight to the server or a real filesystem where you can perform
>> caching.
>
> While 9P contains no explicit support for leases and cacheing there is an
> informal mechanism which is used (at least for plan 9 file servers). If the
> qid.vers is 0 the file can be assumed to be a synthetic file and so it is not
> cached.

That sounds sane. With that you can at least do NFS style caching
with a lot of stat calls to verify your cache is coherent and by
implementing it as a write-through cache you can even do a halfway
decent job of being cache coherent. Which is probably about the
best you can do with the current unix API.

With a write-through cache you can likely achieve the same
semantic effect of totally not caching a file with an appropriate
number of stat calls. Not caching some files will like yield

I suggest you document the quid.vers == 0 magic for an uncachable
file, so future interoperability is assured.

Eric

Miklos Szeredi

unread,
Jul 3, 2005, 2:18:29 AM7/3/05
to fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
> After some thinking, the whole "not allowing namespace differences
> based on user id" philosophy is unenforcable and not even true sometimes
> nowadays. Think NFS: have a look at the unfsd server, you'll be surprised
> what it can do. Think any other networked file system exported by a
> machine with an unusual disk file-system underneath. IIRC ncpfs does
> this on the server based on access and thus based on uid.

Hmm, do you mean returning different directory contents based on uid?

> (hmm, I _hated_ it seeing empty directories only because I had no access
> to anything below. Based on that I'd prefer EACCES instead of seeing an
> empty mount stub when FUSE denies access to root or any other user.)

Well, it works that way currently, and there doesn't seem to be any
real problem with it.

> The thing is, root rules the _local_ part of the name space. So it should
> make a _huge_ difference if FUSE can fiddle with that or only with what's
> below the leaf nodes.

I don't really understand what you mean by "local".

The problem with this leaf node philosophy, is that it's not really
consistent. You can ensure that a mountpoint is a leaf node at mount
time, but you can force it to remain a leaf node after the mount. So
I don't see why this check at mount time would make _any_ difference.

> > > What is your opinion about replacing the ptrace check by a signal check
> > > (later on, no hurry)?
> >
> > Maybe. You'd still have to convince me, that signals sent to suid
> > programs are not a security problem.
>
> google kill(2):
>
> http://www.opengroup.org/onlinepubs/007908799/xsh/kill.html
>
> It is _defined_ behavior. So, it is up to the quality of the programmer
> whether or not it results in a security problem ;-)

Ahh, right.

The info leak argument still holds, but it's pretty weak.

So if the current behavior causes a problem for sombody, and relaxing
the check from ptraceability to killability fixes it, then I'll
consider doing it. Until then, let's keep the more secure check.

Miklos

Pavel Machek

unread,
Jul 3, 2005, 6:42:40 AM7/3/05
to Miklos Szeredi, ar...@infradead.org, ak...@osdl.org, linux-...@vger.kernel.org
Hi!

> > if you are so interested in getting fuse merged... why not merge it
> > first with the security stuff removed entirely. And then start
> > discussing putting security stuff back in ?
>
> BTW, I can split out the security stuff into a separate patch from the
> rest, if people feel more confortable discussing a concrete patch,
> instead of a range of lines (actually a 15 line function) of the
> whole.

Yes, I think that would help. [And also make it last in the series
;-)]
Pavel
--
teflon -- maybe it is a trademark, but it should not be.

Frank van Maarseveen

unread,
Jul 3, 2005, 7:27:25 AM7/3/05
to Miklos Szeredi, fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
On Sun, Jul 03, 2005 at 08:16:37AM +0200, Miklos Szeredi wrote:
> > After some thinking, the whole "not allowing namespace differences
> > based on user id" philosophy is unenforcable and not even true sometimes
> > nowadays. Think NFS: have a look at the unfsd server, you'll be surprised
> > what it can do. Think any other networked file system exported by a
> > machine with an unusual disk file-system underneath. IIRC ncpfs does
> > this on the server based on access and thus based on uid.
>
> Hmm, do you mean returning different directory contents based on uid?

http://clusternfs.sourceforge.net

Don't ask me how this plays with the dcache.

> > The thing is, root rules the _local_ part of the name space. So it should
> > make a _huge_ difference if FUSE can fiddle with that or only with what's
> > below the leaf nodes.
>
> I don't really understand what you mean by "local".

The opposite of "local" is "remote", i.e. networked filesystems:

mount foo:/bar /usr/src/bar

/, /usr and /usr/src are stored on a local disk. /usr/src/bar/* is not.
Namespace invariance can be guaranteed for the "/usr/src" part. Not for
anything below unless you control the peer.

>
> The problem with this leaf node philosophy, is that it's not really
> consistent. You can ensure that a mountpoint is a leaf node at mount

> time, but you cannot force it to remain a leaf node after the mount. So
^^^
inserted by me

ok, I just remembered that any process with an open directory handle
could still fchdir() underneath. I think the leaf node enforcing is
possible but it is indeed a bit more complicated.

(Hmm, it's a bit bizarre but could you mount FUSE on, for example, a
named pipe and change it into a directory?)

> I don't see why this check at mount time would make _any_ difference.

It should be possible to do audits on local filesystems, e.g. by:

find / /home /var -xdev ....

This can be done as root but sometimes you may want to do this with the
uid/gid of a specific user, for safety or for checking what the user
actually can access or damage. And that won't work as expected when the
user places a FUSE mount on top of his own login directory. But I don't
think leaf node enforcing is required from a security point of view. This
is the only thing I could come up with.

IMHO The namespace argument against FUSE is weak for multiple reasons. The
only variancy I see is when crossing the mount point. And that disappears
once EACCES is returned when non-ptraceable processes try to cross it.
But that's not really acceptable (see previous audit case) unless FUSE
refuses to mount on non-leaf dirs.

--
Frank

Miklos Szeredi

unread,
Jul 3, 2005, 9:26:48 AM7/3/05
to fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
> > Hmm, do you mean returning different directory contents based on uid?
>
> http://clusternfs.sourceforge.net
>
> Don't ask me how this plays with the dcache.

But here the decision on what to return is in the _server_. There's
nothing magic about that. It's as if it was N different servers for N
different clients, only more effective.

> The opposite of "local" is "remote", i.e. networked filesystems:
>
> mount foo:/bar /usr/src/bar
>
> /, /usr and /usr/src are stored on a local disk. /usr/src/bar/* is not.
> Namespace invariance can be guaranteed for the "/usr/src" part. Not for
> anything below unless you control the peer.

I think what you call namespace invariance is basically true for all
existing filesystems. There could be a filesystem which returns
different directory contents based on whatever it wants, but it can't
return a different "dentry" for the same name.

So file/directory _content_ can be made to vary, but the namespace
itself can't.

> >
> > The problem with this leaf node philosophy, is that it's not really
> > consistent. You can ensure that a mountpoint is a leaf node at mount
> > time, but you cannot force it to remain a leaf node after the mount. So
> ^^^
> inserted by me

[well corrected :)]

>
> ok, I just remembered that any process with an open directory handle
> could still fchdir() underneath. I think the leaf node enforcing is
> possible but it is indeed a bit more complicated.
>
> (Hmm, it's a bit bizarre but could you mount FUSE on, for example, a
> named pipe and change it into a directory?)

No. Fusermount checks file type and refuses the mount if there's a
mismatch (and it protects against races by mounting on '.' for
directories, and on '/proc/self/fd/X' for regular files).

> > I don't see why this check at mount time would make _any_ difference.
>
> It should be possible to do audits on local filesystems, e.g. by:
>
> find / /home /var -xdev ....
>
> This can be done as root but sometimes you may want to do this with the
> uid/gid of a specific user, for safety or for checking what the user
> actually can access or damage.

But note, that running with the uid/gid of the user exposes the
auditing script to manipulation (kill, ptrace) by the user. Running
with changed fsuid/fsgid is OK though.

> And that won't work as expected when the user places a FUSE mount on
> top of his own login directory. But I don't think leaf node
> enforcing is required from a security point of view. This is the
> only thing I could come up with.

OK, from the auditing POV, there's a slight hole in unprivileged
mounts. But I don't think this is grave, since it's not so hard to
hide any sensitive data from such scripts anyway (keeping data in
memory, or keeping a file descriptor to an unlinked file, etc).

> IMHO The namespace argument against FUSE is weak for multiple reasons. The
> only variancy I see is when crossing the mount point. And that disappears
> once EACCES is returned when non-ptraceable processes try to cross it.

Yes, but still this is just a difference in permission, and not a
difference in namespace.

> But that's not really acceptable (see previous audit case) unless FUSE
> refuses to mount on non-leaf dirs.

I don't think the audit case is important. It's easy to work around
it manually by the sysadmin, and for the automatic case it doesn't
really matter (as detailed above).

Miklos

Frank van Maarseveen

unread,
Jul 3, 2005, 9:52:31 AM7/3/05
to Miklos Szeredi, fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
On Sun, Jul 03, 2005 at 03:24:04PM +0200, Miklos Szeredi wrote:
> > > Hmm, do you mean returning different directory contents based on uid?
> >
> > http://clusternfs.sourceforge.net
> >
> > Don't ask me how this plays with the dcache.
>
> But here the decision on what to return is in the _server_.

It still means that name space invariancy cannot be guaranteed.

> There's
> nothing magic about that. It's as if it was N different servers for N
> different clients, only more effective.

Not entirely, there is a UID dependancy.

> I think what you call namespace invariance is basically true for all
> existing filesystems. There could be a filesystem which returns
> different directory contents based on whatever it wants, but it can't
> return a different "dentry" for the same name.

This is not what I mean. The directory contents itself must be identical
for every user. And every name must of course correspond with only one
dentry. That's name-space invariance IMO.

> > IMHO The namespace argument against FUSE is weak for multiple reasons. The
> > only variancy I see is when crossing the mount point. And that disappears
> > once EACCES is returned when non-ptraceable processes try to cross it.
>
> Yes, but still this is just a difference in permission, and not a
> difference in namespace.

Exactly. And such a difference in permission already exists for (sane)
networked file systems such as NFS with "squash_root" in effect on
the server.

--
Frank

Miklos Szeredi

unread,
Jul 3, 2005, 10:06:39 AM7/3/05
to fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
> > There's
> > nothing magic about that. It's as if it was N different servers for N
> > different clients, only more effective.
>
> Not entirely, there is a UID dependancy.

Ahh, so there is.

Does it actually work? I doubt it. The VFS won't allow two different
dentries to refer to the same name. And without that, how would you
have several inodes for a single name?

> > I think what you call namespace invariance is basically true for all
> > existing filesystems. There could be a filesystem which returns
> > different directory contents based on whatever it wants, but it can't
> > return a different "dentry" for the same name.
>
> This is not what I mean. The directory contents itself must be identical
> for every user. And every name must of course correspond with only one
> dentry. That's name-space invariance IMO.

OK.

> > > IMHO The namespace argument against FUSE is weak for multiple
> > > reasons. The only variancy I see is when crossing the mount
> > > point. And that disappears once EACCES is returned when
> > > non-ptraceable processes try to cross it.
> >
> > Yes, but still this is just a difference in permission, and not a
> > difference in namespace.
>
> Exactly. And such a difference in permission already exists for (sane)
> networked file systems such as NFS with "squash_root" in effect on
> the server.

Agreed.

Miklos

Frank van Maarseveen

unread,
Jul 3, 2005, 10:11:55 AM7/3/05
to Miklos Szeredi, fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
On Sun, Jul 03, 2005 at 03:24:04PM +0200, Miklos Szeredi wrote:
>
> > But that's not really acceptable (see previous audit case) unless FUSE
> > refuses to mount on non-leaf dirs.
>
> I don't think the audit case is important. It's easy to work around
> it manually by the sysadmin, and for the automatic case it doesn't
> really matter (as detailed above).

Note that the audit case "as user" is less important than the root case. I
consider the latter very important and EACCES will break it when FUSE
permits mounting on non-leaf dirs.

--
Frank

Miklos Szeredi

unread,
Jul 3, 2005, 12:07:40 PM7/3/05
to fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
> > > But that's not really acceptable (see previous audit case) unless FUSE
> > > refuses to mount on non-leaf dirs.
> >
> > I don't think the audit case is important. It's easy to work around
> > it manually by the sysadmin, and for the automatic case it doesn't
> > really matter (as detailed above).
>
> Note that the audit case "as user" is less important than the root case. I
> consider the latter very important and EACCES will break it when FUSE
> permits mounting on non-leaf dirs.

OK. Can you tell me, why you consider it important? And what's your
proposal for dealing with it?

Refusing to mount on non-leaf dir is not a solution, since it would
still allow arbitrary hiding.

Miklos

Frank van Maarseveen

unread,
Jul 3, 2005, 3:39:22 PM7/3/05
to Miklos Szeredi, fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
On Sun, Jul 03, 2005 at 05:47:58PM +0200, Miklos Szeredi wrote:
> > > > But that's not really acceptable (see previous audit case) unless FUSE
> > > > refuses to mount on non-leaf dirs.
> > >
> > > I don't think the audit case is important. It's easy to work around
> > > it manually by the sysadmin, and for the automatic case it doesn't
> > > really matter (as detailed above).
> >
> > Note that the audit case "as user" is less important than the root case. I
> > consider the latter very important and EACCES will break it when FUSE
> > permits mounting on non-leaf dirs.
>
> OK. Can you tell me, why you consider it important? And what's your
> proposal for dealing with it?

It is important because on UNIX, "root" rules on local filesystems.
I dont't like the idea of root not being able to run "find -xdev" anymore
for administrative tasks, just because something got hidden by accident
or just for fun by a user. It's not about malicious users who want to
hide data: they can do that in tons of ways. The simple "find -xdev"
by root should just not be affected unless there is a very good reason
(SELinux or other "hardened" solutions).

IMHO The best thing FUSE could do is to make the mount totally invisible:
don't return EACCES, don't follow the FUSE mount but stay on the original
tree. I think it's either this or returning EACCES plus the leaf node
constraint at mount time.

The name-space variancy introduced by the first option is only minor:
Mounting anything over a tree which is still in use by a process is
much worse because it tends to be disruptive. And that has always been
possible.

[And I would use the kill() equivalence instead of ptrace() because it
is more appropriate. Doing so avoids the risk of accidentally breaking
useful setuid programs - I don't know if that will happen but I don't
see any security issues here.]

--
Frank

Pavel Machek

unread,
Jul 3, 2005, 3:43:49 PM7/3/05
to Andrew Morton, Miklos Szeredi, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com
Hi!

> > > > > > I leave the decision to you ;) It's a separate independent patch
> > > > > > already (fuse-nfs-export.patch).
> > > > >
> > > > > Let's leave it out - that'll stimulate some activity in the
> > > > > userspace-nfs-server-for-FUSE area.
> > > > >
> > > > > Speaking of which, dumb question: what does FUSE offer over simply using
> > > > > NFS protocol to talk to the userspace filesystem driver?
> > > >
> > > > Oh lots:
> > > >
> > > > - no deadlocks (NFS mounted from localhost is riddled with them)
> > >
> > > It is? We had some low-memory problems a while back, but they got fixed.
> > > During that work I did some nfs-to-localhost testing and things seemed OK.
> >
> > Well, there's the "unsolvable" writeback deadlock problem, that FUSE
> > works around by not buffering dirty pages (and not allowing writable
> > mmap). Does NFS solve that? I'm interested :)
>
> I don't know - first you'd have to describe it.

Actually, the right question is "how is fuse better than coda". I've
asked that before; unlike nfs, userspace filesystems implemented with
coda actually *work*, but do not provide partial-file writes.

Pavel
--
teflon -- maybe it is a trademark, but it should not be.

Miklos Szeredi

unread,
Jul 4, 2005, 4:41:55 AM7/4/05
to pa...@ucw.cz, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com
> Actually, the right question is "how is fuse better than coda". I've
> asked that before; unlike nfs, userspace filesystems implemented with
> coda actually *work*, but do not provide partial-file writes.

You answered your own question.

I did talk to Jan Harkes about the file I/O issue before starting
FUSE. [searching archives] here's a quote from him about this:

"I've been thinking about partial file accesses myself. However, I
really don't want to go all the way to block-level caching. That
would add a lot of overhead either in passing every read/write call
up to userspace, or by using a largish amount of memory to keep
track of availability of parts of the file. It also defeats the more
efficient 'streaming' fetch of a whole file.

However, something that would work reasonably well is a file offset
marker that indicates how much data is available. Basically, when the
application opens a file, the open upcall returns after the first...
let's say 64KB... have arrived. Any read's and write (and mmap's) that
access the available part of the file will be allowed. When any
operation tries to access beyond the marker an upcall is made which
blocks until the related part of the file has streamed in."

So true random access doesn't fit too well into the CODA philosophy.

Of course you could extend CODA to handle this as well (and all the
other things needed for safe user mounts), but the results would
proably not have pleased either side.

Miklos

Miklos Szeredi

unread,
Jul 4, 2005, 5:00:37 AM7/4/05
to fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
> It is important because on UNIX, "root" rules on local filesystems.
> I dont't like the idea of root not being able to run "find -xdev"
> anymore for administrative tasks, just because something got hidden
> by accident or just for fun by a user. It's not about malicious
> users who want to hide data: they can do that in tons of ways.

That's a sort of security by obscurity: if the user is dumb enough he
cannot do any harm. But I'm not interested in that sort of thing. If
this issue important, then it should be solved properly, and not just
by "preventing accidents".

> IMHO The best thing FUSE could do is to make the mount totally
> invisible: don't return EACCES, don't follow the FUSE mount but stay
> on the original tree. I think it's either this or returning EACCES
> plus the leaf node constraint at mount time.

The leaf node constranint doesn't make sense. The hidden mount thing
does, but it has been very flatly rejected by Al Viro.

There's a nice solution to this (discussed at length earlier): private
namespaces.

I think we are still confusing these two issues, which are in fact
separate.

1) polluting global namespace is bad (find -xdev issue)

2) not ptraceable (or not killable) processes should not be able to
access an unprivileged mount

For 1) private namespaces are the proper solution. For 2) the
fuse_allow_task() in it's current or modified form (to check
killability) should be OK.

1) is completely orthogonal to FUSE. 2) is currently provably secure,
and doesn't seem cause problems in practice. Do you have a concrete
example, where it would cause problems?

Miklos

Miklos Szeredi

unread,
Jul 4, 2005, 5:09:24 AM7/4/05
to pa...@suse.cz, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com
[CC restored]

> Okay, I just wanted to mention CODA. Modifying CODA is probably still
> better than modifying NFS (as akpm suggested at one point).

Definitely.

Here are some numbers on the size these filesystems as in current -mm
('wc fs/${fs}/* include/linux/${fs}*')

nfs: 25495
9p: 6102
coda: 4752
fuse: 3733

I'm sure FUSE came out smallest because I'm biased and did something
wrong ;)

Frank van Maarseveen

unread,
Jul 4, 2005, 6:15:04 AM7/4/05
to Miklos Szeredi, fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
On Mon, Jul 04, 2005 at 10:56:30AM +0200, Miklos Szeredi wrote:
> > It is important because on UNIX, "root" rules on local filesystems.
> > I dont't like the idea of root not being able to run "find -xdev"
> > anymore for administrative tasks, just because something got hidden
> > by accident or just for fun by a user. It's not about malicious
> > users who want to hide data: they can do that in tons of ways.
>
> That's a sort of security by obscurity: if the user is dumb enough he
> cannot do any harm. But I'm not interested in that sort of thing. If
> this issue important, then it should be solved properly, and not just
> by "preventing accidents".

"solving it properly" refers to hardening the leaf node constraint
against circumvention I assume. Suppose there's a script for doing simple
on-line backups using "find". Now explain to the user why he lost his
data due to a backup script geting EACCES on a non-leaf FUSE mount. I
don't think that's acceptable. On the other hand, when the user stored
something _deliberately_ under a mountpoint, circumventing the leaf node
constraint by some trickery then it is clearly his own fault when the data
is lost. Anyway, a leaf node constraint can be hardened against misuse
later on, should it become necessary. Your bind-mount case to circumvent
this restriction is slightly flawed because it requires root interaction.

>
> There's a nice solution to this (discussed at length earlier): private
> namespaces.

I thought that's rejected because a process doesn't automatically get the
right namespace after rsh into such a machine? And fixing it by adjusting
the name-space of a process (by whatever means) is not transparent.

> I think we are still confusing these two issues, which are in fact
> separate.
>
> 1) polluting global namespace is bad (find -xdev issue)
>
> 2) not ptraceable (or not killable) processes should not be able to
> access an unprivileged mount
>
> For 1) private namespaces are the proper solution. For 2) the
> fuse_allow_task() in it's current or modified form (to check
> killability) should be OK.
>
> 1) is completely orthogonal to FUSE. 2) is currently provably secure,
> and doesn't seem cause problems in practice. Do you have a concrete
> example, where it would cause problems?

See above backup scenario.

Issues (1) and (2) are tied together I'm afraid:

When using a private name-space and thus assuming an unrelated process
needs to do something very special to get that name-space then (2)
would not be needed at all.

On the other hand, Name-space inheritance by setuid processes suddenly
becomes an issue: issue (2) is re-appearing but at another place.

--
Frank

Miklos Szeredi

unread,
Jul 4, 2005, 6:32:53 AM7/4/05
to fra...@frankvm.com, mik...@szeredi.hu, fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
> "solving it properly" refers to hardening the leaf node constraint
> against circumvention I assume. Suppose there's a script for doing simple
> on-line backups using "find". Now explain to the user why he lost his
> data due to a backup script geting EACCES on a non-leaf FUSE mount.

I see your point. But then this is really not a security issue, but
an "are you sure you want to format C:" style protection for the
user's own sake. Adding a mount option (checked by the library) for
this would be fine. E.g. with "mount_nonempty" it would not refuse to
mount on a non-leaf dir, and README would document, that using this
option might cause trouble. Otherwise the mount would be refused with
a reference to the above option.

Is that what you were thinking?

> > There's a nice solution to this (discussed at length earlier): private
> > namespaces.
>
> I thought that's rejected because a process doesn't automatically get the
> right namespace after rsh into such a machine? And fixing it by adjusting
> the name-space of a process (by whatever means) is not transparent.

Private namespaces in their current form are not really useful. But
that's irrelevant to the current discussion. If somebody needs
private namespaces they will have to add the missing features (Ram Pai
is working on shared subtrees, the biggest chunk).

> > I think we are still confusing these two issues, which are in fact
> > separate.
> >
> > 1) polluting global namespace is bad (find -xdev issue)
> >
> > 2) not ptraceable (or not killable) processes should not be able to
> > access an unprivileged mount
> >
> > For 1) private namespaces are the proper solution. For 2) the
> > fuse_allow_task() in it's current or modified form (to check
> > killability) should be OK.
> >
> > 1) is completely orthogonal to FUSE. 2) is currently provably secure,
> > and doesn't seem cause problems in practice. Do you have a concrete
> > example, where it would cause problems?
>
> See above backup scenario.

The backup problem is a consequence of 1). It has absolutely zero to
do with 2). If the fuse_allow_task() security check didn't exist the
backup script would still not work.

> Issues (1) and (2) are tied together I'm afraid:
>
> When using a private name-space and thus assuming an unrelated process
> needs to do something very special to get that name-space then (2)
> would not be needed at all.

Wrong. It's still needed, because suid/sgid programs can

- run under the private namespace without doing anything special

- run with extra privileges, not possesed by the user executing the
program

> On the other hand, Name-space inheritance by setuid processes suddenly
> becomes an issue: issue (2) is re-appearing but at another place.

I don't think you could change the rules of namespace inheritence,
without causing trouble.

Miklos

Pekka Enberg

unread,
Jul 4, 2005, 6:50:36 AM7/4/05
to Miklos Szeredi, pa...@suse.cz, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org, fra...@frankvm.com
On 7/4/05, Miklos Szeredi <mik...@szeredi.hu> wrote:
> Here are some numbers on the size these filesystems as in current -mm
> ('wc fs/${fs}/* include/linux/${fs}*')

Sloccount [1] gives more meaningful numbers than wc:

('sloccount fs/${fs}/* include/linux/${fs}*')

nfs: 21,046
9p: 3,856
coda: 3,358
fuse: 2,829

1. http://www.dwheeler.com/sloccount/

Pekka

Frank van Maarseveen

unread,
Jul 4, 2005, 7:36:35 AM7/4/05
to Miklos Szeredi, fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
On Mon, Jul 04, 2005 at 12:27:13PM +0200, Miklos Szeredi wrote:
> E.g. with "mount_nonempty" it would not refuse to
> mount on a non-leaf dir, and README would document, that using this
> option might cause trouble. Otherwise the mount would be refused with
> a reference to the above option.

that will do.

--
Frank

Bodo Eggert

unread,
Jul 4, 2005, 9:15:07 AM7/4/05
to Miklos Szeredi, fra...@frankvm.com, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
Miklos Szeredi <mik...@szeredi.hu> wrote:

> I see your point. But then this is really not a security issue, but
> an "are you sure you want to format C:" style protection for the
> user's own sake. Adding a mount option (checked by the library) for

> this would be fine. E.g. with "mount_nonempty" it would not refuse t


o
> mount on a non-leaf dir, and README would document, that using this
> option might cause trouble. Otherwise the mount would be refused wit
h
> a reference to the above option.

IMO that should be a generic mount option, not FUSE specific.
Maybe the default could vary for each fs, but I'd vote against that.
--
Ich danke GMX dafür, die Verwendung meiner Adressen mittels per SPF
verbreiteten Lügen zu sabotieren.

Miklos Szeredi

unread,
Jul 4, 2005, 9:37:50 AM7/4/05
to 7eg...@gmx.de, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
> > I see your point. But then this is really not a security issue, but
> > an "are you sure you want to format C:" style protection for the
> > user's own sake. Adding a mount option (checked by the library) for
> > this would be fine. E.g. with "mount_nonempty" it would not refuse to
> > mount on a non-leaf dir, and README would document, that using this
> > option might cause trouble. Otherwise the mount would be refused with
> > a reference to the above option.
>
> IMO that should be a generic mount option, not FUSE specific.
> Maybe the default could vary for each fs, but I'd vote against that.

The option only makes sense with the default being restrictive. But
making that default for all filesystems can't be done, because that
would immediately break thousands of existing installations.

I think this makes some sense for unprivileged mounts, but otherwise
not really. If sysadmin is not careful about where the mounts go,
tough luck on him.

Miklos

Ragnar Kjørstad

unread,
Jul 4, 2005, 11:22:10 AM7/4/05
to Miklos Szeredi, 7eg...@gmx.de, ak...@osdl.org, ai...@cam.ac.uk, ar...@infradead.org, linux-...@vger.kernel.org
On Mon, Jul 04, 2005 at 03:17:35PM +0200, Miklos Szeredi wrote:
> > > I see your point. But then this is really not a security issue,
but
> > > an "are you sure you want to format C:" style protection for the
> > > user's own sake. Adding a mount option (checked by the library)
for
> > > this would be fine. E.g. with "mount_nonempty" it would not refu
se to
> > > mount on a non-leaf dir, and README would document, that using th
is
> > > option might cause trouble. Otherwise the mount would be refused
with
> > > a reference to the above option.
> >
> > IMO that should be a generic mount option, not FUSE specific.
> > Maybe the default could vary for each fs, but I'd vote against that
.

Why a mount option at all?
Why not just a switch for the mount utility?

> The option only makes sense with the default being restrictive. But
> making that default for all filesystems can't be done, because that
> would immediately break thousands of existing installations.

I think it is acceptable to change this behaviour in a new version of
the mount utility. One could considder ignoring the restriction when
running with "-a" or when running as root - that would reduce or
eliminate the problems with the transition.

However, if this is implemented in mount itself, it is totally
orthogonal to the FUSE merge discussion.


--
Ragnar Kjørstad
Software Engineer
Scali - http://www.scali.com
Scaling the Linux Datacenter

0 new messages