What's up with FUSE merging? Is there anything pending that I should
do?
Ted Ts'o's ideas about selective access to mountpoints are
interesting, but I wouldn't consider them merge critical, as they
solve a problem, that hasn't yet come up in real life.
Thanks,
Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Where are we up to with the fuse_allow_task() bunfight?
I think we agreed, that there seem to be no alternatives.
Tytso said, that fuse_allow_task() thing is basically OK, but there
should be some method to make certain tasks excempt from this
limitation. I agree, with this, but I think there should be at least
one (preferably more) users who actually need this, before I start
thinking about implementing it.
Making a mount be excepmt is already possible with the 'allow_other'
(privileged by default) mount option.
Miklos
if you are so interested in getting fuse merged... why not merge it
first with the security stuff removed entirely. And then start
discussing putting security stuff back in ?
a) it's already been discussed to death (just search for 'fuse' on
lkml and fsdevel)
b) I don't consider it a good idea to ship a defunct version of it in
the mainline
Can you please accept my wish to have FUSE merged _with_ the
unprivileged mount's thing.
If anybody has anything to add to the discussion, please do it now,
and not later. Delaying this further won't get us any bonus IMO.
Miklos
BTW, I can split out the security stuff into a separate patch from the
rest, if people feel more confortable discussing a concrete patch,
instead of a range of lines (actually a 15 line function) of the
whole.
Miklos
By the same argument:
Then can you please accept that FUSE will not get merged right now.
Yes.
My argument is: IF it's not going to get merged now, can we please
continue the discussion about why it's unacceptable, and what are the
alternatives.
Is that fair?
Miklos
Why should he? IMNSHO it should be merged right now with the security
stuff. FUSE works as is. Without the security stuff FUSE is useless.
I have yet to read even a single constructive argument why it should not
be merged as is.
Best regards,
Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/
Why has there not been more discussion about just making an option for
those 15 lines, just for merging's sake, and hopefully after more
discussion, the option will go away one way or another. On the other
hand everyone says security, security, security and I don't remember
one person actually saying something negative about what it does to
security.
avuton
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
I believe that the requirement which fuse_allow_task() attempts to satisfy
is legitimate and is useful to FUSE users.
The fact that, AFAIK, nobody as found a way to implement it more nicely is
a Linux problem, not a FUSE problem.
Given that the actual amount of code involved is small, centralised and
well known about we can easily fix it up later if/when new infrastructure
or new ideas become available.
So unless someone is able to come up with a better approach in the next few
days I'm inclined to say "we suck" and merge the thing as-is.
However, a few things:
- is there anything in the current implementation of the permission stuff
which might tie our hands if it is later reimplemented? IOW: does the
current FUSE user interface in any way lock us into the current FUSE
implementation (fuse_allow_task())?
- the fuse mount options don't seem to be documented
- aren't we going to remove the nfs semi-server feature?
- Frank points out that a user can send a sigstop to his own setuid(0)
task and he intimates that this could cause DoS problems with FUSE. More
details needed please?
- I don't recall seeing an exhaustive investigation of how an
unprivileged user could use a FUSE mount to implement DoS attacks against
other users or against root.
You say
"If a sysadmin trusts the users enough, or can ensure through other
measures, that system processes will never enter non-privileged mounts,
it can relax the last limitation with a "user_allow_other" config
option. If this config option is set, the mounting user can add the
"allow_other" mount option which disables the check for other users'
processes."
What config option, where?
It's the other way around:
Apparently it is not a security problem to SIGSTOP or even SIGKILL a
setuid program. So why is it a security problem when such a program is
delayed by a supposedly malicious behaving FUSE mount?
I think that setuid programs take too many things for granted, especially
"time". I also think the ptrace equivalence principle (item C2 in the
FUSE doc) is too harsh for FUSE.
Suppose the process changes id to full root and we can no longer send
signals to it. Are there any other ways we could affect its scheduling
without FUSE? I think "yes", clearly not that easy as when it accesses a
FUSE mount but "yes". Think about typing ^S (XOFF), or by letting it read
from a pipe or from a file on a very very slow device. Or by renicing
the parent in advance. Regarding the pipe: yes the setuid program could
check that with fstat() but is such a check fundamentally the right
approach? I have doubt because unified I/O is a good thing and there is
no guarantee whatsoever about completion of any FS operation within a
certain amount of time. Suppose another malicious process does a lookup
in a huge directory without hashed names? What about a process consuming
lots of memory, pushing everything else into swap? What about deleting
a _huge_ file or do other things which might(?) take a considerable
amount of kernel time? [id]notify might even help using this to delay
a root process at a crucial point to exploit a race. So, I think there
are many ways to affect the execution speed of [setuid] programs. I
have never heard of a setuid root program which renices itself, such,
that it successfully avoids a race or DoS exploit.
And then the DoS thing using simulated endless files within FUSE. It is
already possible to create terabyte sized [sparse] files. Can the fstat()
size/blocks info be trusted from FUSE? no more than fstat() outside FUSE
because the file may still be growing!
> - I don't recall seeing an exhaustive investigation of how an
> unprivileged user could use a FUSE mount to implement DoS attacks against
> other users or against root.
In general I think it is _hard_ to protect against a local DoS for many
reasons and I don't see any new fundamental problem here with FUSE:
it is just making it more obvious that it's hard to write secure setuid
programs. Those programs should _know_ that input data and anything else
from the user is "tainted" and that they must be _very_ careful with it,
in every detail.
--
Frank
There is a mount option: 'allow_other' which does just this. Or did
you mean a config option?
Thanks,
Miklos
No. This thing is above the userspace interface and completely
independent. Either a task is allowed, and then the request goes
through to the interface. Or if it's not, the request is stopped
right there, and never reaches the userspace interface.
> - the fuse mount options don't seem to be documented
True. I'll send a patch (they are documented in the README of the
fuse distribution).
> - aren't we going to remove the nfs semi-server feature?
I leave the decision to you ;) It's a separate independent patch
already (fuse-nfs-export.patch).
> - Frank points out that a user can send a sigstop to his own setuid(0)
> task and he intimates that this could cause DoS problems with FUSE. More
> details needed please?
Will follow up in Franks answer.
> - I don't recall seeing an exhaustive investigation of how an
> unprivileged user could use a FUSE mount to implement DoS attacks against
> other users or against root.
Here's a description of a theoretical DoS scenario:
http://marc.theaimsgroup.com/?l=linux-fsdevel&m=111522019516694&w=2
Miklos
Currently that's a userspace issue. There's a /etc/fuse.conf file,
with two options:
max_mounts=X
user_allow_other
The fusermount helper reads this file, and decides if passing the
'allow_other' mount option to the kernel is OK or not.
If we want unprivileged sys_mount() these will have to be checked in
kernel (set via sysfs, etc).
Miklos
Let's leave it out - that'll stimulate some activity in the
userspace-nfs-server-for-FUSE area.
Speaking of which, dumb question: what does FUSE offer over simply using
NFS protocol to talk to the userspace filesystem driver?
Perfectly valid argument. My question: is it not a security problem
to allow signals to reach a suid program?
> I think that setuid programs take too many things for granted, especially
> "time". I also think the ptrace equivalence principle (item C2 in the
> FUSE doc) is too harsh for FUSE.
It's obviously not equivalence. FUSE filesystem gets a subset of
ptrace's capabilities (and rather a small one).
> Suppose the process changes id to full root and we can no longer send
> signals to it. Are there any other ways we could affect its scheduling
> without FUSE? I think "yes", clearly not that easy as when it accesses a
> FUSE mount but "yes". Think about typing ^S (XOFF), or by letting it read
> from a pipe or from a file on a very very slow device. Or by renicing
> the parent in advance. Regarding the pipe: yes the setuid program could
> check that with fstat() but is such a check fundamentally the right
> approach? I have doubt because unified I/O is a good thing and there is
> no guarantee whatsoever about completion of any FS operation within a
> certain amount of time. Suppose another malicious process does a lookup
> in a huge directory without hashed names? What about a process consuming
> lots of memory, pushing everything else into swap? What about deleting
> a _huge_ file or do other things which might(?) take a considerable
> amount of kernel time? [id]notify might even help using this to delay
> a root process at a crucial point to exploit a race. So, I think there
> are many ways to affect the execution speed of [setuid] programs. I
> have never heard of a setuid root program which renices itself, such,
> that it successfully avoids a race or DoS exploit.
There's a huge difference between slowing down, and stopping a
process. I wouldn't consider the first a true DoS.
> And then the DoS thing using simulated endless files within FUSE. It is
> already possible to create terabyte sized [sparse] files. Can the fstat()
> size/blocks info be trusted from FUSE? no more than fstat() outside FUSE
> because the file may still be growing!
>
> > - I don't recall seeing an exhaustive investigation of how an
> > unprivileged user could use a FUSE mount to implement DoS attacks against
> > other users or against root.
>
> In general I think it is _hard_ to protect against a local DoS for many
> reasons and I don't see any new fundamental problem here with FUSE:
> it is just making it more obvious that it's hard to write secure setuid
> programs. Those programs should _know_ that input data and anything else
> from the user is "tainted" and that they must be _very_ careful with it,
> in every detail.
Yes. The extra problem with FUSE, is that they are not _able_ to be
careful. They can't even check if a file is in fact on a FUSE mount
or not without the FUSE daemon's intervention (lookup on a file will
be passed to userspace).
Thanks,
Miklos
Oh lots:
- no deadlocks (NFS mounted from localhost is riddled with them)
- efficient protocol, optimized for less context switches
- dcache invalidation policy
- probably more, but I can't remember
Miklos
It is? We had some low-memory problems a while back, but they got fixed.
During that work I did some nfs-to-localhost testing and things seemed OK.
> - efficient protocol, optimized for less context switches
One wouldn't really expect a userspace filesystem to be particularly fast,
and the performance will be dominated by memory copies and IO wait anyway.
> - dcache invalidation policy
What's that?
> - probably more, but I can't remember
Please do..
Well there's slow and then there's slow... numbers are always nice though.
-miles
--
[|nurgle|] ddt- demonic? so quake will have an evil kinda setting? one that
will make every christian in the world foamm at the mouth?
[iddt] nurg, that's the goal
Well, there's the "unsolvable" writeback deadlock problem, that FUSE
works around by not buffering dirty pages (and not allowing writable
mmap). Does NFS solve that? I'm interested :)
Then there's the usual "filesystem recursing into itself" deadlock.
Mounting with 'intr' probably solves this for NFS, but that has
unwanted side effects. FUSE only allows KILL to interrupt a request.
> > - efficient protocol, optimized for less context switches
>
> One wouldn't really expect a userspace filesystem to be particularly fast,
FUSE is pretty fast. >100Mbytes/s transfer speeds on a moderate
hardware are not unusual.
> and the performance will be dominated by memory copies and IO wait anyway.
Memory copies don't seem to be an issue (and FUSE does very little of
it). Performance is mostly dominated by context switch times (if the
underlying filesystem can keep up). Unfortunately unbuffered writes
mean a separate request for each written page, and thus a context
switch (on UP at least). This has a marked effect on write
performance.
> > - dcache invalidation policy
>
> What's that?
Userspace can tell the kernel, how long a dentry should be valid. I
don't think the NFS protocol provides this. Same holds for the inode
attributes.
> > - probably more, but I can't remember
>
> Please do..
OK, I'll do a little research.
Miklos
Fred
--
o---------------------------------------------o
| http://open-news.net : l'info alternative |
| Tech - Sciences - Politique - International |
o---------------------------------------------o
I don't know - first you'd have to describe it.
> Then there's the usual "filesystem recursing into itself" deadlock.
Describe this completely as well, please.
> Mounting with 'intr' probably solves this for NFS, but that has
> unwanted side effects. FUSE only allows KILL to interrupt a request.
Maybe these things can be solved in NFS?
> > > - dcache invalidation policy
> >
> > What's that?
>
> Userspace can tell the kernel, how long a dentry should be valid. I
> don't think the NFS protocol provides this. Same holds for the inode
> attributes.
Why is that needed?
> > > - probably more, but I can't remember
> >
> > Please do..
>
> OK, I'll do a little research.
>
v9fs has a user-level server too. Maybe it has been used in FUSE-like
scenarios more than NFS.
Plus NFS and v9fs work across the network...
That's what I though too so I asked it first on the security mailing list.
Apparently this signal behavior is normal.
> There's a huge difference between slowing down, and stopping a
> process. I wouldn't consider the first a true DoS.
Stopping is a special case. But it is effectively the same as being
indefinately slowed down by, say, 10000+ malicious processes and from
that angle I don't see a fundamental difference w.r.t. security.
Killing the malicous processes should solve the problem. And killing
one FUSE process looks easier to me than killing 10000+ ones.
> Yes. The extra problem with FUSE, is that they are not _able_ to be
> careful.
I think this is not true. Every pathname passed to a setuid program
by the user is basically "tainted". Standard I/O is tainted as well.
> They can't even check if a file is in fact on a FUSE mount
They shouldn't. The pathname is not to be trusted anyway.
I think FUSE has shown to be conservative enough w.r.t. security to be
merged. But it may be interesting to consider:
- replace ptraceability test by a kill()ability test.
- some sort of "intr" mount option for most signals on by default.
- Forbid hiding data by mounting a FUSE filesystem on top of it (does
FUSE check for this already?)
- /proc isn't a problem: most root processes tend to avoid it because
it is synthetic and thus uninteresting. Maybe we should extend
the idea of "synthetic file-systems being uninteresting" to any
process which cannot receive signals from the FUSE mount owner. When
one cannot hide data by a FUSE mount and its synthetic anyway so not
interesting then just show the original empty mount point.
--
Frank
So the open() hangs indefinately. but what if blackhat tries to install
a package from a no longer existing server on /net or via NFS?
A user supplied pathname is not to be trusted by any setuid (or full
root) program.
Another example: I'm not sure if there are still /dev/tty devices which
may block indefinately upon open() but:
- I have yet to see a setuid program which always uses O_NONBLOCK
when opening user supplied pathnames.
- one cannot stat() and then open() because that gives a race.
--
Frank
Yes, but that would be thousand times worse than the current solution.
You just can't know in advance, what a "sane" timeout value is.
Miklos
A dirty page is being written back, but the userspace server needs to
allocate memory to complete the request. But the allocation will
block, since there's no more free memory.
> > Then there's the usual "filesystem recursing into itself" deadlock.
>
> Describe this completely as well, please.
User does unlink("/mnt/userfs/file"). Userspace server receives
request to unlink "/file". Then the daemon does
unlink("/mnt/userfs/file"). This will deadlock on i_sem.
> > Mounting with 'intr' probably solves this for NFS, but that has
> > unwanted side effects. FUSE only allows KILL to interrupt a request.
>
> Maybe these things can be solved in NFS?
Possibly.
>
> > > > - dcache invalidation policy
> > >
> > > What's that?
> >
> > Userspace can tell the kernel, how long a dentry should be valid. I
> > don't think the NFS protocol provides this. Same holds for the inode
> > attributes.
>
> Why is that needed?
Because, I can well imagine a synthetic filesystem, where file
data/metadata change aribitrarily. In this case the timeout heuristic
in NFS is not useful.
In fact with NFS it's often a PITA, that it doesn't want to refresh a
file's data/metatata, which I _know_ has changed on the server.
> > > > - probably more, but I can't remember
> > >
> > > Please do..
> >
> > OK, I'll do a little research.
> >
>
> v9fs has a user-level server too. Maybe it has been used in FUSE-like
> scenarios more than NFS.
I think the p9 protocol is suffering from trying to be too generic.
The FUSE kernel interface is probably slightly tied to the linux VFS,
and would present problems if trying to port to other *NIX or god
forbid some other OS family altogether.
That may seem like a drawback, but I don't think it is:
- people are encouraged to use the FUSE library API instead of the
raw kernel interface
- if it will be ported to other systems, even the kernel interface
could probably be made compatible, only at the loss of
simplicity/performance.
> Plus NFS and v9fs work across the network...
Yes. I consider that a drawback. FUSE does data transfer very
efficiently (single copy), without the heavy network infrastructure
being in the way.
Miklos
Well, I think it's a fertile ground for hole hunters out there. Just
needs a little publicity ;)
Is it considered DoS for example if I prevent other users from sending
email? SIGSTOP on sendmail at the right moment (when the database is
locked) should do it fine.
> Stopping is a special case. But it is effectively the same as being
> indefinately slowed down by, say, 10000+ malicious processes and from
> that angle I don't see a fundamental difference w.r.t. security.
On a well protected multiuser system there will be ulimits in place to
prevent that.
> Killing the malicous processes should solve the problem. And killing
> one FUSE process looks easier to me than killing 10000+ ones.
Killing always works, if the sysadmin happens to be around. If not
then there's not a lot other users can do.
> I think this is not true. Every pathname passed to a setuid program
> by the user is basically "tainted". Standard I/O is tainted as well.
You mean suid programs are never to touch paths passed to them?
If that would be true, then fuse_allow_task() would not be needed, but
would do no harm either, since it would never be invoked by a suid
program.
> > They can't even check if a file is in fact on a FUSE mount
>
> They shouldn't. The pathname is not to be trusted anyway.
>
> I think FUSE has shown to be conservative enough w.r.t. security to be
> merged. But it may be interesting to consider:
>
> - replace ptraceability test by a kill()ability test.
You didn't consider the information leak aspect (point B in fuse.txt).
> - some sort of "intr" mount option for most signals on by default.
KILL will always interrupt a request. So getting rid of a malicious
mount should present no problems.
> - Forbid hiding data by mounting a FUSE filesystem on top of it (does
> FUSE check for this already?)
Yes. It checks for writablilty on the mountpoing (excluding limited
writablilty as /tmp for example).
> - /proc isn't a problem: most root processes tend to avoid it because
> it is synthetic and thus uninteresting. Maybe we should extend
> the idea of "synthetic file-systems being uninteresting" to any
> process which cannot receive signals from the FUSE mount owner. When
> one cannot hide data by a FUSE mount and its synthetic anyway so not
> interesting then just show the original empty mount point.
Been there. People (like Al Viro) didn't like it.
Miklos
If /net won't detect a dead server within a timeout, I think it can be
considered broken.
> Another example: I'm not sure if there are still /dev/tty devices which
> may block indefinately upon open() but:
>
> - I have yet to see a setuid program which always uses O_NONBLOCK
> when opening user supplied pathnames.
> - one cannot stat() and then open() because that gives a race.
Is "being already broken" an excuse for preventing future breakage,
when these are fixed?
Miklos
That shouldn't happen with write() traffic due to the dirty memory
balancing logic.
It'll happen with MAP_SHARED. Totally disallowing MAP_SHARED sounds a bit
drastic, but of course nfs/v9fs could be taught to do that.
> > > Then there's the usual "filesystem recursing into itself" deadlock.
> >
> > Describe this completely as well, please.
>
> User does unlink("/mnt/userfs/file"). Userspace server receives
> request to unlink "/file". Then the daemon does
> unlink("/mnt/userfs/file"). This will deadlock on i_sem.
eh? How can the fuse client and the fuse server both get access to the
same file in this manner? I don't see how you could set that up with NFS,
for example.
> > > Userspace can tell the kernel, how long a dentry should be valid. I
> > > don't think the NFS protocol provides this. Same holds for the inode
> > > attributes.
> >
> > Why is that needed?
>
> Because, I can well imagine a synthetic filesystem, where file
> data/metadata change aribitrarily. In this case the timeout heuristic
> in NFS is not useful.
>
> In fact with NFS it's often a PITA, that it doesn't want to refresh a
> file's data/metatata, which I _know_ has changed on the server.
I think nfs can do this, as long as the modification was done through the
server. I'd expect v9fs would be the same.
> > Plus NFS and v9fs work across the network...
>
> Yes. I consider that a drawback.
Others (many) would disagree.
Sorry, but I'm not buying it. I still don't see a solid reason why all
this could not be done with nfs/v9fs, some kernel tweaks and the rest in
userspace. It would take some effort, but that effort would end up
strengthening existing kernel capabilities rather than adding brand new
things, which is good.
All this breakage points into the same direction: A user supplied pathname
is not to be trusted by any setuid (or full root) program.
--
Frank