[PATCH 0/3] vfs: plug some holes involving LAST_BIND symlinks and file bind mounts (try #5)

Jeff Layton

unread,

Nov 23, 2009, 12:50:03 PM11/23/09

to

There are a few situations where a lookup can end up returning a dentry
without revalidating it, and without checking whether the calling
process has permissions to access it. Two situations identified so far
are:

1) LAST_BIND symlinks (such as those under /proc/<pid>)

2) file bind mounts

This patchset is intended to fix this by forcing revalidation of the
returned dentries at appropriate locations.

In the case of LAST_BIND symlinks it also adds a check to verify that
the target of the symlink is accessible by the current process by
walking mounts and dentries back up to the root and checking permission
on each inode.

This set fixes the reproducers I have (including the reproducer that
Pavel provided for the permissions bypass). It's still pretty rough
though and I expect that it'll need revision. At this point, I'm mainly
looking to get these questions answered:

1) what should we do if these dentries are found to be invalid? Is it ok
to d_invalidate them? Or is that likely to break something (particularly
in the case of file bind mounts)?

2) I'm using FS_REVAL_DOT as an indicator of whether to force a
d_revalidate. I think that it's appropriate to key off of that flag, but
we may want to rename it (maybe FS_FORCE_D_REVAL ?).

3) is check_path_accessible racy? It seems to work, but something
doesn't seem quite right with this approach. Is this defeatable somehow?
Could a rename of one of the intermediate path components cause
problems?

4) do we need to hold the dcache_lock across the check_path_accessible
call?

This isn't my usual area of expertise, so I'm definitely open to
suggestions on this.

Jeff Layton (3):
vfs: force reval of target when following LAST_BIND symlinks
vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT
filesystems
vfs: check path permissions on target of LAST_BIND symlinks

fs/namei.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 102 insertions(+), 2 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Jeff Layton

unread,

Nov 23, 2009, 12:50:03 PM11/23/09

to

Because LAST_BIND symlinks aren't subject to the normal path walking
routines, they sidestep all of the permission checking that occurs while
resolving a path. Fix this by adding a routine to walk back up the
directory tree and check MAY_EXEC permission on the entire path back up to
the root.

Signed-off-by: Jeff Layton <jla...@redhat.com>
---
fs/namei.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 56 insertions(+), 1 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 738b257..d4c1279 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -504,6 +504,57 @@ ok:
}

/*
+ * We have a cached struct path from a LAST_BIND symlink. The path is valid
+ * but it's possible that one of the components leading to this point
+ * might not be accessible. This check walks back up the tree to the root
+ * of the namespace and checks whether each component is accessible. The
+ * supplied path is put on error.
+ */
+static int
+check_path_accessible(struct path *path)
+{
+ int err;
+ struct dentry *parent, *tdentry = dget(path->dentry);
+ struct vfsmount *vfsmnt = mntget(path->mnt);
+ struct path root = current->fs->root;
+
+ path_get(&root);
+ for(;;) {
+ parent = dget_parent(tdentry);
+ err = inode_permission(parent->d_inode, MAY_EXEC);
+ if (err < 0) {
+ dput(parent);
+ break;
+ }
+ dput(tdentry);
+ tdentry = parent;
+
+ /* keep going if not to root of mnt */
+ if (!IS_ROOT(tdentry))
+ continue;
+
+ /* are we at global root or root of namespace? */
+ if ((tdentry == root.dentry && vfsmnt == root.mnt) ||
+ vfsmnt->mnt_parent == vfsmnt)
+ break;
+
+ /* cross to parent mount and keep walking */
+ mntput(vfsmnt);
+ vfsmnt = mntget(vfsmnt->mnt_parent);
+ tdentry = dget(vfsmnt->mnt_mountpoint);
+ dput(parent);
+ }
+ dput(tdentry);
+ mntput(vfsmnt);
+ path_put(&root);
+
+ if (err)
+ path_put(path);
+
+ return err;
+}
+
+/*
* This is called when everything else fails, and we actually have
* to go to the low-level filesystem to find out what we should do..
*
@@ -679,8 +730,12 @@ static __always_inline int __do_follow_link(struct path *path, struct nameidata
error = 0;
if (s)
error = __vfs_follow_link(nd, s);
- else if (nd->last_type == LAST_BIND)
+ else if (nd->last_type == LAST_BIND) {
error = force_reval_path(&nd->path, nd);
+ if (!error)
+ error = check_path_accessible(&nd->path);
+ }
+
if (dentry->d_inode->i_op->put_link)
dentry->d_inode->i_op->put_link(dentry, nd, cookie);
}
--
1.5.5.6

Eric W. Biederman

unread,

Nov 23, 2009, 5:10:01 PM11/23/09

to

Jeff Layton <jla...@redhat.com> writes:

> There are a few situations where a lookup can end up returning a dentry
> without revalidating it, and without checking whether the calling
> process has permissions to access it. Two situations identified so far
> are:
>
> 1) LAST_BIND symlinks (such as those under /proc/<pid>)
>
> 2) file bind mounts
>
> This patchset is intended to fix this by forcing revalidation of the
> returned dentries at appropriate locations.
>
> In the case of LAST_BIND symlinks it also adds a check to verify that
> the target of the symlink is accessible by the current process by
> walking mounts and dentries back up to the root and checking permission
> on each inode.
>
> This set fixes the reproducers I have (including the reproducer that
> Pavel provided for the permissions bypass). It's still pretty rough
> though and I expect that it'll need revision. At this point, I'm mainly
> looking to get these questions answered:
>
> 1) what should we do if these dentries are found to be invalid? Is it ok
> to d_invalidate them? Or is that likely to break something (particularly
> in the case of file bind mounts)?

The normal sequence in do_revalidate should be safe. In practice what we
should see is d_drop(). If we access the dentries via another path today
we already go through d_revalidate. It is only the reference count on
the dentry that keeps them alive and working. The cases I have looked
at for distributed filesystems have to call d_drop themselves so I don't
know if it would add anything if the vfs called d_revalidate. Especially
since FS_REVAL_DOT doesn't have that logic.

> 2) I'm using FS_REVAL_DOT as an indicator of whether to force a
> d_revalidate. I think that it's appropriate to key off of that flag, but
> we may want to rename it (maybe FS_FORCE_D_REVAL ?).

Perhaps FS_ALWAYS_REVAL. I don't think it makes much of
a difference either way. I expect a rename should come after we fix
nfsv4 so there is a chance at pushing the fixes back to stable.

> 3) is check_path_accessible racy? It seems to work, but something
> doesn't seem quite right with this approach. Is this defeatable somehow?
> Could a rename of one of the intermediate path components cause
> problems?

check_path_accessible seems pretty horrible. If a process is running
inside of a subdirectory it doesn't have permissions to access, say
a chroot, /proc/self/fd/XXX becomes completely unusable.

Eric

Jeff Layton

unread,

Nov 23, 2009, 5:40:02 PM11/23/09

to

Hmm...I have this in there:

+ /* are we at global root or root of namespace? */
+ if ((tdentry == root.dentry && vfsmnt == root.mnt) ||
+ vfsmnt->mnt_parent == vfsmnt)
+ break;

...In the case of a chroot, wouldn't "current->fs->root" point to the
root of the process' namespace? Or am I misunderstanding what
current->fs actually represents?

--
Jeff Layton <jla...@redhat.com>

Jamie Lokier

unread,

Nov 23, 2009, 6:00:02 PM11/23/09

to

Jeff Layton wrote:
> > check_path_accessible seems pretty horrible. If a process is running
> > inside of a subdirectory it doesn't have permissions to access, say
> > a chroot, /proc/self/fd/XXX becomes completely unusable.
> >
>
> Hmm...I have this in there:
>
> + /* are we at global root or root of namespace? */
> + if ((tdentry == root.dentry && vfsmnt == root.mnt) ||
> + vfsmnt->mnt_parent == vfsmnt)
> + break;
>
> ...In the case of a chroot, wouldn't "current->fs->root" point to the
> root of the process' namespace? Or am I misunderstanding what
> current->fs actually represents?

A process can run inside a subdirectory it doesn't have permissions to
access without that being a chroot.

It can also run inside a subdirectory that isn't accessible from it's
root, if that's how it was started - as well as having other
descriptors pointing to things outside its root.

It can also be passed file descriptors from outside it's root while
it's running.

Really, I think the /proc/PID/fd/N check should restrict the open to
the O_* limitations that were used to open fd N before, and not have
any connection to actual paths at the time of this check.

-- Jamie

Jeff Layton

unread,

Nov 23, 2009, 6:20:01 PM11/23/09

to

On Mon, 23 Nov 2009 22:49:48 +0000
Jamie Lokier <ja...@shareable.org> wrote:

> Jeff Layton wrote:
> > > check_path_accessible seems pretty horrible. If a process is running
> > > inside of a subdirectory it doesn't have permissions to access, say
> > > a chroot, /proc/self/fd/XXX becomes completely unusable.
> > >
> >
> > Hmm...I have this in there:
> >
> > + /* are we at global root or root of namespace? */
> > + if ((tdentry == root.dentry && vfsmnt == root.mnt) ||
> > + vfsmnt->mnt_parent == vfsmnt)
> > + break;
> >
> > ...In the case of a chroot, wouldn't "current->fs->root" point to the
> > root of the process' namespace? Or am I misunderstanding what
> > current->fs actually represents?
>
> A process can run inside a subdirectory it doesn't have permissions to
> access without that being a chroot.
>

Certainly.

> It can also run inside a subdirectory that isn't accessible from it's
> root, if that's how it was started - as well as having other
> descriptors pointing to things outside its root.
>

Yes.

> It can also be passed file descriptors from outside it's root while
> it's running.
>

Yep.

> Really, I think the /proc/PID/fd/N check should restrict the open to
> the O_* limitations that were used to open fd N before, and not have
> any connection to actual paths at the time of this check.
>

The big question with all of this is: Should a task have the ability
to follow a /proc/pid symlink to a path that it wouldn't ordinarily be
able to resolve with a path lookup. The concensus that I got from the
bugtraq discussion was that it should not, and this patch is an attempt
to prevent that.

I take it from you and Eric's comments that you disagree? If so, what's
your rationale for allowing a task to resolve this symlink when it
wouldn't ordinarily be able to do so if it were a "normal" symlink?

--
Jeff Layton <jla...@redhat.com>

Eric W. Biederman

unread,

Nov 23, 2009, 6:40:02 PM11/23/09

to

Jeff Layton <jla...@redhat.com> writes:

For myself I start with the simple fact that the code has worked the
way it currently does since around linux 0.99. We have to be very
careful if we change this to avoid breaking existing applications.

So if we change the existing behaviour it must be done in such a way
that legitimate applications do not break.

Eric

Jeff Layton

unread,

Nov 23, 2009, 7:40:01 PM11/23/09

to

On Mon, 23 Nov 2009 15:35:44 -0800

ebie...@xmission.com (Eric W. Biederman) wrote:

I certainly don't want to break existing apps. That said, applications
that are depending on /proc/pid symlinks to allow them to bypass
directory permissions or access files that aren't in their namespace
would seem to be unsafe, no?

I think all we can reasonably do is try to clearly lay out how these
symlinks are intended to work. I think it's logical that the result of
following these links should be more or less the same as if you were to
resolve the results of the readlink.

Is there some reason that we should expect them to provide anything
more? Do you have apps in mind that you think will break with this
change? If you think this is unreasonable, perhaps you could suggest an
alternative?

If this approach is reasonable, there is one thing I think that I'm
pretty sure will need to be fixed. It'll need to detect when the file
lies outside of its namespace altogether. I'm not quite sure how to
do that yet. I've not done much work with multiple namespaces, so I
could certainly use some guidance here.

--
Jeff Layton <jla...@redhat.com>

Jamie Lokier

unread,

Nov 23, 2009, 8:30:02 PM11/23/09

to

Jeff Layton wrote:
> I certainly don't want to break existing apps. That said, applications
> that are depending on /proc/pid symlinks to allow them to bypass
> directory permissions or access files that aren't in their namespace
> would seem to be unsafe, no?

I think we can mostly agree on that :-)

> I think all we can reasonably do is try to clearly lay out how these
> symlinks are intended to work. I think it's logical that the result of
> following these links should be more or less the same as if you were to
> resolve the results of the readlink.
>
> Is there some reason that we should expect them to provide anything
> more? Do you have apps in mind that you think will break with this
> change?

Anything which compiled with and uses the openat(), mkdirat()
etc. emulation in gnulib (formerly known as libiberty), and anything
using the same technique.

You know, GNU coreutils and other obscure things :-)

Of course there are real system calls for that, now, but there are
still compiled programs that don't know about the real system calls.

The same technique (traversing /proc/self/fd/N) is used on Solaris, by
the way. It's probably worth keeping a modicum of compatibility with
whatever Solaris does.

> If you think this is unreasonable, perhaps you could suggest an
> alternative?

I have, two mails up - did you read it? - and in the previous threads
which resulted in the bugtraq.

Please tell me why that approach does not work, thanks.

> If this approach is reasonable, there is one thing I think that I'm
> pretty sure will need to be fixed.

It's not reasonable for /proc/self/fd/N because that has historically
been a way to follow a directory (like openat) or dup() an open file
without sharing the seek offset, which is useful for multithreaded
code.

Same goes for /proc/self/exe: That has historically been a way to read
your own executable, e.g. for self-extracting executables, executables
with additional data glued on. That breaks if the executable at the
link target is not yourself.

But just to prove we've been over this before and never came to a
consensus or conclusion:

http://lkml.org/lkml/2008/3/23/3

(the whole thread is worth a read, but Denys Vlasenko's remarks are
especially relevant).

And for those who remember 2.0 :-)

http://lkml.indiana.edu/hypermail/linux/kernel/9609.2/0371.html

-- Jamie

Jeff Layton

unread,

Nov 24, 2009, 6:30:02 AM11/24/09

to

On Tue, 24 Nov 2009 01:20:27 +0000
Jamie Lokier <ja...@shareable.org> wrote:

> Jeff Layton wrote:
> > I certainly don't want to break existing apps. That said, applications
> > that are depending on /proc/pid symlinks to allow them to bypass
> > directory permissions or access files that aren't in their namespace
> > would seem to be unsafe, no?
>
> I think we can mostly agree on that :-)
>
> > I think all we can reasonably do is try to clearly lay out how these
> > symlinks are intended to work. I think it's logical that the result of
> > following these links should be more or less the same as if you were to
> > resolve the results of the readlink.
> >
> > Is there some reason that we should expect them to provide anything
> > more? Do you have apps in mind that you think will break with this
> > change?
>
> Anything which compiled with and uses the openat(), mkdirat()
> etc. emulation in gnulib (formerly known as libiberty), and anything
> using the same technique.
>
> You know, GNU coreutils and other obscure things :-)
>
> Of course there are real system calls for that, now, but there are
> still compiled programs that don't know about the real system calls.
>
> The same technique (traversing /proc/self/fd/N) is used on Solaris, by
> the way. It's probably worth keeping a modicum of compatibility with
> whatever Solaris does.
>

Oof...ok. I didn't realize that these symlinks were relied on in such a
fashion. I guess that means that there is a need to have them continue
to have special semantics beyond what a symlink would ordinarily have.

> > If you think this is unreasonable, perhaps you could suggest an
> > alternative?
>
> I have, two mails up - did you read it? - and in the previous threads
> which resulted in the bugtraq.
>
> Please tell me why that approach does not work, thanks.
>

I did read it, sorry I didn't comment on it before...

My immediate thought was that that approach only affects open calls. If
the containing directory weren't executable by the process it wouldn't
be able to (for instance) stat the file were these symlinks more
ordinary. Of course, the process could always do an fstat on the fd so
I don't suppose there's any harm in allowing it.

Since it's clear that these symlinks do need to have special semantics,
perhaps the approach you suggest would be the best thing. I'll have to
think about it a bit more.

> > If this approach is reasonable, there is one thing I think that I'm
> > pretty sure will need to be fixed.
>
> It's not reasonable for /proc/self/fd/N because that has historically
> been a way to follow a directory (like openat) or dup() an open file
> without sharing the seek offset, which is useful for multithreaded
> code.
>
> Same goes for /proc/self/exe: That has historically been a way to read
> your own executable, e.g. for self-extracting executables, executables
> with additional data glued on. That breaks if the executable at the
> link target is not yourself.
>
> But just to prove we've been over this before and never came to a
> consensus or conclusion:
>
> http://lkml.org/lkml/2008/3/23/3
>
> (the whole thread is worth a read, but Denys Vlasenko's remarks are
> especially relevant).
>
> And for those who remember 2.0 :-)
>
> http://lkml.indiana.edu/hypermail/linux/kernel/9609.2/0371.html
>

Thanks for the links. Those help clarify where you and Eric are coming
from. I'll need to rethink this.

Thank you (and Eric) for the comments so far.

Cheers,
--
Jeff Layton <jla...@redhat.com>

Miklos Szeredi

unread,

Nov 24, 2009, 7:00:02 AM11/24/09

to

On Tue, 24 Nov 2009, Jeff Layton wrote:
> Since it's clear that these symlinks do need to have special semantics,
> perhaps the approach you suggest would be the best thing. I'll have to
> think about it a bit more.

open() is not the only thing you need to think about. Anything that
checks read or write permission on the inode (truncate, utimes,
*xattr) would have to be changed to respect the open mode.

See, this is not just about hacking the proc follow_symlink code to
check some lookup intent. It's about changing the permission checking
mechanism for theses beasts. And since the permission checking is
inode based, this is not at all trivial to do.

I still believe leaving the current semantics and documenting them is
the best option.

Thanks,
Miklos

Pavel Machek

unread,

Nov 24, 2009, 7:30:02 AM11/24/09

to

On Tue 2009-11-24 12:53:09, Miklos Szeredi wrote:
> On Tue, 24 Nov 2009, Jeff Layton wrote:
> > Since it's clear that these symlinks do need to have special semantics,
> > perhaps the approach you suggest would be the best thing. I'll have to
> > think about it a bit more.
>
> open() is not the only thing you need to think about. Anything that
> checks read or write permission on the inode (truncate, utimes,
> *xattr) would have to be changed to respect the open mode.
>
> See, this is not just about hacking the proc follow_symlink code to
> check some lookup intent. It's about changing the permission checking
> mechanism for theses beasts. And since the permission checking is
> inode based, this is not at all trivial to do.
>
> I still believe leaving the current semantics and documenting them is
> the best option.

I believe that current semantics is ugly enough that 'documenting' it
is not enough... and people want to port from other systems, too, not
expecting nasty surprises like this...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Miklos Szeredi

unread,

Nov 24, 2009, 8:00:01 AM11/24/09

to

On Tue, 24 Nov 2009, Pavel Machek wrote:
> I believe that current semantics is ugly enough that 'documenting' it
> is not enough... and people want to port from other systems, too, not
> expecting nasty surprises like this...

This hasn't been a problem for the last 12 years, and still we don't
see script kiddies exploiting this hole and sysadmins hurrying to
secure their system, even though it has been public for quite a while.

Why?

The reason might be, that there *is no* violation of security.

See this: the surprise isn't that an inode can be reached from
multiple paths, that has been possible with hard links for as long as
unix lived. The suprise is that the inode can be reached through
proc. So this "hole" that has been opened about 12 years ago in linux
is quite well known. Only this particular aspect of it isn't well
known, but that doesn't mean it's not right, does it?

Thanks,
Miklos

Duane Griffin

unread,

Nov 24, 2009, 8:20:02 AM11/24/09

to

2009/11/24 Pavel Machek <pa...@ucw.cz>:

> On Tue 2009-11-24 12:53:09, Miklos Szeredi wrote:
>> I still believe leaving the current semantics and documenting them is
>> the best option.
>
> I believe that current semantics is ugly enough that 'documenting' it
> is not enough... and people want to port from other systems, too, not
> expecting nasty surprises like this...

Solaris 10 works the same way as Linux does now, so I don't think the
porting argument gets you anywhere.

> Pavel

Cheers,
Duane.

--
"I never could learn to drink that blood and call it wine" - Bob Dylan

Pavel Machek

unread,

Nov 30, 2009, 7:30:02 AM11/30/09

to

On Tue 2009-11-24 13:59:06, Miklos Szeredi wrote:
> On Tue, 24 Nov 2009, Pavel Machek wrote:
> > I believe that current semantics is ugly enough that 'documenting' it
> > is not enough... and people want to port from other systems, too, not
> > expecting nasty surprises like this...
>
> This hasn't been a problem for the last 12 years, and still we don't
> see script kiddies exploiting this hole and sysadmins hurrying to
> secure their system, even though it has been public for quite a while.
>
> Why?

Because condition when it hits are quite unusual?

> The reason might be, that there *is no* violation of security.

Well, security people disagree with you.

> See this: the surprise isn't that an inode can be reached from
> multiple paths, that has been possible with hard links for as long as
> unix lived. The suprise is that the inode can be reached through
> proc. So this "hole" that has been opened about 12 years ago in linux
> is quite well known. Only this particular aspect of it isn't well
> known, but that doesn't mean it's not right, does it?

It does. Bypassing checks on read-only file descriptors is design
misfeature, and users are clearly unaware. (See bugtraq). Being "old"
does not mean it is right.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Jamie Lokier

unread,

Nov 30, 2009, 2:10:01 PM11/30/09

to

Duane Griffin wrote:
> 2009/11/24 Pavel Machek <pa...@ucw.cz>:
> > On Tue 2009-11-24 12:53:09, Miklos Szeredi wrote:
> >> I still believe leaving the current semantics and documenting them is
> >> the best option.
> >
> > I believe that current semantics is ugly enough that 'documenting' it
> > is not enough... and people want to port from other systems, too, not
> > expecting nasty surprises like this...
>
> Solaris 10 works the same way as Linux does now, so I don't think the
> porting argument gets you anywhere.

It certainly must be similar, as gnulib uses the same technique on
both Solaris and Linux.

I don't have a Solaris to try this on. Can you use /proc to re-open
with O_RDWR a file descriptor previously opened with O_RDONLY on
Solaris 10, assuming the underlying inode allows writing?

-- Jamie

Eric W. Biederman

unread,

Nov 30, 2009, 2:30:02 PM11/30/09

to

Pavel Machek <pa...@ucw.cz> writes:

> On Tue 2009-11-24 13:59:06, Miklos Szeredi wrote:
>> On Tue, 24 Nov 2009, Pavel Machek wrote:
>> > I believe that current semantics is ugly enough that 'documenting' it
>> > is not enough... and people want to port from other systems, too, not
>> > expecting nasty surprises like this...
>>
>> This hasn't been a problem for the last 12 years, and still we don't
>> see script kiddies exploiting this hole and sysadmins hurrying to
>> secure their system, even though it has been public for quite a while.
>>
>> Why?
>
> Because condition when it hits are quite unusual?

So unusual perhaps that this is not a problem?

>> The reason might be, that there *is no* violation of security.
>
> Well, security people disagree with you.

Other security people disagree with you.

>> See this: the surprise isn't that an inode can be reached from
>> multiple paths, that has been possible with hard links for as long as
>> unix lived. The suprise is that the inode can be reached through
>> proc. So this "hole" that has been opened about 12 years ago in linux
>> is quite well known. Only this particular aspect of it isn't well
>> known, but that doesn't mean it's not right, does it?
>
> It does. Bypassing checks on read-only file descriptors is design
> misfeature, and users are clearly unaware. (See bugtraq). Being "old"
> does not mean it is right.

Being "old" does mean that changing it is a regression if any valid
application depends on this feature.

Eric

Duane Griffin

unread,

Dec 1, 2009, 4:00:01 AM12/1/09

to

2009/11/30 Jamie Lokier <ja...@shareable.org>:

> Duane Griffin wrote:
>> 2009/11/24 Pavel Machek <pa...@ucw.cz>:
>> > On Tue 2009-11-24 12:53:09, Miklos Szeredi wrote:
>> >> I still believe leaving the current semantics and documenting them is
>> >> the best option.
>> >
>> > I believe that current semantics is ugly enough that 'documenting' it
>> > is not enough... and people want to port from other systems, too, not
>> > expecting nasty surprises like this...
>>
>> Solaris 10 works the same way as Linux does now, so I don't think the
>> porting argument gets you anywhere.
>
> It certainly must be similar, as gnulib uses the same technique on
> both Solaris and Linux.
>
> I don't have a Solaris to try this on. Can you use /proc to re-open
> with O_RDWR a file descriptor previously opened with O_RDONLY on
> Solaris 10, assuming the underlying inode allows writing?

Yep, I basically followed Pavel's recipe and got the same result. On
the other hand, it didn't work on AIX.

> -- Jamie

Cheers,
Duane.

--
"I never could learn to drink that blood and call it wine" - Bob Dylan

Jeff Layton

unread,

Dec 1, 2009, 8:20:01 AM12/1/09

to

On Mon, 23 Nov 2009 14:05:24 -0800

ebie...@xmission.com (Eric W. Biederman) wrote:

There seems to be a lot of disagreement about whether the issue that
Pavel raised is even a bug. I think what I'm going to do at this point
is respin this patchset without that patch (just add the missing
revalidations).

I'll also plan to just have force_reval_path call do_revalidate instead
so that invalid dentries get d_invalidated too. Any other thoughts on
the first two patches in this set?

--
Jeff Layton <jla...@redhat.com>

Al Viro

unread,

Dec 16, 2009, 7:40:02 AM12/16/09

to

On Mon, Nov 23, 2009 at 06:15:45PM -0500, Jeff Layton wrote:

> The big question with all of this is: Should a task have the ability
> to follow a /proc/pid symlink to a path that it wouldn't ordinarily be
> able to resolve with a path lookup. The concensus that I got from the
> bugtraq discussion was that it should not, and this patch is an attempt
> to prevent that.
>
> I take it from you and Eric's comments that you disagree? If so, what's
> your rationale for allowing a task to resolve this symlink when it
> wouldn't ordinarily be able to do so if it were a "normal" symlink?

WTF not? It's convenient and doesn't lose any real security. If your
code relies on inaccessibility of <path> since some component of that
path is inaccessible, you are *already* fscked. Consider e.g. fchdir()
and its implications - if you have an opened descriptor for parent,
having no exec permissions on grandparent won't stop you at all. Already.
On all Unices, regardless of openat(), etc.

And that's aside of being able to see the same object at some other pathname.
Which is also possible in a lot of ways. IOW, any code relying on that class
of assumptions is very likely to be widely b0rken, even if you leave aside the
long-standing behaviour of Linux.

I might buy the argument about restricting reopening with wider permissions,
but
a) we still are looking at possible userland breakage of the worst
kind - random scripts passing /dev/fd/42 as command line arguments to
random programs. Once in a while. With error checking being... not quite
sufficient.
b) it's not just open - we have at least chmod/chown/truncate to
deal with.

Prohibiting *all* access is a complete non-starter - things like
cmp foo /dev/stdin || ....
would bloody better work and nobody cares whether you have redirect
from something out of your reach at the moment.

Pavel Machek

unread,

Dec 20, 2009, 3:00:02 PM12/20/09

to

On Wed 2009-12-16 12:31:43, Al Viro wrote:
> On Mon, Nov 23, 2009 at 06:15:45PM -0500, Jeff Layton wrote:
>
> > The big question with all of this is: Should a task have the ability
> > to follow a /proc/pid symlink to a path that it wouldn't ordinarily be
> > able to resolve with a path lookup. The concensus that I got from the
> > bugtraq discussion was that it should not, and this patch is an attempt
> > to prevent that.
> >
> > I take it from you and Eric's comments that you disagree? If so, what's
> > your rationale for allowing a task to resolve this symlink when it
> > wouldn't ordinarily be able to do so if it were a "normal" symlink?
>
> WTF not? It's convenient and doesn't lose any real security. If your
> code relies on inaccessibility of <path> since some component of that
> path is inaccessible, you are *already* fscked. Consider e.g. fchdir()
> and its implications - if you have an opened descriptor for parent,
> having no exec permissions on grandparent won't stop you at all. Already.
> On all Unices, regardless of openat(), etc.

Consider FD passing over unix socket. Passing R/O file descriptor to
the other task, then having the task write to the file is certainly bad.

> I might buy the argument about restricting reopening with wider permissions,
> but
> a) we still are looking at possible userland breakage of the worst
> kind - random scripts passing /dev/fd/42 as command line arguments to
> random programs. Once in a while. With error checking being... not quite
> sufficient.
> b) it's not just open - we have at least chmod/chown/truncate to
> deal with.

That's indeed the sane way to solve that.

> Prohibiting *all* access is a complete non-starter - things like
> cmp foo /dev/stdin || ....
> would bloody better work and nobody cares whether you have redirect
> from something out of your reach at the moment.

Ok.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Pavel Machek

unread,

Dec 20, 2009, 4:10:01 PM12/20/09

to

On Sun 2009-12-20 21:04:04, Al Viro wrote:

> On Sun, Dec 20, 2009 at 08:59:03PM +0100, Pavel Machek wrote:
> > > WTF not? It's convenient and doesn't lose any real security. If your
> > > code relies on inaccessibility of <path> since some component of that
> > > path is inaccessible, you are *already* fscked. Consider e.g. fchdir()
> > > and its implications - if you have an opened descriptor for parent,
> > > having no exec permissions on grandparent won't stop you at all. Already.
> > > On all Unices, regardless of openat(), etc.
> >
> > Consider FD passing over unix socket. Passing R/O file descriptor to
> > the other task, then having the task write to the file is certainly bad.
>

> You've omitted the "R/O file descriptor of a file that is writable for
> that other task" part...

That is 666 for the other task. But the other task can't access it due
to directory being 700 or something. Your fchdir() argument does not
apply here.

Al Viro

unread,

Dec 20, 2009, 4:10:02 PM12/20/09

to

On Sun, Dec 20, 2009 at 08:59:03PM +0100, Pavel Machek wrote:

> > WTF not? It's convenient and doesn't lose any real security. If your
> > code relies on inaccessibility of <path> since some component of that
> > path is inaccessible, you are *already* fscked. Consider e.g. fchdir()
> > and its implications - if you have an opened descriptor for parent,
> > having no exec permissions on grandparent won't stop you at all. Already.
> > On all Unices, regardless of openat(), etc.
>
> Consider FD passing over unix socket. Passing R/O file descriptor to
> the other task, then having the task write to the file is certainly bad.

You've omitted the "R/O file descriptor of a file that is writable for
that other task" part...

Al Viro

unread,

Dec 20, 2009, 4:30:02 PM12/20/09

to

On Sun, Dec 20, 2009 at 10:06:19PM +0100, Pavel Machek wrote:
> > > Consider FD passing over unix socket. Passing R/O file descriptor to
> > > the other task, then having the task write to the file is certainly bad.
> >
> > You've omitted the "R/O file descriptor of a file that is writable for
> > that other task" part...
>
> That is 666 for the other task. But the other task can't access it due
> to directory being 700 or something. Your fchdir() argument does not
> apply here.

*snort*

What you are advocating is a very limited class of setups that might be
usable for protecting files if not for the existing behaviour on a shitload
of systems.

The thing is, that class *is* very limited. E.g. introduce links and it's
fallen apart. Introduce bindings and the same will happen. Just try to
extend it one level deeper and fchdir() will bite you, etc. All of that
is not dependent on procfs even being there.

Access rights belong to file, not to a pathname (and there's no such thing
as _the_ pathname of a file).

I'd buy that as a minor QoI issue; as a security one - no way.

Pavel Machek

unread,

Jan 1, 2010, 10:50:02 AM1/1/10

to

Hi!

> > > > Consider FD passing over unix socket. Passing R/O file descriptor to
> > > > the other task, then having the task write to the file is certainly bad.
> > >
> > > You've omitted the "R/O file descriptor of a file that is writable for
> > > that other task" part...
> >
> > That is 666 for the other task. But the other task can't access it due
> > to directory being 700 or something. Your fchdir() argument does not
> > apply here.
>
> *snort*
>
> What you are advocating is a very limited class of setups that might be
> usable for protecting files if not for the existing behaviour on a shitload
> of systems.
>
> The thing is, that class *is* very limited. E.g. introduce links and it's
> fallen apart. Introduce bindings and the same will happen. Just try to
> extend it one level deeper and fchdir() will bite you, etc. All of that
> is not dependent on procfs even being there.
>
> Access rights belong to file, not to a pathname (and there's no such thing
> as _the_ pathname of a file).
>
> I'd buy that as a minor QoI issue; as a security one - no way.

Ok, so you see it as a (QoI) problem, but not too major. Good; I hope
it gets fixed one day.
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Al Viro

unread,

Jan 9, 2010, 11:50:02 PM1/9/10

to

On Fri, Jan 01, 2010 at 04:40:27PM +0100, Pavel Machek wrote:
> > Access rights belong to file, not to a pathname (and there's no such thing
> > as _the_ pathname of a file).
> >
> > I'd buy that as a minor QoI issue; as a security one - no way.
>
> Ok, so you see it as a (QoI) problem, but not too major. Good; I hope
> it gets fixed one day.

Actually, I'm not even sure that it *is* worse than what we'd get after
such change. Note that it's not just about trying to reopen a file
currently opened r/o for write; there's the opposite case. We'd break
scripts that try to read /dev/stderr and expect to be called with stderr
redirected to caller-writable file. With redirects done with 2> and not
2<>. Sure, it's a lousy practice. And scripts in question are not
well-written in general. Downright unmaintainable, in fact. Written
by sysadmin that had left the job five years ago and can't be located,
even if he could be bribed into touching That Shite(tm) ever again.

We have far lousier kinds of behaviour we can't fix for compatibility
reasons. O_CREAT on dangling symlinks, for one. We tried to switch to
sane variant (from the current "create file wherever that symlink points
to") and had to revert due to userland crap that actually relied on that
insanity.