Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

RLIMIT_NPROC check in set_user()

152 views
Skip to first unread message

Vasiliy Kulikov

unread,
Jun 12, 2011, 9:20:02 AM6/12/11
to
Hi,

I'd want to start a discussion of RLIMIT_NPROC check in set_user().
8 years ago set_user() was forced to check whether RLIMIT_NPROC limit is
reached, and, if so, abort set_user():

http://lkml.org/lkml/2003/7/13/226 [1]

Before the patch setuid() and similar were not able to fail because
of the reached limit. So, some daemons running as root, dropping root
and switching uid/gid to some user were able to greatly exceed the limit
of processes running as this user.

The patch has solved this problem. But it also unexpectedly created new
security threat. Many poorly written programs running as root (or
owning CAP_SYS_ADMIN) don't expect setuid() failure and don't check its
return code. If it fails, they still assume the uid has changed, but
actually it has not, which leads to very sad consequences.

In times of Linux 2.4 the initial problem (the lack of RLIMIT_NPROC
check) was solved in -ow patches by introducing the check in execve(),
not in setuid()/setuid() helpers as a process after dropping privileges
usually does execve() call. While strictly speaking it is not a full
fix (it doesn't limit the number of not-execve'd but setuid'ed
processes) it works just fine for most of programs.

Another possible workaround is not moving the check from setuid() to
execve(), but sending SIGSEGV to the current process if setuid() failed [2].
This should solve the problem of poor programs and looks like not
breaking legitimate applications that handle setuid() failure as they
usually just print error message to the logfile/stderr and exit. Also
as it is a horribly rare case (setuid() failure), more complicated code
path might be not tested very well.

I want to repeat myself: I don't consider checking RLIMIT_NPROC in
setuid() as a bug (a lack of syscalls return code checking is a real
bug), but as a pouring oil on the flames of programs doing poorly
written privilege dropping. I believe the situation may be improved by
relatively small ABI changes that shouldn't be visible to normal
programs.

The first solution is reverting [1] and introducing similar check in
execve(), just like in -ow patch for 2.4. The second solution is
applying [2] and sending SIGSEGV in case of privileges dropping failure.

I'd be happy to hear opinions about improving the fixes above or
alternative fixes.

Related references:
[1] - http://lkml.org/lkml/2003/7/13/226
[2] - https://lkml.org/lkml/2006/8/19/129

Thanks,

--
Vasiliy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Vasiliy Kulikov

unread,
Jul 6, 2011, 1:40:02 PM7/6/11
to
On Sun, Jun 12, 2011 at 17:09 +0400, Vasiliy Kulikov wrote:
> I'd be happy to hear opinions about improving the fixes above or
> alternative fixes.

No comments? Even "Sigh, what a nasty problem. But we cannot really
fix it without significantly breaking the stuff. Go and drink something." ?

Linus Torvalds

unread,
Jul 6, 2011, 2:10:02 PM7/6/11
to
On Wed, Jul 6, 2011 at 10:36 AM, Vasiliy Kulikov <seg...@openwall.com> wrote:
> On Sun, Jun 12, 2011 at 17:09 +0400, Vasiliy Kulikov wrote:
>> I'd be happy to hear opinions about improving the fixes above or
>> alternative fixes.
>
> No comments? �Even "Sigh, what a nasty problem. �But we cannot really
> fix it without significantly breaking the stuff. �Go and drink something." ?

Thanks for reminding me.

My reaction is: "let's just remote the crazy check from set_user()
entirely". If somebody has credentials to change users, they damn well
have credentials to override the RLIMIT_NPROC too, and as you say,
failure is likely a bigger security threat than success.

The whole point of RLIMIT_NPROC is to avoid fork-bombs. If we go over
the limit for some other reason that is controlled by the super-user,
who cares?

So let's keep it in kernel/fork.c where we actually create a *new*
process (and where everybody knows exactly what the limit means, and
people who don't check for error cases are just broken). And remove it
from everywhere else.

Hmm?

Linus

Vasiliy Kulikov

unread,
Jul 6, 2011, 3:00:02 PM7/6/11
to
On Wed, Jul 06, 2011 at 11:01 -0700, Linus Torvalds wrote:
> My reaction is: "let's just remote the crazy check from set_user()
> entirely".

Honestly, I didn't expect such a positive reaction from you in the first
reply :)


> The whole point of RLIMIT_NPROC is to avoid fork-bombs.

It is also used in cases where there is implicit or explicit limit on
some other resource per process leading to the global limit of
RLIMIT_NPROC*X. The most obvious case of X is RLIMIT_AS.

Purely pragmatic approach is introducing the check in execve() to
heuristically limit the number of user processes. If the program uses
PAM to register a user session, maxlogins from pam_limits is the Right
Way. But many programs simply don't use PAM because of the performance
issues. E.g. apache doesn't use PAM. On a shared web hosting this is a
real issue.

In -ow patch execve() checked for the exceeded RLIMIT_NPROC, which
effectively solved Apache's problem.

...and execve() error handling is hard to miss ;-)


> So let's keep it in kernel/fork.c where we actually create a *new*
> process (and where everybody knows exactly what the limit means, and
> people who don't check for error cases are just broken). And remove it
> from everywhere else.

There are checks only in copy_process() and set_user().

Thanks,

--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

Vasiliy Kulikov

unread,
Jul 7, 2011, 4:00:02 AM7/7/11
to
(Sorry, I've dropped Linus from CC somehow ;-)

Vasiliy Kulikov

unread,
Jul 7, 2011, 4:20:02 AM7/7/11
to

Vasiliy Kulikov

unread,
Jul 12, 2011, 9:30:02 AM7/12/11
to
The patch http://lkml.org/lkml/2003/7/13/226 introduced a RLIMIT_NPROC
check in set_user() to check for NPROC exceeding via setuid() and
similar functions. Before the check there was a possibility to greatly
exceed the allowed number of processes by an unprivileged user if the
program relied on rlimit only. But the check created new security
threat: many poorly written programs simply don't check setuid() return
code and believe it cannot fail if executed with root privileges. So,
the check is removed in this patch because of too often privilege
escalations related to buggy programs.

The NPROC can still be enforced in the common code flow of daemons
spawning user processes. Most of daemons do fork()+setuid()+execve().
The check introduced in execve() enforces the same limit as in setuid()
and doesn't create similar security issues.

Similar check was introduced in -ow patches.

Signed-off-by: Vasiliy Kulikov <seg...@openwall.com>
---
fs/exec.c | 13 +++++++++++++
kernel/sys.c | 6 ------
2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index ea5f748..0baf5c9 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1436,6 +1436,19 @@ static int do_execve_common(const char *filename,
struct files_struct *displaced;
bool clear_in_exec;
int retval;
+ const struct cred *cred = current_cred();
+
+ /*
+ * We check for RLIMIT_NPROC in execve() instead of set_user() because
+ * too many poorly written programs don't check setuid() return code.
+ * The check in execve() does the same thing for programs doing
+ * setuid()+execve(), but without similar security issues.
+ */
+ if (atomic_read(&cred->user->processes) > rlimit(RLIMIT_NPROC) &&
+ cred->user != INIT_USER) {
+ retval = -EAGAIN;
+ goto out_ret;
+ }

retval = unshare_files(&displaced);
if (retval)
diff --git a/kernel/sys.c b/kernel/sys.c
index e4128b2..0e7509a 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -591,12 +591,6 @@ static int set_user(struct cred *new)
if (!new_user)
return -EAGAIN;

- if (atomic_read(&new_user->processes) >= rlimit(RLIMIT_NPROC) &&
- new_user != INIT_USER) {
- free_uid(new_user);
- return -EAGAIN;
- }
-
free_uid(new->user);
new->user = new_user;
return 0;

Linus Torvalds

unread,
Jul 12, 2011, 5:20:02 PM7/12/11
to
On Tue, Jul 12, 2011 at 6:27 AM, Vasiliy Kulikov <seg...@openwall.com> wrote:
>
> The NPROC can still be enforced in the common code flow of daemons
> spawning user processes.  Most of daemons do fork()+setuid()+execve().
> The check introduced in execve() enforces the same limit as in setuid()
> and doesn't create similar security issues.

Ok, this looks fine by me. I'd like to get some kind of comment from
the selinux etc people (James?) but I'd certainly be willing to take
this.

Failing when executing a suid application rather than when a suid
application releases its heightened credentials seems to be the
fundamentally saner approach. IOW, failing to raise privileges rather
than failing to lower them.

Linus

NeilBrown

unread,
Jul 12, 2011, 7:20:01 PM7/12/11
to
On Tue, 12 Jul 2011 14:16:10 -0700 Linus Torvalds
<torv...@linux-foundation.org> wrote:

> On Tue, Jul 12, 2011 at 6:27 AM, Vasiliy Kulikov <seg...@openwall.com> wrote:
> >
> > The NPROC can still be enforced in the common code flow of daemons
> > spawning user processes.  Most of daemons do fork()+setuid()+execve().
> > The check introduced in execve() enforces the same limit as in setuid()
> > and doesn't create similar security issues.
>
> Ok, this looks fine by me. I'd like to get some kind of comment from
> the selinux etc people (James?) but I'd certainly be willing to take
> this.
>
> Failing when executing a suid application rather than when a suid
> application releases its heightened credentials seems to be the
> fundamentally saner approach. IOW, failing to raise privileges rather
> than failing to lower them.
>
> Linus

I am happy with the patch in general - it adequately addresses the problem
which I fixed by adding the test to set_user which is now being removed.

However I don't think your characterisation is quite correct Linus.
There is no setuid application, and there is no raising of privileges.

The contrast is really "failing when trying to use reduced privileges is
safer than failing to reduce privileges - if the reduced privileges are not
available".

Note that there is room for a race that could have unintended consequences.

Between the 'setuid(ordinary-user)' and a subsequent 'exit()' after execve()
has failed, any other process owned by the same user (and we know where are
quite a few) would fail an execve() where it really should not.

I think it would be safer to add a test for PF_SUPERPRIV and PF_FORKNOEXEC
in current->flags and only fail the execve if both are set.
i.e.
(current->flags & (PF_SUPERPRIV|PF_FORKNOEXEC)) == (PF_SUPERPRIV|PF_FORKNOEXEC)

That should narrow it down to only failing in the particular case that we are
interested in.

NeilBrown

KOSAKI Motohiro

unread,
Jul 13, 2011, 1:40:02 AM7/13/11
to
(2011/07/12 22:27), Vasiliy Kulikov wrote:
> The patch http://lkml.org/lkml/2003/7/13/226 introduced a RLIMIT_NPROC
> check in set_user() to check for NPROC exceeding via setuid() and
> similar functions. Before the check there was a possibility to greatly
> exceed the allowed number of processes by an unprivileged user if the
> program relied on rlimit only. But the check created new security
> threat: many poorly written programs simply don't check setuid() return
> code and believe it cannot fail if executed with root privileges. So,
> the check is removed in this patch because of too often privilege
> escalations related to buggy programs.
>
> The NPROC can still be enforced in the common code flow of daemons
> spawning user processes. Most of daemons do fork()+setuid()+execve().
> The check introduced in execve() enforces the same limit as in setuid()
> and doesn't create similar security issues.
>
> Similar check was introduced in -ow patches.
>
> Signed-off-by: Vasiliy Kulikov <seg...@openwall.com>

BSD folks tell me NetBSD has the exactly same hack (ie check at exec instead
setuid) since 2008. Then, I think this is enough proved safer way.

http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/kern_exec.c?rev=1.316&content-type=text/x-cvsweb-markup&only_with_tag=MAIN

Thx.

Solar Designer

unread,
Jul 13, 2011, 2:40:01 AM7/13/11
to
Linus, Neil, Motohiro - thank you for your comments!

On Wed, Jul 13, 2011 at 09:14:08AM +1000, NeilBrown wrote:
> The contrast is really "failing when trying to use reduced privileges is
> safer than failing to reduce privileges - if the reduced privileges are not
> available".

Right.

> Note that there is room for a race that could have unintended consequences.
>
> Between the 'setuid(ordinary-user)' and a subsequent 'exit()' after execve()
> has failed, any other process owned by the same user (and we know where are
> quite a few) would fail an execve() where it really should not.

It is not obvious to me that this is unintended, and that dealing with
it in some way makes much of a difference. (Also, it's not exactly "any
other process owned by the same user" - this only affects processes that
also run with similar or lower RLIMIT_NPROC. So, for example, if a web
server is set to use RLIMIT_NPROC of 30, but interactive logins use 40,
then the latter may succeed and allow for shell commands to succeed.
This is actually a common combination of settings that we've been using
on some systems for years.)

> I think it would be safer to add a test for PF_SUPERPRIV and PF_FORKNOEXEC
> in current->flags and only fail the execve if both are set.
> i.e.
> (current->flags & (PF_SUPERPRIV|PF_FORKNOEXEC)) == (PF_SUPERPRIV|PF_FORKNOEXEC)
>
> That should narrow it down to only failing in the particular case that we are
> interested in.

That's a curious idea, and apparently this is what NetBSD does, but
unfortunately it does not match a common use case that we are interested
in - specifically, Apache with suEXEC (which is part of the Apache
distribution). Here's what happens:

httpd runs as non-root. It forks, execs suexec (SUID root). suexec
calls setuid() to the target non-root user and execve() on the CGI
program (script, interpreter, whatever).

Notice how the fork() and the setuid() are separated by execve() of
suexec itself. Thus, we need to apply the RLIMIT_NPROC check on
execve() unconditionally (well, we may allow processes with
CAP_SYS_RESOURCE to proceed despite of the failed check, like it's
done in -ow patches), or at least not on the condition proposed above.

Alexander

NeilBrown

unread,
Jul 13, 2011, 3:10:02 AM7/13/11
to
On Wed, 13 Jul 2011 10:31:42 +0400 Solar Designer <so...@openwall.com> wrote:

> Linus, Neil, Motohiro - thank you for your comments!
>
> On Wed, Jul 13, 2011 at 09:14:08AM +1000, NeilBrown wrote:
> > The contrast is really "failing when trying to use reduced privileges is
> > safer than failing to reduce privileges - if the reduced privileges are not
> > available".
>
> Right.
>
> > Note that there is room for a race that could have unintended consequences.
> >
> > Between the 'setuid(ordinary-user)' and a subsequent 'exit()' after execve()
> > has failed, any other process owned by the same user (and we know where are
> > quite a few) would fail an execve() where it really should not.
>
> It is not obvious to me that this is unintended, and that dealing with
> it in some way makes much of a difference. (Also, it's not exactly "any
> other process owned by the same user" - this only affects processes that
> also run with similar or lower RLIMIT_NPROC. So, for example, if a web
> server is set to use RLIMIT_NPROC of 30, but interactive logins use 40,
> then the latter may succeed and allow for shell commands to succeed.
> This is actually a common combination of settings that we've been using
> on some systems for years.)

I don't think it can be intended to cause 'execve' to fail when a user is at
the NPROC limit - except in the specific case that the process has previously
called setuid. So I feel justified in calling it an unintended consequence.
It my not be a very common consequence but but we all know that uncommon
things do happen.

I agree that having different limits for different cases could make this much
less of a problem, but it doesn't necessarily remove it.

>
> > I think it would be safer to add a test for PF_SUPERPRIV and PF_FORKNOEXEC
> > in current->flags and only fail the execve if both are set.
> > i.e.
> > (current->flags & (PF_SUPERPRIV|PF_FORKNOEXEC)) == (PF_SUPERPRIV|PF_FORKNOEXEC)
> >
> > That should narrow it down to only failing in the particular case that we are
> > interested in.
>
> That's a curious idea, and apparently this is what NetBSD does, but
> unfortunately it does not match a common use case that we are interested
> in - specifically, Apache with suEXEC (which is part of the Apache
> distribution). Here's what happens:
>
> httpd runs as non-root. It forks, execs suexec (SUID root). suexec
> calls setuid() to the target non-root user and execve() on the CGI
> program (script, interpreter, whatever).
>
> Notice how the fork() and the setuid() are separated by execve() of
> suexec itself. Thus, we need to apply the RLIMIT_NPROC check on
> execve() unconditionally (well, we may allow processes with
> CAP_SYS_RESOURCE to proceed despite of the failed check, like it's
> done in -ow patches), or at least not on the condition proposed above.
>
> Alexander

Yes, the PF_FORKNOEXEC test causes problems in that case.

Using just the PF_SUPERPRIV test would still be a good idea I think, but would
not be quite as thorough a check.
Adding a new PF flag would be possible (there seem to be 3 unused) but is
probably not justified.


NeilBrown

Linus Torvalds

unread,
Jul 13, 2011, 4:50:01 PM7/13/11
to
It sounds like people are effectively Ack'ing the patch, but with this
kind of patch I don't want to add the "implicit Ack" that I often do
for regular stuff.

So could people who think that the patch is ok in its current form
just send me their acked-by or reviewed-by? I haven't heard any actual
objection to it, and I think it's valid for the current -rc.

Alternatively, feel free to send a comment like "I think it's the
right thing, but maybe it should wait for the next merge window"..

Linus

James Morris

unread,
Jul 13, 2011, 8:20:02 PM7/13/11
to
On Wed, 13 Jul 2011, Linus Torvalds wrote:

> It sounds like people are effectively Ack'ing the patch, but with this
> kind of patch I don't want to add the "implicit Ack" that I often do
> for regular stuff.
>
> So could people who think that the patch is ok in its current form
> just send me their acked-by or reviewed-by? I haven't heard any actual
> objection to it, and I think it's valid for the current -rc.
>
> Alternatively, feel free to send a comment like "I think it's the
> right thing, but maybe it should wait for the next merge window"..

Count me in the latter.

It does look ok to me, but I'd be happier if it had more testing first (in
-mm perhaps). I think some security folk may be on summer vacation, too.


- James
--
James Morris
<jmo...@namei.org>

NeilBrown

unread,
Jul 13, 2011, 9:30:02 PM7/13/11
to
On Thu, 14 Jul 2011 10:11:57 +1000 (EST) James Morris <jmo...@namei.org>
wrote:

> On Wed, 13 Jul 2011, Linus Torvalds wrote:
>
> > It sounds like people are effectively Ack'ing the patch, but with this
> > kind of patch I don't want to add the "implicit Ack" that I often do
> > for regular stuff.
> >
> > So could people who think that the patch is ok in its current form
> > just send me their acked-by or reviewed-by? I haven't heard any actual
> > objection to it, and I think it's valid for the current -rc.
> >
> > Alternatively, feel free to send a comment like "I think it's the
> > right thing, but maybe it should wait for the next merge window"..
>
> Count me in the latter.
>
> It does look ok to me, but I'd be happier if it had more testing first (in
> -mm perhaps). I think some security folk may be on summer vacation, too.
>
>
> - James

I'm still trying to understand the full consequences, but I agree that there
is no rush - the code has been this way for quite a while and their is no
obvious threat that would expedite things (as far as I know).

However I'm not convinced that testing will help all that much - if there are
problems they will be is rare corner cases that testing is unlikely to find.

The only case where this change will improve safety is where:
1/ a process has RLIMIT_NPROC set
2/ the process is running with root privileges
3/ the process calls setuid() and doesn't handle errors.


I think the only times that a root process would have RLIMIT_NPROC set are:
1/ if it explicitly set up rlimits before calling setuid. In this case
we should be able to expect that the process checks setuid .. maybe
this is naive(?)
2/ if the process was setuid-root and inherited rlimits from before, and
never re-set them. In this case it is easy to imagine that a setuid()
would not be checked.


So maybe an alternate 'fix' would be to reset all rlimits to match init_task
when a setuid-root happens. There are other corner cases where inappropriate
rlimits can cause setuid programs to behave in ways they don't expect.
Obviously such programs are buggy, but so are programs that don't check
'setuid'. (There is a CVE about mount potentially corrupting mtab.)

... maybe that would introduce other problems though.

In short: while I don't feel bold enough to 'nack' this patch, I don't really
feel comfortable enough to 'ack' it either.

NeilBrown

KOSAKI Motohiro

unread,
Jul 13, 2011, 9:40:02 PM7/13/11
to
(2011/07/14 9:11), James Morris wrote:
> On Wed, 13 Jul 2011, Linus Torvalds wrote:
>
>> It sounds like people are effectively Ack'ing the patch, but with this
>> kind of patch I don't want to add the "implicit Ack" that I often do
>> for regular stuff.
>>
>> So could people who think that the patch is ok in its current form
>> just send me their acked-by or reviewed-by? I haven't heard any actual
>> objection to it, and I think it's valid for the current -rc.
>>
>> Alternatively, feel free to send a comment like "I think it's the
>> right thing, but maybe it should wait for the next merge window"..
>
> Count me in the latter.
>
> It does look ok to me, but I'd be happier if it had more testing first (in
> -mm perhaps). I think some security folk may be on summer vacation, too.

I don't think I am best person to take ack. but I also don't want to hesitate
to help Solar's good improvemnt.
Reviewed-by: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>

And I'll second James. next mere window is probably safer.

Thanks.

Solar Designer

unread,
Jul 14, 2011, 11:10:01 AM7/14/11
to
On Thu, Jul 14, 2011 at 11:27:51AM +1000, NeilBrown wrote:
> I'm still trying to understand the full consequences, but I agree that there
> is no rush - the code has been this way for quite a while and their is no
> obvious threat that would expedite things (as far as I know).

I don't insist on getting this in sooner than in the next merge window,
although I would have liked that. Relevant userspace vulnerabilities
are being found quite often - I'll include some examples below.

> However I'm not convinced that testing will help all that much - if there are
> problems they will be is rare corner cases that testing is unlikely to find.

This makes sense.

> The only case where this change will improve safety is where:
> 1/ a process has RLIMIT_NPROC set
> 2/ the process is running with root privileges
> 3/ the process calls setuid() and doesn't handle errors.

Yes, and this is a pretty common case.

> I think the only times that a root process would have RLIMIT_NPROC set are:
> 1/ if it explicitly set up rlimits before calling setuid. In this case
> we should be able to expect that the process checks setuid .. maybe
> this is naive(?)

RLIMIT_NPROC could be set by the parent process or by pam_limits. The
machine I am typing this on has:

* hard nproc 200

(as well as other limits) in /etc/security/limits.conf, so if this
machine's kernel let setuid() fail on RLIMIT_NPROC, I would be taking
extra risk of a security compromise by reducing the risk of system
crashes from inadvertent excessive resource consumption by runaway
processes - a tradeoff I'd rather avoid.

> 2/ if the process was setuid-root and inherited rlimits from before, and
> never re-set them. In this case it is easy to imagine that a setuid()
> would not be checked.

Right. (In practice, all kinds of programs tend to forget to check
setuid() return value, though.)

Actually, for the problem to apply to setuid-root programs, they need to
switch their real uid first (more fully become root), then try to switch
to a user - but this is common.

Here are some examples for 2011-2010:

"... missing setuid() retval check in opielogin which leads to easy root
compromise":

http://www.openwall.com/lists/oss-security/2011/06/22/6

"The /usr/lib/libgnomesu/gnomesu-pam-backend suid binary which belongs
to the libgnomesu package is not checking setuid() return values.

As a result, two cooperating users, or users with access to guest,
cgi or web accounts can run arbitrary commands as root very easily."

http://www.openwall.com/lists/oss-security/2011/05/30/2

pam_xauth (exploitable if pam_limits is also used):

http://www.openwall.com/lists/oss-security/2010/08/16/2

A collection of examples from 2006:

http://lists.openwall.net/linux-kernel/2006/08/20/156

> So maybe an alternate 'fix' would be to reset all rlimits to match init_task
> when a setuid-root happens. There are other corner cases where inappropriate
> rlimits can cause setuid programs to behave in ways they don't expect.
> Obviously such programs are buggy, but so are programs that don't check
> 'setuid'. (There is a CVE about mount potentially corrupting mtab.)

Right, but to me possibly resetting rlimits is not an "alternative" to
moving the RLIMIT_NPROC check. setuid-root exec is not the only case
where having setuid() fail on RLIMIT_NPROC is undesirable. We also
don't want such failures with pam_limits, nor on daemon startup:

http://www.openwall.com/lists/oss-security/2009/07/14/2

As to resetting rlimits on SUID/SGID exec, I think this would make
sense for RLIMIT_FSIZE, which would mitigate the mount mtab issue
(thank you for bringing it up!) But it's to be discussed separately.

Alexander

NeilBrown

unread,
Jul 14, 2011, 11:40:02 PM7/14/11
to

This is useful background that I think would be good have int the changelog.

I'm still bothers that the proposed patch can cause exec to fail for an
separate 'innocent' process.
It also seems to 'hide' the problem - just shuffling code around.
The comment in do_execve_common helps. A similar comment in set_user
wouldn't hurt.

But what do you think of this. It sure that only the process which ignored
the return value from setuid is inconvenienced.
??

NeilBrown

commit e47554d38340d4c4c3eccf207de682fe6b29224e
Author: NeilBrown <ne...@suse.de>
Date: Fri Jul 15 13:20:09 2011 +1000

Protect execve from being used after 'setuid' has failed.

Many programs do not check setuid for failure so when it fails they
might continue to use elevated privileged without intending to.
Of particular concern is a call to 'exec' as this is common after
dropping privileges and allows the elevated privileges to be misused.



The patch http://lkml.org/lkml/2003/7/13/226 introduced a RLIMIT_NPROC
check in set_user() to check for NPROC exceeding via setuid() and
similar functions. Before the check there was a possibility to greatly
exceed the allowed number of processes by an unprivileged user if the
program relied on rlimit only. But the check created new security
threat: many poorly written programs simply don't check setuid() return
code and believe it cannot fail if executed with root privileges. So,

when the check fails we set a process flag which causes execve to fail.

Following examples of vulnerabilities due to processes not checking
error from setuid provided by Solar Designer <so...@openwall.com>:



Here are some examples for 2011-2010:

"... missing setuid() retval check in opielogin which leads to easy root
compromise":

http://www.openwall.com/lists/oss-security/2011/06/22/6

"The /usr/lib/libgnomesu/gnomesu-pam-backend suid binary which belongs
to the libgnomesu package is not checking setuid() return values.

As a result, two cooperating users, or users with access to guest,
cgi or web accounts can run arbitrary commands as root very easily."

http://www.openwall.com/lists/oss-security/2011/05/30/2

pam_xauth (exploitable if pam_limits is also used):

http://www.openwall.com/lists/oss-security/2010/08/16/2

A collection of examples from 2006:

http://lists.openwall.net/linux-kernel/2006/08/20/156

Signed-off-by: NeilBrown <ne...@suse.de>

diff --git a/fs/exec.c b/fs/exec.c
index 6075a1e..c282ebb 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1434,6 +1434,16 @@ static int do_execve_common(const char *filename,
bool clear_in_exec;
int retval;

+ if (current->flags & PF_SETUSER_FAILED) {
+ /* setuid failed and program is trying to
+ * 'exec' anyway (rather than e.g. clean up and
+ * exit), so fail rather than allow a possible
+ * security breach
+ */


+ retval = -EAGAIN;
+ goto out_ret;
+ }
+
retval = unshare_files(&displaced);
if (retval)

goto out_ret;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 496770a..6b13923 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1759,6 +1759,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
#define PF_DUMPCORE 0x00000200 /* dumped core */
#define PF_SIGNALED 0x00000400 /* killed by a signal */
#define PF_MEMALLOC 0x00000800 /* Allocating memory */
+#define PF_SETUSER_FAILED 0x00001000 /* set_user failed so privs should not be used */
#define PF_USED_MATH 0x00002000 /* if unset the fpu must be initialized before use */
#define PF_FREEZING 0x00004000 /* freeze in progress. do not account to load */
#define PF_NOFREEZE 0x00008000 /* this thread should not be frozen */
diff --git a/kernel/cred.c b/kernel/cred.c
index 174fa84..05c46fc 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -508,10 +508,8 @@ int commit_creds(struct cred *new)
key_fsgid_changed(task);

/* do it
- * - What if a process setreuid()'s and this brings the
- * new uid over his NPROC rlimit? We can check this now
- * cheaply with the new uid cache, so if it matters
- * we should be checking for it. -DaveM
+ * RLIMIT_NPROC limits on user->processes have already been imposed
+ * in set_user().
*/
alter_cred_subscribers(new, 2);
if (new->user != old->user)
diff --git a/kernel/sys.c b/kernel/sys.c
index e4128b2..9b8f03f 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -588,6 +588,8 @@ static int set_user(struct cred *new)
struct user_struct *new_user;

new_user = alloc_uid(current_user_ns(), new->uid);
+ current->flags |= PF_SETUSER_FAILED;
+
if (!new_user)
return -EAGAIN;

@@ -596,6 +598,7 @@ static int set_user(struct cred *new)
free_uid(new_user);
return -EAGAIN;
}
+ current->flags &= ~PF_SETUSER_FAILED;



free_uid(new->user);
new->user = new_user;

Willy Tarreau

unread,
Jul 15, 2011, 1:40:01 AM7/15/11
to
Hi Neil,

On Fri, Jul 15, 2011 at 01:30:13PM +1000, NeilBrown wrote:
(...)


> But what do you think of this. It sure that only the process which ignored
> the return value from setuid is inconvenienced.

(...)

I think this is a smart idea. But will the flag be inherited by children
over a fork() ? If not, we might as well block fork(), because we can
expect a lot of fork+exec situations which are as dangerous as the simple
execve().

Regards,
Willy

Vasiliy Kulikov

unread,
Jul 15, 2011, 2:40:01 AM7/15/11
to
Neil,

On Fri, Jul 15, 2011 at 13:30 +1000, NeilBrown wrote:
> I'm still bothers that the proposed patch can cause exec to fail for an
> separate 'innocent' process.
> It also seems to 'hide' the problem - just shuffling code around.
> The comment in do_execve_common helps. A similar comment in set_user
> wouldn't hurt.
>
> But what do you think of this. It sure that only the process which ignored
> the return value from setuid is inconvenienced.

I don't like it. You're mixing the main problem and an RLIMIT check
enforcement. The main goal is denying setuid() to fail unless there is not
enough privileges, RLIMIT in execve() is just an *attempt* to still count
NPROC in *some* widespread cases. But you're trying to fix setuid()
where RLIMIT accounting is simple :\

Your patch doesn't address the core issue in this situation:

setuid(); /* it fails because of RLIMIT */
do_some_fs();
execve();

do_some_fs() should be called ONLY if root is dropped. In your scheme
the process may interact with FS as root while thinking it is nonroot,
which almost always leads to privilege escalation.

Thanks,

--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

NeilBrown

unread,
Jul 15, 2011, 3:10:02 AM7/15/11
to
On Fri, 15 Jul 2011 10:31:13 +0400 Vasiliy Kulikov <seg...@openwall.com>
wrote:

> Neil,
>
> On Fri, Jul 15, 2011 at 13:30 +1000, NeilBrown wrote:
> > I'm still bothers that the proposed patch can cause exec to fail for an
> > separate 'innocent' process.
> > It also seems to 'hide' the problem - just shuffling code around.
> > The comment in do_execve_common helps. A similar comment in set_user
> > wouldn't hurt.
> >
> > But what do you think of this. It sure that only the process which ignored
> > the return value from setuid is inconvenienced.
>
> I don't like it. You're mixing the main problem and an RLIMIT check
> enforcement. The main goal is denying setuid() to fail unless there is not
> enough privileges, RLIMIT in execve() is just an *attempt* to still count
> NPROC in *some* widespread cases. But you're trying to fix setuid()
> where RLIMIT accounting is simple :\
>
> Your patch doesn't address the core issue in this situation:
>
> setuid(); /* it fails because of RLIMIT */
> do_some_fs();
> execve();
>
> do_some_fs() should be called ONLY if root is dropped. In your scheme
> the process may interact with FS as root while thinking it is nonroot,
> which almost always leads to privilege escalation.
>
> Thanks,
>

How about this then?

From 4a1411d10c90510a4eedf94bfd6b2578425271ba Mon Sep 17 00:00:00 2001
From: NeilBrown <ne...@suse.de>
Date: Fri, 15 Jul 2011 13:20:09 +1000
Subject: [PATCH] Protect execve from being used after 'setuid' exceeds RLIMIT_NPROC

Many programs to not check setuid for failure so it is not safe
for it to fail. So impose the RLIMIT_NPROC limit by noting the
excess in set_user with a process flag, and failing exec() if the
flag is set.

http://www.openwall.com/lists/oss-security/2011/06/22/6

http://www.openwall.com/lists/oss-security/2011/05/30/2

http://www.openwall.com/lists/oss-security/2010/08/16/2

http://lists.openwall.net/linux-kernel/2006/08/20/156

Signed-off-by: NeilBrown <ne...@suse.de>

diff --git a/fs/exec.c b/fs/exec.c
index 6075a1e..dfe9c61 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1434,6 +1434,17 @@ static int do_execve_common(const char *filename,
bool clear_in_exec;
int retval;

+ if (current->flags & PF_NPROC_EXCEEDED) {
+ /* setuid noticed that RLIMIT_NPROC was
+ * exceeded, but didn't fail as that easily
+ * leads to privileges leaking. So fail
+ * here instead - we still limit the number of
+ * processes running the user's code.


+ */
+ retval = -EAGAIN;
+ goto out_ret;
+ }
+
retval = unshare_files(&displaced);
if (retval)
goto out_ret;
diff --git a/include/linux/sched.h b/include/linux/sched.h

index 496770a..f024c63 100644


--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1759,6 +1759,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
#define PF_DUMPCORE 0x00000200 /* dumped core */
#define PF_SIGNALED 0x00000400 /* killed by a signal */
#define PF_MEMALLOC 0x00000800 /* Allocating memory */

+#define PF_NPROC_EXCEEDED 0x00001000 /* set_user noticed that RLIMIT_NPROC was exceeded */


#define PF_USED_MATH 0x00002000 /* if unset the fpu must be initialized before use */
#define PF_FREEZING 0x00004000 /* freeze in progress. do not account to load */
#define PF_NOFREEZE 0x00008000 /* this thread should not be frozen */
diff --git a/kernel/cred.c b/kernel/cred.c

index 174fa84..8ef31f5 100644


--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -508,10 +508,8 @@ int commit_creds(struct cred *new)
key_fsgid_changed(task);

/* do it
- * - What if a process setreuid()'s and this brings the
- * new uid over his NPROC rlimit? We can check this now
- * cheaply with the new uid cache, so if it matters
- * we should be checking for it. -DaveM

+ * RLIMIT_NPROC limits on user->processes have already been checked


+ * in set_user().
*/
alter_cred_subscribers(new, 2);
if (new->user != old->user)
diff --git a/kernel/sys.c b/kernel/sys.c

index e4128b2..dd1fb9d 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -592,10 +592,9 @@ static int set_user(struct cred *new)
return -EAGAIN;



if (atomic_read(&new_user->processes) >= rlimit(RLIMIT_NPROC) &&
- new_user != INIT_USER) {
- free_uid(new_user);
- return -EAGAIN;
- }

+ new_user != INIT_USER)
+ /* Cause any subsequent 'exec' to fail */
+ current->flags |= PF_NPROC_EXCEEDED;



free_uid(new->user);
new->user = new_user;

Vasiliy Kulikov

unread,
Jul 15, 2011, 3:40:02 AM7/15/11
to
Neil,

On Fri, Jul 15, 2011 at 17:06 +1000, NeilBrown wrote:
> How about this then?

AFAIU, with this patch:

1) setuid() doesn't fail in NPROC exceed case.
2) NPROC is forced on execve() after setuid().
3) execve() fails only if NPROC was exceeded during setuid() execution.
4) Other processes of the same user doesn't suffer from "occasional"
execve() failures.

If it is correct, then I like the patch :) It does RLIMIT_NPROC
enforcement without mixing other execve() calls like -ow patch did.

See minor suggestions about the comments below.


> Signed-off-by: NeilBrown <ne...@suse.de>

Acked-by: Vasiliy Kulikov <seg...@openwall.com>

> diff --git a/fs/exec.c b/fs/exec.c
> index 6075a1e..dfe9c61 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1434,6 +1434,17 @@ static int do_execve_common(const char *filename,
> bool clear_in_exec;
> int retval;
>
> + if (current->flags & PF_NPROC_EXCEEDED) {
> + /* setuid noticed that RLIMIT_NPROC was
> + * exceeded, but didn't fail as that easily
> + * leads to privileges leaking.

It doesn't lead to privileges leaking as is, it is a threat of buggy
programs not checking return code of syscalls dropping privileges (setuid here).

The comment about why all this bustle is needed is better to put here
because the check is logically belongs to set_user().

Specifically, I'd mention that (1) we don't want setuid() to fail while
having enough privileges and (2) RLIMIT_NPROC can be still harmlessly
checked for programs doing setuid()+execve() if the error code is
deferred to the execve stage.

> + current->flags |= PF_NPROC_EXCEEDED;
>
> free_uid(new->user);
> new->user = new_user;

Thanks,

--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

Solar Designer

unread,
Jul 15, 2011, 9:10:02 AM7/15/11
to
Neil, Vasiliy -

On Fri, Jul 15, 2011 at 11:38:23AM +0400, Vasiliy Kulikov wrote:
> On Fri, Jul 15, 2011 at 17:06 +1000, NeilBrown wrote:
> > How about this then?
>
> AFAIU, with this patch:
>
> 1) setuid() doesn't fail in NPROC exceed case.
> 2) NPROC is forced on execve() after setuid().
> 3) execve() fails only if NPROC was exceeded during setuid() execution.
> 4) Other processes of the same user doesn't suffer from "occasional"
> execve() failures.
>
> If it is correct, then I like the patch :)

I'm not convinced this has any advantages over the patch Vasiliy had
posted (which simply moved the RLIMIT_NPROC check), but I don't mind,
with one important correction:

Although in the primary use cases we're considering, setuid() is very
soon followed by execve(), this doesn't always have to be the case.
There may be cases where the process would sleep between these two calls
(for a variety of reasons). It would be surprising to see a process
fail on execve() because of RLIMIT_NPROC when that limit had been
reached, say, days ago and is no longer reached at the time of execve().

Thus, if we go with Neil's proposal (adding/checking an extra flag), we
should also re-check the process count against RLIMIT_NPROC on execve(),
and fail the exec only when both the flag is set and the current process
count is excessive (still or again).

Also, if we go with a patch like this, it brings up the question of
whether the flag should be preserved or reset on fork(). I think it
should be preserved.

> It does RLIMIT_NPROC
> enforcement without mixing other execve() calls like -ow patch did.

"Mixing other execve() calls" (4 on the list above) is not obviously
undesirable. Thus, either Vasiliy's original patch or Neil's patch with
the addition mentioned above would be fine by me.

Thanks,

Alexander

Stephen Smalley

unread,
Jul 15, 2011, 10:20:02 AM7/15/11
to
On Fri, 2011-07-15 at 11:38 +0400, Vasiliy Kulikov wrote:
> Neil,
>
> On Fri, Jul 15, 2011 at 17:06 +1000, NeilBrown wrote:
> > How about this then?
>
> AFAIU, with this patch:
>
> 1) setuid() doesn't fail in NPROC exceed case.
> 2) NPROC is forced on execve() after setuid().
> 3) execve() fails only if NPROC was exceeded during setuid() execution.
> 4) Other processes of the same user doesn't suffer from "occasional"
> execve() failures.
>
> If it is correct, then I like the patch :) It does RLIMIT_NPROC
> enforcement without mixing other execve() calls like -ow patch did.

Does this have implications for Android's zygote model? There you have
a long running uid 0 / all caps process (the zygote), which forks itself
upon receiving a request to spawn an app and then calls setgroups();
setrlimit(); setgid(); setuid(); assuming the limits and credentials of
the app but never does an exec at all, as it is just loading the app's
class and executing it from memory.

Also, can't setuid() fail under other conditions, e.g. ENOMEM upon
prepare_creds() allocation failure? Is it ever reasonable for a program
to not check setuid() for failure? Certainly there are plenty of
examples of programs not doing that, but it isn't clear that this is a
bug in the kernel.

--
Stephen Smalley
National Security Agency

Vasiliy Kulikov

unread,
Jul 15, 2011, 11:30:02 AM7/15/11
to
Hi Stephen,

On Fri, Jul 15, 2011 at 09:58 -0400, Stephen Smalley wrote:
> Does this have implications for Android's zygote model? There you have
> a long running uid 0 / all caps process (the zygote), which forks itself
> upon receiving a request to spawn an app and then calls

> setgroups();
> setrlimit(); setgid(); setuid();

Is RLIMIT_NPROC forced in your model and setuid() is expected to fail
because of NPROC exceeding? If no, then it is not touched at all.

> Also, can't setuid() fail under other conditions, e.g. ENOMEM upon
> prepare_creds() allocation failure? Is it ever reasonable for a program
> to not check setuid() for failure? Certainly there are plenty of
> examples of programs not doing that, but it isn't clear that this is a
> bug in the kernel.

This thing is better to discuss separately, after the patch is applied.
Shorter, in current implementation it is not possible as mm layer tries
to alloc small structures (less than 8 pages) indefinitely. cred and
user_struct are less than 64kb, so ENOMEM is impossible.

Thanks,

--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

Stephen Smalley

unread,
Jul 15, 2011, 4:00:02 PM7/15/11
to
On Fri, 2011-07-15 at 19:26 +0400, Vasiliy Kulikov wrote:
> Hi Stephen,
>
> On Fri, Jul 15, 2011 at 09:58 -0400, Stephen Smalley wrote:
> > Does this have implications for Android's zygote model? There you have
> > a long running uid 0 / all caps process (the zygote), which forks itself
> > upon receiving a request to spawn an app and then calls
>
> > setgroups();
> > setrlimit(); setgid(); setuid();
>
> Is RLIMIT_NPROC forced in your model and setuid() is expected to fail
> because of NPROC exceeding? If no, then it is not touched at all.

I don't know what their intent is. But it is an example of a case where
moving the enforcement from setuid() to a subsequent execve() causes the
check to never get applied. As to whether or not they care, I don't
know. An app that calls fork() repeatedly will still be stopped, but an
app that repeatedly connects to the zygote and asks to spawn another
instance of itself would be unlimited.

OTOH, the current RLIMIT_NPROC check and lack of setuid() error checking
has been a repeated issue for Android.

--
Stephen Smalley
National Security Agency

--

NeilBrown

unread,
Jul 21, 2011, 12:20:01 AM7/21/11
to
On Fri, 15 Jul 2011 15:54:43 -0400 Stephen Smalley <s...@tycho.nsa.gov> wrote:

> On Fri, 2011-07-15 at 19:26 +0400, Vasiliy Kulikov wrote:
> > Hi Stephen,
> >
> > On Fri, Jul 15, 2011 at 09:58 -0400, Stephen Smalley wrote:
> > > Does this have implications for Android's zygote model? There you have
> > > a long running uid 0 / all caps process (the zygote), which forks itself
> > > upon receiving a request to spawn an app and then calls
> >
> > > setgroups();
> > > setrlimit(); setgid(); setuid();
> >
> > Is RLIMIT_NPROC forced in your model and setuid() is expected to fail
> > because of NPROC exceeding? If no, then it is not touched at all.
>
> I don't know what their intent is. But it is an example of a case where
> moving the enforcement from setuid() to a subsequent execve() causes the
> check to never get applied. As to whether or not they care, I don't
> know. An app that calls fork() repeatedly will still be stopped, but an
> app that repeatedly connects to the zygote and asks to spawn another
> instance of itself would be unlimited.
>
> OTOH, the current RLIMIT_NPROC check and lack of setuid() error checking
> has been a repeated issue for Android.
>

So where does this leave us? Between a rock and a hard place?

It says to me that moving the check from set_user to execve is simply
Wrong(TM). It may be convenient and do TheRightThing in certain common
cases, but it also can do the Wrong thing in other cases and I don't think
that is an acceptable trade off.

Having setuid succeed when it should fail is simply incorrect.

The problem - as we all know - is that user space doesn't always check error
status properly. If we were to look for precedent I would point to SIGPIPE.
The only reason for that to exist is because programs don't always check that
a "write" succeeds so we have a mechanism to kill off processes that don't
check the error status and keep sending.

I would really like to apply that to this problem ... but that has already
been suggested (years ago) and found wanting. Maybe we can revisit it?

The value of the SIGPIPE approach (whether it is SIGPIPE or SIGXCPU or
SIGVEC ... if only there were a SIGXNPROC) is that the program remains in
complete control. It can find out if the NPROC limit has been exceeded at
the "right" time.


The only other solution that I can think of which isn't "Wrong(TM)" is my
first patch which introduced PF_SETUSER_FAILED.
With this patch setuid() still fails if it should, so the calling process
still remains in control. But if it fails to exercise that control, the
kernel steps in.

Vasiliy didn't like that because it allows a process to ignore the setuid
failure, perform some in-process activity as root when expecting it to be as
some-other-user, and only fails when execve is attempted - possibly too late.

Against this I ask: what exactly is our goal here?
Is it to stop all possible abuses? I think not. That is impossible.
Is it to stop certain known and commonly repeated errors? I think so. That
is clearly valuable.

We know (Thanks to Solar Designer's list) that unchecked setuid followed by
execve is a commonly repeated error, so trapping that can be justified.
Do we know that accessing the filesystem first is a commonly repeated error?
If not, there is no clear motive to deal with that problem.
If, however, it is then maybe a check for PF_SETUSER_FAILED in
inode_permission() would be the right thing.

Or maybe we just set PF_SETUSER_FAILED and leave it up to some security
module to decide what to disable or report when that is set?

In short: I don't think there can be a solution that is both completely
correct and completely safe. I would go for "as correct as possible" with
"closes common vulnerabilities".

NeilBrown

Solar Designer

unread,
Jul 21, 2011, 8:50:02 AM7/21/11
to
On Thu, Jul 21, 2011 at 02:09:36PM +1000, NeilBrown wrote:
> On Fri, 15 Jul 2011 15:54:43 -0400 Stephen Smalley <s...@tycho.nsa.gov> wrote:
> > On Fri, 2011-07-15 at 19:26 +0400, Vasiliy Kulikov wrote:
> > > On Fri, Jul 15, 2011 at 09:58 -0400, Stephen Smalley wrote:
> > > > Does this have implications for Android's zygote model? There you have
> > > > a long running uid 0 / all caps process (the zygote), which forks itself
> > > > upon receiving a request to spawn an app and then calls
> > >
> > > > setgroups();
> > > > setrlimit(); setgid(); setuid();
> > >
> > > Is RLIMIT_NPROC forced in your model and setuid() is expected to fail
> > > because of NPROC exceeding? If no, then it is not touched at all.
> >
> > I don't know what their intent is. But it is an example of a case where
> > moving the enforcement from setuid() to a subsequent execve() causes the
> > check to never get applied. As to whether or not they care, I don't
> > know. An app that calls fork() repeatedly will still be stopped, but an
> > app that repeatedly connects to the zygote and asks to spawn another
> > instance of itself would be unlimited.
> >
> > OTOH, the current RLIMIT_NPROC check and lack of setuid() error checking
> > has been a repeated issue for Android.
>
> So where does this leave us? Between a rock and a hard place?

Maybe. I just took a look at Android's Zygote code, as found via Google
Code Search. This appears to be the relevant place:

http://www.google.com/codesearch#atE6BTe41-M/vm/native/dalvik_system_Zygote.c&q=forkAndSpecialize&type=cs&l=439

As we can see, Stephen's description of the sequence of calls is indeed
correct. Also, the return value from setuid() is checked here. :-)

As to which rlimits are actually set, I don't know. This appears to be
configured at a much higher level:

http://www.google.com/codesearch#uX1GffpyOZk/core/java/com/android/internal/os/ZygoteConnection.java&q=ZygoteConnection.java&type=cs&l=247

--rlimit=r,c,m tuple of values for setrlimit() call.

I have no idea whether this --rlimit argument is ever supplied and with
what settings. The intent of supporting it could well be other than
making use of RLIMIT_NPROC specifically.

> It says to me that moving the check from set_user to execve is simply
> Wrong(TM). It may be convenient and do TheRightThing in certain common
> cases, but it also can do the Wrong thing in other cases and I don't think
> that is an acceptable trade off.

I disagree. There's nothing wrong in having the check on execve(),
especially not if we combine it with a check of your proposed flag set
on setuid() (only fail execve() when the flag is set and RLIMIT_NPROC is
still or again exceeded).

> Having setuid succeed when it should fail is simply incorrect.

As far as I'm aware, no standard says that setuid() should fail if it
would exceed RLIMIT_NPROC for the target user. There's a notion of
"appropriate privileges", but what these are is implementation-defined
and it was hardly meant to include rlimits.

> The problem - as we all know - is that user space doesn't always check error
> status properly. If we were to look for precedent I would point to SIGPIPE.
> The only reason for that to exist is because programs don't always check that
> a "write" succeeds so we have a mechanism to kill off processes that don't
> check the error status and keep sending.
>
> I would really like to apply that to this problem ... but that has already
> been suggested (years ago) and found wanting. Maybe we can revisit it?
>
> The value of the SIGPIPE approach (whether it is SIGPIPE or SIGXCPU or
> SIGVEC ... if only there were a SIGXNPROC) is that the program remains in
> complete control. It can find out if the NPROC limit has been exceeded at
> the "right" time.

I don't mind having setuid() signal (and by default actually kill) a
process if it would exceed RLIMIT_NPROC. However, I doubt that others
will agree. BTW, as I told Vasiliy on the kernel-hardening list, I
think we should revisit the "can't happen" memory allocation failure in
set_user() _after_ we have dealt with the RLIMIT_NPROC issue. I would
support the killing of process with SIGKILL or SIGSEGV (as opposed to
return -EAGAIN) on that "can't happen" condition (which might become
possible in a further revision of the code, so better safe than sorry).
Let's actually revisit this later, after having the most important fix
applied.

> The only other solution that I can think of which isn't "Wrong(TM)" is my
> first patch which introduced PF_SETUSER_FAILED.
> With this patch setuid() still fails if it should, so the calling process
> still remains in control. But if it fails to exercise that control, the
> kernel steps in.
>
> Vasiliy didn't like that because it allows a process to ignore the setuid
> failure, perform some in-process activity as root when expecting it to be as
> some-other-user, and only fails when execve is attempted - possibly too late.

I am with Vasiliy on this.

> Against this I ask: what exactly is our goal here?
> Is it to stop all possible abuses? I think not. That is impossible.
> Is it to stop certain known and commonly repeated errors? I think so. That
> is clearly valuable.

Not checking the return value from setuid() and proceeding to do other
work is a known and commonly repeated error. As to whether it is also
common for that other work to involve risky syscalls other than execve(),
I expect that it is, although I did not research this.

> We know (Thanks to Solar Designer's list) that unchecked setuid followed by
> execve is a commonly repeated error, so trapping that can be justified.
> Do we know that accessing the filesystem first is a commonly repeated error?
> If not, there is no clear motive to deal with that problem.
> If, however, it is then maybe a check for PF_SETUSER_FAILED in
> inode_permission() would be the right thing.
>
> Or maybe we just set PF_SETUSER_FAILED and leave it up to some security
> module to decide what to disable or report when that is set?

I feel that we'd be inventing something more complicated yet worse than
simply moving the check would be.

Here's my current proposal:

1. Apply Vasiliy's patch to move the RLIMIT_NPROC check from setuid() to
execve(), optionally enhanced with setting PF_SETUSER_FAILED on
would-be-failed setuid() and checking this flag in execve() (in addition
to repeating the RLIMIT_NPROC check).

2. With a separate patch, add a prctl() to read the PF_SETUSER_FAILED flag.
Android will be able to use this if it wants to.

Yes, this might break RLIMIT_NPROC for Android (I wrote "might" because
I have no idea if it actually sets that specific limit or not) until it
learns to use the new prctl(). But I think that's fine, and it is not a
reason for us to introduce more complexity into the kernel, yet make our
security hardening change more limited. There was never a guarantee (in
any standard or piece of documentation) that setuid() would fail on
exceeding RLIMIT_NPROC, and the Android/Zygote code might not actually
rely on that anyway (there's no clear indication that it does;
RLIMIT_NPROC is not in the code, nor is it mentioned in any comment).

> In short: I don't think there can be a solution that is both completely
> correct and completely safe. I would go for "as correct as possible" with
> "closes common vulnerabilities".

Maybe, and if so I think that one I proposed above falls in this
category as well, but it closes more vulnerabilities (and/or does so
more fully).

Thanks,

Alexander

Linus Torvalds

unread,
Jul 21, 2011, 2:30:03 PM7/21/11
to
On Thu, Jul 21, 2011 at 5:48 AM, Solar Designer <so...@openwall.com> wrote:
>
> Maybe, and if so I think that one I proposed above falls in this
> category as well, but it closes more vulnerabilities (and/or does so
> more fully).

I think we could have a pretty simple approach that "works in
practice": retain the check at setuid() time, but make it a higher
limit.

IOW, the logic is that we have two competing pressures:

(a) we should try to avoid failing on setuid(), because there is a
real risk that the setuid caller doesn't really check the failure case
and opens itself up for a security problem

and

(b) never failing setuid at all is in itself a security problem,
since it can lead to DoS attacks in the form of excessive resource use
by one user.

But the sane (intelligent) solution to that is to say that we *PREFER*
to fail in execve(), but that at some point a failure in setuid() is
preferable to no failure at all. After all, we have no hard knowledge
that there is any actual setuid() issue. Neither generally does the
user (iow, look at this whole discussion where intelligent people
simply have different inputs depending on "what could happen").

So it really seems like the natural approach would be to simply fail
*earlier* on execve() and fork(). That will catch most cases, and has
no downsides. But if we notice that we are in a situation where some
privileged user can be tricked into forking a lot and doing setuid(),
then at that point the setuid() path becomes relevant.

IOW, I'd suggest simply making the rule be that "setuid() allows 10%
more users than the limit technically says". It's not a guarantee, but
it means that in order to hit the problem, you need to have *both* a
setuid application that allows unconstrained user forking *and*
doesn't check the setuid() return value.

Put another way: a user cannot force the "we're at the edge of the
setuid() limit" on its own by just forking - the user will be stopped
10% before the setuid() failure case can ever trigger.

Is this some "guarantee of nothing bad can ever happen"? No. If you
have bad setuid applications, you will have problems. But it's a "you
need to really work harder at it and you need to find more things to
go wrong", which is after all what real security is all about.

No?

Linus

Solar Designer

unread,
Jul 21, 2011, 3:50:03 PM7/21/11
to
On Thu, Jul 21, 2011 at 11:21:07AM -0700, Linus Torvalds wrote:
> I think we could have a pretty simple approach that "works in
> practice": retain the check at setuid() time, but make it a higher
> limit.
>
> IOW, the logic is that we have two competing pressures:
>
> (a) we should try to avoid failing on setuid(), because there is a
> real risk that the setuid caller doesn't really check the failure case
> and opens itself up for a security problem
>
> and
>
> (b) never failing setuid at all is in itself a security problem,
> since it can lead to DoS attacks in the form of excessive resource use
> by one user.

I don't recall anyone stating (b) the way you did above (or sufficiently
similar). I wouldn't consider setuid() never failing to be a security
problem. BTW, some people consider setuid() failing on RLIMIT_NPROC
kernel "brokenness", which applications have to "work around":

http://www.openwall.com/lists/musl/2011/07/21/3

"I'm aware of course that some interfaces *can* fail for nonstandard
reasons under Linux (dup2 and set*uid come to mind), and I've tried to
work around these and shield applications from the brokenness..."

So opinions on setuid() failing vary, whereas (a) is clear - there have
been vulnerabilities caused by setuid() failing.

The DoS that you mention doesn't necessarily have to be dealt with by
setuid() failing on RLIMIT_NPROC (nor on a higher limit). It could also
be dealt with by checking the limit on execve(), like we've been doing
on shared web hosting servers for years, and, if desired, by letting
applications like Android/Zygote check for the condition themselves via
a new prctl() (or they can simply pass an extra fork(), although that's
a bit costly).

> IOW, I'd suggest simply making the rule be that "setuid() allows 10%
> more users than the limit technically says". It's not a guarantee, but
> it means that in order to hit the problem, you need to have *both* a
> setuid application that allows unconstrained user forking *and*
> doesn't check the setuid() return value.
>
> Put another way: a user cannot force the "we're at the edge of the
> setuid() limit" on its own by just forking - the user will be stopped
> 10% before the setuid() failure case can ever trigger.

For a malicious user, this merely adds the task of triggering a race
condition - have a sufficient number of processes accumulate in the
between setuid() and execve() state. If the program in question can be
made to sleep, this may be trivial to do. Otherwise, it may require
very rapid requests (automated) and high system load.

(BTW, 10% of 0 would be 0, which would allow for attacks that are as
simple as they're now, but that's an implementation detail. Of course,
you'd actually add some constant as well.)

> Is this some "guarantee of nothing bad can ever happen"? No. If you
> have bad setuid applications, you will have problems. But it's a "you
> need to really work harder at it and you need to find more things to
> go wrong", which is after all what real security is all about.
>
> No?

I generally support having multiple layers of security even if some are
non-perfect, but in this case we have a problem that we can _fully_ deal
with rather than merely make attacks harder.

So my proposal remains to go with Vasiliy's trivial patch and maybe add
a few things on top of it as I mentioned in my previous message.

Thanks,

Alexander

Vasiliy Kulikov

unread,
Jul 25, 2011, 1:20:02 PM7/25/11
to
The patch http://lkml.org/lkml/2003/7/13/226 introduced an RLIMIT_NPROC

check in set_user() to check for NPROC exceeding via setuid() and
similar functions. Before the check there was a possibility to greatly
exceed the allowed number of processes by an unprivileged user if the
program relied on rlimit only. But the check created new security
threat: many poorly written programs simply don't check setuid() return
code and believe it cannot fail if executed with root privileges. So,
the check is removed in this patch because of too often privilege
escalations related to buggy programs.

The NPROC can still be enforced in the common code flow of daemons
spawning user processes. Most of daemons do fork()+setuid()+execve().

The check introduced in execve() (1) enforces the same limit as in
setuid() and (2) doesn't create similar security issues.

Neil Brown suggested to track what specific process has exceeded the
limit by setting PF_NPROC_EXCEEDED process flag. With the change only
this process would fail on execve(), and other processes' execve()
behaviour is not changed.

Solar Designer suggested to re-check whether NPROC limit is still
exceeded at the moment of execve(). If the process was sleeping for
days between set*uid() and execve(), and the NPROC counter step down
under the limit, the deferred execve() failure because NPROC limit was
exceeded days ago would be unexpected.

Similar check was introduced in -ow patches (without the process flag).

Signed-off-by: Vasiliy Kulikov <seg...@openwall.com>
---
fs/exec.c | 13 +++++++++++++

include/linux/sched.h | 1 +
kernel/cred.c | 7 +++----
kernel/sys.c | 13 +++++++++----
4 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 6075a1e..706a213 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1433,6 +1433,19 @@ static int do_execve_common(const char *filename,
struct files_struct *displaced;
bool clear_in_exec;
int retval;


+ const struct cred *cred = current_cred();
+
+ /*

+ * We move the actual failure in case of RLIMIT_NPROC excess from
+ * set*uid() to execve() because too many poorly written programs
+ * don't check setuid() return code. Here we additionally recheck
+ * whether NPROC limit is still exceeded.
+ */
+ if ((current->flags & PF_NPROC_EXCEEDED) &&
+ atomic_read(&cred->user->processes) > rlimit(RLIMIT_NPROC)) {


+ retval = -EAGAIN;
+ goto out_ret;
+ }

retval = unshare_files(&displaced);
if (retval)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 496770a..f024c63 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1759,6 +1759,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
#define PF_DUMPCORE 0x00000200 /* dumped core */
#define PF_SIGNALED 0x00000400 /* killed by a signal */
#define PF_MEMALLOC 0x00000800 /* Allocating memory */
+#define PF_NPROC_EXCEEDED 0x00001000 /* set_user noticed that RLIMIT_NPROC was exceeded */
#define PF_USED_MATH 0x00002000 /* if unset the fpu must be initialized before use */
#define PF_FREEZING 0x00004000 /* freeze in progress. do not account to load */
#define PF_NOFREEZE 0x00008000 /* this thread should not be frozen */
diff --git a/kernel/cred.c b/kernel/cred.c

index 174fa84..52f4342 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -508,11 +508,10 @@ int commit_creds(struct cred *new)


key_fsgid_changed(task);

/* do it
- * - What if a process setreuid()'s and this brings the
- * new uid over his NPROC rlimit? We can check this now
- * cheaply with the new uid cache, so if it matters
- * we should be checking for it. -DaveM
+ * RLIMIT_NPROC limits on user->processes have already been checked
+ * in set_user().
*/

+


alter_cred_subscribers(new, 2);
if (new->user != old->user)

atomic_inc(&new->user->processes);
diff --git a/kernel/sys.c b/kernel/sys.c
index e4128b2..fc40cbc 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -591,11 +591,16 @@ static int set_user(struct cred *new)
if (!new_user)
return -EAGAIN;

+ /*
+ * We don't fail in case of NPROC limit excess here because too many
+ * poorly written programs don't check set*uid() return code, assuming
+ * it never fails if called by root. We may still enforce NPROC limit
+ * for programs doing set*uid()+execve() by harmlessly deferring the
+ * failure to the execve() stage.
+ */


if (atomic_read(&new_user->processes) >= rlimit(RLIMIT_NPROC) &&
- new_user != INIT_USER) {
- free_uid(new_user);
- return -EAGAIN;
- }
+ new_user != INIT_USER)

+ current->flags |= PF_NPROC_EXCEEDED;

free_uid(new->user);
new->user = new_user;

--
1.7.0.4

Solar Designer

unread,
Jul 25, 2011, 7:50:02 PM7/25/11
to
Vasiliy,

On Mon, Jul 25, 2011 at 09:14:23PM +0400, Vasiliy Kulikov wrote:
> @@ -1433,6 +1433,19 @@ static int do_execve_common(const char *filename,
> struct files_struct *displaced;
> bool clear_in_exec;
> int retval;
> + const struct cred *cred = current_cred();
> +
> + /*
> + * We move the actual failure in case of RLIMIT_NPROC excess from
> + * set*uid() to execve() because too many poorly written programs
> + * don't check setuid() return code. Here we additionally recheck
> + * whether NPROC limit is still exceeded.
> + */
> + if ((current->flags & PF_NPROC_EXCEEDED) &&
> + atomic_read(&cred->user->processes) > rlimit(RLIMIT_NPROC)) {
> + retval = -EAGAIN;
> + goto out_ret;
> + }

Do you possibly need:

current->flags &= ~PF_NPROC_EXCEEDED;

somewhere after this point?

I think it's weird to have past set_user() failure affect other than the
very next execve().

Perhaps also reset the flag on fork() because we have an RLIMIT_NPROC
check on fork() anyway.

Thanks,

Alexander

NeilBrown

unread,
Jul 25, 2011, 8:50:01 PM7/25/11
to

So we are hoping that no program uses execvp() or similar... Maybe that is
reasonable but "in for a penny, in for a pound" - I'd fail them all.

I think the flag should only be cleared once we notice that the limit is no
longer exceeded. So clearing the flag can appear *after* the code you quote
above, but not in the middle of it.

>
> Perhaps also reset the flag on fork() because we have an RLIMIT_NPROC
> check on fork() anyway.

I agree it should be cleared here too.

>
> Thanks,
>
> Alexander


But there is still the issue of 'zygot' like services....

Let me try another suggestion. Instead of catching the error in
do_execve_common, how about we catch it in do_mmap_pgoff.
i.e. if the flag is set and an attempt it made to create an executable
mapping, we check the user->processes against the limit then - either failing
or clearing the flag and succeeding.

This will stop an execve, and an attempt to load a shared library and call it.

In the case of 'exec' the process will get a SIGKILL as well, which is
probably a good thing.

Thoughts?

NeilBrown

Solar Designer

unread,
Jul 25, 2011, 9:20:02 PM7/25/11
to

Why? No, we don't, unless I am missing something.

> Maybe that is
> reasonable but "in for a penny, in for a pound" - I'd fail them all.
>
> I think the flag should only be cleared once we notice that the limit is no
> longer exceeded. So clearing the flag can appear *after* the code you quote
> above, but not in the middle of it.

Definitely. In case execve() fails because of the limit, the flag
remains set, so a second execve() by the process will fail too.

> > Perhaps also reset the flag on fork() because we have an RLIMIT_NPROC
> > check on fork() anyway.
>
> I agree it should be cleared here too.

Great. Just to clarify my own words: on fork(), clear the flag in the
child process only.

> But there is still the issue of 'zygot' like services....

Here's my take on it:

1. It is not known (from the discussion so far) whether Android/Zygote
even cares about RLIMIT_NPROC specifically or not. The code is very
generic, usable for any rlimits, and the rationale behind it might have
been to be able to apply certain other limits. I don't know whether or
not there exists a system that actually sets RLIMIT_NPROC via that
mechanism and expects it working.

2. If desired, Android/Zygote will be able to check the
PF_NPROC_EXCEEDED flag, via procfs or via a prctl() interface that we
might introduce. Or it may simply pass an extra fork().

> Let me try another suggestion. Instead of catching the error in
> do_execve_common, how about we catch it in do_mmap_pgoff.
> i.e. if the flag is set and an attempt it made to create an executable
> mapping, we check the user->processes against the limit then - either failing
> or clearing the flag and succeeding.
>
> This will stop an execve, and an attempt to load a shared library and call it.

This sounds too hackish to me, although if others are (unexpectedly) OK
with it, I don't mind.

Thanks,

Alexander

NeilBrown

unread,
Jul 26, 2011, 12:20:02 AM7/26/11
to
On Tue, 26 Jul 2011 05:16:29 +0400 Solar Designer <so...@openwall.com> wrote:


> > Let me try another suggestion. Instead of catching the error in
> > do_execve_common, how about we catch it in do_mmap_pgoff.
> > i.e. if the flag is set and an attempt it made to create an executable
> > mapping, we check the user->processes against the limit then - either failing
> > or clearing the flag and succeeding.
> >
> > This will stop an execve, and an attempt to load a shared library and call it.
>
> This sounds too hackish to me, although if others are (unexpectedly) OK
> with it, I don't mind.

Thanks ... I think :-)

I don't really see that failing mmap is any more hackish than failing execve.

Both are certainly hacks. It is setuid that should fail, but that is
problematic.

We seem to agree that it is acceptable to delay the failure until the process
actually tries to run some code for the user. I just think that
mapping-a-file-for-exec is a more direct measure of "trying to run some code
for the user" than "execve" is.

So they are both hacks, but one it more thorough than the other. In the
world of security I would hope that "thorough" would win.

NeilBrown

Vasiliy Kulikov

unread,
Jul 26, 2011, 11:00:03 AM7/26/11
to
Neil, Solar,

On Tue, Jul 26, 2011 at 14:11 +1000, NeilBrown wrote:
> I don't really see that failing mmap is any more hackish than failing execve.
>
> Both are certainly hacks. It is setuid that should fail, but that is
> problematic.
>
> We seem to agree that it is acceptable to delay the failure until the process
> actually tries to run some code for the user. I just think that
> mapping-a-file-for-exec is a more direct measure of "trying to run some code
> for the user" than "execve" is.
>
> So they are both hacks, but one it more thorough than the other. In the
> world of security I would hope that "thorough" would win.

Well, I don't mind against something more generic than the check in
execve(), however, the usefulness of the check in mmap() is unclear to
me. You want to make more programs fail after setuid(), but does mmap
stops really many programs? Do you know any program doing mmap/dlopen
after setuid() call? What if the program will not do any mmap/dlopen
and e.g. start to handle network connections or do some computations?
I suppose the latter case is much more often than mmap/dlopen.

(Also, reading significant amount of data could be gained via read()
without any mmap.)

More generic place is all resource allocation syscalls, but it start to
be too complex for this sort of things IMO.


Meanwhile, the patch with the cleared PF on successful fork/exec is as
follows:

-------------------------------------------
From: Vasiliy Kulikov <seg...@openwall.com>
Subject: [patch v2] move RLIMIT_NPROC check from set_user() to do_execve_common()

The patch http://lkml.org/lkml/2003/7/13/226 introduced an RLIMIT_NPROC
check in set_user() to check for NPROC exceeding via setuid() and
similar functions. Before the check there was a possibility to greatly
exceed the allowed number of processes by an unprivileged user if the
program relied on rlimit only. But the check created new security
threat: many poorly written programs simply don't check setuid() return
code and believe it cannot fail if executed with root privileges. So,
the check is removed in this patch because of too often privilege
escalations related to buggy programs.

The NPROC can still be enforced in the common code flow of daemons
spawning user processes. Most of daemons do fork()+setuid()+execve().
The check introduced in execve() (1) enforces the same limit as in
setuid() and (2) doesn't create similar security issues.

Neil Brown suggested to track what specific process has exceeded the
limit by setting PF_NPROC_EXCEEDED process flag. With the change only
this process would fail on execve(), and other processes' execve()
behaviour is not changed.

Solar Designer suggested to re-check whether NPROC limit is still
exceeded at the moment of execve(). If the process was sleeping for
days between set*uid() and execve(), and the NPROC counter step down

under the limit, the defered execve() failure because NPROC limit was
exceeded days ago would be unexpected. If the limit is not exceeded
anymore, we clear the flag on both execve() and fork().

Similar check was introduced in -ow patches (without the process flag).

Signed-off-by: Vasiliy Kulikov <seg...@openwall.com>
---

fs/exec.c | 17 +++++++++++++++++


include/linux/sched.h | 1 +
kernel/cred.c | 7 +++----

kernel/fork.c | 2 +-
kernel/sys.c | 13 +++++++++----
5 files changed, 31 insertions(+), 9 deletions(-)

---
diff --git a/fs/exec.c b/fs/exec.c
index 6075a1e..e8b7c1a 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1433,6 +1433,23 @@ static int do_execve_common(const char *filename,


struct files_struct *displaced;
bool clear_in_exec;
int retval;
+ const struct cred *cred = current_cred();
+
+ /*
+ * We move the actual failure in case of RLIMIT_NPROC excess from
+ * set*uid() to execve() because too many poorly written programs
+ * don't check setuid() return code. Here we additionally recheck
+ * whether NPROC limit is still exceeded.
+ */
+ if ((current->flags & PF_NPROC_EXCEEDED) &&
+ atomic_read(&cred->user->processes) > rlimit(RLIMIT_NPROC)) {
+ retval = -EAGAIN;
+ goto out_ret;
+ }

+
+ /* If we steep below the limit, we don't want to make following
+ * execve() fail. */
+ current->flags &= ~PF_NPROC_EXCEEDED;

diff --git a/kernel/fork.c b/kernel/fork.c
index 0276c30..0f44484 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -995,7 +995,7 @@ static void copy_flags(unsigned long clone_flags, struct task_struct *p)
{
unsigned long new_flags = p->flags;

- new_flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER);
+ new_flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER | PF_NPROC_EXCEEDED);
new_flags |= PF_FORKNOEXEC;
new_flags |= PF_STARTING;
p->flags = new_flags;


diff --git a/kernel/sys.c b/kernel/sys.c
index e4128b2..fc40cbc 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -591,11 +591,16 @@ static int set_user(struct cred *new)
if (!new_user)
return -EAGAIN;

+ /*


+ * We don't fail in case of NPROC limit excess here because too many
+ * poorly written programs don't check set*uid() return code, assuming
+ * it never fails if called by root. We may still enforce NPROC limit

+ * for programs doing set*uid()+execve() by harmlessly defering the


+ * failure to the execve() stage.
+ */
if (atomic_read(&new_user->processes) >= rlimit(RLIMIT_NPROC) &&
- new_user != INIT_USER) {
- free_uid(new_user);
- return -EAGAIN;
- }
+ new_user != INIT_USER)
+ current->flags |= PF_NPROC_EXCEEDED;

free_uid(new->user);
new->user = new_user;

---


Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

NeilBrown

unread,
Jul 26, 2011, 10:20:02 PM7/26/11
to
On Tue, 26 Jul 2011 18:48:48 +0400 Vasiliy Kulikov <seg...@openwall.com>
wrote:

> Neil, Solar,


>
> On Tue, Jul 26, 2011 at 14:11 +1000, NeilBrown wrote:
> > I don't really see that failing mmap is any more hackish than failing execve.
> >
> > Both are certainly hacks. It is setuid that should fail, but that is
> > problematic.
> >
> > We seem to agree that it is acceptable to delay the failure until the process
> > actually tries to run some code for the user. I just think that
> > mapping-a-file-for-exec is a more direct measure of "trying to run some code
> > for the user" than "execve" is.
> >
> > So they are both hacks, but one it more thorough than the other. In the
> > world of security I would hope that "thorough" would win.
>
> Well, I don't mind against something more generic than the check in
> execve(), however, the usefulness of the check in mmap() is unclear to
> me. You want to make more programs fail after setuid(), but does mmap
> stops really many programs? Do you know any program doing mmap/dlopen
> after setuid() call? What if the program will not do any mmap/dlopen
> and e.g. start to handle network connections