RFC: disablenetwork facility. (v4)

Michael Stone

unread,

Dec 26, 2009, 8:10:01 PM12/26/09

to

Here's version 4 of my disablenetwork facility and a recap of the significant
design choices so far:

1. Per Ulrich's request, we provide the initial userland interface through
prctl() rather than through *rlimit() (or through sys_disablenetwork()).

2. Per Alan's request, we use the existing security_*() hook callsites to
integrate the access control logic into the networking subsystem.

3. The access control state and logic are now conditionally compiled under
the CONFIG_SECURITY_DISABLENETWORK option. The interface calls return
-ENOSYS when this symbol is not defined.

4. In order to interoperate with as easily as possible with existing LSMs, we
store our state in a new (conditionally compiled) task_struct field named
current->network rather than in current->security. The access control
logic is called directly from the appropriate security_*() hook
implementations in security/security.c, as was done for IMA.

5. Per GeoffX's suggestion, the interface functions now take pointers to user
memory rather than passing the value of the flag field back and forth
directly. This permits prctl(PR_GET_NETWORK) to return an error code.

6. At the moment, we exempt all local networking which requires action by
both the sender and receiver and which has discretionary access control
comparable to regular Unix filesystem DAC.

In practice, this means that we leave all unix sockets, sysv IPC, and
kill()/killpg() alone.

We intercept ptrace() because it's effect on the receiver is "involuntary"
and we intercept socket_create(), socket_bind(), socket_connect(), and
socket_sendmsg() because they're not otherwise access-controlled.

sendmsg() on previously connected sockets is exempted.

7. The documentation, kconfig option, and access control logic are named
"disablenetwork" because that's the name of the functionality. The fact
that it's exposed through prctl is incidental to its purpose and semantics
and may become less exclusively true in the future, e.g., if we decide
that we want a /proc interface for reading the networking restrictions of
other processes.

Further suggestions?

Regards,

Michael

Michael Stone (3):
Security: Add disablenetwork interface. (v4)
Security: Implement disablenetwork semantics. (v4)
Security: Document disablenetwork. (v4)

Documentation/disablenetwork.txt | 84 ++++++++++++++++++++++++++++++++++++++
include/linux/disablenetwork.h | 22 ++++++++++
include/linux/prctl.h | 7 +++
include/linux/prctl_network.h | 7 +++
include/linux/sched.h | 4 ++
kernel/sys.c | 53 ++++++++++++++++++++++++
security/Kconfig | 11 +++++
security/Makefile | 1 +
security/disablenetwork.c | 73 +++++++++++++++++++++++++++++++++
security/security.c | 76 ++++++++++++++++++++++++++++++++--
10 files changed, 333 insertions(+), 5 deletions(-)
create mode 100644 Documentation/disablenetwork.txt
create mode 100644 include/linux/disablenetwork.h
create mode 100644 include/linux/prctl_network.h
create mode 100644 security/disablenetwork.c
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Michael Stone

unread,

Dec 26, 2009, 8:10:02 PM12/26/09

to

Daniel Bernstein has observed [1] that security-conscious userland processes
may benefit from the ability to irrevocably remove their ability to create,
bind, connect to, or send messages except in the case of previously connected
sockets or AF_UNIX filesystem sockets.

This patch provides

* a new configuration option named CONFIG_SECURITY_DISABLENETWORK,
* a new prctl option-pair (PR_SET_NETWORK, PR_GET_NETWORK),
* a new prctl(PR_SET_NETWORK) flag named PR_NETWORK_OFF, and
* a new task_struct flags field named "network"

Signed-off-by: Michael Stone <mic...@laptop.org>
---
include/linux/prctl.h | 7 +++++
include/linux/prctl_network.h | 7 +++++
include/linux/sched.h | 4 +++
kernel/sys.c | 53 +++++++++++++++++++++++++++++++++++++++++
security/Kconfig | 11 ++++++++
5 files changed, 82 insertions(+), 0 deletions(-)
create mode 100644 include/linux/prctl_network.h

diff --git a/include/linux/prctl.h b/include/linux/prctl.h
index a3baeb2..4eb4110 100644
--- a/include/linux/prctl.h
+++ b/include/linux/prctl.h
@@ -102,4 +102,11 @@

#define PR_MCE_KILL_GET 34

+/* Get/set process disable-network flags */
+#define PR_SET_NETWORK 35
+#define PR_GET_NETWORK 36
+# define PR_NETWORK_ON 0
+# define PR_NETWORK_OFF 1
+# define PR_NETWORK_ALL_FLAGS 1
+
#endif /* _LINUX_PRCTL_H */
diff --git a/include/linux/prctl_network.h b/include/linux/prctl_network.h
new file mode 100644
index 0000000..d18f8cb
--- /dev/null
+++ b/include/linux/prctl_network.h
@@ -0,0 +1,7 @@
+#ifndef _LINUX_PRCTL_NETWORK_H
+#define _LINUX_PRCTL_NETWORK_H
+
+extern long prctl_get_network(unsigned long*);
+extern long prctl_set_network(unsigned long*);
+
+#endif /* _LINUX_PRCTL_NETWORK_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index f2f842d..6fcaef8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1403,6 +1403,10 @@ struct task_struct {
#endif
seccomp_t seccomp;

+#ifdef CONFIG_SECURITY_DISABLENETWORK
+ unsigned long network;
+#endif
+
/* Thread group tracking */
u32 parent_exec_id;
u32 self_exec_id;
diff --git a/kernel/sys.c b/kernel/sys.c
index 26a6b73..b48f021 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -35,6 +35,7 @@
#include <linux/cpu.h>
#include <linux/ptrace.h>
#include <linux/fs_struct.h>
+#include <linux/prctl_network.h>

#include <linux/compat.h>
#include <linux/syscalls.h>
@@ -1578,6 +1579,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
else
error = PR_MCE_KILL_DEFAULT;
break;
+ case PR_SET_NETWORK:
+ error = prctl_set_network((unsigned long*)arg2);
+ break;
+ case PR_GET_NETWORK:
+ error = prctl_get_network((unsigned long*)arg2);
+ break;
default:
error = -EINVAL;
break;
@@ -1585,6 +1592,52 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
return error;
}

+#ifdef CONFIG_SECURITY_DISABLENETWORK
+
+long prctl_get_network(unsigned long* user)
+{
+ return put_user(current->network, user);
+}
+
+long prctl_set_network(unsigned long* user)
+{
+ unsigned long network_flags;
+ long ret;
+
+ ret = -EFAULT;
+ if (copy_from_user(&network_flags, user, sizeof(network_flags)))
+ goto out;
+
+ ret = -EINVAL;
+ if (network_flags & ~PR_NETWORK_ALL_FLAGS)
+ goto out;
+
+ /* only dropping access is permitted */
+ ret = -EPERM;
+ if (current->network & ~network_flags)
+ goto out;
+
+ current->network = network_flags;
+ ret = 0;
+
+out:
+ return ret;
+}
+
+#else
+
+long prctl_get_network(unsigned long* user)
+{
+ return -ENOSYS;
+}
+
+long prctl_set_network(unsigned long* user)
+{
+ return -ENOSYS;
+}
+
+#endif /* ! CONFIG_SECURITY_DISABLENETWORK */
+
SYSCALL_DEFINE3(getcpu, unsigned __user *, cpup, unsigned __user *, nodep,
struct getcpu_cache __user *, unused)
{
diff --git a/security/Kconfig b/security/Kconfig
index 226b955..afd7f76 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -137,6 +137,17 @@ config LSM_MMAP_MIN_ADDR
this low address space will need the permission specific to the
systems running LSM.

+config SECURITY_DISABLENETWORK
+ bool "Socket and networking discretionary access control"
+ depends on SECURITY_NETWORK
+ help
+ This enables processes to drop networking privileges via
+ prctl(PR_SET_NETWORK, PR_NETWORK_OFF).
+
+ See Documentation/disablenetwork.txt for more information.
+
+ If you are unsure how to answer this question, answer N.
+
source security/selinux/Kconfig
source security/smack/Kconfig
source security/tomoyo/Kconfig
--
1.6.6.rc2

Michael Stone

unread,

Dec 26, 2009, 8:10:02 PM12/26/09

to

Explain the purpose, implementation, and semantics of the disablenetwork
facility.

Also reference some example userland clients.

Signed-off-by: Michael Stone <mic...@laptop.org>
---

Documentation/disablenetwork.txt | 84 ++++++++++++++++++++++++++++++++++++++
1 files changed, 84 insertions(+), 0 deletions(-)
create mode 100644 Documentation/disablenetwork.txt

diff --git a/Documentation/disablenetwork.txt b/Documentation/disablenetwork.txt
new file mode 100644
index 0000000..c885502
--- /dev/null
+++ b/Documentation/disablenetwork.txt
@@ -0,0 +1,84 @@
+Disablenetwork Purpose
+----------------------
+
+Daniel Bernstein has observed [1] that security-conscious userland processes
+may benefit from the ability to irrevocably remove their ability to create,
+bind, connect to, or send messages except in the case of previously connected
+sockets or AF_UNIX filesystem sockets.
+
+This facility is particularly attractive to security platforms like OLPC
+Bitfrost [2] and to isolation programs like Rainbow [3] and Plash [4] because:
+
+ * it integrates well with standard techniques for writing privilege-separated
+ Unix programs
+
+ * it integrates well with the need to perform limited socket I/O, e.g., when
+ running X clients
+
+ * it's available to unprivileged programs
+
+ * it's a discretionary feature available to all of distributors,
+ administrators, authors, and users
+
+ * its effect is entirely local, rather than global (like netfilter)
+
+ * it's simple enough to have some hope of being used correctly
+
+
+Implementation
+--------------
+
+The initial userland interface for accessing the disablenetwork functionality
+is provided through the prctl() framework via a new pair of options named
+PR_{GET,SET}_NETWORK and a new flag named PR_NETWORK_OFF.
+
+The PR_{GET,SET}_NETWORK options access and modify a new (conditionally
+compiled) task_struct flags field named "network".
+
+Finally, the pre-existing
+
+ security_socket_create(),
+ security_socket_bind(),
+ security_socket_connect(),
+ security_socket_sendmsg(), and
+ security_ptrace_access_check()
+
+security hooks are modified to call the corresponding disablenetwork_*
+discretionary access control functions. These functions return -EPERM or 0 as
+described below.
+
+Semantics
+---------
+
+current->network is a task_struct flags field which is preserved across all
+variants of fork() and exec().
+
+Writes which attempt to clear bits in current->network return -EPERM.
+
+The default value for current->network is named PR_NETWORK_ON and is defined
+to be 0.
+
+Presently, only one flag is defined: PR_NETWORK_OFF.
+
+More flags may be defined in the future if they become needed.
+
+Attempts to set undefined flags result in -EINVAL.
+
+When PR_NETWORK_OFF is set, the disablenetwork security hooks for socket(),
+bind(), connect(), sendmsg(), and ptrace() will return -EPERM or 0.
+
+Exceptions are made for
+
+ * processes manipulating an AF_UNIX socket or,
+ * processes calling sendmsg() on a previously connected socket
+ (i.e. one with msg.msg_name == NULL && msg.msg_namelen == 0) or
+ * processes calling ptrace() on a target process which shares every
+ networking restriction flag set in current->network.
+
+References
+----------
+
+[1]: http://cr.yp.to/unix/disablenetwork.html
+[2]: http://wiki.laptop.org/go/OLPC_Bitfrost
+[3]: http://wiki.laptop.org/go/Rainbow
+[4]: http://plash.beasts.org/
--
1.6.6.rc2

Tetsuo Handa

unread,

Dec 26, 2009, 8:40:01 PM12/26/09

to

Tetsuo Handa wrote
> sendmsg(fd, (struct sockadr *) &addr, sizeof(addr));
I meant
sendto(fd, buffer, len, 0, (struct sockadr *) &addr, sizeof(addr));

Michael Stone wrote:
> +Exceptions are made for

> + * processes calling sendmsg() on a previously connected socket
> + (i.e. one with msg.msg_name == NULL && msg.msg_namelen == 0) or

What should we do for non connection oriented protocols (e.g. UDP)
but destination is already configured by previous connect() request?

struct sockaddr_in addr = { ... };
int fd2 = socket(PF_INET, SOCK_DGRAM, 0);
connect(fd2, (struct sockadr *) &addr, sizeof(addr));
prctl( ... );
sendto(fd2, buffer, len, 0, NULL, 0); /* Should we allow this? */
sendto(fd2, buffer, len, 0, (struct sockadr *) &addr, sizeof(addr)); /* Should we reject this? */

Serge E. Hallyn

unread,

Dec 26, 2009, 10:20:02 PM12/26/09

to

Is there any reason not to handle these in
disablenetwork_security_prctl()
?

Other than that, this looks quite good to me... (No need to
initialize ret=0 in your security_* updates, to get pedantic,
that's all I noticed)

I'll give it a closer look on monday before I ack.

thanks,
-serge

Pavel Machek

unread,

Dec 27, 2009, 3:00:01 AM12/27/09

to

> index 26a6b73..b48f021 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -35,6 +35,7 @@
> #include <linux/cpu.h>
> #include <linux/ptrace.h>
> #include <linux/fs_struct.h>
> +#include <linux/prctl_network.h>
>
> #include <linux/compat.h>
> #include <linux/syscalls.h>

Something seems to be wrong with whitespace here. Damaged patch?

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Tetsuo Handa

unread,

Dec 27, 2009, 3:40:02 AM12/27/09

to

Michael Stone wrote:
> Further suggestions?

I expect that the future figure of this "disablenetwork" functionality becomes
"disablesyscall" functionality.

What about defining two types of masks, one is applied throughout the rest of
the task_struct's lifetime (inheritable mask), the other is cleared when
execve() succeeds (local mask)?

When an application is sure that "I know I don't need to call execve()" or
"I know execve()d programs need not to call ...()" or "I want execve()d
programs not to call ...()", the application sets inheritable mask.
When an application is not sure about what syscalls the execve()d programs
will call but is sure that "I know I don't need to call ...()", the application
sets local mask.

When I started TOMOYO project in 2003, I implemented above two types of masks.
I found that the characteristics of task_struct (i.e. duplicated upon fork(),
modified upon execve(), deleted upon exit()) suits well for implementing
discretionary dropping privileges.

Application writers know better what syscalls the application will call than
application users. I think that combination of policy based access control
(which restricts operations from outside applications, like SELinux, Smack,
TOMOYO) and voluntary access control (which restricts operations from inside
applications, like disablenetwork) is a good choice. Above two types of masks
can give application writers chance to drop unneeded privileges (in other
words, chance to disable unneeded syscalls).

Pavel Machek

unread,

Dec 27, 2009, 3:40:02 AM12/27/09

to

On Sun 2009-12-27 17:36:48, Tetsuo Handa wrote:
> Michael Stone wrote:
> > Further suggestions?
>
> I expect that the future figure of this "disablenetwork" functionality becomes
> "disablesyscall" functionality.
>
> What about defining two types of masks, one is applied throughout the rest of
> the task_struct's lifetime (inheritable mask), the other is cleared when
> execve() succeeds (local mask)?
>
> When an application is sure that "I know I don't need to call execve()" or
> "I know execve()d programs need not to call ...()" or "I want execve()d
> programs not to call ...()", the application sets inheritable mask.
> When an application is not sure about what syscalls the execve()d programs
> will call but is sure that "I know I don't need to call ...()", the application
> sets local mask.

Syscalls are very wrong granularity for security system. But easy to
implement, see seccomp.
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Al Viro

unread,

Dec 27, 2009, 4:00:02 AM12/27/09

to

On Sun, Dec 27, 2009 at 05:36:48PM +0900, Tetsuo Handa wrote:

> Application writers know better what syscalls the application will call than
> application users.

Aren't you forgetting about libc? Seriously, any interface along the
lines of "pass a set of syscall numbers to kernel" is DOA:
* syscall numbers are architecture-dependent
* there are socketcall-style multiplexors (sys_ipc, anyone?)
* libc is free to substitute one for another
* libc is free to do so in arch-specific manner
* libc is free to do so in kernel-revision-specific manner
* libc is free to do so in libc-revision-specific manner
(... and does all of the above)
* new syscalls get added
* e.g. on sparc64 32bit task can issue 64bit syscalls

Valdis.K...@vt.edu

unread,

Dec 27, 2009, 6:30:02 AM12/27/09

to

On Sun, 27 Dec 2009 17:36:48 +0900, Tetsuo Handa said:

> What about defining two types of masks, one is applied throughout the rest of
> the task_struct's lifetime (inheritable mask), the other is cleared when
> execve() succeeds (local mask)?

A mask of permitted syscalls. You've re-invented SECCOMP. ;)

> When an application is sure that "I know I don't need to call execve()" or

OK, you *might* know that. Or more likely you just *think* you know that - ever
had a library routine do an execve() call behind your back?). Or glibc
decides to do a clone2() call behind your back instead of execve(),
except on ARM where it does either a clone_nommu47() or clone_backflip() :)

> "I know execve()d programs need not to call ...()"

Unless you've done a code review of the exec'ed program, you don't know.

The big problem is that it's *not* sufficient to just run an strace or two
of normal runs and proclaim "this is the set of syscalls I need" - you need
to check all the error paths in all the shared libraries too. It's no fun
when a program errors out, tries to do a syslog() of the fact - and then
*that* errors out too, causing the program to go into an infinite loop trying
to report the previous syslog() call just failed...

> "I want execve()d programs not to call ...()",

Congrats - you just re-invented the Sendmail capabilities bug. ;)

This stuff is harder than it looks, especially when you realize that
syscall-granularity is almost certainly not the right security model.

> Application writers know better what syscalls the application will call than
> application users.

But the application user will know better than the writer what *actual*
security constraints need to be applied. "I don't care *what* syscalls the
program uses, it's not allowed to access resource XYZ".

Tetsuo Handa

unread,

Dec 27, 2009, 6:50:01 AM12/27/09

to

Pavel Machek wrote:
> Syscalls are very wrong granularity for security system. But easy to
> implement, see seccomp.

Quoting from http://en.wikipedia.org/wiki/Seccomp
> It allows a process to make a one-way transition into a "secure" state where
> it cannot make any system calls except exit(), read() and write() to
> already-open file descriptors.

I think seccomp() is too much restricted to apply for general applications.
Most applications will need some other syscalls in addition to exit(), read()
and write(). Most applications cannot use seccomp().

What I want to do is similar to seccomp(), but allows userland process to
forbid some syscalls like execve(), mount(), chroot(), link(), unlink(),
socket(), bind(), listen() etc. selectively.

Al Viro wrote:
>> Application writers know better what syscalls the application will call than
>> application users.
>

> Aren't you forgetting about libc? Seriously, any interface along the
> lines of "pass a set of syscall numbers to kernel" is DOA:

We can determine what syscalls we need from application's code and libc's code,
can't we?
Otherwise, I think disablenetwork can't be used. If we simply forbid use of
sendmsg() because application's code doesn't use UDP sockets, DNS requests (UDP
port 53) by libc's code cannot be handled and applications will stop working.
We must know what syscalls we need to allow when we forbid some syscalls.

> * syscall numbers are architecture-dependent
> * there are socketcall-style multiplexors (sys_ipc, anyone?)
> * libc is free to substitute one for another
> * libc is free to do so in arch-specific manner
> * libc is free to do so in kernel-revision-specific manner
> * libc is free to do so in libc-revision-specific manner
> (... and does all of the above)
> * new syscalls get added
> * e.g. on sparc64 32bit task can issue 64bit syscalls

I don't mean to tell the kernel by "syscall numbers".
To be able to handle socketcall-style multiplexors, we will need a hook inside
each syscall functions.

Al Viro

unread,

Dec 27, 2009, 7:20:02 AM12/27/09

to

On Sun, Dec 27, 2009 at 08:49:17PM +0900, Tetsuo Handa wrote:

> We can determine what syscalls we need from application's code and libc's code,
> can't we?

_Which_ libc? And no, I'm not talking about other implementations; even
glibc is more than enough. It changes and it *does* change the set of
syscalls used to implement given function.

I'm not disagreeing about what's seccomp worth, BTW.

Andi Kleen

unread,

Dec 27, 2009, 7:50:02 AM12/27/09

to

On Sun, Dec 27, 2009 at 05:36:48PM +0900, Tetsuo Handa wrote:

> Michael Stone wrote:
> > Further suggestions?
>
> I expect that the future figure of this "disablenetwork" functionality becomes
> "disablesyscall" functionality.

That's basically apparmor. I believe it has been re-submitted
recently.

-Andi
--
a...@linux.intel.com -- Speaking for myself only.

Serge E. Hallyn

unread,

Dec 27, 2009, 10:00:01 AM12/27/09

to

Quoting Tetsuo Handa (penguin...@I-love.SAKURA.ne.jp):
> Pavel Machek wrote:
> > Syscalls are very wrong granularity for security system. But easy to
> > implement, see seccomp.
>
> Quoting from http://en.wikipedia.org/wiki/Seccomp
> > It allows a process to make a one-way transition into a "secure" state where
> > it cannot make any system calls except exit(), read() and write() to
> > already-open file descriptors.
>
> I think seccomp() is too much restricted to apply for general applications.
> Most applications will need some other syscalls in addition to exit(), read()
> and write(). Most applications cannot use seccomp().
>
> What I want to do is similar to seccomp(), but allows userland process to
> forbid some syscalls like execve(), mount(), chroot(), link(), unlink(),
> socket(), bind(), listen() etc. selectively.

The nice thing about the disablenetwork module is that (AFAICS so far)
it actually is safe for an unprivileged user to do. I can't think of
any setuid-root software which, if started with restricted-network by
an unprivileged user, would become unsafe rather than simply failing (*1).

Adding syscalls becomes much scarier.

-serge

*1 - Michael Stone, without looking back over the patches, do you also
restrict opening netlink sockets? Should we worry about preventing
an error message from being sent to the audit daemon?

Michael Stone

unread,

Dec 27, 2009, 10:50:01 AM12/27/09

to

Serge Hallyn writes:

> Michael Stone, without looking back over the patches, do you also
> restrict opening netlink sockets?

The current version of the patch restricts netlink sockets which were not bound
to an address before calling disablenetwork(). It does so primarily on the
grounds of "fail safe", due to the following sorts of discussions and
observations:

http://kerneltrap.org/mailarchive/linux-kernel/2007/12/7/493793/thread
http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2006-5461
http://marc.info/?l=linux-kernel&m=125448727130301&w=2

I would be willing to entertain an argument that some kind of exemption for
AF_NETLINK ought to be introduced but I'd need to hear some more details before
I could implement it and before I could satisfy myself that the result was
sound.

> Should we worry about preventing an error message from being sent to the
> audit daemon?

I've considered the matter and I don't see much to worry about at this time.

The first reason why I'm not too worried is that anyone in a position to use
disablenetwork for nefarious purposes is also probably able to use ptrace(),
kill(), and/or LD_PRELOAD to similar ends.

The second reason why I'm not too worried is that I believe it to be
straightforward to use the pre-existing MAC frameworks to prevent individually
important processes from dropping networking privileges.

Do you have a specific concern in mind not addressed by either of these
observations?

Regards,

Michael

Serge E. Hallyn

unread,

Dec 27, 2009, 11:00:02 AM12/27/09

to

Quoting Michael Stone (mic...@laptop.org):
> Serge Hallyn writes:
>
>> Michael Stone, without looking back over the patches, do you also
>> restrict opening netlink sockets?
>
> The current version of the patch restricts netlink sockets which were not bound
> to an address before calling disablenetwork(). It does so primarily on the
> grounds of "fail safe", due to the following sorts of discussions and
> observations:
>
> http://kerneltrap.org/mailarchive/linux-kernel/2007/12/7/493793/thread
> http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2006-5461
> http://marc.info/?l=linux-kernel&m=125448727130301&w=2
>
> I would be willing to entertain an argument that some kind of exemption for
> AF_NETLINK ought to be introduced but I'd need to hear some more details before
> I could implement it and before I could satisfy myself that the result was
> sound.
>
>> Should we worry about preventing an error message from being sent to the
>> audit daemon?
>
> I've considered the matter and I don't see much to worry about at this
> time.

I don't either, because I don't know of userspace programs other than
/bin/login (and I'm guessing at that) using netlink to send audit messages,
but I could be wrong, and there could be "important software" out there
that does so.

> The first reason why I'm not too worried is that anyone in a position to use
> disablenetwork for nefarious purposes is also probably able to use ptrace(),
> kill(), and/or LD_PRELOAD to similar ends.

How do you mean? I thought that disabling network was a completely
unprivileged operation? And subsequently executing a setuid-root
application won't reset the flag.

> The second reason why I'm not too worried is that I believe it to be
> straightforward to use the pre-existing MAC frameworks to prevent individually
> important processes from dropping networking privileges.
>
> Do you have a specific concern in mind not addressed by either of these
> observations?

Near as I can tell the worst one could do would be to prevent remote
admins from getting useful audit messages, which could give you unlimited
time to keep re-trying the server, on your quest to a brute-force attack
of some sort, i.e. restarting the server with random passwords, and now
no audit msg about the wrong password gets generated, so you're free to
exhaust the space of valid passwords.

Not saying I'm all that worried about it - just something that came to
mind.

-serge

Michael Stone

unread,

Dec 27, 2009, 11:00:02 AM12/27/09

to

Tetsuo Handa wrote:

> I expect that the future figure of this "disablenetwork" functionality
> becomes "disablesyscall" functionality.

Thanks for the suggestion, but I'm not interested in pursuing a generic
disablesyscall facility at this time.

Michael

Michael Stone

unread,

Dec 27, 2009, 11:30:01 AM12/27/09

to

Tetsuo Handa wrote:
> Michael Stone wrote:
>> +Exceptions are made for
>> + * processes calling sendmsg() on a previously connected socket
>> + (i.e. one with msg.msg_name == NULL && msg.msg_namelen == 0) or
>
> What should we do for non connection oriented protocols (e.g. UDP)
> but destination is already configured by previous connect() request?
>
> struct sockaddr_in addr = { ... };
> int fd2 = socket(PF_INET, SOCK_DGRAM, 0);
> connect(fd2, (struct sockadr *) &addr, sizeof(addr));
> prctl( ... );
> sendto(fd2, buffer, len, 0, NULL, 0); /* Should we allow this? */

This call should be allowed. man 2 send states that this call is equivalent to

send(fd2, buffer, len, 0)

and, since the flags field is 0, to

write(fd2, buffer, len)

which are both clearly permitted.

> sendto(fd2, buffer, len, 0, (struct sockadr *) &addr, sizeof(addr)); /* Should we reject this? */

It is reasonable to reject this call because it is not required to be
equivalent to a send() or write() call.

(In fact, the current UDP implementation unconditionally uses the addr argument
passed to sendmsg in favor of the socket addr whenever it exists.)

However, it might also be reasonable to permit the send when the call would be
equivalent to a permitted send() or write() call: for example, when the socket
destination address exists and matches the message destination address.

Unfortunately, I don't think that we have an appropriate generic address
comparison function for deciding this equivalence. Am I mistaken?

Regards, and thanks for your questions,

Michael

P.S. - I would be happy to include a brief explanatory comment in the code
defining this test since this is by far the most complex test in the patch. Any
suggestions on what it might say?

Michael Stone

unread,

Dec 27, 2009, 11:40:02 AM12/27/09

to

Serge Hallyn writes:

> Michael Stone writes:
>> The first reason why I'm not too worried is that anyone in a position to use
>> disablenetwork for nefarious purposes is also probably able to use ptrace(),
>> kill(), and/or LD_PRELOAD to similar ends.
>
> How do you mean?

I meant that, with the current interface, to set disablenetwork for pid P, you
have either be pid P or to have been one of P's ancestors. In either case, you
have lots of opportunity to mess with P's environment.

> I thought that disabling network was a completely
> unprivileged operation? And subsequently executing a setuid-root
> application won't reset the flag.

Correct and correct for the current patches.

>> The second reason why I'm not too worried is that I believe it to be
>> straightforward to use the pre-existing MAC frameworks to prevent individually
>> important processes from dropping networking privileges.
>>
>> Do you have a specific concern in mind not addressed by either of these
>> observations?
>
> Near as I can tell the worst one could do would be to prevent remote
> admins from getting useful audit messages, which could give you unlimited
> time to keep re-trying the server, on your quest to a brute-force attack
> of some sort, i.e. restarting the server with random passwords, and now
> no audit msg about the wrong password gets generated, so you're free to
> exhaust the space of valid passwords.
>
> Not saying I'm all that worried about it - just something that came to
> mind.

I'll think about it further. Fortunately, there's no need to be hasty. :)

Michael

Pavel Machek

unread,

Dec 27, 2009, 1:10:01 PM12/27/09

to

> >I thought that disabling network was a completely
> >unprivileged operation? And subsequently executing a setuid-root
> >application won't reset the flag.
>
> Correct and correct for the current patches.

Then you are introducing a security problem. User can now mess with
setuid0 binary.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Pavel Machek

unread,

Dec 27, 2009, 2:10:02 PM12/27/09

to

Hi!

> > I think seccomp() is too much restricted to apply for general applications.
> > Most applications will need some other syscalls in addition to exit(), read()
> > and write(). Most applications cannot use seccomp().
> >
> > What I want to do is similar to seccomp(), but allows userland process to
> > forbid some syscalls like execve(), mount(), chroot(), link(), unlink(),
> > socket(), bind(), listen() etc. selectively.
>
> The nice thing about the disablenetwork module is that (AFAICS so far)
> it actually is safe for an unprivileged user to do. I can't think of
> any setuid-root software which, if started with restricted-network by
> an unprivileged user, would become unsafe rather than simply

> failing.

"I can't see" is not strong enough test, I'd say.

For example, I can easily imagine something like pam falling back to
local authentication when network is unavailable. If you disable
network for su...

It would be also extremely easy to DoS something like sendmail -- if
it forks into background and then serves other users' requests.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Michael Stone

unread,

Dec 28, 2009, 1:10:01 AM12/28/09

to

Pavel Machek wrote:
> "I can't see" is not strong enough test, I'd say.
>
> For example, I can easily imagine something like pam falling back to
> local authentication when network is unavailable. If you disable
> network for su...
>
> It would be also extremely easy to DoS something like sendmail -- if
> it forks into background and then serves other users' requests.

Pavel,

I spent some time this afternoon reflecting on the scenarios that you sketched
above. This reflection resulted in three concrete responses:

1. Anyone depending on their network for authentication already has to deal
with availability faults. disablenetwork doesn't change anything
fundamental there.

2. Anyone able to use disablenetwork to block a privilege escalation via su
or to influence sendmail will be able to disrupt the privilege escalation
or mail transfer by manipulating the ancestors of su or sendmail in plenty
of other ways including, for example, via ptrace(), kill(), manipulation
of PATH, manipulation of X11 events and IPC, manipulation of TTYs, and so
on.

3. As I pointed out before, disablenetwork _is_ controlled by a build-time
configuration option, its use _is_ still subject to any existing MAC
policy, and it _is_ easy to control for simply by talking to a
known-unrestricted process over an unrestricted IPC channel like a Unix
socket.

and a short meta-response:

As I see it, the whole point of isolation facilities like disablenetwork is
to convert _nasty_ faults like secrecy and integrity faults into _local_
availability faults. Consequently, it is completely unsurprising that we
can't meet stringent availability goals via isolation without either relaxing
our other security goals or falling back to strong assumptions about the
state of our initial environment.

However, we should not therefore ignore the bottom line which is that
the additional isolation enabled by disablenetwork represents an excellent
security risk/reward tradeoff for many people and should be made more widely
available to these people on these grounds.

Regards,

Michael

P.S. - Thanks again for your assistance in thinking through these scenarios and
in refining the security case for the disablenetwork feature.

Pavel Machek

unread,

Dec 28, 2009, 5:20:03 AM12/28/09

to

Hi!

> >"I can't see" is not strong enough test, I'd say.
> >
> >For example, I can easily imagine something like pam falling back to
> >local authentication when network is unavailable. If you disable
> >network for su...
> >
> >It would be also extremely easy to DoS something like sendmail -- if
> >it forks into background and then serves other users' requests.
>
> Pavel,
>
> I spent some time this afternoon reflecting on the scenarios that you sketched
> above. This reflection resulted in three concrete responses:

There's more. You are introducing security holes. Don't.

> 1. Anyone depending on their network for authentication already has to deal
> with availability faults. disablenetwork doesn't change anything
> fundamental there.

Actually it does. Policy may well be "If the network works, noone can
log in locally, because administration is normally done over
network. If the network fails, larger set of people is allowed in,
because something clearly went wrong and we want anyone going around
to fix it."

> 2. Anyone able to use disablenetwork to block a privilege escalation via su
> or to influence sendmail will be able to disrupt the privilege escalation
> or mail transfer by manipulating the ancestors of su or sendmail in plenty
> of other ways including, for example, via ptrace(), kill(), manipulation
> of PATH, manipulation of X11 events and IPC, manipulation of TTYs, and so
> on.

Please learn how setuid works. No, you can't ptrace su. Yes, su has to
deal with poisonous PATH. setuid programs are generally carefully
written to handle _known_ problems. You are adding disablenetwork that
they'll need to handle, and that's bad.

> 3. As I pointed out before, disablenetwork _is_ controlled by a build-time
> configuration option, its use _is_ still subject to any existing MAC

CONFIG_ADD_SECURITY_HOLE is still bad idea.

You should either:

a) make disablenetwork reset to "enablenetwork" during setuid exec

or

b) disallow setuid execs for tasks that have network disabled.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Valdis.K...@vt.edu

unread,

Dec 28, 2009, 9:40:02 AM12/28/09

to

On Mon, 28 Dec 2009 11:10:06 +0100, Pavel Machek said:

> a) make disablenetwork reset to "enablenetwork" during setuid exec

That won't work either. If you only make it 'setuid==0' binaries, you still
break 'setuid-FOO' binaries that require the net. If you just check the setuid
bit, it allows a trivial escape by creating a setuid-yourself binary and using
that to exec something else (now with network access, because we apparently
don't have a way to remember the previous setting).

> b) disallow setuid execs for tasks that have network disabled.

This is probably safer...

Michael Stone

unread,

Dec 28, 2009, 11:30:02 AM12/28/09

to

> Pavel,
>
> I spent some time this afternoon reflecting on the scenarios that you sketched
> above. This reflection resulted in three concrete responses:

> There's more. You are introducing security holes. Don't.

My users care about things like stopping compromised processes from leaking
their documents or sending spam; they do not use the network in the ways you
seem to be concerned about.

> 1. Anyone depending on their network for authentication already has to deal
> with availability faults. disablenetwork doesn't change anything
> fundamental there.

> Actually it does. Policy may well be "If the network works, noone can
> log in locally, because administration is normally done over
> network. If the network fails, larger set of people is allowed in,
> because something clearly went wrong and we want anyone going around
> to fix it."

Have you actually seen this security policy in real life? I ask because it
seems quite far-fetched to me. Networks are just too easy to attack. Seems to
me, from this casual description, that you're just asking to be ARP- or
DNS-poisoned and rooted with this one.

> 2. Anyone able to use disablenetwork to block a privilege escalation via su
> or to influence sendmail will be able to disrupt the privilege escalation
> or mail transfer by manipulating the ancestors of su or sendmail in plenty
> of other ways including, for example, via ptrace(), kill(), manipulation
> of PATH, manipulation of X11 events and IPC, manipulation of TTYs, and so
> on.

> Please learn how setuid works.

I am quite familiar with how setuid works. I was suggesting a number of ways to
modify the behavior of su's *ancestors*; not su. (I apoligize that my writing
was not more clear on this point.)

In retrospect, substituting "abort()" for "disablenetwork()" better explains my
point. Who can call disablenetwork() to cause a problem who can't just as well
have called abort() or kill(0, SIGSTOP) at the same time?

Still, I take your point that there may be people out there who have written
configurations for setuid executables under the belief that their networks are
reliable and available in the presence of attackers.

>> 3. As I pointed out before, disablenetwork _is_ controlled by a build-time
>> configuration option, its use _is_ still subject to any existing MAC
>
> CONFIG_ADD_SECURITY_HOLE is still bad idea.

Perhaps for your users. For me and for the users of my software, having
CONFIGURE_SECURITY_DISABLENETWORK is far better than not having it because it
permits us to close many other far more significant holes.

> You should either:

> a) make disablenetwork reset to "enablenetwork" during setuid exec

> b) disallow setuid execs for tasks that have network disabled.

Neither of these work. The first is incorrect because a disablenetwork'ed
process could transmit anything it wants through ping. The second is one that I
feel is unsafe because I don't feel that I can predict its consequences.

However, there's a third option that I think might work. What do you think of
treating being network-disabled the same way we treat RLIMIT_NOFILE? That is,
what about:

c) permit capable processes (such as euid 0) to remove networking restrictions
by further calls to prctl(PR_SET_NETWORK)?

Regards,

Michael

Serge E. Hallyn

unread,

Dec 28, 2009, 1:20:02 PM12/28/09

to

Quoting Pavel Machek (pa...@ucw.cz):
> Hi!
>
> > > I think seccomp() is too much restricted to apply for general applications.
> > > Most applications will need some other syscalls in addition to exit(), read()
> > > and write(). Most applications cannot use seccomp().
> > >
> > > What I want to do is similar to seccomp(), but allows userland process to
> > > forbid some syscalls like execve(), mount(), chroot(), link(), unlink(),
> > > socket(), bind(), listen() etc. selectively.
> >
> > The nice thing about the disablenetwork module is that (AFAICS so far)
> > it actually is safe for an unprivileged user to do. I can't think of
> > any setuid-root software which, if started with restricted-network by
> > an unprivileged user, would become unsafe rather than simply
> > failing.
>
> "I can't see" is not strong enough test, I'd say.
>
> For example, I can easily imagine something like pam falling back to
> local authentication when network is unavailable. If you disable
> network for su...
>
> It would be also extremely easy to DoS something like sendmail -- if
> it forks into background and then serves other users' requests.

But you can just as easily unplug the network cable (or flip the
wireless switch). So in the case of authentication, either your
nsswitch.conf says to fall back to files, or it doesn't - in either
case it's what you expected...

Michael, a few possibilities have been brought up. To toss in
one more, what about making a separate capability CAP_NETWORK_REENABLE,
and requiring that in order to reset prctl(PR_SET_NETWORK) or
whatever? Then if you don't want to allow that, you can drop
CAP_NETWORK_REENABLE from your bounding set, and you'll never
be able to reset it.

It's not just a silly extra step - dropping CAP_NETWORK_REENABLE
from your bounding set requires privilege, so now we are at
least saying that it takes privilege to allow a less-privileged
process to stop a more-privileged process from regaining network
requires privilege later.

That specific example isn't good - the problem is, someone has to sit
there knowing to do the prctl(PR_SET_NETWORK). It doesn't do anything
to prevent the nefarious unprivileged user from doing
prctl(PR_DROP_NETWORK) and then running a setuid-root daemon, and if the
daemon doesn't know about PR_SET_NETWORK then it still will run without
priv.

So I prefer a similar but slightly different construct - the key
being requiring privilege to be able to say "it's ok to deny privileged
software network". We can either

1. introduce a sysctl which says whether or not setuid-root
re-enables network by default,
or
2. add an extra bit to your per-task network data, which
again says "for root we re-enable network" or not.
or heck
3. make it a boot flag.

In any case, the idea would be that on your bitfrost systems init, or
some early privileged process, would say "for me and all my children,
if an unprivileged process does PR_DROP_NETWORK then that holds even
for setuid-root programs.

-serge

Serge E. Hallyn

unread,

Dec 28, 2009, 1:20:02 PM12/28/09

to

But I'm going to wait to see a response (or new patch) about moving
this code to disablenetwork_security_prctl().

Pavel Machek

unread,

Dec 28, 2009, 4:00:03 PM12/28/09

to

On Mon 2009-12-28 09:37:24, Valdis.K...@vt.edu wrote:
> On Mon, 28 Dec 2009 11:10:06 +0100, Pavel Machek said:
>
> > a) make disablenetwork reset to "enablenetwork" during setuid exec
>
> That won't work either. If you only make it 'setuid==0' binaries, you still
> break 'setuid-FOO' binaries that require the net. If you just check the setuid
> bit, it allows a trivial escape by creating a setuid-yourself binary and using
> that to exec something else (now with network access, because we apparently
> don't have a way to remember the previous setting).

it is really only required for binaries setuid to someone else, but
that would be too ugly. (Plus, as someone said, ping is great for
leaking data out.)

Pavel Machek

unread,

Dec 28, 2009, 4:10:01 PM12/28/09

to

>> 1. Anyone depending on their network for authentication already has to deal
>> with availability faults. disablenetwork doesn't change anything
>> fundamental there.
>
>> Actually it does. Policy may well be "If the network works, noone can
>> log in locally, because administration is normally done over
>> network. If the network fails, larger set of people is allowed in,
>> because something clearly went wrong and we want anyone going around
>> to fix it."
>
> Have you actually seen this security policy in real life? I ask because it
> seems quite far-fetched to me. Networks are just too easy to attack. Seems to
> me, from this casual description, that you're just asking to be ARP- or
> DNS-poisoned and rooted with this one.

It is little far-fetched; but it would make sense on 'secure' network,
where you can't do arp attacks. You can bet that someone out there
does it.

>> Please learn how setuid works.
>
> I am quite familiar with how setuid works. I was suggesting a number of ways to
> modify the behavior of su's *ancestors*; not su. (I apoligize that my writing
> was not more clear on this point.)
>
> In retrospect, substituting "abort()" for "disablenetwork()" better explains my
> point. Who can call disablenetwork() to cause a problem who can't just as well
> have called abort() or kill(0, SIGSTOP) at the same time?

You can't sigstop sendmail, right?

> Still, I take your point that there may be people out there who have written
> configurations for setuid executables under the belief that their networks are
> reliable and available in the presence of attackers.

Good.

>> You should either:
>
>> a) make disablenetwork reset to "enablenetwork" during setuid exec
>> b) disallow setuid execs for tasks that have network disabled.
>
> Neither of these work. The first is incorrect because a disablenetwork'ed
> process could transmit anything it wants through ping. The second is one that I
> feel is unsafe because I don't feel that I can predict its
>consequences.

Ok, you could just remove ping from your systems, but I see, b) is
better solution.

Why do you think it is unsafe? Its clearly secure, at least from 'user
can't attack other users on shared machine'...

It may cause some failures, but given how rare setuid stuff is these
days, I doubt it.

> However, there's a third option that I think might work. What do you think of
> treating being network-disabled the same way we treat RLIMIT_NOFILE? That is,
> what about:
>
> c) permit capable processes (such as euid 0) to remove networking restrictions
> by further calls to prctl(PR_SET_NETWORK)?

I'm afraid that does not help... you'd have to audit/modify existing
setuid programs to keep system secure. No-no.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Valdis.K...@vt.edu

unread,

Dec 28, 2009, 4:30:02 PM12/28/09

to

On Mon, 28 Dec 2009 11:31:09 EST, Michael Stone said:

> > Actually it does. Policy may well be "If the network works, noone can
> > log in locally, because administration is normally done over
> > network. If the network fails, larger set of people is allowed in,
> > because something clearly went wrong and we want anyone going around
> > to fix it."
>
> Have you actually seen this security policy in real life? I ask because it
> seems quite far-fetched to me. Networks are just too easy to attack. Seems to
> me, from this casual description, that you're just asking to be ARP- or
> DNS-poisoned and rooted with this one.

Actually, I've seen a *lot* of similar "if things fail, more people can login
to fix it" policies. For instance, a default Fedora box will require a root
password to login - but if you can't get to multi-user because the box is
scrozzled and boot into single user, no root password is required.

So if you're using Fedora and LDAP authentication, and reboot to single-user
to fix an LDAP issue, you do in fact have that policy in real life...

(And before you start shouting "but that's a stupid config to make root login
depend on LDAP", note that for many Microsoft Active Directory shops, they add
machines with Administrator rights for an Active Directory group, and then
disable local Administrator, which is exactly the same thing... Stupid or
not, it's a *very* common policy.)

Valdis.K...@vt.edu

unread,

Dec 28, 2009, 4:40:02 PM12/28/09

to

On Mon, 28 Dec 2009 21:55:11 +0100, Pavel Machek said:

> it is really only required for binaries setuid to someone else, but
> that would be too ugly. (Plus, as someone said, ping is great for
> leaking data out.)

Hmm... How is it "too ugly"? It's just a 'euid != uid' comparison? Or
am I missing some contortion required?

Bryan Donlan

unread,

Dec 28, 2009, 4:40:02 PM12/28/09

to

On Mon, Dec 28, 2009 at 3:55 PM, Pavel Machek <pa...@ucw.cz> wrote:
> On Mon 2009-12-28 09:37:24, Valdis.K...@vt.edu wrote:
>> On Mon, 28 Dec 2009 11:10:06 +0100, Pavel Machek said:
>>
>> > a) make disablenetwork reset to "enablenetwork" during setuid exec
>>
>> That won't work either. �If you only make it 'setuid==0' binaries, you still
>> break 'setuid-FOO' binaries that require the net. If you just check the setuid
>> bit, it allows a trivial escape by creating a setuid-yourself binary and using
>> that to exec something else (now with network access, because we apparently
>> don't have a way to remember the previous setting).
>
>
> it is really only required for binaries setuid to someone else, but
> that would be too ugly. (Plus, as someone said, ping is great for
> leaking data out.)

No, this is not sufficient; one needs only to find a setuid process
that can be convinced to run a program with the original (pre-suid)
privileges. For example, one could invoke gpg (older versions setuid
so it can lock memory, executes user code for the passphrase input
agent) or pulseaudio (in some cases setuid to go realtime, loads user
plugins) or screen (setuid for sharing sessions, obviously executes
user programs) or at/cron (did you remember to deny access to these?)
...

Or one can target a non-root setuid program that may have security
holes - how about nethack?

While in modern distros these uses of setuid may be rare, they can
exist, and under the old security model they were safe. Not so
anymore. As such, re-enabling network access upon executing a setuid
program is not acceptable.

That said, I do feel this is a separate issue. The process should
first drop its ability to suid; then it can freely apply additional
restrictions without there being any risk of breaking setuid
applications.

In short, how does this sound:
* Add an API to allow processes to permanently revoke their own
ability to gain privileges from setuid-exec
* Add this disablenetwork facility, conditional on dropping
setuid-exec abilities

This also paves the way for:
* Allow processes that have dropped said suid ability to freely create
new namespaces (and chroot)

Which, combined with doing whatever audits are necessary to allow
cross-network-namespace uses of unix domain sockets, actually
eliminates the need for the disablenetwork API. :)

David Wagner

unread,

Dec 28, 2009, 5:20:02 PM12/28/09

to

Pavel writes:
> Policy may well be "If the network works, noone can
> log in locally, because administration is normally done over
> network. If the network fails, larger set of people is allowed in,
> because something clearly went wrong and we want anyone going around
> to fix it."

Michael Stone writes:
> Have you actually seen this security policy in real life?

Pavel responds:
> Actually, I've seen a *lot* of similar [..] policies.

OK, so to translate: it sounds like the answer is No, you
haven't seen this policy in real life.

More to the point, the real question is whether this policy
is embedded in code anywhere such that Michael's mechanism would
introduce a new security hole, and if so, whether the cost of
that would outweigh the benefit of his mechanism. I think the
answer is, No, no one even has a plausible story for how this
policy might appear in some legacy executable that would then
be newly subvertible due to Michael Stone's policy. First off,
this sounds like a pretty wacko policy. Second, it's unlikely
to be embedded in a setuid-root executable that anyone can
execute. Third, if there were such a setuid-root executable
(which I've already argued is in fantasy land, but let's suppose
pigs could fly and such a thing existed in practice), there are
other ways to attack it: such as by using up all available
file descriptors and then forking and execing that executable.
Fourth, even if it existed, it would be a very rare one-off
site-specific thing. But most importantly, we're way off the
rails onto speculation. Of course you can always imagine some
conceivable scenario under which any new mechanism might have
unwanted side effects -- that's just the nature of any complex
system -- but I don't see any reasonable argument at all that
Michael's mechanism will cause more harm than good.

Bottom lien: I agree with Michael Stone. I think this
objection is weak.

I think what Michael is trying to do has the potential to be very
valuable and should be supported, and this is not a convincing
argument against it.

Valdis.K...@vt.edu

unread,

Dec 28, 2009, 7:00:02 PM12/28/09

to

On Mon, 28 Dec 2009 22:10:28 GMT, David Wagner said:

> Pavel responds:
> > Actually, I've seen a *lot* of similar [..] policies.

No, that was me, not Pavel.

> OK, so to translate: it sounds like the answer is No, you
> haven't seen this policy in real life.

As I point out in subsequent paragraphs, I *have* in fact seen systems that
implement essentially the same semantics.

> More to the point, the real question is whether this policy
> is embedded in code anywhere such that Michael's mechanism would
> introduce a new security hole, and if so, whether the cost of
> that would outweigh the benefit of his mechanism.

Granted - but "is it embedded in code anywhere" is different from "does
anybody use such a policy". The semantic is used by many shops, but isn't
embedded in code anywhere that I know of - it's always done via system
config.

Take a standard stock Fedora install. Configure it to use LDAP for user
authentication. Screw up the config with a typo. Reboot to single user to fix,
you get a # prompt without entering a password. You now have Pavel's policy:

> "If the network works, noone can
> log in locally, because administration is normally done over
> network. If the network fails, larger set of people is allowed in,
> because something clearly went wrong and we want anyone going around
> to fix it."

So yes, it *does* exist in the real world - unless there are *zero* Fedora
boxes that use LDAP, and haven't manually changed the init config to run
sulogin on a single-user boot.

> I think what Michael is trying to do has the potential to be very
> valuable and should be supported, and this is not a convincing
> argument against it.

In case you didn't notice, I've been on the "this looks sane if we can actually
do it correctly" side of the fence. Michael's code isn't something I'd
personally run, because it doesn't address the threat models I worry about -
but I see the value for those people who do worry about them.

And hey, maybe we'll get lucky and we'll get the ability to have a stacker
that does MAC LSM + targeted add-ons, because in my world, the easiest fix
is the distributed SELinux 'mls' policy plus an add-on LSM - even though it's
likely that most of the stuff I want *could* be done via SELinux policy, in
many cases 20 lines of C is easier than retrofitting a policy patch or getting
the policy patch pushed upstream...

Out of curiosity, any of the other security types here ever included "getting
the damned semi-clued auditor who insists on cargo-cult checklists out of your
office" as part of your threat model? Only a half-smiley on this one...

David Wagner

unread,

Dec 28, 2009, 7:50:01 PM12/28/09

to

> Granted - but "is it embedded in code anywhere" is different from "does
> anybody use such a policy".

OK, that's fine. But "is it embedded in code anywhere" is the
question that matters to this thread. And not just in code "anywhere",
but in code in a setuid-root executable that would become vulnerable if
Michael's scheme is introduced (yet is not already vulnerable today).

To refresh: the original context was that Pavel objected to Michael's
disablenetwork scheme on the basis that it could introduce new security
vulnerabilities, if some setuid-root program somewhere is written to
enforce a specific policy. So, to my way of thinking, the only reason to
spend any energy on this question at all is to determine whether Pavel's
objection is persuasive. I'm arguing the objection is not persuasive.
And I'm suggesting that we focus on the question that matters, rather
than getting distracted by imprecise phrasing Michael may have used when
he asked the question.

(Sorry for the misattribution, by the way; I attempted to clean up
the quoting and made it worse! Sorry.)

> Out of curiosity, any of the other security types here ever included "getting
> the damned semi-clued auditor who insists on cargo-cult checklists out of your
> office" as part of your threat model? Only a half-smiley on this one...

Sure. :-) One big catch-phrase that covers a lot of this ground is
'compliance'. Recently there seems to be considerable discussion
among security professionals about the tension between 'compliance' and
'security', and whether increased attention to 'compliance' benefits
'security' or is in the end a distraction.

Michael Stone

unread,

Dec 28, 2009, 8:30:02 PM12/28/09

to

Pavel Machek wrote:
>> index 26a6b73..b48f021 100644
>> --- a/kernel/sys.c
>> +++ b/kernel/sys.c
>> @@ -35,6 +35,7 @@
>> #include <linux/cpu.h>
>> #include <linux/ptrace.h>
>> #include <linux/fs_struct.h>
>> +#include <linux/prctl_network.h>
>>
>> #include <linux/compat.h>
>> #include <linux/syscalls.h>
>

> Something seems to be wrong with whitespace here. Damaged patch?

Nope; kernel/sys.c has a newline there:

http://repo.or.cz/w/linux-2.6.git/blob/HEAD:/kernel/sys.c#l36

Shall I remove it?

Michael

Michael Stone

unread,

Dec 28, 2009, 8:30:02 PM12/28/09

to

Serge Hallyn wrote:
> Is there any reason not to handle these in
> disablenetwork_security_prctl()
> ?

I'm afraid that I don't understand what you're asking here. Are you just saying
that you'd like me to rename the functions that implement the interface logic
to something that begins with "disablenetwork_"?

Regards, and thanks for all your help,

Michael

Valdis.K...@vt.edu

unread,

Dec 28, 2009, 8:40:01 PM12/28/09

to

On Tue, 29 Dec 2009 00:42:55 GMT, David Wagner said:

> Sure. :-) One big catch-phrase that covers a lot of this ground is
> 'compliance'. Recently there seems to be considerable discussion
> among security professionals about the tension between 'compliance' and
> 'security', and whether increased attention to 'compliance' benefits
> 'security' or is in the end a distraction.

There's what I can administer effectively, there's what the most junior admin
in my shop can administer effectively, what the DBA's will accept, and what
our auditors insist on. Every once in a while, all four actually line up,
but then my alarm clock goes off and it's another Monday in the office :)

Michael Stone

unread,

Dec 29, 2009, 12:00:01 AM12/29/09

to

Serge,

I think that Pavel's point, at its strongest and most general, could be
rephrased as:

"Adding *any* interesting isolation facility to the kernel breaks backwards
compatibility for *some* program [in a way that violates security goals]."

The reason is the one that I identified in my previous note:

"The purpose of isolation facilities is to create membranes inside which
grievous security faults are converted into availability faults."

The question then is simply:

"How do we want to deal with the compatibility-breaking changes created by
introducing new isolation facilities?"

So far, I've seen the following suggestions:

a) setuid restores pre-isolation semantics

- Doesn't work for me because it violates the security guarantee of the
isolation primitive

b) setuid is an escape-hatch

- Probably the cleanest in the long-run

- Doesn't, by itself, suffice for Pavel since it violates backwards
compatibility

c) signal to the kernel through a privileged mechanism that
backwards-incompatible isolation may or may not be used

- No problems seen so far.

I would be happy with (c), assuming we can agree on an appropriate signalling
mechanism and default.

So far, two defaults have been proposed:

default-deny incompatible isolation (Pavel)
default-permit incompatible isolation (Michael)

So far, several signalling mechanisms have been proposed:

1) enabling a kernel config option implies default-permit

- My favorite; apparently insufficient for Pavel?

2) default-deny; disablesuid grants disablenetwork

- "disablesuid" is my name for the idea of dropping the privilege of
exec'ing setuid binaries

- Suggested by Pavel and supported by several others.

- I think it has the same backwards-compatibility problem as
disablenetwork: disablesuid is an isolation primitive.

3) default-deny; dropping a capability from the bounding set grants "permit"

- Suggested by Serge; seems nicely fine-grained but rather indirect

4) default-deny; setting a sysctl implies permit

- Suggested by Serge; works fine for me

5) default-deny; setting a kernel boot argument implies permit

- Suggested by Serge; I like the sysctl better.

I am happiest with (1) and, if (1) isn't good enough, with (4).

Pavel, what do you think of (4)?

Regards,

Michael

P.S. - I'd be happy to know more about existing precedent on introducing
compatibility-breaking changes if any comes to mind. (For example, how were the
Linux-specific rlimits handled?)

P.P.S. - On a completely unrelated note: imagine trying to use SELinux (or your
favorite MAC framework) to restrict the use of prctl(PR_SET_NETWORK,
PR_NETWORK_OFF). Am I right that sys_prctl() contains a
time-of-check-to-time-of-use (TOCTTOU) race (with security_task_prctl() as the
check and with prctl_set_network() as the use) as a result of the actual
argument being passed by address rather than by value?

Serge E. Hallyn

unread,

Dec 29, 2009, 12:30:02 AM12/29/09

to

Quoting Michael Stone (mic...@laptop.org):
> Serge Hallyn wrote:
> >Is there any reason not to handle these in
> > disablenetwork_security_prctl()
> >?
>
> I'm afraid that I don't understand what you're asking here. Are you just saying
> that you'd like me to rename the functions that implement the interface logic
> to something that begins with "disablenetwork_"?

Sorry, for some reason I was thinking you still had a
security_operations *disablenetwork_ops. But you don't, so
my comment is silly. Please ignore.

> Regards, and thanks for all your help,

My pleasure, and thanks for the patience.

-serge

Serge E. Hallyn

unread,

Dec 29, 2009, 1:00:02 AM12/29/09

to

default under what conditions? any setuid? setuid-root?

> 2) default-deny; disablesuid grants disablenetwork
>
> - "disablesuid" is my name for the idea of dropping the privilege of
> exec'ing setuid binaries
>
> - Suggested by Pavel and supported by several others.
>
> - I think it has the same backwards-compatibility problem as
> disablenetwork: disablesuid is an isolation primitive.
>
> 3) default-deny; dropping a capability from the bounding set grants "permit"
>
> - Suggested by Serge; seems nicely fine-grained but rather indirect

Actually I think it's the opposite of what you said here: so long as the
capability is in pE, you can regain network. So it would require a privileged
process early on (like init or login) to remove the capability from the
bounding set (bc doing so requires CAP_SETPCAP), but once that was done,
the resulting process and it's children could not require the capability,
and, without the capability, could not regain network. Point being that
privileged userspace had to actively allow userspace to trap a setuid root
binary without networking.

I think during exec we can simply check for this capability in pE, and
if present then re-enable network if turned off. Then setuid-root binaries
will raise that bit (if it's in the bounding set) automatically. Now,
that means setuid-nonroot binaries will not reset network. Though you
could make that happen by doing setcap cap_net_allownet+pe /the/file.
Does that suffice?

> 4) default-deny; setting a sysctl implies permit
>
> - Suggested by Serge; works fine for me

That still leaves the question of when we re-allow network. Any
setuid?

> 5) default-deny; setting a kernel boot argument implies permit
>
> - Suggested by Serge; I like the sysctl better.
>
> I am happiest with (1) and, if (1) isn't good enough, with (4).
>
> Pavel, what do you think of (4)?
>
> Regards,
>
> Michael
>
> P.S. - I'd be happy to know more about existing precedent on introducing
> compatibility-breaking changes if any comes to mind. (For example, how were the
> Linux-specific rlimits handled?)
>
> P.P.S. - On a completely unrelated note: imagine trying to use SELinux (or your
> favorite MAC framework) to restrict the use of prctl(PR_SET_NETWORK,
> PR_NETWORK_OFF). Am I right that sys_prctl() contains a
> time-of-check-to-time-of-use (TOCTTOU) race (with security_task_prctl() as the
> check and with prctl_set_network() as the use) as a result of the actual
> argument being passed by address rather than by value?

I'm probably misunderstanding your question, but just in case I'm not: the
answer is that you wouldn't use the prctl interface anyway. You would strictly
use domain transitions. Instead of doing prctl(PR_SET_NETWORK, PR_NETWORK_OFF)
you would move yourself from the user_u:user_r:network_allowed domain to the
user_u:user_r:network_disallowed domain.

-serge

Serge E. Hallyn

unread,

Dec 29, 2009, 1:10:01 AM12/29/09

to

Well, this is possible now, but requires privilege: Remove
any bit not in pP from the bounding set.

Removing the requirement for privilege to do so has some conerns. Do we
force a task to then run with absolutely no capabilities, or can it just
stop itself from gaining new ones? If the latter, then we are close to
re-raising the sendmail-capabilities bug. The main difference would be
that you must already have the capbilities you want to keep, but I'm
not convinced that's sufficient.

A function which can be called without privilege, which empties out
all capability sets and the bounding set, that may be safe. Still might
cause a setuid-root app which is running as root but with no privilege
to be confused and mess up the system...

> * Add this disablenetwork facility, conditional on dropping
> setuid-exec abilities
>
> This also paves the way for:
> * Allow processes that have dropped said suid ability to freely create
> new namespaces (and chroot)
>
> Which, combined with doing whatever audits are necessary to allow
> cross-network-namespace uses of unix domain sockets, actually
> eliminates the need for the disablenetwork API. :)
> --

> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in

Eric W. Biederman

unread,

Dec 29, 2009, 6:10:02 AM12/29/09

to

Michael Stone <mic...@laptop.org> writes:

> Serge,
>
> I think that Pavel's point, at its strongest and most general, could be
> rephrased as:
>
> "Adding *any* interesting isolation facility to the kernel breaks backwards
> compatibility for *some* program [in a way that violates security goals]."

*some* privileged program.

> The reason is the one that I identified in my previous note:
>
> "The purpose of isolation facilities is to create membranes inside which
> grievous security faults are converted into availability faults."
>
> The question then is simply:
>
> "How do we want to deal with the compatibility-breaking changes created by
> introducing new isolation facilities?"

You have a very peculiar taxonomy of the suggestions,
that fails to capture the concerns.

I strongly recommend working out a way to disable
setuid exec. Ideally we would use capabilities to
achieve this.

Serge can we have a capability that unprivelged processes
normally have an can drop without privelege?

I can see one of two possible reasons you are avoiding the
suggestion to disable setuid root.
- You have a use for setuid root executables in your contained
environment. If you do what is that use?
- Disabling suid root executables is an indirect path to your
goal.

The problem with the disable_network semantics you want
is that they allow you to perform a denial of service attack
on privileged users. An unprivileged DOS attack is unsuitable
for a general purpose feature in a general purpose kernel.

Your sysctl, your boot option, your Kconfig option all fail
to be viable options for the same reason. Your facility is
only valid in an audited userspace.

Disabling setuid-exec especially to a subset of processes is
valid in an unaudited userspace as it does not allow propagating
the DOS to privileged processes.

Eric

Serge E. Hallyn

unread,

Dec 29, 2009, 10:20:01 AM12/29/09

to

Quoting Eric W. Biederman (ebie...@xmission.com):
> Michael Stone <mic...@laptop.org> writes:
>
> > Serge,
> >
> > I think that Pavel's point, at its strongest and most general, could be
> > rephrased as:
> >
> > "Adding *any* interesting isolation facility to the kernel breaks backwards
> > compatibility for *some* program [in a way that violates security goals]."
>
> *some* privileged program.
>
> > The reason is the one that I identified in my previous note:
> >
> > "The purpose of isolation facilities is to create membranes inside which
> > grievous security faults are converted into availability faults."
> >
> > The question then is simply:
> >
> > "How do we want to deal with the compatibility-breaking changes created by
> > introducing new isolation facilities?"
>
> You have a very peculiar taxonomy of the suggestions,
> that fails to capture the concerns.
>
> I strongly recommend working out a way to disable
> setuid exec. Ideally we would use capabilities to
> achieve this.
>
> Serge can we have a capability that unprivelged processes
> normally have an can drop without privelege?

David Madore suggested such a system several years ago:
http://lkml.org/lkml/2006/9/5/246

I have two comments on the idea:

1. We don't want to complicate the current capabilities
concepts and API, so if we do something like this,
we should make sure not to try to store these
unprivileged capabilities with the current privilege
capabilities.

2. This in itself does nothing to address the problem of
unprivileged tasks denying privilege from privileged
programs, thereby threatening the system.

In my last email last night I detailed a way to use regular
capabilities to make the prctl(PR_SET_NETWORK, PR_DROP_NET)
safer.

We could generalize that a bit:

1. we add a set of 'user_capabilities' like 'network', 'open',
etc, which are the rights to do an unprivileged network socket
create, file open, etc.

2. For each user_capability, we add a new 'CAP_REGAIN_$userpriv'
POSIX capability.

3. When a file is executed, we always add CAP_REGAIN_* to the
file permitted and effective sets. That means that after
exec, they will always be in pE.

4. So long as CAP_REGAIN_foo is in pE toward the end of exec,
we re-enable the user_capability foo.

5. A privileged program can remove CAP_REGAIN_foo from the
capability bounding set. It, and all it's children, will then
not have CAP_REGAIN_foo in pE after exec, so that if userspace
has removed foo from the user_capabilities set, it will not
be returned.

So, again, /bin/login, or pam, or /sbin/init can drop
CAP_REGAIN_* from its bounding set if userspace is designed
with that functionality in mind, in other words the distro or
admin trusts that privileged programs won't ruin the system if
they are denied certain features.

> I can see one of two possible reasons you are avoiding the
> suggestion to disable setuid root.
> - You have a use for setuid root executables in your contained
> environment. If you do what is that use?

I don't think Michael was avoiding that. Rather, we haven't quite
spelled out what it means to disable setuid root, and we haven't
(to my satisfaction) detailed how setuid root would undo the
prctl(PR_SET_NETWORK, PR_DROP_NET) - i.e. is it only on a
privilege-granting setuid-root, or all setuids?

Eric, let me specifically point out a 'disable setuid-root'
problem on linux: root still owns most of the system even when
it's not privileged. So does "disable setuid-root" mean
we don't allow exec of setuid-root binaries at all, or that
we don't setuid to root, or that we just don't raise privileges
for setuid-root?

> - Disabling suid root executables is an indirect path to your
> goal.
>
> The problem with the disable_network semantics you want
> is that they allow you to perform a denial of service attack
> on privileged users. An unprivileged DOS attack is unsuitable
> for a general purpose feature in a general purpose kernel.

Though to be honest I'm still unconvinced that the disablenetwork
is dangerous. I think long-term a more general solution like what
I outlined above might be good, but short-term a sysctl that
turns on or off the ability to drop network would suffice imo.
For ultra-secure sites at three-letter government agencies, we
could also provide a boot arg that disables the feature altogether,
and of course a grub password and IMA/EVM/trusted-boot could
ensure the boot arg isn't messed with.

> Your sysctl, your boot option, your Kconfig option all fail
> to be viable options for the same reason. Your facility is
> only valid in an audited userspace.
>
> Disabling setuid-exec especially to a subset of processes is
> valid in an unaudited userspace as it does not allow propagating
> the DOS to privileged processes.

-serge

Michael Stone

unread,

Dec 29, 2009, 11:10:02 AM12/29/09

to

Eric Biederman writes:
> Serge,

>
> Michael Stone <mic...@laptop.org> writes:
>> I think that Pavel's point, at its strongest and most general, could be
>> rephrased as:
>>
>> "Adding *any* interesting isolation facility to the kernel breaks backwards
>> compatibility for *some* program [in a way that violates security goals]."
>
>*some* privileged program.

Your amendment is a good one; it makes the statement better reflect your and
Pavel's concern. Thanks.

>> The reason is the one that I identified in my previous note:
>>
>> "The purpose of isolation facilities is to create membranes inside which
>> grievous security faults are converted into availability faults."
>>
>> The question then is simply:
>>
>> "How do we want to deal with the compatibility-breaking changes created by
>> introducing new isolation facilities?"

> You have a very peculiar taxonomy of the suggestions,
> that fails to capture the concerns.

Do you agree with my assessment that this is fundamentally a
backwards-compatibility problem with security consequences?

> I strongly recommend working out a way to disable
> setuid exec. Ideally we would use capabilities to
> achieve this.
>

> ...

>
> I can see one of two possible reasons you are avoiding the

> suggestion to disable setuid root...

Heard and understood. I'll start thinking about how to do it (and about what
the consequences might be). However, those aren't my reasons for wariness.

My reasons have to do with history, preparation, and logic:

1. I first began playing with disablenetwork about two years ago and I missed
both the need to restrict ptrace and the fact that the interaction with
privileged executables would be a problem for other people.

2. With disablenetwork, I was already building on clear (if slightly
incomplete) design by djb. We have none of that prep work here.

3. We have definite reasons, laid out by my argument above about the general
compatibility cost of isolation facilities, to suspect that disablesuid
in any form will break at least one other interesting use case. I don't
know what that use case is yet but I'm fairly sure that it exists.

> The problem with the disable_network semantics you want
> is that they allow you to perform a denial of service attack
> on privileged users. An unprivileged DOS attack is unsuitable
> for a general purpose feature in a general purpose kernel.

Then where did rlimits come from? rlimits *can* DoS privileged processes and
people are pretty much okay with the idea. People who are concerned can raise
the rlimits in their privileged processes and get on with life.

Second, as you point out, I am willing to audit and to modify my setuid
executables *in exchange* for having a kernel that changes in useful ways.
Obviously, not everyone is willing to pay this upkeep.

At any rate, I hope all this helps make my position clearer. I'll think more
about your suggestions.

Regards,

Michael

Bryan Donlan

unread,

Dec 29, 2009, 11:10:03 AM12/29/09

to

On Tue, Dec 29, 2009 at 10:11 AM, Serge E. Hallyn <se...@us.ibm.com> wrote:
> Eric, let me specifically point out a 'disable setuid-root'
> problem on linux: root still owns most of the system even when
> it's not privileged. �So does "disable setuid-root" mean
> we don't allow exec of setuid-root binaries at all, or that
> we don't setuid to root, or that we just don't raise privileges
> for setuid-root?

I, for one, think it would be best to handle it exactly like the
nosuid mount option - that is, pretend the file doesn't have any
setuid bits set. There's no reason to deny execution; if the process
would otherwise be able to execute it, it can also copy the file to
make a non-suid version and execute that instead. And some programs
can operate with reduced function without setuid. For example, screen
comes to mind; it needs root to share screen sessions between multiple
users, but can operate for a single user just fine without root, and
indeed the latter is usually the default configuration.

Michael Stone

unread,

Dec 29, 2009, 11:30:02 AM12/29/09

to

Serge Hallyn writes:

> Quoting Michael Stone (mic...@laptop.org):
> So far, two defaults have been proposed:
>
> default-deny incompatible isolation (Pavel)
> default-permit incompatible isolation (Michael)
>
> So far, several signalling mechanisms have been proposed:
>
>> 1) enabling a kernel config option implies default-permit
>>
>> - My favorite; apparently insufficient for Pavel?
>
> default under what conditions? any setuid? setuid-root?

My favorite option is that CONFIGURE_SECURITY_DISABLENETWORK causes
disablenetwork to function like djb describes: unprivileged and irrevocable.

(I don't have any setuid executables that I'm worried about breaking; only ones
that I think /should/ be broken and aren't, like ping.)

> 2) default-deny; disablesuid grants disablenetwork
>
> - "disablesuid" is my name for the idea of dropping the privilege of
> exec'ing setuid binaries
>
> - Suggested by Pavel and supported by several others.
>
> - I think it has the same backwards-compatibility problem as
> disablenetwork: disablesuid is an isolation primitive.
>
> 3) default-deny; dropping a capability from the bounding set grants "permit"
>
> - Suggested by Serge; seems nicely fine-grained but rather indirect
>
> Actually I think it's the opposite of what you said here: so long as the
> capability is in pE, you can regain network. So it would require a privileged
> process early on (like init or login) to remove the capability from the
> bounding set (bc doing so requires CAP_SETPCAP), but once that was done,
> the resulting process and it's children could not require the capability,
> and, without the capability, could not regain network. Point being that
> privileged userspace had to actively allow userspace to trap a setuid root
> binary without networking.

What I wrote accurately (if confusingly; sorry!) reflects what you suggest: by
default, the kernel should deny processes from irrevocably dropping networking
privilege until signalled that this is acceptable by the privileged mechanism
of dropping your cap from the bounding set.

> I think during exec we can simply check for this capability in pE, and
> if present then re-enable network if turned off. Then setuid-root binaries
> will raise that bit (if it's in the bounding set) automatically. Now,
> that means setuid-nonroot binaries will not reset network. Though you
> could make that happen by doing setcap cap_net_allownet+pe /the/file.
> Does that suffice?

I think I could live with it.

I find it weird that, if I call disablenetwork on a system *without* dropping
your capability, sendto(...) will fail but execve(['/bin/ping', '...']) will
succeed.

Still, it will do what I need.

>> 4) default-deny; setting a sysctl implies permit
>>
>> - Suggested by Serge; works fine for me
>
>That still leaves the question of when we re-allow network. Any
>setuid?

My intention was that prctl(PR_SET_NETWORK, PR_NETWORK_OFF) would return
-ENOTSUP or similar until the sysctl was enabled, at which point it would work
as I specified.

("As I specified" means one of "irrevocable" or "like rlimits; can be relaxed
by explicit action by privileged processes")

>> P.P.S. - On a completely unrelated note: imagine trying to use SELinux (or your
>> favorite MAC framework) to restrict the use of prctl(PR_SET_NETWORK,
>> PR_NETWORK_OFF). Am I right that sys_prctl() contains a
>> time-of-check-to-time-of-use (TOCTTOU) race (with security_task_prctl() as the
>> check and with prctl_set_network() as the use) as a result of the actual
>> argument being passed by address rather than by value?
>
> I'm probably misunderstanding your question, but just in case I'm not: the
> answer is that you wouldn't use the prctl interface anyway. You would strictly
> use domain transitions. Instead of doing prctl(PR_SET_NETWORK, PR_NETWORK_OFF)
> you would move yourself from the user_u:user_r:network_allowed domain to the
> user_u:user_r:network_disallowed domain.

You misunderstood; sorry I wasn't more clear. :)

I was really saying:

Suppose process A and process B create a share a memory segment containing an
unsigned long pointed to by.

unsigned long *flags;

Can't process A call prctl(PR_SET_NETWORK, flags) while, on another
processor, process B is twiddling bits in *flags so that

security_task_prctl() sees the bits that process A wrote and
prctl_set_network() sees the bits that process B wrote?

i.e. isn't there a TOCTTOU race [1] here in every prctl option that uses a
pointer argument? if not, what stops the race?

Regards,

Michael

[1]: http://en.wikipedia.org/wiki/Time-of-check-to-time-of-use

Serge E. Hallyn

unread,

Dec 29, 2009, 11:40:01 AM12/29/09

to

Quoting Bryan Donlan (bdo...@gmail.com):
> On Tue, Dec 29, 2009 at 10:11 AM, Serge E. Hallyn <se...@us.ibm.com> wrote:
> > Eric, let me specifically point out a 'disable setuid-root'
> > problem on linux: root still owns most of the system even when
> > it's not privileged. �So does "disable setuid-root" mean
> > we don't allow exec of setuid-root binaries at all, or that
> > we don't setuid to root, or that we just don't raise privileges
> > for setuid-root?
>
> I, for one, think it would be best to handle it exactly like the
> nosuid mount option - that is, pretend the file doesn't have any
> setuid bits set. There's no reason to deny execution; if the process
> would otherwise be able to execute it, it can also copy the file to
> make a non-suid version and execute that instead. And some programs
> can operate with reduced function without setuid. For example, screen
> comes to mind; it needs root to share screen sessions between multiple
> users, but can operate for a single user just fine without root, and
> indeed the latter is usually the default configuration.

That's fine with me, seems safe for a fully unprivileged program to
use, and would make sense to do through one of the securebits set
with prctl(PR_SET_SECUREBITS).

In addition, I assume we would also refuse to honor file capabilities?

-serge

Bryan Donlan

unread,

Dec 29, 2009, 12:10:02 PM12/29/09

to

On Tue, Dec 29, 2009 at 11:39 AM, Serge E. Hallyn <se...@us.ibm.com> wrote:
> Quoting Bryan Donlan (bdo...@gmail.com):
>> On Tue, Dec 29, 2009 at 10:11 AM, Serge E. Hallyn <se...@us.ibm.com> wrote:
>> > Eric, let me specifically point out a 'disable setuid-root'
>> > problem on linux: root still owns most of the system even when
>> > it's not privileged. �So does "disable setuid-root" mean
>> > we don't allow exec of setuid-root binaries at all, or that
>> > we don't setuid to root, or that we just don't raise privileges
>> > for setuid-root?
>>
>> I, for one, think it would be best to handle it exactly like the
>> nosuid mount option - that is, pretend the file doesn't have any
>> setuid bits set. There's no reason to deny execution; if the process
>> would otherwise be able to execute it, it can also copy the file to
>> make a non-suid version and execute that instead. And some programs
>> can operate with reduced function without setuid. For example, screen
>> comes to mind; it needs root to share screen sessions between multiple
>> users, but can operate for a single user just fine without root, and
>> indeed the latter is usually the default configuration.
>
> That's fine with me, seems safe for a fully unprivileged program to
> use, and would make sense to do through one of the securebits set
> with prctl(PR_SET_SECUREBITS).
>
> In addition, I assume we would also refuse to honor file capabilities?

Yes - essentially a one-time switch saying "never allow me to gain
capabilities again".

Eric W. Biederman

unread,

Dec 29, 2009, 1:10:02 PM12/29/09

to

"Serge E. Hallyn" <se...@us.ibm.com> writes:
>
> I have two comments on the idea:
>
> 1. We don't want to complicate the current capabilities
> concepts and API, so if we do something like this,
> we should make sure not to try to store these
> unprivileged capabilities with the current privilege
> capabilities.

I was afraid there might be a complication like that.
Then the user interface side of what I am proposing
needs some more thought.

> In my last email last night I detailed a way to use regular
> capabilities to make the prctl(PR_SET_NETWORK, PR_DROP_NET)
> safer.

Yes. I missed that earlier my apologies.

>> I can see one of two possible reasons you are avoiding the
>> suggestion to disable setuid root.
>> - You have a use for setuid root executables in your contained
>> environment. If you do what is that use?
>
> I don't think Michael was avoiding that. Rather, we haven't quite
> spelled out what it means to disable setuid root, and we haven't
> (to my satisfaction) detailed how setuid root would undo the
> prctl(PR_SET_NETWORK, PR_DROP_NET) - i.e. is it only on a
> privilege-granting setuid-root, or all setuids?

Michael finds suid-root granting network access incompatible
with what he is trying to achieve. The practical example is
network denied applications may communicate with the outside
world by calling ping.

> Eric, let me specifically point out a 'disable setuid-root'
> problem on linux: root still owns most of the system even when
> it's not privileged. So does "disable setuid-root" mean
> we don't allow exec of setuid-root binaries at all, or that
> we don't setuid to root, or that we just don't raise privileges
> for setuid-root?

The semantics I am suggesting under the title "disable suid exec" are
to flag a process such that, that process and any future children may
not increase their kernel enforced privileges. In practice this means
attempts to exec any suid binaries will fail. Likewise attempting to
exec a binary flagged with in the filesystem to gain capabilities will
also fail.

This would appear to require denying of ptracing applications
with other/more privileges, failing attempts to raise the
capabilities on one of the processes, and I think failing
all setuid/setgid calls.

In my conception this is a useful subset of unshare(USERNS) that can
be easily implemented now.

>> - Disabling suid root executables is an indirect path to your
>> goal.
>>
>> The problem with the disable_network semantics you want
>> is that they allow you to perform a denial of service attack
>> on privileged users. An unprivileged DOS attack is unsuitable
>> for a general purpose feature in a general purpose kernel.
>
> Though to be honest I'm still unconvinced that the disablenetwork
> is dangerous. I think long-term a more general solution like what
> I outlined above might be good, but short-term a sysctl that
> turns on or off the ability to drop network would suffice imo.

I have seen the following arguments put forth.
- Certain kinds of logging are broken from suid executables.
- Certain questionable but existing user space authentication
policies will be broken.

That is enough for me to strongly suspect someone with a more
devious mind than myself could find something worse. I know
it took me over a year to figure out how someone could exploit
unshare(NETNS).

The goal is to find something that unprivileged applications can use
safely, and can be available by default in distro kernels.

A syscall that removes the ability to exec suid executables appears to
meet that goal, as well as be the necessary prerequisite for enabling
other forms of isolation without causing security problems.

Eric

Eric W. Biederman

unread,

Dec 29, 2009, 1:40:02 PM12/29/09

to

Bryan Donlan <bdo...@gmail.com> writes:

That is what I was thinking. Does setresuid case problems? Assuming
the application that drop permissions could have successfully
called setresuid?

Ignoring the bits instead of honoring them when execing an executable
makes sense as that is the existing precedent.

If it works prctl appears to be a fine interface.

Eric

Bryan Donlan

unread,

Dec 29, 2009, 2:10:01 PM12/29/09

to

It's probably reasonable to require that real == effective == saved ==
fs UID (and same for GID); anything else brings up sticky issues of
"which UID is a higher capability?"
If a process does this call, it's effectively saying that the only way
it's going to be accessing resources beyond its current UID and
capabilities is by talking to another process over a (unix domain)
socket.

Benny Amorsen

unread,

Dec 29, 2009, 3:40:02 PM12/29/09

to

Bryan Donlan <bdo...@gmail.com> writes:

> I, for one, think it would be best to handle it exactly like the
> nosuid mount option - that is, pretend the file doesn't have any
> setuid bits set. There's no reason to deny execution; if the process
> would otherwise be able to execute it, it can also copy the file to
> make a non-suid version and execute that instead.

Execute != read. The executable file may contain secrets which must not
be available to the user running the setuid program. If you fail the
setuid, the user will be able to ptrace() and then the secret is
revealed.

It's amazing how many security holes appear from what seems like a very
simple request.

/Benny

Bryan Donlan

unread,

Dec 29, 2009, 3:50:02 PM12/29/09

to

On Tue, Dec 29, 2009 at 3:40 PM, Eric W. Biederman
<ebie...@xmission.com> wrote:

> Benny Amorsen <benny+...@amorsen.dk> writes:
>
>> Bryan Donlan <bdo...@gmail.com> writes:
>>
>>> I, for one, think it would be best to handle it exactly like the
>>> nosuid mount option - that is, pretend the file doesn't have any
>>> setuid bits set. There's no reason to deny execution; if the process
>>> would otherwise be able to execute it, it can also copy the file to
>>> make a non-suid version and execute that instead.
>>
>> Execute != read. The executable file may contain secrets which must not
>> be available to the user running the setuid program. If you fail the
>> setuid, the user will be able to ptrace() and then the secret is
>> revealed.
>>
>> It's amazing how many security holes appear from what seems like a very
>> simple request.
>

> Do we have a security hole in nosuid mount option?

Looks like it:
$ /tmp/m/sudo
sudo: must be setuid root
$ ls -l /tmp/m/sudo
-rwsr-x--x 1 root root 123448 2009-06-22 12:14 /tmp/m/sudo

Eric W. Biederman

unread,

Dec 29, 2009, 3:50:01 PM12/29/09

to

Benny Amorsen <benny+...@amorsen.dk> writes:

> Bryan Donlan <bdo...@gmail.com> writes:
>
>> I, for one, think it would be best to handle it exactly like the
>> nosuid mount option - that is, pretend the file doesn't have any
>> setuid bits set. There's no reason to deny execution; if the process
>> would otherwise be able to execute it, it can also copy the file to
>> make a non-suid version and execute that instead.
>
> Execute != read. The executable file may contain secrets which must not
> be available to the user running the setuid program. If you fail the
> setuid, the user will be able to ptrace() and then the secret is
> revealed.
>
> It's amazing how many security holes appear from what seems like a very
> simple request.

Do we have a security hole in nosuid mount option?
Can someone write a patch to fix it?

Eric

Eric W. Biederman

unread,

Dec 29, 2009, 4:00:04 PM12/29/09

to

Bryan Donlan <bdo...@gmail.com> writes:

> It's probably reasonable to require that real == effective == saved ==
> fs UID (and same for GID); anything else brings up sticky issues of
> "which UID is a higher capability?"
> If a process does this call, it's effectively saying that the only way
> it's going to be accessing resources beyond its current UID and
> capabilities is by talking to another process over a (unix domain)
> socket.

Makes sense. Especially for the initial implementation,
and it keeps the code size small.

Eric

Bryan Donlan

unread,

Dec 29, 2009, 4:20:05 PM12/29/09

to

2009/12/29 Alan Cox <al...@lxorguk.ukuu.org.uk>:

>> > Execute != read. The executable file may contain secrets which must not
>> > be available to the user running the setuid program. If you fail the
>> > setuid, the user will be able to ptrace() and then the secret is
>> > revealed.
>> >
>> > It's amazing how many security holes appear from what seems like a very
>> > simple request.
>>
>> Do we have a security hole in nosuid mount option?
>> Can someone write a patch to fix it?
>

> If a setuid app can read a key when its erroneously not set setuid then
> the user can read it too.
>
> Anything you can do with ptrace you can do yourself !

The security hole is that secrets in a setuid application with
other-exec but no other-read permission can be read when the
filesystem is mounted nosuid. Normally the user would be unable to
ptrace the program, and unable to read the executable, so the secret
would not be divulged; when nosuid is set, the user is now able to
ptrace the program - ie, they gain abilities from nosuid.

Whether this is a severe issue is debatable, of course; it's unlikely
that the administrator will create a setuid program with weird
permissions and then go and mount the fs it's on with nosuid. However
with the proposed 'drop suiding abilities' API, this becomes a bigger
issue, since if we reuse the nosuid semantics, any user can trigger
it, without needing to get root to mount things nosuid.

That said, I do tend to agree that relying on the _presence_ of a suid
mode to protect your secrets is probably a bad idea...

Alan Cox

unread,

Dec 29, 2009, 4:20:05 PM12/29/09

to

> > Execute != read. The executable file may contain secrets which must not
> > be available to the user running the setuid program. If you fail the
> > setuid, the user will be able to ptrace() and then the secret is
> > revealed.
> >
> > It's amazing how many security holes appear from what seems like a very
> > simple request.
>
> Do we have a security hole in nosuid mount option?
> Can someone write a patch to fix it?

If a setuid app can read a key when its erroneously not set setuid then

the user can read it too.

Anything you can do with ptrace you can do yourself !

Eric W. Biederman

unread,

Dec 29, 2009, 4:30:02 PM12/29/09

to

Alan Cox <al...@lxorguk.ukuu.org.uk> writes:

>> > Execute != read. The executable file may contain secrets which must not
>> > be available to the user running the setuid program. If you fail the
>> > setuid, the user will be able to ptrace() and then the secret is
>> > revealed.
>> >
>> > It's amazing how many security holes appear from what seems like a very
>> > simple request.
>>
>> Do we have a security hole in nosuid mount option?
>> Can someone write a patch to fix it?
>
> If a setuid app can read a key when its erroneously not set setuid then
> the user can read it too.
>
> Anything you can do with ptrace you can do yourself !

Now that I think about it this is really something completely separate
from setuid. This is about being able to read the text segment with
ptrace when you on have execute permissions on the file.

I just skimmed through fs/exec.c and we set the undumpable process
flag in that case so ptrace should not work in that case.

So short of a bug in the implementation we have no security hole.

Eric

Serge E. Hallyn

unread,

Dec 29, 2009, 4:30:01 PM12/29/09

to

I think i disagree. A uid is just a uid (or should be). One day we may
have a way for a factotum-style daemon to grant the ability to an unpriv
task to setuid without CAP_SETUID. I think slingling uids and gids
around that you already have access to should be fine.

> If a process does this call, it's effectively saying that the only way
> it's going to be accessing resources beyond its current UID and
> capabilities is by talking to another process over a (unix domain)
> socket.
> --

> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in

Alan Cox

unread,

Dec 29, 2009, 4:40:03 PM12/29/09

to

> The security hole is that secrets in a setuid application with
> other-exec but no other-read permission can be read when the
> filesystem is mounted nosuid.

Erm no

We enforce the following anyway to prevent execution being permitted to
make file copies. Most Unixen do although its historical value is
primarily to prevent people "stealing valuable proprietary intellectual
software assets".

} else if (file_permission(bprm->file, MAY_READ) ||
bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP)
{
set_dumpable(current->mm, suid_dumpable);
}

There does appear to be a small race in modern versions of that code
which wants swatting.

Valdis.K...@vt.edu

unread,

Dec 29, 2009, 4:50:02 PM12/29/09

to

On Tue, 29 Dec 2009 15:27:22 CST, "Serge E. Hallyn" said:
> I think i disagree. A uid is just a uid (or should be). One day we may
> have a way for a factotum-style daemon to grant the ability to an unpriv
> task to setuid without CAP_SETUID. I think slingling uids and gids
> around that you already have access to should be fine.

Yes, but not doing the clear and obvious simple thing now for a "one day
we may have" consideration seems a poor engineering tradeoff.

Yes, slinging uids and gids around *would* be nice. But first we need a clear
plan for making /usr/bin/newgrp a shell builtin - once that happens, *then*
we can re-address this code. ;)

Serge E. Hallyn

unread,

Dec 29, 2009, 5:20:01 PM12/29/09

to

Absolutely agreed with the principle, but conflicted on the application.

I know earlier in the thread I said uid 0 even when unprivileged is
actually privileged merely by owning most of the system files. But
in fact I think it helps to think more clearly when we separately
consider the cases of (a) changing uid, and (b) enhancing privilege.

That's why I was recommending implementation through securebits - what
we're basically saying is the task should never gain privilege. And
effectively, since it won't have CAP_SETUID (unless it has and keeps it
in pI) it wont' be able to change uids. But if we right off the bat
confuse changing uids with gaining privilege, I'm afraid we might end
up making some poor decisions.

Still, I won't say no to a check to refuse dropping the ability to
setuid to ensure that ruid=euid=suid and pP=pE=pI=empty. It may
come back to bite us, but like I say I'm conflicted - willing to
go either way.

-serge

Serge E. Hallyn

unread,

Dec 29, 2009, 5:40:02 PM12/29/09

to

Quoting Eric W. Biederman (ebie...@xmission.com):

> Alan Cox <al...@lxorguk.ukuu.org.uk> writes:
>
> >> > Execute != read. The executable file may contain secrets which must not
> >> > be available to the user running the setuid program. If you fail the
> >> > setuid, the user will be able to ptrace() and then the secret is
> >> > revealed.
> >> >
> >> > It's amazing how many security holes appear from what seems like a very
> >> > simple request.
> >>
> >> Do we have a security hole in nosuid mount option?
> >> Can someone write a patch to fix it?
> >
> > If a setuid app can read a key when its erroneously not set setuid then
> > the user can read it too.
> >
> > Anything you can do with ptrace you can do yourself !
>
> Now that I think about it this is really something completely separate
> from setuid. This is about being able to read the text segment with
> ptrace when you on have execute permissions on the file.
>
> I just skimmed through fs/exec.c and we set the undumpable process
> flag in that case so ptrace should not work in that case.

And in fact you can't do a new ptrace_attach, but if you're already
tracing the task when it execs the unreadable-but-executable file,
then the ptrace can continue.

Just looking at the code, it appears 2.2 was the same way (though I
could be missing where it used to enforce that).

So, is that intended? What exactly would we do about it if not?
Just refuse exec of a unreadable-but-executable file if we're
being traced?

Eric W. Biederman

unread,

Dec 29, 2009, 10:30:02 PM12/29/09

to

"Serge E. Hallyn" <se...@us.ibm.com> writes:

In common cap we drop the new capabilities if we are being ptraced.
Look for brm->unsafe.

Eric W. Biederman

unread,

Dec 29, 2009, 10:40:01 PM12/29/09

to

If we can know that a process will never raise it's priveleges we can
enable a lot of features that otherwise would be unsafe, because they
could break assumptions of existing suid executables.

To allow this to be used as a sand boxing feature also disable
ptracing other executables without this new restriction.

For the moment I have used a per thread flag because we are out of per
process flags.

To ensure all descendants get this flag I rely on the default copying
of procss structures.

The disabling of suid executables is exactly the same as MNT_NOSUID.

This should be what we have been talking about in for disabling of
suid exec. I choose not to use securebits as that interface requires
privilege and assumes capabilities. This implementation is more basic
than capabilities and only adds additional sanity checks when
linux capabilities are not present.

I attempt to ensure there are no mixed priveleges present, when we
perform the disable so I don't need to handle or think about
interactions with setreuid or capabilities in this code.

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 375c917..e716203 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -82,6 +82,7 @@ struct thread_info {
#define TIF_SYSCALL_EMU 6 /* syscall emulation active */
#define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */
#define TIF_SECCOMP 8 /* secure computing */
+#define TIF_NOSUID 9 /* suid exec permanently disabled */
#define TIF_MCE_NOTIFY 10 /* notify userspace of an MCE */
#define TIF_USER_RETURN_NOTIFY 11 /* notify kernel of userspace return */
#define TIF_NOTSC 16 /* TSC is not accessible in userland */
@@ -107,6 +108,7 @@ struct thread_info {
#define _TIF_SYSCALL_EMU (1 << TIF_SYSCALL_EMU)
#define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
#define _TIF_SECCOMP (1 << TIF_SECCOMP)
+#define _TIF_NOSUID (1 << TIF_NOSUID)
#define _TIF_MCE_NOTIFY (1 << TIF_MCE_NOTIFY)
#define _TIF_USER_RETURN_NOTIFY (1 << TIF_USER_RETURN_NOTIFY)
#define _TIF_NOTSC (1 << TIF_NOTSC)
diff --git a/fs/exec.c b/fs/exec.c
index 632b02e..e6c9bc5 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1132,7 +1132,8 @@ int prepare_binprm(struct linux_binprm *bprm)
bprm->cred->euid = current_euid();
bprm->cred->egid = current_egid();

- if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)) {
+ if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID) &&
+ !test_tsk_thread_flag(current, TIF_NOSUID)) {
/* Set-uid? */
if (mode & S_ISUID) {
bprm->per_clear |= PER_CLEAR_ON_SETID;
diff --git a/include/linux/prctl.h b/include/linux/prctl.h
index a3baeb2..acb3516 100644
--- a/include/linux/prctl.h
+++ b/include/linux/prctl.h
@@ -102,4 +102,6 @@

#define PR_MCE_KILL_GET 34

+#define PR_SET_NOSUID 35
+
#endif /* _LINUX_PRCTL_H */
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 23bd09c..4b2643c 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -152,6 +152,10 @@ int __ptrace_may_access(struct task_struct *task, unsigned int mode)
if (!dumpable && !capable(CAP_SYS_PTRACE))
return -EPERM;

+ if (test_tsk_thread_flag(current, TIF_NOSUID) &&
+ !test_tsk_thread_flag(current, TIF_NOSUID))
+ return -EPERM;
+
return security_ptrace_access_check(task, mode);
}

diff --git a/kernel/sys.c b/kernel/sys.c
index 26a6b73..1d1902a 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1578,6 +1578,22 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
else
error = PR_MCE_KILL_DEFAULT;
break;
+ case PR_SET_NOSUID:
+ {
+ const struct cred *cred = current->cred;
+ error = -EINVAL;
+ if ( (cred->uid != cred->suid) ||
+ (cred->uid != cred->euid) ||
+ (cred->uid != cred->fsuid) ||
+ (cred->gid != cred->sgid) ||
+ (cred->gid != cred->egid) ||
+ (cred->gid != cred->fsgid) ||
+ (atomic_read(&current->signal->count) != 1))
+ break;
+ error = 0;
+ set_tsk_thread_flag(current, TIF_NOSUID);
+ break;
+ }
default:
error = -EINVAL;
break;
diff --git a/security/commoncap.c b/security/commoncap.c
index f800fdb..8abd3dc 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -392,6 +392,9 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective)
if (bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID)
return 0;

+ if (test_tsk_thread_flag(current, TIF_NOSUID))
+ return 0;
+
dentry = dget(bprm->file->f_dentry);

rc = get_vfs_caps_from_disk(dentry, &vcaps);
@@ -869,6 +872,18 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
new->securebits &= ~issecure_mask(SECURE_KEEP_CAPS);
goto changed;

+ case PR_SET_NOSUID:
+ {
+ const struct cred *cred = current->cred;
+ error = -EINVAL;
+ // Perform the capabilities checks
+ if (!cap_isclear(cred->cap_permitted) ||
+ !cap_isclear(cred->cap_effective))
+ goto error;
+ // Have the default perform the rest of the work.
+ error = -ENOSYS;
+ goto error;
+ }
default:
/* No functionality available - continue with default */
error = -ENOSYS;
--
1.6.5.2.143.g8cc62

Serge E. Hallyn

unread,

Dec 29, 2009, 11:00:02 PM12/29/09

to

Yes - that isn't the issue. The issue is with a file to which
we have execute permission but not read. If user hallyn has two
terminals open, and terminal one does ./foo then terminal two
cannot do strace -f -p `pidof foo`. But user hallyn can do
strace -f -p ./foo and succeed.

So we refuse ptrace_attach to an existing process with dumpable
turned off, but a pre-existing ptrace attach isn't affected by
executing a file which causes dumpable to be unset.

It goes back to finding a way to figure out what is inside the
file when the installer obviously thought we shouldn't be able
to read the file.

Do we care? <shrug>

-serge

Bryan Donlan

unread,

Dec 29, 2009, 11:00:02 PM12/29/09

to

On Tue, Dec 29, 2009 at 10:35 PM, Eric W. Biederman
<ebie...@xmission.com> wrote:
>
> If we can know that a process will never raise it's priveleges we can
> enable a lot of features that otherwise would be unsafe, because they
> could break assumptions of existing suid executables.
>
> To allow this to be used as a sand boxing feature also disable
> ptracing other executables without this new restriction.
>
> For the moment I have used a per thread flag because we are out of per
> process flags.
>
> To ensure all descendants get this flag I rely on the default copying
> of procss structures.
>
> The disabling of suid executables is exactly the same as MNT_NOSUID.
>
> This should be what we have been talking about in for disabling of
> suid exec. �I choose not to use securebits as that interface requires
> privilege and assumes capabilities. �This implementation is more basic
> than capabilities and only adds additional sanity checks when
> linux capabilities are not present.
>
> I attempt to ensure there are no mixed priveleges present, when we
> perform the disable so I don't need to handle or think about
> interactions with setreuid or capabilities in this code.

Is this sufficient for other security models such as selinux or
TOMOYO? Can processes in these models gain privileges through means
not restricted here?

Also, perhaps there should be a corresponding GET prctl?

Eric W. Biederman

unread,

Dec 29, 2009, 11:30:02 PM12/29/09

to

"Serge E. Hallyn" <se...@us.ibm.com> writes:

>> In common cap we drop the new capabilities if we are being ptraced.
>> Look for brm->unsafe.
>
> Yes - that isn't the issue.

Right. Sorry. I saw that we set unsafe and totally
missed that we don't act on it in that case.

> It goes back to finding a way to figure out what is inside the
> file when the installer obviously thought we shouldn't be able
> to read the file.
>
> Do we care? <shrug>

<shrug>

I expect two lines of testing bprm->unsafe and failing
at the right point would solve that.

Eric

Eric W. Biederman

unread,

Dec 29, 2009, 11:40:01 PM12/29/09

to

Bryan Donlan <bdo...@gmail.com> writes:

> Is this sufficient for other security models such as selinux or
> TOMOYO? Can processes in these models gain privileges through means
> not restricted here?

The LSM is primarily about returning -EPERM more often.
Except for the prctl and the capability hooks I am not aware
of anywhere a LSM can increase a processes capabilities.

> Also, perhaps there should be a corresponding GET prctl?

Probably for the final version.

Eric

Bryan Donlan

unread,

Dec 30, 2009, 12:00:02 AM12/30/09

to

On Tue, Dec 29, 2009 at 11:33 PM, Eric W. Biederman
<ebie...@xmission.com> wrote:
> Bryan Donlan <bdo...@gmail.com> writes:
>
>> Is this sufficient for other security models such as selinux or
>> TOMOYO? Can processes in these models gain privileges through means
>> not restricted here?
>
> The LSM is primarily about returning -EPERM more often.
> Except for the prctl and the capability hooks I am not aware
> of anywhere a LSM can increase a processes capabilities.

I'm more concerned about a case where a privilege that the LSM
currently denies is lifted by execing some executable - this is still
an increase in privilege, even though the LSM only adds additional
restrictions. That is:

1) Initial state: LSM denies access to /somefile (although normal
POSIX permissions would permit access)
2) Disable capability-gaining
3) Disable network access with proposed API
4) Exec some application, which is labeled in a way that permits
access to /somefile
5) Application fails to access the network, then does something to /somefile

I'm not entirely sure if step 4) can happen in any of the currently
existing LSMs - if it's not possible to gain privileges in them via a
suid-like mechanism, this isn't a problem, but it's something that
needs to be checked for.

David Wagner

unread,

Dec 30, 2009, 2:30:02 AM12/30/09

to

Eric W. Biederman wrote:
>The problem with the disable_network semantics you want
>is that they allow you to perform a denial of service attack
>on privileged users. An unprivileged DOS attack is unsuitable
>for a general purpose feature in a general purpose kernel.

I'm not persuaded yet.

When you talk about DOS, let's be a bit more precise. disablenetwork
gives a way to deny setuid programs access to the network. It's not a
general-purpose DOS; it's denying access to the network only. And the
network is fundamentally unreliable. No security-critical mechanism
should be relying upon the availability of the network.

There are already a number of ways to deny service to the network.
Let me list a few:
* Limit the number of open file descriptors using rlimit,
open lots of file descriptors, and exec a setuid program.
* Flood the local network link.
* DOS the DNS servers (likely causing most connections to fail).
If there is a setuid-root program that fails catastrophically when the
network is unavailable, you've already got a serious problem -- and that
is true whether or not we introduce the disablenetwork service.

So while I certainly can't rule out the possibility that disablenetwork
might introduce minor issues, I think there are fundamental reasons to
be skeptical that disablenetwork will introduce serious new security
problems.

Pavel Machek

unread,

Dec 30, 2009, 5:10:02 AM12/30/09

to

> Pavel Machek wrote:
> >>index 26a6b73..b48f021 100644
> >>--- a/kernel/sys.c
> >>+++ b/kernel/sys.c
> >>@@ -35,6 +35,7 @@
> >> #include <linux/cpu.h>
> >> #include <linux/ptrace.h>
> >> #include <linux/fs_struct.h>
> >>+#include <linux/prctl_network.h>
> >>
> >> #include <linux/compat.h>
> >> #include <linux/syscalls.h>
> >
> >Something seems to be wrong with whitespace here. Damaged patch?
>
> Nope; kernel/sys.c has a newline there:
>
> http://repo.or.cz/w/linux-2.6.git/blob/HEAD:/kernel/sys.c#l36
>
> Shall I remove it?

notice two spaces before include. something was definitely wrong there.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Eric W. Biederman

unread,

Dec 30, 2009, 7:50:03 AM12/30/09

to

Bryan Donlan <bdo...@gmail.com> writes:

A reasonable concern. When the glitches get worked out of this patch
I intend to allow much more dangerous things like unprivileged unsharing
of all of the namespaces, and unprivileged mounts.

It appears I missed a place where MNT_NOSUID was handled in selinux.
So I will be adding a bprm->nosuid field so I don't have to duplicate
the MNT_NOSUID check everywhere it is used.

I don't understand TOMOYO I think it is file based access control,
which suggests there is not a suid like mechanism.

Smack and selinux are label based. Selinux at least can switch labels
on exec, but it handles NOSUID already.

Looking a little farther if I assume that lsm implementations that
implement the set_creds hook need attention. Only selinux has
an interesting set_creds implementation and it handles nosuid already.

So I think we are ok.

Eric

Eric W. Biederman

unread,

Dec 30, 2009, 8:00:02 AM12/30/09

to

If we can know that a process will never raise
it's priveleges we can enable a lot of features
that otherwise would be unsafe, because they
could break assumptions of existing suid executables.

To allow this to be used as a sand boxing feature
also disable ptracing other executables without
this new restriction.

For the moment I have used a per thread flag because
we are out of per process flags.

To ensure all descendants get this flag I rely on
the default copying of procss structures.

Added bprm->nosuid to make remove the need to add
duplicate error prone checks. This ensures that
the disabling of suid executables is exactly the
same as MNT_NOSUID.

Signed-off-by: Eric W. Biederman <ebie...@xmission.com>
---
arch/x86/include/asm/thread_info.h | 2 ++
fs/exec.c | 6 ++++--
include/linux/binfmts.h | 1 +

include/linux/prctl.h | 2 ++
kernel/ptrace.c | 4 ++++
kernel/sys.c | 16 ++++++++++++++++

security/commoncap.c | 14 +++++++++++++-
security/selinux/hooks.c | 2 +-
8 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 375c917..e716203 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -82,6 +82,7 @@ struct thread_info {
#define TIF_SYSCALL_EMU 6 /* syscall emulation active */
#define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */
#define TIF_SECCOMP 8 /* secure computing */
+#define TIF_NOSUID 9 /* suid exec permanently disabled */
#define TIF_MCE_NOTIFY 10 /* notify userspace of an MCE */
#define TIF_USER_RETURN_NOTIFY 11 /* notify kernel of userspace return */
#define TIF_NOTSC 16 /* TSC is not accessible in userland */
@@ -107,6 +108,7 @@ struct thread_info {
#define _TIF_SYSCALL_EMU (1 << TIF_SYSCALL_EMU)
#define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
#define _TIF_SECCOMP (1 << TIF_SECCOMP)
+#define _TIF_NOSUID (1 << TIF_NOSUID)
#define _TIF_MCE_NOTIFY (1 << TIF_MCE_NOTIFY)
#define _TIF_USER_RETURN_NOTIFY (1 << TIF_USER_RETURN_NOTIFY)
#define _TIF_NOTSC (1 << TIF_NOTSC)
diff --git a/fs/exec.c b/fs/exec.c

index 632b02e..5cba5ac 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1131,8 +1131,10 @@ int prepare_binprm(struct linux_binprm *bprm)
/* clear any previous set[ug]id data from a previous binary */

bprm->cred->euid = current_euid();
bprm->cred->egid = current_egid();
-
- if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)) {

+ bprm->nosuid =
+ (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID) ||
+ test_tsk_thread_flag(current, TIF_NOSUID);
+ if (bprm->nosuid) {

/* Set-uid? */
if (mode & S_ISUID) {
bprm->per_clear |= PER_CLEAR_ON_SETID;

diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index cd4349b..c3b5a30 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -44,6 +44,7 @@ struct linux_binprm{
#ifdef __alpha__
unsigned int taso:1;
#endif
+ unsigned int nosuid:1; /* True if suid bits are ignored */
unsigned int recursion_depth;
struct file * file;
struct cred *cred; /* new credentials */

diff --git a/include/linux/prctl.h b/include/linux/prctl.h
index a3baeb2..acb3516 100644
--- a/include/linux/prctl.h
+++ b/include/linux/prctl.h
@@ -102,4 +102,6 @@

#define PR_MCE_KILL_GET 34

+#define PR_SET_NOSUID 35
+
#endif /* _LINUX_PRCTL_H */
diff --git a/kernel/ptrace.c b/kernel/ptrace.c

index 23bd09c..b91040c 100644

--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -152,6 +152,10 @@ int __ptrace_may_access(struct task_struct *task, unsigned int mode)
if (!dumpable && !capable(CAP_SYS_PTRACE))
return -EPERM;

+ if (test_tsk_thread_flag(current, TIF_NOSUID) &&

+ !test_tsk_thread_flag(task, TIF_NOSUID))

index f800fdb..28ab286 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -389,7 +389,7 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective)
if (!file_caps_enabled)
return 0;

- if (bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID)
+ if (bprm->nosuid)
return 0;

dentry = dget(bprm->file->f_dentry);

@@ -869,6 +869,18 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,

new->securebits &= ~issecure_mask(SECURE_KEEP_CAPS);
goto changed;

+ case PR_SET_NOSUID:
+ {
+ const struct cred *cred = current->cred;
+ error = -EINVAL;

+ /* Perform the capabilities checks */

+ if (!cap_isclear(cred->cap_permitted) ||
+ !cap_isclear(cred->cap_effective))
+ goto error;

+ /* Have the default perform the rest of the work. */

+ error = -ENOSYS;
+ goto error;
+ }
default:
/* No functionality available - continue with default */
error = -ENOSYS;

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 7a374c2..d14cd24 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2147,7 +2147,7 @@ static int selinux_bprm_set_creds(struct linux_binprm *bprm)
COMMON_AUDIT_DATA_INIT(&ad, FS);
ad.u.fs.path = bprm->file->f_path;

- if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
+ if (bprm->nosid)
new_tsec->sid = old_tsec->sid;

if (new_tsec->sid == old_tsec->sid) {
--
1.6.5.2.143.g8cc62

Andrew G. Morgan

unread,

Dec 30, 2009, 10:00:02 AM12/30/09

to

Eric,

I'm not clear why capabilities need to be manipulated by this feature
(the pure capability support already has a feature for disabling
privilege and blocking unsafe, or insufficient privilege, execution).

Perhaps I'm just unclear what features can be more safely enabled with
this in effect - that is, your description suggests that this is why
you are doing this, but leaves it unclear what they are. Could you
take a few moments to enumerate some of them?

Thanks

Andrew

> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in

Valdis.K...@vt.edu

unread,

Dec 30, 2009, 11:30:03 AM12/30/09

to

On Wed, 30 Dec 2009 07:24:11 GMT, David Wagner said:
> So while I certainly can't rule out the possibility that disablenetwork
> might introduce minor issues, I think there are fundamental reasons to
> be skeptical that disablenetwork will introduce serious new security
> problems.

I have to agree with David here - although there's many failure modes if
a security-relevant program wants to talk to the network, they're all already
prone to stuffage by an attacker.

Biggest danger is probably programs that rashly assume that 127.0.0.1 is
reachable. Seen a lot of *that* in my day (no, don't ask how I found out ;)

Serge E. Hallyn

unread,

Dec 30, 2009, 1:10:02 PM12/30/09

to

Quoting Eric W. Biederman (ebie...@xmission.com):

> "Serge E. Hallyn" <se...@us.ibm.com> writes:
>
> >> In common cap we drop the new capabilities if we are being ptraced.
> >> Look for brm->unsafe.
> >
> > Yes - that isn't the issue.
>
> Right. Sorry. I saw that we set unsafe and totally
> missed that we don't act on it in that case.
>
> > It goes back to finding a way to figure out what is inside the
> > file when the installer obviously thought we shouldn't be able
> > to read the file.
> >
> > Do we care? <shrug>
>
> <shrug>
>
> I expect two lines of testing bprm->unsafe and failing
> at the right point would solve that.

But what is the right response? Prevent excecution? Stop the
tracer? Enter some one-shot mode where the whole exec appears
as one step, but tracing continues if execution continues on a
dumpable file?

-serge

Serge E. Hallyn

unread,

Dec 30, 2009, 1:30:02 PM12/30/09

to

Quoting Eric W. Biederman (ebie...@xmission.com):
>

Should this be -EPERM? not sure...

> + /* Perform the capabilities checks */
> + if (!cap_isclear(cred->cap_permitted) ||
> + !cap_isclear(cred->cap_effective))

No need to check cap_effective, as no bits can be there which are not
in cap_permitted.

To be honest, I don't think there is much reason to not have this
check done in the main sys_prctl(0 - capabilities themselves are not
optional in the kernel, while cap_task_prctl() is. So you are setting
us up to have cases where say an apparmor user can call this with uid
0 and/or active capabilities.

> + goto error;
> + /* Have the default perform the rest of the work. */
> + error = -ENOSYS;
> + goto error;
> + }
> default:
> /* No functionality available - continue with default */
> error = -ENOSYS;
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 7a374c2..d14cd24 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -2147,7 +2147,7 @@ static int selinux_bprm_set_creds(struct linux_binprm *bprm)
> COMMON_AUDIT_DATA_INIT(&ad, FS);
> ad.u.fs.path = bprm->file->f_path;
>
> - if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
> + if (bprm->nosid)

typo - nosuid?

> new_tsec->sid = old_tsec->sid;
>
> if (new_tsec->sid == old_tsec->sid) {
> --
> 1.6.5.2.143.g8cc62

Thanks, I think this looks good.

-serge

Serge E. Hallyn

unread,

Dec 30, 2009, 1:40:02 PM12/30/09

to

Quoting Andrew G. Morgan (mor...@kernel.org):
> Eric,
>
> I'm not clear why capabilities need to be manipulated by this feature
> (the pure capability support already has a feature for disabling
> privilege and blocking unsafe, or insufficient privilege, execution).

Not entirely - this option would also prevent file capabilities from
being honored.

> Perhaps I'm just unclear what features can be more safely enabled with
> this in effect - that is, your description suggests that this is why
> you are doing this, but leaves it unclear what they are. Could you
> take a few moments to enumerate some of them?

There are two desirable features which are at the moment unsafe for
unprivileged users, because it allows them to fool privileged (setuid
or bearing file capabilities) programs. One is to unconditionally
restrict privilege to yourself and all your descendents. The recent
disablenetwork patchset is one example. The other is the ability to
make substantial changes to your environment in a private namespace.
A private namespace can protect already-running privileged program,
but cannot protect privilege-bearing binaries. Unless we prevent
them from bearing privilege. Which is what this patch does.

-serge

Serge E. Hallyn

unread,

Dec 30, 2009, 1:50:01 PM12/30/09

to

Quoting Michael Stone (mic...@laptop.org):
> Daniel Bernstein has observed [1] that security-conscious userland processes
> may benefit from the ability to irrevocably remove their ability to create,
> bind, connect to, or send messages except in the case of previously connected
> sockets or AF_UNIX filesystem sockets.
>
> This patch provides
>
> * a new configuration option named CONFIG_SECURITY_DISABLENETWORK,
> * a new prctl option-pair (PR_SET_NETWORK, PR_GET_NETWORK),
> * a new prctl(PR_SET_NETWORK) flag named PR_NETWORK_OFF, and
> * a new task_struct flags field named "network"
>
> Signed-off-by: Michael Stone <mic...@laptop.org>
> ---
> include/linux/prctl.h | 7 +++++
> include/linux/prctl_network.h | 7 +++++
> include/linux/sched.h | 4 +++
> kernel/sys.c | 53 +++++++++++++++++++++++++++++++++++++++++
> security/Kconfig | 11 ++++++++
> 5 files changed, 82 insertions(+), 0 deletions(-)
> create mode 100644 include/linux/prctl_network.h
>
> diff --git a/include/linux/prctl.h b/include/linux/prctl.h
> index a3baeb2..4eb4110 100644
> --- a/include/linux/prctl.h
> +++ b/include/linux/prctl.h
> @@ -102,4 +102,11 @@
>
> #define PR_MCE_KILL_GET 34
>
> +/* Get/set process disable-network flags */
> +#define PR_SET_NETWORK 35
> +#define PR_GET_NETWORK 36
> +# define PR_NETWORK_ON 0
> +# define PR_NETWORK_OFF 1
> +# define PR_NETWORK_ALL_FLAGS 1

> +
> #endif /* _LINUX_PRCTL_H */

> diff --git a/include/linux/prctl_network.h b/include/linux/prctl_network.h
> new file mode 100644
> index 0000000..d18f8cb
> --- /dev/null
> +++ b/include/linux/prctl_network.h
> @@ -0,0 +1,7 @@
> +#ifndef _LINUX_PRCTL_NETWORK_H
> +#define _LINUX_PRCTL_NETWORK_H
> +
> +extern long prctl_get_network(unsigned long*);
> +extern long prctl_set_network(unsigned long*);
> +
> +#endif /* _LINUX_PRCTL_NETWORK_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index f2f842d..6fcaef8 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1403,6 +1403,10 @@ struct task_struct {
> #endif
> seccomp_t seccomp;
>
> +#ifdef CONFIG_SECURITY_DISABLENETWORK
> + unsigned long network;
> +#endif
> +
> /* Thread group tracking */
> u32 parent_exec_id;
> u32 self_exec_id;
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 26a6b73..b48f021 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c

> @@ -35,6 +35,7 @@
> #include <linux/cpu.h>
> #include <linux/ptrace.h>
> #include <linux/fs_struct.h>
> +#include <linux/prctl_network.h>
>
> #include <linux/compat.h>
> #include <linux/syscalls.h>

> @@ -1578,6 +1579,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,

> else
> error = PR_MCE_KILL_DEFAULT;
> break;

> + case PR_SET_NETWORK:
> + error = prctl_set_network((unsigned long*)arg2);
> + break;
> + case PR_GET_NETWORK:
> + error = prctl_get_network((unsigned long*)arg2);
> + break;

> default:
> error = -EINVAL;
> break;

> @@ -1585,6 +1592,52 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> return error;
> }
>
> +#ifdef CONFIG_SECURITY_DISABLENETWORK
> +
> +long prctl_get_network(unsigned long* user)
> +{
> + return put_user(current->network, user);
> +}
> +
> +long prctl_set_network(unsigned long* user)
> +{
> + unsigned long network_flags;
> + long ret;
> +
> + ret = -EFAULT;
> + if (copy_from_user(&network_flags, user, sizeof(network_flags)))
> + goto out;

Do you expect to pass more than 32 bits through this interface at
some point? If not, how about avoiding the copy, and just passing
a long into prctl_set_network(), and having prctl_get_network
return 0 or a positive value indicating the active bits?

So

long prctl_get_network(void)
{
return current->network;
}

long prctl_set_network(unsigned long network_flags)
{
if (network_flags & ~PR_NETWORK_ALL_FLAGS)
return -EINVAL;
if (current->network & ~network_flags)
return -EPERM;
current->network = network_flags;
return 0;
}

> + ret = -EINVAL;
> + if (network_flags & ~PR_NETWORK_ALL_FLAGS)
> + goto out;
> +
> + /* only dropping access is permitted */
> + ret = -EPERM;
> + if (current->network & ~network_flags)

whitespace.

> + goto out;
> +
> + current->network = network_flags;
> + ret = 0;
> +
> +out:
> + return ret;
> +}
> +
> +#else
> +
> +long prctl_get_network(unsigned long* user)
> +{
> + return -ENOSYS;
> +}
> +
> +long prctl_set_network(unsigned long* user)
> +{
> + return -ENOSYS;
> +}
> +
> +#endif /* ! CONFIG_SECURITY_DISABLENETWORK */
> +
> SYSCALL_DEFINE3(getcpu, unsigned __user *, cpup, unsigned __user *, nodep,
> struct getcpu_cache __user *, unused)
> {
> diff --git a/security/Kconfig b/security/Kconfig
> index 226b955..afd7f76 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -137,6 +137,17 @@ config LSM_MMAP_MIN_ADDR
> this low address space will need the permission specific to the
> systems running LSM.
>
> +config SECURITY_DISABLENETWORK
> + bool "Socket and networking discretionary access control"
> + depends on SECURITY_NETWORK
> + help
> + This enables processes to drop networking privileges via
> + prctl(PR_SET_NETWORK, PR_NETWORK_OFF).
> +
> + See Documentation/disablenetwork.txt for more information.
> +
> + If you are unsure how to answer this question, answer N.
> +
> source security/selinux/Kconfig
> source security/smack/Kconfig
> source security/tomoyo/Kconfig
> --
> 1.6.6.rc2
> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in

Eric W. Biederman

unread,

Dec 30, 2009, 3:10:03 PM12/30/09

to

"Serge E. Hallyn" <se...@us.ibm.com> writes:

> Quoting Andrew G. Morgan (mor...@kernel.org):
>> Eric,
>>
>> I'm not clear why capabilities need to be manipulated by this feature
>> (the pure capability support already has a feature for disabling
>> privilege and blocking unsafe, or insufficient privilege, execution).
>
> Not entirely - this option would also prevent file capabilities from
> being honored.

All my patch does is verify the caller doesn't have privilege.

>> Perhaps I'm just unclear what features can be more safely enabled with
>> this in effect - that is, your description suggests that this is why
>> you are doing this, but leaves it unclear what they are. Could you
>> take a few moments to enumerate some of them?
>
> There are two desirable features which are at the moment unsafe for
> unprivileged users, because it allows them to fool privileged (setuid
> or bearing file capabilities) programs. One is to unconditionally
> restrict privilege to yourself and all your descendents. The recent
> disablenetwork patchset is one example. The other is the ability to
> make substantial changes to your environment in a private namespace.
> A private namespace can protect already-running privileged program,
> but cannot protect privilege-bearing binaries. Unless we prevent
> them from bearing privilege. Which is what this patch does.

Effectively by ensuring privileges can not be raised this removes
the set of circumstances that lead to the sendmail capabilities bug.

So any kernel feature that requires capabilities only because not
doing so would break backwards compatibility with suid applications.
This includes namespace manipulation, like plan 9.
This includes unsharing pid and network and sysvipc namespaces.

There are probably other useful but currently root only features
that this will allow to be used by unprivileged processes, that
I am not aware of.

In addition to the fact that knowing privileges can not be escalated
by a process is a good feature all by itself. Run this in a chroot
and the programs will never be able to gain root access even if
there are suid binaries available for them to execute.

Eric

Serge E. Hallyn

unread,

Dec 30, 2009, 3:20:03 PM12/30/09

to

Quoting Eric W. Biederman (ebie...@xmission.com):

> "Serge E. Hallyn" <se...@us.ibm.com> writes:
>
> > Quoting Andrew G. Morgan (mor...@kernel.org):
> >> Eric,
> >>
> >> I'm not clear why capabilities need to be manipulated by this feature
> >> (the pure capability support already has a feature for disabling
> >> privilege and blocking unsafe, or insufficient privilege, execution).
> >
> > Not entirely - this option would also prevent file capabilities from
> > being honored.
>
> All my patch does is verify the caller doesn't have privilege.

No, you shortcut security/commoncap.c:get_file_caps() if (bprm->nosuid),
which is set if test_tsk_thread_flag(current, TIF_NOSUID) at exec.

So if we're in this new no-suid mode, then file capabilities are not
honored.

Which is the right thing to do.

Eric W. Biederman

unread,

Dec 30, 2009, 3:50:04 PM12/30/09

to

"Serge E. Hallyn" <se...@us.ibm.com> writes:

>> @@ -869,6 +869,18 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
>> new->securebits &= ~issecure_mask(SECURE_KEEP_CAPS);
>> goto changed;
>>
>> + case PR_SET_NOSUID:
>> + {
>> + const struct cred *cred = current->cred;
>> + error = -EINVAL;
>
> Should this be -EPERM? not sure...

I intended -EINVAL to say it is simply a set of initial conditions
that are not supported today. But could be supported if someone
does the audit, and found there are no security issues.

>> + /* Perform the capabilities checks */
>> + if (!cap_isclear(cred->cap_permitted) ||
>> + !cap_isclear(cred->cap_effective))
>
> No need to check cap_effective, as no bits can be there which are not
> in cap_permitted.
>
> To be honest, I don't think there is much reason to not have this
> check done in the main sys_prctl(0 - capabilities themselves are not
> optional in the kernel, while cap_task_prctl() is. So you are setting
> us up to have cases where say an apparmor user can call this with uid
> 0 and/or active capabilities.

Sounds fine to me. I had noticed all of the capabilities checks were
off in their own file, so I had tried to maintain that. But you are
right we can't remove capabilities so splitting the code like this only
obfuscates it.

>> @@ -2147,7 +2147,7 @@ static int selinux_bprm_set_creds(struct linux_binprm *bprm)
>> COMMON_AUDIT_DATA_INIT(&ad, FS);
>> ad.u.fs.path = bprm->file->f_path;
>>
>> - if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
>> + if (bprm->nosid)
>
> typo - nosuid?

Yep.

Eric

Eric W. Biederman

unread,

Dec 30, 2009, 4:20:02 PM12/30/09

to

If we can know that a process will never raise
it's priveleges we can enable a lot of features

without privilege (such as unsharing namespaces
and unprivileged mounts) that otherwise would be unsafe,

because they could break assumptions of existing
suid executables.

To allow this to be used as a sand boxing feature
also disable ptracing other executables without
this new restriction.

For the moment I have used a per thread flag because
we are out of per process flags.

To ensure all descendants get this flag I rely on
the default copying of procss structures.

Added bprm->nosuid to make remove the need to add
duplicate error prone checks. This ensures that
the disabling of suid executables is exactly the
same as MNT_NOSUID.

Signed-off-by: Eric W. Biederman <ebie...@xmission.com>
---
arch/x86/include/asm/thread_info.h | 2 ++
fs/exec.c | 6 ++++--
include/linux/binfmts.h | 1 +

index a3baeb2..8adc517 100644
--- a/include/linux/prctl.h
+++ b/include/linux/prctl.h
@@ -102,4 +102,7 @@

#define PR_MCE_KILL_GET 34

+#define PR_SET_NOSUID 35

+#define PR_GET_NOSUID 36

+
#endif /* _LINUX_PRCTL_H */
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 23bd09c..b91040c 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -152,6 +152,10 @@ int __ptrace_may_access(struct task_struct *task, unsigned int mode)
if (!dumpable && !capable(CAP_SYS_PTRACE))
return -EPERM;

+ if (test_tsk_thread_flag(current, TIF_NOSUID) &&
+ !test_tsk_thread_flag(task, TIF_NOSUID))
+ return -EPERM;
+
return security_ptrace_access_check(task, mode);
}

diff --git a/kernel/sys.c b/kernel/sys.c

index 26a6b73..8731f2a 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1578,6 +1578,27 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,

else
error = PR_MCE_KILL_DEFAULT;
break;
+ case PR_SET_NOSUID:
+ {
+ const struct cred *cred = current->cred;
+ error = -EINVAL;

+ /* Don't support cases that could be unsafe */

+ if ( (cred->uid != cred->suid) ||
+ (cred->uid != cred->euid) ||
+ (cred->uid != cred->fsuid) ||
+ (cred->gid != cred->sgid) ||
+ (cred->gid != cred->egid) ||
+ (cred->gid != cred->fsgid) ||

+ !cap_isclear(cred->cap_permitted) ||

+ (atomic_read(&current->signal->count) != 1))
+ break;
+ error = 0;
+ set_tsk_thread_flag(current, TIF_NOSUID);
+ break;
+ }

+ case PR_GET_NOSUID:
+ error = !!test_tsk_thread_flag(current, TIF_NOSUID);
+ break;

default:
error = -EINVAL;
break;
diff --git a/security/commoncap.c b/security/commoncap.c

index f800fdb..34500e3 100644

--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -389,7 +389,7 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective)
if (!file_caps_enabled)
return 0;

- if (bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID)
+ if (bprm->nosuid)
return 0;

dentry = dget(bprm->file->f_dentry);

@@ -868,7 +868,6 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
else

new->securebits &= ~issecure_mask(SECURE_KEEP_CAPS);
goto changed;

-

default:
/* No functionality available - continue with default */
error = -ENOSYS;
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c

index 7a374c2..bd77a2b 100644

--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2147,7 +2147,7 @@ static int selinux_bprm_set_creds(struct linux_binprm *bprm)
COMMON_AUDIT_DATA_INIT(&ad, FS);
ad.u.fs.path = bprm->file->f_path;

- if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)

+ if (bprm->nosuid)

new_tsec->sid = old_tsec->sid;

if (new_tsec->sid == old_tsec->sid) {
--
1.6.5.2.143.g8cc62

--

Eric W. Biederman

unread,

Dec 30, 2009, 4:20:03 PM12/30/09

to

"Serge E. Hallyn" <se...@us.ibm.com> writes:

> Quoting Eric W. Biederman (ebie...@xmission.com):
>> "Serge E. Hallyn" <se...@us.ibm.com> writes:
>>
>> >> In common cap we drop the new capabilities if we are being ptraced.
>> >> Look for brm->unsafe.
>> >
>> > Yes - that isn't the issue.
>>
>> Right. Sorry. I saw that we set unsafe and totally
>> missed that we don't act on it in that case.
>>
>> > It goes back to finding a way to figure out what is inside the
>> > file when the installer obviously thought we shouldn't be able
>> > to read the file.
>> >
>> > Do we care? <shrug>
>>
>> <shrug>
>>
>> I expect two lines of testing bprm->unsafe and failing
>> at the right point would solve that.
>
> But what is the right response? Prevent excecution? Stop the
> tracer? Enter some one-shot mode where the whole exec appears
> as one step, but tracing continues if execution continues on a
> dumpable file?

The whole exec should already appear as one step.

The right response is to either fail the exec or disable
the tracer. Since the other case drops privs. I expect
failing the exec is the simplest and most consistent thing
we can do.

Eric

Alan Cox

unread,

Dec 30, 2009, 4:30:02 PM12/30/09

to

> Added bprm->nosuid to make remove the need to add
> duplicate error prone checks. This ensures that
> the disabling of suid executables is exactly the
> same as MNT_NOSUID.

Another fine example of why we have security hooks so that we don't get a
kernel full of other "random security idea of the day" hacks.

Alan

Eric W. Biederman

unread,

Dec 30, 2009, 4:40:01 PM12/30/09

to

Alan Cox <al...@lxorguk.ukuu.org.uk> writes:

>> Added bprm->nosuid to make remove the need to add
>> duplicate error prone checks. This ensures that
>> the disabling of suid executables is exactly the
>> same as MNT_NOSUID.
>
> Another fine example of why we have security hooks so that we don't get a
> kernel full of other "random security idea of the day" hacks.

Well it comes from plan 9. Except there they just simply did not
implement suid. What causes you to think dropping the ability
to execute suid executables is a random security idea of the day?

Eric

Alan Cox

unread,

Dec 30, 2009, 6:10:02 PM12/30/09

to

On Wed, 30 Dec 2009 13:36:57 -0800
ebie...@xmission.com (Eric W. Biederman) wrote:

> Alan Cox <al...@lxorguk.ukuu.org.uk> writes:
>
> >> Added bprm->nosuid to make remove the need to add
> >> duplicate error prone checks. This ensures that
> >> the disabling of suid executables is exactly the
> >> same as MNT_NOSUID.
> >
> > Another fine example of why we have security hooks so that we don't get a
> > kernel full of other "random security idea of the day" hacks.
>
> Well it comes from plan 9. Except there they just simply did not
> implement suid. What causes you to think dropping the ability
> to execute suid executables is a random security idea of the day?

Well to be fair its random regurgitated security idea of every year or
two.

More to the point - we have security_* hooks so this kind of continuous
security proposal turdstream can stay out of the main part of the kernel.

Cleaning up the mechanism by which NOSUID is handled in kernel seems a
good idea. Adding wacky new prctls and gunk for it doesn't, and belongs
in whatever security model you are using via the security hooks.

Bryan Donlan

unread,

Dec 30, 2009, 9:50:02 PM12/30/09

to

On Wed, Dec 30, 2009 at 6:00 PM, Alan Cox <al...@lxorguk.ukuu.org.uk> wrote:
> On Wed, 30 Dec 2009 13:36:57 -0800
> ebie...@xmission.com (Eric W. Biederman) wrote:
>
>> Alan Cox <al...@lxorguk.ukuu.org.uk> writes:
>>
>> >> Added bprm->nosuid to make remove the need to add
>> >> duplicate error prone checks. �This ensures that
>> >> the disabling of suid executables is exactly the
>> >> same as MNT_NOSUID.
>> >
>> > Another fine example of why we have security hooks so that we don't get a
>> > kernel full of other "random security idea of the day" hacks.
>>
>> Well it comes from plan 9. �Except there they just simply did not
>> implement suid. �What causes you to think dropping the ability
>> to execute suid executables is a random security idea of the day?
>
> Well to be fair its random regurgitated security idea of every year or
> two.
>
> More to the point - we have security_* hooks so this kind of continuous
> security proposal turdstream can stay out of the main part of the kernel.
>
> Cleaning up the mechanism by which NOSUID is handled in kernel seems a
> good idea. Adding wacky new prctls and gunk for it doesn't, and belongs
> in whatever security model you are using via the security hooks.

I see this as being a security-model agnostic API - the reason being,
the application is specifying a policy for itself that has meaning in
all existing security models, and which does not require administrator
intervention to configure. Rather than reimplementing this for each
security model, it's far better to do it just once. Moreover, by
having a single, common API, the application can state the general
policy "I will never need to gain priviliges over exec" without
needing to know what LSM is in use.

The future goal of this API is to allow us to relax restrictions on
creating new namespaces, chrooting, and otherwise altering the task's
environment in ways that may confuse privileged applications. Since
security hooks are all about making the existing security restrictions
_stricter_, it's not easy to later relax these using the security hook
model. And once we put in the general requirement that "this task
shall never gain privilege", it should be safe to relax these
restrictions for _all_ security models.

In short, this is something which is meaningful for all existing LSMs
and should be implemented in a central point, it will make things
easier for the namespace folks, and since it will lead to relaxing
restrictions later, it doesn't make sense to put it in a LSM as they
stand now.

Eric W. Biederman

unread,

Dec 31, 2009, 4:00:02 AM12/31/09

to

Alan Cox <al...@lxorguk.ukuu.org.uk> writes:

> On Wed, 30 Dec 2009 13:36:57 -0800
> ebie...@xmission.com (Eric W. Biederman) wrote:
>
>> Alan Cox <al...@lxorguk.ukuu.org.uk> writes:
>>
>> >> Added bprm->nosuid to make remove the need to add
>> >> duplicate error prone checks. This ensures that
>> >> the disabling of suid executables is exactly the
>> >> same as MNT_NOSUID.
>> >
>> > Another fine example of why we have security hooks so that we don't get a
>> > kernel full of other "random security idea of the day" hacks.
>>
>> Well it comes from plan 9. Except there they just simply did not
>> implement suid. What causes you to think dropping the ability
>> to execute suid executables is a random security idea of the day?
>
> Well to be fair its random regurgitated security idea of every year or
> two.
>
> More to the point - we have security_* hooks so this kind of continuous
> security proposal turdstream can stay out of the main part of the kernel.
>
> Cleaning up the mechanism by which NOSUID is handled in kernel seems a
> good idea. Adding wacky new prctls and gunk for it doesn't, and belongs
> in whatever security model you are using via the security hooks.

I am more than happy to make this a proper system call, instead of a
prctl. The way this code is evolving that seems to be the clean way
to handle this. No point in hiding the functionality away in a corner
in shame.

In my book SUID applications are the root of all evil. They are
exploitable if you twist their environment in a way they have not
hardened themselves against, and simply supporting them prevents a lot
of good features from being used by ordinary applications.

To get SUID out of my way I have to do something. A disable SUID
from this process and it's children is a simple and direct way there.
My other path is much more complicated.

As this also has security related uses it seems even better as a feature.

Eric

Samir Bellabes

unread,

Dec 31, 2009, 8:10:02 AM12/31/09

to

Alan Cox <al...@lxorguk.ukuu.org.uk> writes:

> Well to be fair its random regurgitated security idea of every year or
> two.

true, last year the same kind of discussion occurs with the 'personal
firewall' aka a network MAC.
http://marc.info/?t=123247387500003&r=3&w=2
http://marc.info/?t=123187029200001&r=2&w=2

> More to the point - we have security_* hooks so this kind of continuous
> security proposal turdstream can stay out of the main part of the kernel.

indeed, LSM framework was design to be the abstraction tool. the 3
design rules were :

0. truly generic, where using a different security model is merely a
matter of loading a different kernel module;
1. conceptually simple, minimally invasive, and efficient; and
2. able to support the existing POSIX.1e capabilities logic as an
optional security module.

so, 'minimally invasive' is keyword. what's why I don't understand the
purpose of this kind of patch, even if I see the goal to achieve:

int security_socket_connect(struct socket *sock, struct sockaddr *address, int addrlen)
{
- return security_ops->socket_connect(sock, address, addrlen);
+ int ret = 0;
+
+ ret = security_ops->socket_connect(sock, address, addrlen);
+ if (ret)
+ goto out;
+
+#ifdef CONFIG_SECURITY_DISABLENETWORK
+ ret = disablenetwork_security_socket_connect(sock, address, addrlen);
+ if (ret)
+ goto out;
+#endif

+
+out:
+ return ret;
}

This really seems to be a kind of stacking, but it's not. So are we
going to move LSM framework to support stacking, or are we respecting
the rules of LSM framework (respecting the abstract hooks) ?
This change makes LSM framework no more generic at all.

Peter Dolding

unread,

Dec 31, 2009, 9:10:02 AM12/31/09

to

On Thu, Dec 31, 2009 at 11:00 PM, Samir Bellabes <s...@synack.fr> wrote:
> Alan Cox <al...@lxorguk.ukuu.org.uk> writes:
>
>> Well to be fair its random regurgitated security idea of every year or
>> two.
>
> true, last year the same kind of discussion occurs with the 'personal
> firewall' aka a network MAC.
> http://marc.info/?t=123247387500003&r=3&w=2
> http://marc.info/?t=123187029200001&r=2&w=2

Lets step back for a moment. What is the common issue with both.

The issue is simple. "How to I generically tell the secuirty system
want particular restrictions."

There is no generic LSM API for application or users to talk to the
LSM and say I want the following restricted. Of course the
restrictions have to be tighter than what the profiles already say.

To control the LSM the applications are expected to know what the LSM.
This has caused items like chrome major issues.

Also by providing a generic LSM API there would be a base set of
requirements for a LSM to provide to meet the requirements of the
generic interfaces.

Basically until a generic interface to talk to LSM module is provided
these requests are going to keep coming. Maybe assign secuirty ops
string values that applications can say disable the following secuirty
operations from me.

Application does not need to be informed what is disabled from it.

Peter Dolding

Serge E. Hallyn

unread,

Dec 31, 2009, 10:30:02 AM12/31/09

to

Quoting Eric W. Biederman (ebie...@xmission.com):
>

I'm sorry, this may actually not be sufficient.

Could you try the following test on a kernel with this patch? :

1. become root
2. do prctl(PR_SET_NOSUID);
3. run bash, and examine your capabilities in /proc/self/status

I think the code in security/commoncap.c:457-458 will re-raise your
capabilities.

>
> dentry = dget(bprm->file->f_dentry);
> @@ -868,7 +868,6 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
> else
> new->securebits &= ~issecure_mask(SECURE_KEEP_CAPS);
> goto changed;
> -
> default:
> /* No functionality available - continue with default */
> error = -ENOSYS;
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 7a374c2..bd77a2b 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -2147,7 +2147,7 @@ static int selinux_bprm_set_creds(struct linux_binprm *bprm)
> COMMON_AUDIT_DATA_INIT(&ad, FS);
> ad.u.fs.path = bprm->file->f_path;
>
> - if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
> + if (bprm->nosuid)
> new_tsec->sid = old_tsec->sid;
>
> if (new_tsec->sid == old_tsec->sid) {
> --
> 1.6.5.2.143.g8cc62
>
> --

> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in

Eric W. Biederman

unread,

Dec 31, 2009, 11:50:02 AM12/31/09

to

"Serge E. Hallyn" <se...@us.ibm.com> writes:

>> diff --git a/security/commoncap.c b/security/commoncap.c
>> index f800fdb..34500e3 100644
>> --- a/security/commoncap.c
>> +++ b/security/commoncap.c
>> @@ -389,7 +389,7 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective)
>> if (!file_caps_enabled)
>> return 0;
>>
>> - if (bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID)
>> + if (bprm->nosuid)
>> return 0;
>
> I'm sorry, this may actually not be sufficient.
>
> Could you try the following test on a kernel with this patch? :
>
> 1. become root
> 2. do prctl(PR_SET_NOSUID);
> 3. run bash, and examine your capabilities in /proc/self/status
>
> I think the code in security/commoncap.c:457-458 will re-raise your
> capabilities.

Right. That is a legitimate issue.
I almost guard against it with my test against with my start condition test
of cap_isclear(cred->cap_permitted).

Which causes this to fail for root in most situations. I will add a
test for the securebits, and deny this to root unless the securebits
are such that root cannot gain privilege.

Thanks for catching this. I figured I might need a uid == 0 exclusion.
Where the test was split when I wrote it I wasn't certain where to put it.

Eric

Alan Cox

unread,

Dec 31, 2009, 12:10:02 PM12/31/09

to

> Lets step back for a moment. What is the common issue with both.
>
> The issue is simple. "How to I generically tell the secuirty system
> want particular restrictions."

You don't. It's not "the security system", its a whole collection of
completely different models of security and differing tools.

> There is no generic LSM API for application or users to talk to the
> LSM and say I want the following restricted.

That's a meaningless observation I think because security doesn't work
that way. Removing specific features from a specific piece of code
generally isn't a security feature - its only meaningful in the context
of a more general policy and that policy expression isn't generic.

> To control the LSM the applications are expected to know what the LSM.
> This has caused items like chrome major issues.

..

> Application does not need to be informed what is disabled from it.

So why does it cause chrome problems ?

There are multiple security models because nobody can agree on what they
should look like, just like multiple desktops. Each of them is based on a
totally different conceptual model so the idea of a single interface to
them is a bit meaningless.

Alan

Alan Cox

unread,

Dec 31, 2009, 12:40:02 PM12/31/09

to

> I see this as being a security-model agnostic API - the reason being,

Thats what everyone else says about their security model too

> the application is specifying a policy for itself that has meaning in
> all existing security models, and which does not require administrator
> intervention to configure. Rather than reimplementing this for each
> security model, it's far better to do it just once. Moreover, by
> having a single, common API, the application can state the general
> policy "I will never need to gain priviliges over exec" without
> needing to know what LSM is in use.

So it can sit in the security hooks and stack.

> The future goal of this API is to allow us to relax restrictions on
> creating new namespaces, chrooting, and otherwise altering the task's
> environment in ways that may confuse privileged applications. Since

All of which are security policy, general purpose and frequently part of
the main LSM module loaded - in other words it's nothing of the sort when
it comes to being separate. Its just another magic interface hook, and as
I think the history of capability stuff in kernel shows it doesn't work
that way.

> security hooks are all about making the existing security restrictions

> _stricter_, it's not easy to later relax these using the security hook
> model. And once we put in the general requirement that "this task
> shall never gain privilege", it should be safe to relax these
> restrictions for _all_ security models.

In which case the hooks can be tweaked. It's an interface it can be
tuned - and has been - eg for Tomoyo.

> In short, this is something which is meaningful for all existing LSMs

But is it - and if its combined with 500 other similar hooks and a set of
system policies can you even work out the result ?

> restrictions later, it doesn't make sense to put it in a LSM as they
> stand now.

And it certainly doesn't make sense to add this and the several hundred
other variants of this "can't open sockets, can't mount, can't this,
can't that ...." stuff continually being suggested by randomly extending
other unrelated interfaces.

Look up the sendmail security archive and you'll even find examples where
enforcing extra security on setuid *caused* security problems to show up
that were basically impossible to hit otherwise.

We have a security system, with a set of interfaces for attaching
security models, please stop trying to go round the back of the kernel
design because you can't be bothered to do the required work to do the
job right and would rather add more unmaintainable crap all over the
place.

Yes it might mean the hooks need tweaking, yes it probably means the
people who want these need to do some trivial stacking work, but if as
many people are actually really interested as are having random 'lets add
a button to disable reading serial ports on wednesday' ideas there should
be no shortage of people to do the job right.

Alan

David Wagner

unread,

Dec 31, 2009, 1:00:02 PM12/31/09

to

Alan Cox wrote:
>Look up the sendmail security archive and you'll even find examples where
>enforcing extra security on setuid *caused* security problems to show up
>that were basically impossible to hit otherwise.

Yes, we know: people have mentioned the sendmail bug multiple times
in this thread. That's exactly the hazard that this proposed patch
was intended to help address. That's exactly the hazard that all
this discussion has been focused on.

This patch is not a security model. It may facilitate other
security models, hopefully, but it's not intended as a security
model in itself.

>We have a security system, with a set of interfaces for attaching
>security models, please stop trying to go round the back of the kernel
>design because you can't be bothered to do the required work to do the
>job right and would rather add more unmaintainable crap all over the
>place.

Got a constructive suggestion for a better way to implement this?
My impression is that all thread participants have been happy to
listen to constructive suggestions about alternative ways to achieve
the goals, from people who have payed attention to the discussion and
taken the time to understand the points that havee been raised so far.

Serge E. Hallyn

unread,

Dec 31, 2009, 1:00:02 PM12/31/09

to

Quoting Alan Cox (al...@lxorguk.ukuu.org.uk):
> > I see this as being a security-model agnostic API - the reason being,
>
> Thats what everyone else says about their security model too

LOL

That's exactly what we're trying to avoid :) But I'm personally not
against making this an LSM. As you say:

> We have a security system, with a set of interfaces for attaching
> security models, please stop trying to go round the back of the kernel
> design because you can't be bothered to do the required work to do the
> job right and would rather add more unmaintainable crap all over the
> place.
>
> Yes it might mean the hooks need tweaking, yes it probably means the

Yes, and in particular, we'll need to do something about data
->security annotations, since, if we make this an LSM, then we can't
use a per-thread flag.

This feature is used during exec and ptrace, not on hot-paths, so
dereferencing task->security would be fine. But finding a way to
multiplex task->security so it can be used by Eric's nosuid lsm,
Michael's disablenetwork LSM, and SELinux/smack/apparmor, that
will likely take months, and, history shows, may never happen.

> people who want these need to do some trivial stacking work, but if as
> many people are actually really interested as are having random 'lets add
> a button to disable reading serial ports on wednesday' ideas there should
> be no shortage of people to do the job right.

Eric, the thing is, once an API goes upstream, we can't change it,
but in contrast we can change how task->security is used at any time.
So I'd suggest just adding

#ifdef CONFIG_SECURITY_NOSUID
short nosuid;
#endif

or something like it next to the

#ifdef CONFIG_SECURITY
void *security;
#endif

in struct cred and doing that for a first go. You could
share that field with Michael's disablenetwork, or not if you
prefer - either way, it keeps you and SELinux out of each other's
ways.

-serge

David Wagner

unread,

Dec 31, 2009, 1:00:02 PM12/31/09

to

Alan Cox wrote:
>Removing specific features from a specific piece of code
>generally isn't a security feature -

You lost me there. The ability of a specific piece of code to voluntarily
relinquish privileges can be a big benefit to security. It enables
privilege-separated software architectures, which are a powerful way to
reduce risk. That's the motivation for the disablenetwork proposal that
has stimulated all this discussion. I hope this is obvious? Does it
need elaboration?