[PATCH] [request for inclusion] Realtime LSM

Lee Revell

unread,

Dec 29, 2004, 9:46:05 PM12/29/04

to linux-kernel, Andrew Morton, Ingo Molnar, Jack O'Quin

The realtime LSM has been previously explained on this list. Its
function is to allow selected nonroot users to run RT tasks. The most
common application is low latency audio with JACK, http://jackit.sf.net.

Several people have reported that 2.6.10 is the best kernel yet for
audio latency, see
http://ccrma-mail.stanford.edu/pipermail/planetccrma/2004-December/007341.html. If the realtime LSM were merged, then this would be the last step to making low latency audio work well with the stock kernel.

We (the authors and the Linux audio community) would like to request its
inclusion in the next -mm release, with the eventual goal of having it
in mainline.

This is identical to the last version Jack O'Quin posted (but didn't cc:
Andrew, or make clear that we would like this added to -mm), so I
preserved his Signed-Off-By.

http://lkml.org/lkml/2004/11/24/242

Signed-Off-By: Jack O'Quin <j...@joq.us>

diff -ruN -X /home/joq/bin/kdiff.exclude linux-2.6.10-rc2-mm3/Documentation/realtime-lsm.txt linux-2.6.10-rc2-mm3-rt2/Documentation/realtime-lsm.txt
--- linux-2.6.10-rc2-mm3/Documentation/realtime-lsm.txt Wed Dec 31 18:00:00 1969
+++ linux-2.6.10-rc2-mm3-rt2/Documentation/realtime-lsm.txt Wed Nov 24 09:58:29 2004
@@ -0,0 +1,39 @@
+
+ Realtime Linux Security Module
+
+
+This Linux Security Module (LSM) enables realtime capabilities. It
+was written by Torben Hohn and Jack O'Quin, under the provisions of
+the GPL (see the COPYING file). We make no warranty concerning the
+safety, security or even stability of your system when using it. But,
+we will fix problems if you report them.
+
+Once the LSM has been installed and the kernel for which it was built
+is running, the root user can load it and pass parameters as follows:
+
+ # modprobe realtime any=1
+
+ Any program can request realtime privileges. This allows any local
+ user to crash the system by hogging the CPU in a tight loop or
+ locking down too much memory. But, it is simple to administer. :-)
+
+ # modprobe realtime gid=29
+
+ All users belonging to group 29 and programs that are setgid to that
+ group have realtime privileges. Use any group number you like. A
+ `gid' of -1 disables group access.
+
+ # modprobe realtime mlock=0
+
+ Grants realtime scheduling privileges without the ability to lock
+ memory using mlock() or mlockall() system calls. This option can be
+ used in conjunction with any of the other options.
+
+After the module is loaded, its parameters can be changed dynamically
+via sysfs.
+
+ # echo 1 > /sys/module/realtime/parameters/any
+ # echo 29 > /sys/module/realtime/parameters/gid
+ # echo 1 > /sys/module/realtime/parameters/mlock
+
+Jack O'Quin, j...@joq.us
diff -ruN -X /home/joq/bin/kdiff.exclude linux-2.6.10-rc2-mm3/security/Kconfig linux-2.6.10-rc2-mm3-rt2/security/Kconfig
--- linux-2.6.10-rc2-mm3/security/Kconfig Wed Nov 24 09:35:44 2004
+++ linux-2.6.10-rc2-mm3-rt2/security/Kconfig Wed Nov 24 09:58:29 2004
@@ -84,6 +84,17 @@

If you are unsure how to answer this question, answer N.

+config SECURITY_REALTIME
+ tristate "Realtime Capabilities"
+ depends on SECURITY && SECURITY_CAPABILITIES!=y
+ default n
+ help
+ This module selectively grants realtime privileges
+ controlled by parameters set at load time or via files in
+ /sys/module/realtime/parameters.
+
+ If you are unsure how to answer this question, answer N.
+
source security/selinux/Kconfig

endmenu
diff -ruN -X /home/joq/bin/kdiff.exclude linux-2.6.10-rc2-mm3/security/Makefile linux-2.6.10-rc2-mm3-rt2/security/Makefile
--- linux-2.6.10-rc2-mm3/security/Makefile Wed Nov 24 09:35:44 2004
+++ linux-2.6.10-rc2-mm3-rt2/security/Makefile Wed Nov 24 09:58:29 2004
@@ -17,3 +17,4 @@
obj-$(CONFIG_SECURITY_CAPABILITIES) += commoncap.o capability.o
obj-$(CONFIG_SECURITY_ROOTPLUG) += commoncap.o root_plug.o
obj-$(CONFIG_SECURITY_SECLVL) += seclvl.o
+obj-$(CONFIG_SECURITY_REALTIME) += commoncap.o realtime.o
diff -ruN -X /home/joq/bin/kdiff.exclude linux-2.6.10-rc2-mm3/security/realtime.c linux-2.6.10-rc2-mm3-rt2/security/realtime.c
--- linux-2.6.10-rc2-mm3/security/realtime.c Wed Dec 31 18:00:00 1969
+++ linux-2.6.10-rc2-mm3-rt2/security/realtime.c Wed Nov 24 09:59:01 2004
@@ -0,0 +1,147 @@
+/*
+ * Realtime Capabilities Linux Security Module
+ *
+ * Copyright (C) 2003 Torben Hohn
+ * Copyright (C) 2003, 2004 Jack O'Quin
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/security.h>
+
+#define RT_LSM "Realtime LSM " /* syslog module name prefix */
+#define RT_ERR "Realtime: " /* syslog error message prefix */
+
+#include <linux/vermagic.h>
+MODULE_INFO(vermagic,VERMAGIC_STRING);
+
+/* module parameters
+ *
+ * These values could change at any time due to some process writing
+ * a new value in /sys/module/realtime/parameters. This is OK,
+ * because each is referenced only once in each function call.
+ * Nothing depends on parameters having the same value every time.
+ */
+
+/* if TRUE, any process is realtime */
+static int rt_any;
+module_param_named(any, rt_any, int, 0644);
+MODULE_PARM_DESC(any, " grant realtime privileges to any process.");
+
+/* realtime group id, or NO_GROUP */
+static int rt_gid = -1;
+module_param_named(gid, rt_gid, int, 0644);
+MODULE_PARM_DESC(gid, " the group ID with access to realtime privileges.");
+
+/* enable mlock() privileges */
+static int rt_mlock = 1;
+module_param_named(mlock, rt_mlock, int, 0644);
+MODULE_PARM_DESC(mlock, " enable memory locking privileges.");
+
+/* helper function for testing group membership */
+static inline int gid_ok(int gid)
+{
+ if (gid == -1)
+ return 0;
+
+ if (gid == current->gid)
+ return 1;
+
+ return in_egroup_p(gid);
+}
+
+static void realtime_bprm_apply_creds(struct linux_binprm *bprm, int unsafe)
+{
+ cap_bprm_apply_creds(bprm, unsafe);
+
+ /* If a non-zero `any' parameter was specified, we grant
+ * realtime privileges to every process. If the `gid'
+ * parameter was specified and it matches the group id of the
+ * executable, of the current process or any supplementary
+ * groups, we grant realtime capabilites.
+ */
+
+ if (rt_any || gid_ok(rt_gid)) {
+ cap_raise(current->cap_effective, CAP_SYS_NICE);
+ if (rt_mlock) {
+ cap_raise(current->cap_effective, CAP_IPC_LOCK);
+ cap_raise(current->cap_effective, CAP_SYS_RESOURCE);
+ }
+ }
+}
+
+static struct security_operations capability_ops = {
+ .ptrace = cap_ptrace,
+ .capget = cap_capget,
+ .capset_check = cap_capset_check,
+ .capset_set = cap_capset_set,
+ .capable = cap_capable,
+ .netlink_send = cap_netlink_send,
+ .netlink_recv = cap_netlink_recv,
+ .bprm_apply_creds = realtime_bprm_apply_creds,
+ .bprm_set_security = cap_bprm_set_security,
+ .bprm_secureexec = cap_bprm_secureexec,
+ .task_post_setuid = cap_task_post_setuid,
+ .task_reparent_to_init = cap_task_reparent_to_init,
+ .syslog = cap_syslog,
+ .vm_enough_memory = cap_vm_enough_memory,
+};
+
+#define MY_NAME __stringify(KBUILD_MODNAME)
+
+static int secondary; /* flag to keep track of how we were registered */
+
+static int __init realtime_init(void)
+{
+ /* register ourselves with the security framework */
+ if (register_security(&capability_ops)) {
+
+ /* try registering with primary module */
+ if (mod_reg_security(MY_NAME, &capability_ops)) {
+ printk(KERN_INFO RT_ERR "Failure registering "
+ "capabilities with primary security module.\n");
+ printk(KERN_INFO RT_ERR "Is kernel configured "
+ "with CONFIG_SECURITY_CAPABILITIES=m?\n");
+ return -EINVAL;
+ }
+ secondary = 1;
+ }
+
+ if (rt_any)
+ printk(KERN_INFO RT_LSM
+ "initialized (all groups, mlock=%d)\n", rt_mlock);
+ else if (rt_gid == -1)
+ printk(KERN_INFO RT_LSM
+ "initialized (no groups, mlock=%d)\n", rt_mlock);
+ else
+ printk(KERN_INFO RT_LSM
+ "initialized (group %d, mlock=%d)\n", rt_gid, rt_mlock);
+
+ return 0;
+}
+
+static void __exit realtime_exit(void)
+{
+ /* remove ourselves from the security framework */
+ if (secondary) {
+ if (mod_unreg_security(MY_NAME, &capability_ops))
+ printk(KERN_INFO RT_ERR "Failure unregistering "
+ "capabilities with primary module.\n");
+
+ } else if (unregister_security(&capability_ops)) {
+ printk(KERN_INFO RT_ERR
+ "Failure unregistering capabilities with the kernel\n");
+ }
+ printk(KERN_INFO "Realtime Capability LSM exiting\n");
+}
+
+late_initcall(realtime_init);
+module_exit(realtime_exit);
+
+MODULE_DESCRIPTION("Realtime Capabilities Security Module");
+MODULE_LICENSE("GPL");

--
joq

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Christoph Hellwig

unread,

Jan 3, 2005, 9:06:59 AM1/3/05

to Lee Revell, linux-kernel, Andrew Morton, Ingo Molnar, Jack O'Quin

On Wed, Dec 29, 2004 at 09:43:22PM -0500, Lee Revell wrote:
> The realtime LSM has been previously explained on this list. Its
> function is to allow selected nonroot users to run RT tasks. The most
> common application is low latency audio with JACK, http://jackit.sf.net.
>
> Several people have reported that 2.6.10 is the best kernel yet for
> audio latency, see
> http://ccrma-mail.stanford.edu/pipermail/planetccrma/2004-December/007341.html. If the realtime LSM were merged, then this would be the last step to making low latency audio work well with the stock kernel.
>
> We (the authors and the Linux audio community) would like to request its
> inclusion in the next -mm release, with the eventual goal of having it
> in mainline.
>
> This is identical to the last version Jack O'Quin posted (but didn't cc:
> Andrew, or make clear that we would like this added to -mm), so I
> preserved his Signed-Off-By.

This is far too specialized. And option to the capability LSM to grant
capabilities to certain uids/gids sounds like the better choise - and
would also allow to get rid of the magic hugetlb uid horrors.

Arjan van de Ven

unread,

Jan 3, 2005, 9:17:09 AM1/3/05

to Christoph Hellwig, Lee Revell, linux-kernel, Andrew Morton, Ingo Molnar, Jack O'Quin

On Mon, 2005-01-03 at 14:03 +0000, Christoph Hellwig wrote:
> On Wed, Dec 29, 2004 at 09:43:22PM -0500, Lee Revell wrote:
> > The realtime LSM has been previously explained on this list. Its
> > function is to allow selected nonroot users to run RT tasks. The most
> > common application is low latency audio with JACK, http://jackit.sf.net.
> >
> > Several people have reported that 2.6.10 is the best kernel yet for
> > audio latency, see
> > http://ccrma-mail.stanford.edu/pipermail/planetccrma/2004-December/007341.html. If the realtime LSM were merged, then this would be the last step to making low latency audio work well with the stock kernel.
> >
> > We (the authors and the Linux audio community) would like to request its
> > inclusion in the next -mm release, with the eventual goal of having it
> > in mainline.
> >
> > This is identical to the last version Jack O'Quin posted (but didn't cc:
> > Andrew, or make clear that we would like this added to -mm), so I
> > preserved his Signed-Off-By.
>
> This is far too specialized. And option to the capability LSM to grant
> capabilities to certain uids/gids sounds like the better choise - and
> would also allow to get rid of the magic hugetlb uid horrors.

those can go away anyway now that there is an rlimit to achieve the
exact same thing.....

I can see the point of making an rlimit like thing instead for both the
nice levels allowed and maybe the "can do rt" bit

Lee Revell

unread,

Jan 4, 2005, 1:19:44 PM1/4/05

to Christoph Hellwig, linux-kernel, Andrew Morton, Ingo Molnar, Jack O'Quin

On Mon, 2005-01-03 at 14:03 +0000, Christoph Hellwig wrote:

> On Wed, Dec 29, 2004 at 09:43:22PM -0500, Lee Revell wrote:
> > The realtime LSM has been previously explained on this list. Its
> > function is to allow selected nonroot users to run RT tasks. The most
> > common application is low latency audio with JACK, http://jackit.sf.net.
> >
> > Several people have reported that 2.6.10 is the best kernel yet for
> > audio latency, see
> > http://ccrma-mail.stanford.edu/pipermail/planetccrma/2004-December/007341.html. If the realtime LSM were merged, then this would be the last step to making low latency audio work well with the stock kernel.
> >
> > We (the authors and the Linux audio community) would like to request its
> > inclusion in the next -mm release, with the eventual goal of having it
> > in mainline.
> >
> > This is identical to the last version Jack O'Quin posted (but didn't cc:
> > Andrew, or make clear that we would like this added to -mm), so I
> > preserved his Signed-Off-By.
>
> This is far too specialized. And option to the capability LSM to grant
> capabilities to certain uids/gids sounds like the better choise - and
> would also allow to get rid of the magic hugetlb uid horrors.
>

Got a patch? Code talks, BS walks. This is working perfectly, right
now, and is being used by thousands of Linux ausio users.

Lee

Christoph Hellwig

unread,

Jan 4, 2005, 1:23:50 PM1/4/05

to Lee Revell, linux-...@vger.kernel.org, Andrew Morton, Ingo Molnar, Jack O'Quin

On Tue, Jan 04, 2005 at 01:16:54PM -0500, Lee Revell wrote:
> Got a patch? Code talks, BS walks. This is working perfectly, right
> now, and is being used by thousands of Linux ausio users.

Which still doesn't mean it's the right design. And no, I don't need the
feature so I won't write it. If you want a certain feature it's up to
you to implement it in a way that's considered mergeable.

Jack O'Quin

unread,

Jan 4, 2005, 1:57:31 PM1/4/05

to Christoph Hellwig, Lee Revell, linux-...@vger.kernel.org, Andrew Morton, Ingo Molnar

Christoph Hellwig <h...@infradead.org> writes:

> Which still doesn't mean it's the right design. And no, I don't
> need the feature so I won't write it. If you want a certain feature
> it's up to you to implement it in a way that's considered mergeable.

Which is what I have done. I worked on it because no "real" kernel
developer seemed willing to solve it. Having worked on other kernels
in an "earlier lifetime", I have *no* desire to do that any more. I
would much rather write audio software.

But, the lack of this feature has been a continual impediment for
years now. It affects not just me, but most other serious Linux audio
developers and many of our users. We need a simple way for users to
configure a Digital Audio Workstation without having to run large,
complex, insecure audio applications as `root'. Our competition runs
on Windows and Mac systems where no such configuration is needed.

Statements of the form "had I cared enough to do something about this
problem, I would have implemented it differently" are not much help.
This patch is small and clean. It meshes with existing kernel LSM
mechanisms. It solves a real problem affecting many Linux desktop
users.

I respectfully request that it be accepted for inclusion in 2.6.11.
--
joq

Lee Revell

unread,

Jan 4, 2005, 2:00:05 PM1/4/05

to Christoph Hellwig, linux-...@vger.kernel.org, Andrew Morton, Ingo Molnar, Jack O'Quin

On Tue, 2005-01-04 at 18:20 +0000, Christoph Hellwig wrote:
> On Tue, Jan 04, 2005 at 01:16:54PM -0500, Lee Revell wrote:
> > Got a patch? Code talks, BS walks. This is working perfectly, right
> > now, and is being used by thousands of Linux ausio users.
>
> Which still doesn't mean it's the right design. And no, I don't need the
> feature so I won't write it. If you want a certain feature it's up to
> you to implement it in a way that's considered mergeable.
>

Please specify what's wrong with it. So far all your objection amounts
to is "I don't like it".

If you do have anything other that your opinion to back up your
assertion that it's a bad design, you should have raised it months ago
when this was first posted. Now that we have it to a mergeable state
(as far as the people who worked on it are concerned), you want to pop
up and say "Nope, bad design"?

Sorry but last time I checked you were not the ultimate arbiter of good
design on LKML. If you want to shitcan the _only known good, field
tested, working solution_ then you have to have overwhelming technical
arguments. So far I've seen zero.

Lee

Lee Revell

unread,

Jan 4, 2005, 2:04:29 PM1/4/05

to Jack O'Quin, Christoph Hellwig, linux-...@vger.kernel.org, Andrew Morton, Ingo Molnar

On Tue, 2005-01-04 at 12:55 -0600, Jack O'Quin wrote:
> But, the lack of this feature has been a continual impediment for
> years now. It affects not just me, but most other serious Linux audio
> developers and many of our users. We need a simple way for users to
> configure a Digital Audio Workstation without having to run large,
> complex, insecure audio applications as `root'. Our competition runs
> on Windows and Mac systems where no such configuration is needed.

We could do it the was OSX (our real competition) does if that would
make people happy. They just let any user run RT tasks. Oh wait, but
that's a "broken design", everyone knows that OSX is a joke, no one
would use *that* OS to mix a CD or score a movie. :-)

Lee

Alan Cox

unread,

Jan 4, 2005, 8:13:34 PM1/4/05

to Lee Revell, Jack O'Quin, Christoph Hellwig, Linux Kernel Mailing List, Andrew Morton, Ingo Molnar

On Maw, 2005-01-04 at 18:59, Lee Revell wrote:
> We could do it the was OSX (our real competition) does if that would
> make people happy. They just let any user run RT tasks. Oh wait, but
> that's a "broken design", everyone knows that OSX is a joke, no one
> would use *that* OS to mix a CD or score a movie. :-)

You can do that already, just make everyone root

The problem with uid/gid based hacks is that they get really ugly to
administer really fast. Especially once you have users who need realtime
and hugetlb, and users who need one only.

It would be far cleaner to split CAP_SYS_NICE capability down - which
should cover the real time OS functions nicely. Right now it gives a few
too many rights but that could be fixed easily.

Lee Revell

unread,

Jan 4, 2005, 8:33:37 PM1/4/05

to Alan Cox, Jack O'Quin, Christoph Hellwig, Linux Kernel Mailing List, Andrew Morton, Ingo Molnar, Chris Wright

On Wed, 2005-01-05 at 00:01 +0000, Alan Cox wrote:
> The problem with uid/gid based hacks is that they get really ugly to
> administer really fast. Especially once you have users who need realtime
> and hugetlb, and users who need one only.
>

Sorry, how does hugetlb relate to this?

> It would be far cleaner to split CAP_SYS_NICE capability down - which
> should cover the real time OS functions nicely. Right now it gives a few
> too many rights but that could be fixed easily.
>

We need selected nonroot users to be able to run SCHED_FIFO tasks and
mlock(). It has to be easy to administer. That's it.

As Jack mentioned, the developers of this patch are not kernel hackers
by trade, they wrote this to solve a real problem. In other words, a
patch is worth a thousand words.

It seems distro vendors would be interested in solving this problem.
The linux audio market is smaller than the general desktop of course but
many of the users are professionals who would gladly pay for support.
Look how many people pay for OSX. Wouldn't Red Hat and SuSE like some
of those customers?

Lee

Lee Revell

unread,

Jan 4, 2005, 8:38:30 PM1/4/05

to Alan Cox, Jack O'Quin, Christoph Hellwig, Linux Kernel Mailing List, Andrew Morton, Ingo Molnar

On Wed, 2005-01-05 at 00:01 +0000, Alan Cox wrote:

> The problem with uid/gid based hacks is that they get really ugly to
> administer really fast. Especially once you have users who need realtime
> and hugetlb, and users who need one only.

Why? Just make a realtime group and a hugetlb group and add users to
one, the other, or both.

Lee

Andreas Steinmetz

unread,

Jan 4, 2005, 8:47:13 PM1/4/05

to Lee Revell, linux-...@vger.kernel.org, Andrew Morton, Ingo Molnar, Jack O'Quin

Lee Revell wrote:
> On Tue, 2005-01-04 at 18:20 +0000, Christoph Hellwig wrote:
>
>>On Tue, Jan 04, 2005 at 01:16:54PM -0500, Lee Revell wrote:
>>
>>>Got a patch? Code talks, BS walks. This is working perfectly, right
>>>now, and is being used by thousands of Linux ausio users.
>>
>>Which still doesn't mean it's the right design. And no, I don't need the
>>feature so I won't write it. If you want a certain feature it's up to
>>you to implement it in a way that's considered mergeable.
>>
>
>
> Please specify what's wrong with it. So far all your objection amounts
> to is "I don't like it".
>
> If you do have anything other that your opinion to back up your
> assertion that it's a bad design, you should have raised it months ago
> when this was first posted. Now that we have it to a mergeable state
> (as far as the people who worked on it are concerned), you want to pop
> up and say "Nope, bad design"?

Let me remind you all that according to lkml history hch has always been
biased and objecting to anything related to lsm. Nobody can take hch's
opinion here as objective. I would even go so far that when things are
related to lsm(s) he's just tro...
--
Andreas Steinmetz SPAMmers use robo...@domdv.de

Chris Wright

unread,

Jan 4, 2005, 8:55:08 PM1/4/05

to Alan Cox, Lee Revell, Jack O'Quin, Christoph Hellwig, Linux Kernel Mailing List, Andrew Morton, Ingo Molnar

* Alan Cox (al...@lxorguk.ukuu.org.uk) wrote:
> On Maw, 2005-01-04 at 18:59, Lee Revell wrote:
> > We could do it the was OSX (our real competition) does if that would
> > make people happy. They just let any user run RT tasks. Oh wait, but
> > that's a "broken design", everyone knows that OSX is a joke, no one
> > would use *that* OS to mix a CD or score a movie. :-)
>
> You can do that already, just make everyone root
>
> The problem with uid/gid based hacks is that they get really ugly to
> administer really fast. Especially once you have users who need realtime
> and hugetlb, and users who need one only.

I don't believe the hugetlb gid stuff is useful anymore. It should be
handled nicely via rlimits.

> It would be far cleaner to split CAP_SYS_NICE capability down - which
> should cover the real time OS functions nicely. Right now it gives a few
> too many rights but that could be fixed easily.

Hmm, how do we do this w/out breaking things? Maybe I'm misunderstanding
your idea.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

Lee Revell

unread,

Jan 4, 2005, 8:58:40 PM1/4/05

to Chris Wright, Alan Cox, Jack O'Quin, Christoph Hellwig, Linux Kernel Mailing List, Andrew Morton, Ingo Molnar

On Tue, 2005-01-04 at 17:50 -0800, Chris Wright wrote:
> * Alan Cox (al...@lxorguk.ukuu.org.uk) wrote:
> >
> > The problem with uid/gid based hacks is that they get really ugly to
> > administer really fast. Especially once you have users who need realtime
> > and hugetlb, and users who need one only.
>
> I don't believe the hugetlb gid stuff is useful anymore. It should be
> handled nicely via rlimits.

The last time I checked users could belong to more than one group. Am I
missing something?

Lee

Chris Wright

unread,

Jan 4, 2005, 9:08:46 PM1/4/05

to Lee Revell, Chris Wright, Alan Cox, Jack O'Quin, Christoph Hellwig, Linux Kernel Mailing List, Andrew Morton, Ingo Molnar

* Lee Revell (rlre...@joe-job.com) wrote:
> The last time I checked users could belong to more than one group. Am I
> missing something?

No, you're not. I think Alan's just saying the gid based checks
are suboptimal if there's a cleaner way to do it (to which I agree).
Personally, I don't have a big problem with the Realtime LSM. I've helped
you with it, and suggested a few times that I'd prefer it to be generic;
but never stepped up to deliver code of that sort. Since it's your itch,
you've scratched it, and it's quite simple and contained, I consider
it acceptable.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

Kyle Moffett

unread,

Jan 4, 2005, 10:06:33 PM1/4/05

to Chris Wright, Ingo Molnar, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Christoph Hellwig, Lee Revell, Andrew Morton

On Jan 04, 2005, at 21:05, Chris Wright wrote:
> No, you're not. I think Alan's just saying the gid based checks
> are suboptimal if there's a cleaner way to do it (to which I agree).
> Personally, I don't have a big problem with the Realtime LSM. I've
> helped
> you with it, and suggested a few times that I'd prefer it to be
> generic;
> but never stepped up to deliver code of that sort. Since it's your
> itch,
> you've scratched it, and it's quite simple and contained, I consider
> it acceptable.

Here's a relatively simple idea: Why not make the "Realtime LSM"
just check for a certain "Realtime" credential in the new credential
store (Patch is in 2.6.10, see [1] for control program). You would
mark it as a system credential and give access to that credential via
the appropriate capability with a small utility program.

Of course, I _do_ respect that I am not providing a patch which they
have done. I think this serves a useful place and should probably be
included as-is, for now. A later update to make it use a better
mechanism would be nice, though. :-)

[1] http://people.redhat.com/~dhowells/keys/keyctl.c

Cheers,
Kyle Moffett

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCM/CS/IT/U d- s++: a18 C++++>$ UB/L/X/*++++(+)>$ P+++(++++)>$
L++++(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b++++(++) DI+ D+ G e->++++$ h!*()>++$ r
!y?(-)
------END GEEK CODE BLOCK------

Chris Wright

unread,

Jan 4, 2005, 10:46:55 PM1/4/05

to Kyle Moffett, Chris Wright, Ingo Molnar, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Christoph Hellwig, Lee Revell, Andrew Morton

* Kyle Moffett (mrmac...@mac.com) wrote:
> Here's a relatively simple idea: Why not make the "Realtime LSM"
> just check for a certain "Realtime" credential in the new credential
> store (Patch is in 2.6.10, see [1] for control program). You would
> mark it as a system credential and give access to that credential via
> the appropriate capability with a small utility program.

Well, that's basically what the gid is in this case. It's the credential
that's set at login time and has all the proper sharing and inheritance
rules. So, I'm not yet convinced that this would buy us much.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

Jack O'Quin

unread,

Jan 4, 2005, 11:05:03 PM1/4/05

to Alan Cox, Lee Revell, Christoph Hellwig, Linux Kernel Mailing List, Andrew Morton, Ingo Molnar

Alan Cox <al...@lxorguk.ukuu.org.uk> writes:

> On Maw, 2005-01-04 at 18:59, Lee Revell wrote:
>> We could do it the was OSX (our real competition) does if that would
>> make people happy. They just let any user run RT tasks. Oh wait, but
>> that's a "broken design", everyone knows that OSX is a joke, no one
>> would use *that* OS to mix a CD or score a movie. :-)
>
> You can do that already, just make everyone root

Surely you're joking. Is this actually a serious proposal?

> The problem with uid/gid based hacks is that they get really ugly to
> administer really fast. Especially once you have users who need realtime
> and hugetlb, and users who need one only.

This is why POSIX requires supplementary groups.

All I had to do on my system was...

# adduser joq audio

That is considerably easier than hacking rlimits values via PAM.
--
joq

Jack O'Quin

unread,

Jan 4, 2005, 11:07:19 PM1/4/05

to Chris Wright, Lee Revell, Alan Cox, Christoph Hellwig, Linux Kernel Mailing List, Andrew Morton, Ingo Molnar

Chris Wright <chr...@osdl.org> writes:

> * Lee Revell (rlre...@joe-job.com) wrote:
>> The last time I checked users could belong to more than one group. Am I
>> missing something?
>
> No, you're not. I think Alan's just saying the gid based checks
> are suboptimal if there's a cleaner way to do it (to which I agree).
> Personally, I don't have a big problem with the Realtime LSM. I've helped
> you with it, and suggested a few times that I'd prefer it to be generic;
> but never stepped up to deliver code of that sort. Since it's your itch,
> you've scratched it, and it's quite simple and contained, I consider
> it acceptable.

We appreciate the help, Chris. The patch is considerably smaller and
cleaner thanks to your efforts.
--
joq

Alan Cox

unread,

Jan 5, 2005, 12:24:18 AM1/5/05

to Andreas Steinmetz, Lee Revell, Linux Kernel Mailing List, Andrew Morton, Ingo Molnar, Jack O'Quin

On Mer, 2005-01-05 at 01:35, Andreas Steinmetz wrote:
> Let me remind you all that according to lkml history hch has always been
> biased and objecting to anything related to lsm. Nobody can take hch's
> opinion here as objective. I would even go so far that when things are
> related to lsm(s) he's just tro...

Oh I don't think so. Everyone thinks Christoph has it in for their
project (me included quite often). He's just blessed with a lot of taste
and determination to enforce it, and cursed (or perhaps blessed) with
the ability to explain bluntly and clearly his opinion.

gid hacks are not a good long term plan.

Can we use capabilities, if not - why not and how do we fix it so we can
do the job right. Do we need some more capability bits that are
implicitly inherited and not touched by setuidness ?

Andrew Morton

unread,

Jan 5, 2005, 12:53:35 AM1/5/05

to Alan Cox, a...@domdv.de, rlre...@joe-job.com, linux-...@vger.kernel.org, mi...@elte.hu, j...@io.com

Alan Cox <al...@lxorguk.ukuu.org.uk> wrote:
>
> Can we use capabilities

capabilities don't work :(

http://www.uwsg.iu.edu/hypermail/linux/kernel/0404.0/0502.html

Christoph Hellwig

unread,

Jan 5, 2005, 6:23:21 AM1/5/05

to Jack O'Quin, Christoph Hellwig, Lee Revell, linux-...@vger.kernel.org, Andrew Morton, Ingo Molnar

On Tue, Jan 04, 2005 at 12:55:15PM -0600, Jack O'Quin wrote:
> Statements of the form "had I cared enough to do something about this
> problem, I would have implemented it differently" are not much help.
> This patch is small and clean. It meshes with existing kernel LSM
> mechanisms. It solves a real problem affecting many Linux desktop
> users.

It solves problems - most kernel patches do that. But it does solve
this problems in a way that doesn't fit very well in the grand design.

Christoph Hellwig

unread,

Jan 5, 2005, 6:26:28 AM1/5/05

to Lee Revell, linux-...@vger.kernel.org, Andrew Morton, Ingo Molnar, Jack O'Quin

On Tue, Jan 04, 2005 at 01:57:13PM -0500, Lee Revell wrote:
> On Tue, 2005-01-04 at 18:20 +0000, Christoph Hellwig wrote:
> > On Tue, Jan 04, 2005 at 01:16:54PM -0500, Lee Revell wrote:
> > > Got a patch? Code talks, BS walks. This is working perfectly, right
> > > now, and is being used by thousands of Linux ausio users.
> >
> > Which still doesn't mean it's the right design. And no, I don't need the
> > feature so I won't write it. If you want a certain feature it's up to
> > you to implement it in a way that's considered mergeable.
> >
>
> Please specify what's wrong with it. So far all your objection amounts
> to is "I don't like it".

It's tying privilegues to uids/gids, and it does so in an overcomplicated
way and just for an extremly tiny, specialized subset of available
privilegues.

In short it's a very specialized hack.

> If you do have anything other that your opinion to back up your
> assertion that it's a bad design, you should have raised it months ago
> when this was first posted. Now that we have it to a mergeable state
> (as far as the people who worked on it are concerned), you want to pop
> up and say "Nope, bad design"?

I'm very sorry but I don't have the time to comment on every single patch
posted somewhere. All the review and core kernel work I do on lkml is in my
unpaid spare time. If you want me to review specific things in a deadline
or want me to implement features in a way that fits the kernel grand plan
(which doesn't equal to it actually beeing accepted by other kernel
developers), you're free to contract me.

Christoph Hellwig

unread,

Jan 5, 2005, 6:32:45 AM1/5/05

to Lee Revell, Jack O'Quin, Christoph Hellwig, linux-...@vger.kernel.org, Andrew Morton, Ingo Molnar

On Tue, Jan 04, 2005 at 01:59:57PM -0500, Lee Revell wrote:
> We could do it the was OSX (our real competition) does if that would
> make people happy. They just let any user run RT tasks. Oh wait, but
> that's a "broken design", everyone knows that OSX is a joke, no one
> would use *that* OS to mix a CD or score a movie. :-)

No one sane (well, no one sane with a background in Operating Systems)
would use OS X at all.

Christoph Hellwig

unread,

Jan 5, 2005, 6:46:42 AM1/5/05

to Andreas Steinmetz, Lee Revell, linux-...@vger.kernel.org, Andrew Morton, Ingo Molnar, Jack O'Quin

> Let me remind you all that according to lkml history hch has always been
> biased and objecting to anything related to lsm. Nobody can take hch's
> opinion here as objective. I would even go so far that when things are
> related to lsm(s) he's just tro...

I'm not a big fan of LSM, and I've explained the rationale why multiple
times. The doesn't mean everything done using LSM is bad - in practice
most things are bad though (from the things I've seen everything but lsm)

btw, any reason you drop me from the Cc list once you start the personal
attacks?

Ingo Molnar

unread,

Jan 5, 2005, 6:53:45 AM1/5/05

to Lee Revell, Chris Wright, Alan Cox, Jack O'Quin, Christoph Hellwig, Linux Kernel Mailing List, Andrew Morton, Arjan van de Ven

the RT-LSM thing is a bit dangerous because it doesnt really protect
against a runaway, buggy app. So i think the right way to approach this
problem is to not apply RT-LSM for the time being, but to provide an
'advanced latency needs' scheduling class that is _still_ safe even if
the task is runaway, but behaves with near-RT priorities if the task is
'nice' (i.e. doesnt use up large amount of CPU time.)

incidentally, there is such a scheduling class already: negative nice
levels. Please skip any preconceptions you might have about nice levels,
nice levels have been improved in 2.6.10, the timeslices are now given
out exponentially, giving nice -20 tasks far more weight and priority
than they used to have. (They are obviously still preemptable if they
keep looping burning CPU - but that we can consider a feature.) (Also,
in 2.6 the negative nice levels have a much more agressive interactivity
setting, allowing them to preempt everything lower-prio.)

so, could you try vanilla 2.6.10 (without LSM and without jackd running
with RT priorities), with jackd set to nice -20? Make sure the
jack-client process gets this priority too. Best to achieve this is to
renice a shell to -20 and start up everything from there - the nice
settings will be inherited. How does such an audio test compare to a
test done with jackd running at SCHED_FIFO with RT priority 1?

if this works out well then we could achieve something comparable to
RT-LSM, via nice levels alone.

Ingo

Herbert Poetzl

unread,

Jan 5, 2005, 7:07:42 AM1/5/05

to Andrew Morton, Alan Cox, a...@domdv.de, rlre...@joe-job.com, linux-...@vger.kernel.org, mi...@elte.hu, j...@io.com

On Tue, Jan 04, 2005 at 09:50:10PM -0800, Andrew Morton wrote:
> Alan Cox <al...@lxorguk.ukuu.org.uk> wrote:
> >
> > Can we use capabilities
>
> capabilities don't work :(
>
> http://www.uwsg.iu.edu/hypermail/linux/kernel/0404.0/0502.html

well, maybe it is time to fix them ..

I already proposed some methods to extend them,
and I'm also willing to dig into the various things
required to allow to use the capability system for
what it was intended.

best,
Herbert

Lee Revell

unread,

Jan 5, 2005, 10:47:45 AM1/5/05

to Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Christoph Hellwig, Linux Kernel Mailing List, Andrew Morton, Arjan van de Ven, Paul Davis

On Wed, 2005-01-05 at 12:52 +0100, Ingo Molnar wrote:
> the RT-LSM thing is a bit dangerous because it doesnt really protect
> against a runaway, buggy app. So i think the right way to approach this
> problem is to not apply RT-LSM for the time being, but to provide an
> 'advanced latency needs' scheduling class that is _still_ safe even if
> the task is runaway, but behaves with near-RT priorities if the task is
> 'nice' (i.e. doesnt use up large amount of CPU time.)
>
> incidentally, there is such a scheduling class already: negative nice
> levels. Please skip any preconceptions you might have about nice levels,
> nice levels have been improved in 2.6.10, the timeslices are now given
> out exponentially, giving nice -20 tasks far more weight and priority
> than they used to have. (They are obviously still preemptable if they
> keep looping burning CPU - but that we can consider a feature.) (Also,
> in 2.6 the negative nice levels have a much more agressive interactivity
> setting, allowing them to preempt everything lower-prio.)
>
> so, could you try vanilla 2.6.10 (without LSM and without jackd running
> with RT priorities), with jackd set to nice -20? Make sure the
> jack-client process gets this priority too. Best to achieve this is to
> renice a shell to -20 and start up everything from there - the nice
> settings will be inherited. How does such an audio test compare to a
> test done with jackd running at SCHED_FIFO with RT priority 1?
>
> if this works out well then we could achieve something comparable to
> RT-LSM, via nice levels alone.

Ugh, screwed up the cc: list. Sorry for the WOB.

Paul, care to comment on the above?

Lee

Lee Revell

unread,

Jan 5, 2005, 10:54:47 AM1/5/05

to Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Christoph Hellwig, Linux Kernel Mailing List, Andrew Morton, Arjan van de Ven

On Wed, 2005-01-05 at 12:52 +0100, Ingo Molnar wrote:

> the RT-LSM thing is a bit dangerous because it doesnt really protect
> against a runaway, buggy app. So i think the right way to approach this
> problem is to not apply RT-LSM for the time being, but to provide an
> 'advanced latency needs' scheduling class that is _still_ safe even if
> the task is runaway, but behaves with near-RT priorities if the task is
> 'nice' (i.e. doesnt use up large amount of CPU time.)
>
> incidentally, there is such a scheduling class already: negative nice
> levels. Please skip any preconceptions you might have about nice levels,
> nice levels have been improved in 2.6.10, the timeslices are now given
> out exponentially, giving nice -20 tasks far more weight and priority
> than they used to have. (They are obviously still preemptable if they
> keep looping burning CPU - but that we can consider a feature.) (Also,
> in 2.6 the negative nice levels have a much more agressive interactivity
> setting, allowing them to preempt everything lower-prio.)
>
> so, could you try vanilla 2.6.10 (without LSM and without jackd running
> with RT priorities), with jackd set to nice -20? Make sure the
> jack-client process gets this priority too. Best to achieve this is to
> renice a shell to -20 and start up everything from there - the nice
> settings will be inherited. How does such an audio test compare to a
> test done with jackd running at SCHED_FIFO with RT priority 1?
>
> if this works out well then we could achieve something comparable to
> RT-LSM, via nice levels alone.
>

Adding Paul Davis to the cc:, as he has expressed very strong opinions
on this in the past.

Of course this does not address the problem as you still need to be root
to run at a negative nice value.

Lee

Lee Revell

unread,

Jan 5, 2005, 12:40:40 PM1/5/05

to Christoph Hellwig, Andreas Steinmetz, linux-...@vger.kernel.org, Andrew Morton, Ingo Molnar, Jack O'Quin

On Wed, 2005-01-05 at 11:39 +0000, Christoph Hellwig wrote:
> I'm not a big fan of LSM, and I've explained the rationale why multiple
> times. The doesn't mean everything done using LSM is bad - in practice
> most things are bad though (from the things I've seen everything but lsm)

^^^

Is this a typo? Maybe you mean SELinux?

Lee

Lee Revell

unread,

Jan 5, 2005, 12:44:35 PM1/5/05

to Christoph Hellwig, Jack O'Quin, linux-...@vger.kernel.org, Andrew Morton, Ingo Molnar

On Wed, 2005-01-05 at 11:25 +0000, Christoph Hellwig wrote:
> On Tue, Jan 04, 2005 at 01:59:57PM -0500, Lee Revell wrote:
> > We could do it the was OSX (our real competition) does if that would
> > make people happy. They just let any user run RT tasks. Oh wait, but
> > that's a "broken design", everyone knows that OSX is a joke, no one
> > would use *that* OS to mix a CD or score a movie. :-)
>
> No one sane (well, no one sane with a background in Operating Systems)
> would use OS X at all.
>

Really? I would expect any sane engineer to use the best tool for the
job. If you actually think it's Linux, I suggest you try it sometime.

Lee

Jack O'Quin

unread,

Jan 5, 2005, 1:21:42 PM1/5/05

to Ingo Molnar, Lee Revell, Chris Wright, Alan Cox, Christoph Hellwig, Linux Kernel Mailing List, Andrew Morton, Arjan van de Ven

Ingo Molnar <mi...@elte.hu> writes:

> the RT-LSM thing is a bit dangerous because it doesnt really protect
> against a runaway, buggy app. So i think the right way to approach this
> problem is to not apply RT-LSM for the time being, but to provide an
> 'advanced latency needs' scheduling class that is _still_ safe even if
> the task is runaway, but behaves with near-RT priorities if the task is
> 'nice' (i.e. doesnt use up large amount of CPU time.)

You are right that a runaway SCHED_FIFO application can freeze the
system. But, this really has nothing to do with the permissions
problem addressed by the realtime-lsm. In fact, it is needed by
non-root users for running `nice -20', just as for SCHED_FIFO.

I have no objection to creating a "better" RT scheduling class than
SCHED_FIFO. The "much-maligned" Mac OS X has a deadline scheduler
that works quite well for running JACK and its applications.

> so, could you try vanilla 2.6.10 (without LSM and without jackd running
> with RT priorities), with jackd set to nice -20? Make sure the
> jack-client process gets this priority too. Best to achieve this is to
> renice a shell to -20 and start up everything from there - the nice
> settings will be inherited. How does such an audio test compare to a
> test done with jackd running at SCHED_FIFO with RT priority 1?

For a quick comparison, I used a slightly modified version of the
jack_test3.2 script, that runs jackd without the -R (--realtime)
option...

With -R Without -R
(SCHED_FIFO) (nice -20)

************* SUMMARY RESULT ****************
Total seconds ran . . . . . . : 300
Number of clients . . . . . . : 20
Ports per client . . . . . . : 4
Frames per buffer . . . . . . : 64
*********************************************
Timeout Count . . . . . . . . :( 1) ( 1)
XRUN Count . . . . . . . . . : 2 2837
Delay Count (>spare time) . . : 0 0
Delay Count (>1000 usecs) . . : 0 0
Delay Maximum . . . . . . . . : 3130 usecs 5038044 usecs
Cycle Maximum . . . . . . . . : 960 usecs 18802 usecs
Average DSP Load. . . . . . . : 34.3 % 44.1 %
Average CPU System Load . . . : 8.7 % 7.5 %
Average CPU User Load . . . . : 29.8 % 5.2 %
Average CPU Nice Load . . . . : 0.0 % 20.3 %
Average CPU I/O Wait Load . . : 3.2 % 5.2 %
Average CPU IRQ Load . . . . : 0.7 % 0.7 %
Average CPU Soft-IRQ Load . . : 0.0 % 0.2 %
Average Interrupt Rate . . . : 1707.6 /sec 1677.3 /sec
Average Context-Switch Rate . : 11914.9 /sec 11197.6 /sec
*********************************************

This was not exactly the test you requested. The LSM is still
present. But, it makes no difference. In fact, I used it to grant
nice privileges, since I didn't feel like running it as root.

But this is otherwise vanilla 2.6.10, and the two scheduling
algorithms are fairly represented. Try it yourself, I think you'll
see similarly dramatic differences.

Note that 2.6.10 has by far the best realtime performance of any
vanilla Linux kernel I have ever tried. Although, much better results
can be obtained with your Realtime Preemption patches, this is still a
very creditable result, quite usable for many relatively low-latency
applications. Kudos to you and the many others who contributed to
this achievement.

> if this works out well then we could achieve something comparable to
> RT-LSM, via nice levels alone.

As you see, it does not work at all.
--
joq

Christoph Hellwig

unread,

Jan 5, 2005, 2:12:13 PM1/5/05

to Lee Revell, Jack O'Quin, linux-...@vger.kernel.org, Andrew Morton, Ingo Molnar

On Wed, Jan 05, 2005 at 12:32:47PM -0500, Lee Revell wrote:
> Really? I would expect any sane engineer to use the best tool for the
> job.

Sure.

> If you actually think it's Linux, I suggest you try it sometime.

You don't want to run Darwin, trust me. If you don't read through their
sources..

Christoph Hellwig

unread,

Jan 5, 2005, 2:14:10 PM1/5/05

to Lee Revell, Andreas Steinmetz, linux-...@vger.kernel.org, Andrew Morton, Ingo Molnar, Jack O'Quin

On Wed, Jan 05, 2005 at 12:35:56PM -0500, Lee Revell wrote:
> On Wed, 2005-01-05 at 11:39 +0000, Christoph Hellwig wrote:
> > I'm not a big fan of LSM, and I've explained the rationale why multiple
> > times. The doesn't mean everything done using LSM is bad - in practice
> > most things are bad though (from the things I've seen everything but lsm)
> ^^^
>
> Is this a typo? Maybe you mean SELinux?

Yes.

Olaf Dietsche

unread,

Jan 5, 2005, 3:14:27 PM1/5/05

to Andrew Morton, Alan Cox, a...@domdv.de, rlre...@joe-job.com, linux-...@vger.kernel.org, mi...@elte.hu, j...@io.com

Andrew Morton <ak...@osdl.org> writes:

> Alan Cox <al...@lxorguk.ukuu.org.uk> wrote:
>>
>> Can we use capabilities
>
> capabilities don't work :(
>
> http://www.uwsg.iu.edu/hypermail/linux/kernel/0404.0/0502.html

Capabilities don't work, because of missing filesystem
capabilities. If you have them, it's a question of setting the
appropriate permitted, inheritable and effective capability sets.

I didn't follow the whole thread. But if you want to grant
capabilities on a per user/group basis, may I suggest accessfs user
based capabilities, for example? :-)

Regards, Olaf.

Matt Mackall

unread,

Jan 6, 2005, 8:22:35 PM1/6/05

to Andrew Morton, Alan Cox, a...@domdv.de, rlre...@joe-job.com, linux-...@vger.kernel.org, mi...@elte.hu, j...@io.com

On Wed, Jan 05, 2005 at 01:06:02PM +0100, Herbert Poetzl wrote:
> On Tue, Jan 04, 2005 at 09:50:10PM -0800, Andrew Morton wrote:
> > Alan Cox <al...@lxorguk.ukuu.org.uk> wrote:
> > >
> > > Can we use capabilities
> >
> > capabilities don't work :(
> >
> > http://www.uwsg.iu.edu/hypermail/linux/kernel/0404.0/0502.html
>
> well, maybe it is time to fix them ..
>
> I already proposed some methods to extend them,
> and I'm also willing to dig into the various things
> required to allow to use the capability system for
> what it was intended.

You can't fix them without changing the semantics for existing users
in ways they didn't expect. It could be done with a new personality flag,
but..

--
Mathematics is the supreme nostalgia of our time.

Matt Mackall

unread,

Jan 6, 2005, 8:31:30 PM1/6/05

to Alan Cox, Andreas Steinmetz, Lee Revell, Linux Kernel Mailing List, Andrew Morton, Ingo Molnar, Jack O'Quin

On Wed, Jan 05, 2005 at 04:18:15AM +0000, Alan Cox wrote:
> On Mer, 2005-01-05 at 01:35, Andreas Steinmetz wrote:
> > Let me remind you all that according to lkml history hch has always been
> > biased and objecting to anything related to lsm. Nobody can take hch's
> > opinion here as objective. I would even go so far that when things are
> > related to lsm(s) he's just tro...
>
> Oh I don't think so. Everyone thinks Christoph has it in for their
> project (me included quite often). He's just blessed with a lot of taste
> and determination to enforce it, and cursed (or perhaps blessed) with
> the ability to explain bluntly and clearly his opinion.
>
> gid hacks are not a good long term plan.
>
> Can we use capabilities, if not - why not and how do we fix it so we can
> do the job right. Do we need some more capability bits that are
> implicitly inherited and not touched by setuidness ?

Why can't this be done with a simple SUID helper to promote given
tasks to RT with sched_setschedule, doing essentially all the checks
this LSM is doing?

Objections of "because it requires dangerous root or suid" don't fly,
an RT app under user control can DoS the box trivially. Never mind you
need root to configure the LSM anyway..

--
Mathematics is the supreme nostalgia of our time.

Lee Revell

unread,

Jan 6, 2005, 9:40:11 PM1/6/05

to Matt Mackall, Alan Cox, Andreas Steinmetz, Linux Kernel Mailing List, Andrew Morton, Ingo Molnar, Jack O'Quin

On Thu, 2005-01-06 at 17:18 -0800, Matt Mackall wrote:
> Why can't this be done with a simple SUID helper to promote given
> tasks to RT with sched_setschedule, doing essentially all the checks
> this LSM is doing?
>
> Objections of "because it requires dangerous root or suid" don't fly,
> an RT app under user control can DoS the box trivially. Never mind you
> need root to configure the LSM anyway..

Yes but a bug in an app running as root can trash the filesystem. The
worst you can do with RT privileges is lock up the machine.

Lee

Alan Cox

unread,

Jan 6, 2005, 10:05:41 PM1/6/05

to Matt Mackall, Andrew Morton, a...@domdv.de, rlre...@joe-job.com, Linux Kernel Mailing List, mi...@elte.hu, j...@io.com

On Gwe, 2005-01-07 at 01:13, Matt Mackall wrote:
> You can't fix them without changing the semantics for existing users
> in ways they didn't expect. It could be done with a new personality flag,
> but..

I disagree. At the most trivial you could just add another 32bits of
sticky capability that are never touched by setuid/non-setuidness and
represent additional "user" (or more rightly session) abilities to do
limited overrides

Jack O'Quin

unread,

Jan 7, 2005, 12:54:37 AM1/7/05

to Matt Mackall, Alan Cox, Andreas Steinmetz, Lee Revell, Chris Wright, Linux Kernel Mailing List, Andrew Morton, Ingo Molnar, LAD mailing list

[Adding linux-audio-dev to the CC list]

Matt Mackall <m...@selenic.com> writes:

> On Wed, Jan 05, 2005 at 04:18:15AM +0000, Alan Cox wrote:
>> gid hacks are not a good long term plan.
>>
>> Can we use capabilities, if not - why not and how do we fix it so
>> we can do the job right. Do we need some more capability bits that
>> are implicitly inherited and not touched by setuidness ?
>
> Why can't this be done with a simple SUID helper to promote given
> tasks to RT with sched_setschedule, doing essentially all the checks
> this LSM is doing?

The answer to your simple question is a long, sad story. :-(

There is clearly no practical way to write large audio applications
(many with elaborate graphical interfaces) securly enough to run them
as root. So, we have used capabilities with linux-2.4 systems for
several years. It was never a satisfactory solution, but was all we
could do at the time.

There is a small setuid program called `jackstart' that exec()s the
JACK server (`jackd') with appropriate privileges so it can pass
realtime privileges to its applications. Each client needs to create
a realtime thread and mlock() its storage to do its part of the
realtime audio cycle. Note that sched_setschedule() provides no way
to handle the mlock() requirement, which cannot be done from another
process. Clients may come and go at any time, so dropping the
privilege after initialization is not an option.

Unfortunately, all this heavyweight mechanism only helps with JACK and
its many clients. Lots of other audio or video oriented applications
also have realtime needs.

The biggest problem was CAP_SETPCAP, which for good reasons[1] is
disabled in distributed kernels. This forced every user to patch and
build a custom kernel. Worse, it opened all our systems up to the
problems reported by this sendmail security advisory.

[1] http://www.securiteam.com/unixfocus/5KQ040A1RI.html

While stumbling along with this very unsatisfactory state of affairs,
many on the Linux Audio Developers mailing list were shocked[2] to
hear about an LKML discussion[3] suggesting a significant lack of
developer committment to addressing these issues...

> Quoting Albert Cahalan[3]: "The authors of our code seem to have
> given up and moved on. Nobody cleaned up the mess. Is it any wonder
> the POSIX draft didn't ever make it beyond the draft state?"

[2] http://www.music.columbia.edu/pipermail/linux-audio-dev/2003-November/005332.html
[3] http://www.kerneltraffic.org/kernel-traffic/kt20031101_239.html#3

So, all our work, frustration and user confusion while trying to "do
the right thing" seemed doomed to failure. Since the Linux kernel
developers continued to show little interest in our needs, we started
a discussion about how to meet them ourselves[4].

[4] http://www.music.columbia.edu/pipermail/linux-audio-dev/2003-November/005345.html

Looking at our security requirements in a practical manner, we quickly
concluded that CAP_SETPCAP is the work of the devil. A true
filesystem-based privilege vector solution might be adequate, but is
clearly beyond the scope of what we audio programmers could hope to
accomplish. Even then, it would be difficult to administer.

A simple group ID test is far more secure than CAP_SETPCAP, and
perfectly adequate for us. When configuring a Digital Audio
Workstation, one is not terribly concerned about local Denial of
Service attacks or runaway realtime threads. That would be
unacceptable for many other systems, but not ours. Yet, we want to
avoid system integrity holes in network daemons like sendmail[1]. In
other words: we can tolerate the bad guys crashing the system, but we
don't want them turning it into an open spam relay or corrupting the
filesystem.

So, we needed to provide a simple way for an unskilled system admin
(aka "musician") to configure a personal workstation to run realtime
applications without opening egregious security holes. Equally
important, it must be easy for other system admins to ensure that
these privileges are *not* available on their server systems. It soon
became apparent that the then-new LSM framework provided a good
solution. Because LSM's can be built outside the kernel source tree,
we were no longer forced to wait for some kernel developer to take an
interest.

The realtime-lsm is the solution we evolved. It has been actively
used by thousands of Linux audio users for over a year now[5]. The
first supported SourceForge release was in April of 2004[6]. It is
now used by many popular audio-oriented distributions, including
Planet CCRMA[7] from Stanford University and the Debian Music
Distribution[8] from the AGNULA project.

[5] http://www.music.columbia.edu/pipermail/linux-audio-dev/2003-December/005745.html
[6] http://eca.cx/laa/2004/04/0028.html
[7] http://ccrma.stanford.edu/planetccrma/software/
[8] http://www.agnula.org/

I understand that kernel developers are busy and have other problems
they consider more important than ours. But, you ought to at least
understand that this is really important to us. We needed a clean
solution two or three years ago. Now we finally have one.

Distributing it with the kernel sources would be a great convenience
for our users and would significantly simplify maintenance. It would
also (IMHO) close a significant security and usability deficiency in
the standard kernel. Any of the NSA and DoD experts will tell you: a
security solution that is difficult to administer is not secure.

It is no surprise that kernel developers should consider our solution
technically inferior to their own ideas on the subject. I would have
been delighted to have some kernel developer step in and provide a
clean, well-thought out solution several years ago. This is a kernel
deficiency, not an audio problem. I don't want to work on kernels.

But, I am feeling quite discouraged that so many kernel developers
still seem to consider this problem unimportant. I sense a distinct
unwillingness to move forward on this issue. I really hope I am wrong
about that.
--
joq

Paul Davis

unread,

Jan 7, 2005, 7:58:55 AM1/7/05

to Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Christoph Hellwig, Linux Kernel Mailing List, Andrew Morton, Arjan van de Ven

I just read the thread of messages about this, and I am just
dumbfounded. Jack O'Quin has very politely explained the whole thing,
and it appears that almost nobody actually paid attention to what
he was saying.

1) capabilities: it has been explained by several people that
capabilities do not work, and in the past there has been an utter lack
of interest on the part of the kernel crowd to fix them, sometimes
even going as far as "it can't be fixed".

2) this is *not* only about scheduling. Realtime tasks need
mlockall() and/or mlock as well. even the man page for mlock
recognizes this, yet almost all the discussion here has focused on
scheduling.

3) christoph claims that using uid/gid to define priviledge scope
is a bad idea. but that is the *desired* method. uid/gid corresponds exactly
to what the users of these systems want. they don't want priviledge
accorded to specific applications - its the *users* not the
applications that have the right to get RT scheduling, lock down
memory and so on. these applications will run without RT priviledges,
just not very well (in general, so badly that they are unusable for
their intended purpose).

4) christoph's claims about OS X are nothing but ridiculous. whatever
the internals of Darwin may or may not be (and they certainly include
some of the best ideas about media-friendly kernels from the last 20
years, unlike our favorite OS), professional people are using OS X
(like they used OS 9 and OS 8 before) to get serious, paid work done
in a way that they cannot on Linux. and if attitudes like christoph's
prevail, in a way that they will never get to do on Linux without
going through steps that they will consider absurd. Alan jokes (i
presume) "oh, thats easy, make everyone root", but thats not what OS X
does. OS X says "we know that running realtime applications matters
for a broad class of our likely users, and so anyone can do it, not
just root". And note: "realtime applications" does not mean just
"rt-scheduled", as noted above.

5) setuid wrappers don't work for this, because even though you can
change the scheduling class of another process, you cannot "grant" it
the ability to use mlock. at least not without capabilities, so back
to (1) above ...

So, what do we have here? The two most successful media-friendly OS's
(BeOS and OS X) demonstrate clearly the way things need to be from the
user experience perspective, a development community within the Linux
world evolves a solution using the very nice new security modules in
2.6, and then people who don't appear to understand anything about
what is required or what the use cases are say "i don't like and
because nobody pays me i don't have to tell you why".

I've spent probably burnt through to $250,000 supporting myself and my
family over the last 5 years while I develop pro-level audio software
for Linux. I don't expect to see any of that back. So when Christoph
chimes in with the "I'm not paid, I don't have to tell you why I don't
like it, I just don't" ... that really, really, really irritates me in
a way that few other comments do.

We (Jack, Lee and now myself) have tried to explain what the problem
with the kernel is, how LSM makes a solution possible, acknowledged
issues and attempted to address them, and finally have offered up a
working patch that makes life easier for a bunch of people who don't
want to run webservers or compile kernels all day. If you're going to
publically argue that what the "realtime" LSM does should not be part
of the kernel, at least do us the favor of showing us enough respect
to provide technical or policy based reasons for why its such a bad
solution.

--p

Christoph Hellwig

unread,

Jan 7, 2005, 8:07:16 AM1/7/05

to Paul Davis, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Christoph Hellwig, Linux Kernel Mailing List, Andrew Morton, Arjan van de Ven

On Fri, Jan 07, 2005 at 07:56:02AM -0500, Paul Davis wrote:
> 2) this is *not* only about scheduling. Realtime tasks need
> mlockall() and/or mlock as well. even the man page for mlock
> recognizes this, yet almost all the discussion here has focused on
> scheduling.

RLIMIT_MEMLOCK is your friend.

> 3) christoph claims that using uid/gid to define priviledge scope
> is a bad idea. but that is the *desired* method. uid/gid corresponds exactly
> to what the users of these systems want. they don't want priviledge
> accorded to specific applications - its the *users* not the
> applications that have the right to get RT scheduling, lock down
> memory and so on. these applications will run without RT priviledges,
> just not very well (in general, so badly that they are unusable for
> their intended purpose).

it doesn't really matter what you want, but how we can implement
something that fits in the kernel design.

> 4) christoph's claims about OS X are nothing but ridiculous. whatever
> the internals of Darwin may or may not be (and they certainly include
> some of the best ideas about media-friendly kernels from the last 20
> years, unlike our favorite OS), professional people are using OS X

professional people are also using Windows or Solaris. That doesn't
mean we have to copy every bad idea from them.

> 5) setuid wrappers don't work for this, because even though you can
> change the scheduling class of another process, you cannot "grant" it
> the ability to use mlock. at least not without capabilities, so back
> to (1) above ...

See above (RLIMIT_MEMLOCK).

> I've spent probably burnt through to $250,000 supporting myself and my
> family over the last 5 years while I develop pro-level audio software
> for Linux. I don't expect to see any of that back. So when Christoph
> chimes in with the "I'm not paid, I don't have to tell you why I don't
> like it, I just don't" ... that really, really, really irritates me in
> a way that few other comments do.

I think you're taking things totally out of context here. Lee complained
I didn't review his patch earlier. I only have a limited time available
so I'll select patches that I'm gonna review - and that means thet have
to either be very interesting or be proposed for inclusion. If you want
me to review other things you'll have to either pay me or ask me really
nicely offlist.

> We (Jack, Lee and now myself) have tried to explain what the problem
> with the kernel is, how LSM makes a solution possible, acknowledged
> issues and attempted to address them, and finally have offered up a
> working patch that makes life easier for a bunch of people who don't
> want to run webservers or compile kernels all day.

And we have told you that this solution is not okay. You can spend
more time whining which won't do anything or you could help brainstorming
how to implement a workable solution.

Paul Davis

unread,

Jan 7, 2005, 9:22:46 AM1/7/05

to Christoph Hellwig, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton, Arjan van de Ven

>On Fri, Jan 07, 2005 at 07:56:02AM -0500, Paul Davis wrote:
>> 2) this is *not* only about scheduling. Realtime tasks need
>> mlockall() and/or mlock as well. even the man page for mlock
>> recognizes this, yet almost all the discussion here has focused on
>> scheduling.
>
>RLIMIT_MEMLOCK is your friend.

rlimit_memlock limits the *amount* of memory that mlock() can be used
on, not whether mlock can be used. at least, thats my understanding of
the POSIX design for this. the man page and the source code for mlock
support make that reasonably clear.

moreover, AFAIK all the issues that existed for granting capabilities
exist for rlimit-based priviledges. if they are not granted to all
users/processes, how are they granted, and can they controlled by a
non-root process? last time i looked, the hard limit used by rlimits is
system-wide. you want to copy that idea from OSX or not?

>it doesn't really matter what you want, but how we can implement
>something that fits in the kernel design.

"realtime" LSM does fit into the kernel, quite demonstrably so. it
doesn't, it appears, fit into *your* idea of kernel design.

>> 4) christoph's claims about OS X are nothing but ridiculous. whatever
>> the internals of Darwin may or may not be (and they certainly include
>> some of the best ideas about media-friendly kernels from the last 20
>> years, unlike our favorite OS), professional people are using OS X
>
>professional people are also using Windows or Solaris. That doesn't
>mean we have to copy every bad idea from them.

I didn't say "copy every idea from them". The point of "realtime" LSM
is precisely *not* to copy every idea from them - instead of every
user being able to run RT apps, only specifically root-administered
uids and/or gids can.

>And we have told you that this solution is not okay. You can spend

You, Christoph, have told us that. There is no "we" here. You provided
no rationale other than "uid/gid based privildge control is the wrong
method".

>more time whining which won't do anything or you could help brainstorming
>how to implement a workable solution.

We (Jack, Torben and others on LAD) did brainstorm. We were told on
lkml that LSM was the right way to do this kind of things these days,
because capabilities were broken. But you don't like LSM, so now,
totally post-facto you're telling us that this is not a "workable
solution."

Newsflash: its a totally workable and working solution, and its one
that distributions will adopt whether you get paid or i suck up and
ask you nicely offline. The question was whether we could make
distributions' and users' lives a little easier by not requiring them
to download additional stuff first. Apparently, your unexplained
convictions about the right and wrong way to grant priviledges,
(something that no OS has ever really gotten its head around except
VMS (maybe)), is more important.

Fine, we'll continue to tell people to use "realtime" LSM for audio
work. The people this really affects probably won't use vanilla
kernels anyway.

--p

Arjan van de Ven

unread,

Jan 7, 2005, 9:29:49 AM1/7/05

to Paul Davis, Christoph Hellwig, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

On Fri, Jan 07, 2005 at 09:16:50AM -0500, Paul Davis wrote:
> >On Fri, Jan 07, 2005 at 07:56:02AM -0500, Paul Davis wrote:
> >> 2) this is *not* only about scheduling. Realtime tasks need
> >> mlockall() and/or mlock as well. even the man page for mlock
> >> recognizes this, yet almost all the discussion here has focused on
> >> scheduling.
> >
> >RLIMIT_MEMLOCK is your friend.
>
> rlimit_memlock limits the *amount* of memory that mlock() can be used
> on, not whether mlock can be used. at least, thats my understanding of
> the POSIX design for this. the man page and the source code for mlock
> support make that reasonably clear.

eh no. It defaults to zero, but if you increase it for a specific user, that
user is allowed to mlock more.

>
> Fine, we'll continue to tell people to use "realtime" LSM for audio
> work. The people this really affects probably won't use vanilla
> kernels anyway.

that is so not a constructive way to make progress.
The realtime LSM is the wrong concept. It's a hack to work around other
design issues with linux. *THAT* is what makes it wrong. Not the fact that
it wouldn't work (I believe it works, I don't think anyone doubts that
much). If you are unwilling to even discuss fixing the underlying design
issues then I'm scared that this issue will never come to any workable
solution.

Paul Davis

unread,

Jan 7, 2005, 9:42:40 AM1/7/05

to Arjan van de Ven, Christoph Hellwig, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

>> rlimit_memlock limits the *amount* of memory that mlock() can be used
>> on, not whether mlock can be used. at least, thats my understanding of
>> the POSIX design for this. the man page and the source code for mlock
>> support make that reasonably clear.
>
>eh no. It defaults to zero, but if you increase it for a specific user, that
>user is allowed to mlock more.

from mm/mlock.c:do_mlock() in 2.6.8:

if (on && !capable(CAP_IPC_LOCK))
return -EPERM;

i.e. only root or capabilities can make mlock() usable.

>much). If you are unwilling to even discuss fixing the underlying design
>issues then I'm scared that this issue will never come to any workable
>solution.

Lee, Jack and I have been very willing to discuss the issue. Christoph
isn't willing to discuss it, he's just told us "its the wrong design,
and I'm not telling you why or what's better". If there is a better
design that will end up in the mainstream kernel, we'd love to see it
implemented, and will likely be involved in doing it, because its
really important to us.

--p

Arjan van de Ven

unread,

Jan 7, 2005, 9:44:53 AM1/7/05

to Paul Davis, Christoph Hellwig, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

On Fri, Jan 07, 2005 at 09:38:38AM -0500, Paul Davis wrote:
> >> rlimit_memlock limits the *amount* of memory that mlock() can be used
> >> on, not whether mlock can be used. at least, thats my understanding of
> >> the POSIX design for this. the man page and the source code for mlock
> >> support make that reasonably clear.
> >
> >eh no. It defaults to zero, but if you increase it for a specific user, that
> >user is allowed to mlock more.
>
> from mm/mlock.c:do_mlock() in 2.6.8:
>
> if (on && !capable(CAP_IPC_LOCK))
> return -EPERM;

now try 2.6.9 ;)
this deficiency got already fixed

Christoph Hellwig

unread,

Jan 7, 2005, 9:50:33 AM1/7/05

to Paul Davis, Arjan van de Ven, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

On Fri, Jan 07, 2005 at 09:38:38AM -0500, Paul Davis wrote:

> Lee, Jack and I have been very willing to discuss the issue. Christoph
> isn't willing to discuss it, he's just told us "its the wrong design,
> and I'm not telling you why or what's better". If there is a better
> design that will end up in the mainstream kernel, we'd love to see it
> implemented, and will likely be involved in doing it, because its
> really important to us.

Calm down and read through the thread again.

Paul Davis

unread,

Jan 7, 2005, 10:28:37 AM1/7/05

to Christoph Hellwig, Arjan van de Ven, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

>On Fri, Jan 07, 2005 at 09:38:38AM -0500, Paul Davis wrote:
>> Lee, Jack and I have been very willing to discuss the issue. Christoph
>> isn't willing to discuss it, he's just told us "its the wrong design,
>> and I'm not telling you why or what's better". If there is a better
>> design that will end up in the mainstream kernel, we'd love to see it
>> implemented, and will likely be involved in doing it, because its
>> really important to us.
>
>Calm down and read through the thread again.

Sure, lets. Distilling out the responses from kernel developers:

======================================================================

Christoph:
---------
This is far too specialized. And option to the capability LSM to grant
capabilities to certain uids/gids sounds like the better choise - and
would also allow to get rid of the magic hugetlb uid horrors.

Which still doesn't mean it's the right design. And no, I don't need the
feature so I won't write it. If you want a certain feature it's up to
you to implement it in a way that's considered mergeable.

Alan:
-----
The problem with uid/gid based hacks is that they get really ugly to
administer really fast. Especially once you have users who need realtime
and hugetlb, and users who need one only.

It would be far cleaner to split CAP_SYS_NICE capability down - which
should cover the real time OS functions nicely. Right now it gives a few
too many rights but that could be fixed easily.

gid hacks are not a good long term plan.

Can we use capabilities, if not - why not and how do we fix it so we can
do the job right. Do we need some more capability bits that are
implicitly inherited and not touched by setuidness ?

Andrew:
-------

capabilities don't work :(

Herbert:
--------

well, maybe it is time to fix them ..

I already proposed some methods to extend them,
and I'm also willing to dig into the various things
required to allow to use the capability system for
what it was intended.

Matt:
-----

You can't fix them without changing the semantics for existing users
in ways they didn't expect. It could be done with a new personality flag,
but..

Alan:
-----

I disagree. At the most trivial you could just add another 32bits of
sticky capability that are never touched by setuid/non-setuidness and
represent additional "user" (or more rightly session) abilities to do
limited overrides

Olaf:
-----

Capabilities don't work, because of missing filesystem
capabilities. If you have them, it's a question of setting the
appropriate permitted, inheritable and effective capability sets.

I didn't follow the whole thread. But if you want to grant
capabilities on a per user/group basis, may I suggest accessfs user
based capabilities, for example? :-)

======================================================================

So, we have a few responses, some references to various potential
solutions all of which have problems just as deep if not deeper than
the uid/gid-based model that this particular LSM adopts. No proposal
for any system that would actually work and address anyone's real
needs in a useful way. Please recall that we developed a
capability-based solution for 2.4, but it was cumbersome because the
vanilla kernel doesn't have capabilities enabled and there are lots of
reasons to not enable them given their current status.

Meanwhile, Jack already provided a very detailed, cross-referenced and
clear explanatin of why various other ideas won't work very well from
a user-space perspective. And in this thread, both Lee and Jack have
attempted to deal with issues that have been raised about the uid/gid
approach.

In summary, on the one hand, we have a working, defensible solution,
and on the other some misgivings and suggestions to try again at
implementing some more generic priviledge-granting system, something
that lkml has been arguing about for years, along with the rest of the
OS design community. Something that I suspect will never be properly
resolved, merely "muddled towards". There is no right way to grant
priviledges - there are many ways, and the benefits and downfalls of
each depends on what you are trying to achieve. For years, POSIX based
systems have relied on uid/gid solutions and they continue to do
so. People understand how to manage them (as best as can be done), and
what the issues are. Capabilities were supposed to be solution to
this, and instead have essentially been a dead-end. So I trust that
you'll be understanding of any scepticism that I might have of the
suggestion that we go away and work on some other "more generic"
system.

--p

Paul Davis

unread,

Jan 7, 2005, 10:31:51 AM1/7/05

to Arjan van de Ven, Christoph Hellwig, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

>now try 2.6.9 ;)
>this deficiency got already fixed

well thats good, i hope someone updated the man page too :)

but is there actually any way to grant specific users a reasonable
rlimit, or are you proposing that we adopt another "bad idea" from OS
X and let everybody do this?

--p

Arjan van de Ven

unread,

Jan 7, 2005, 10:36:27 AM1/7/05

to Paul Davis, Christoph Hellwig, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

On Fri, Jan 07, 2005 at 10:27:33AM -0500, Paul Davis wrote:
> >now try 2.6.9 ;)
> >this deficiency got already fixed
>
> well thats good, i hope someone updated the man page too :)
>
> but is there actually any way to grant specific users a reasonable
> rlimit,

yes; most distributions will use pam for this, you can set per user or per
gorup limits there.

Paul Davis

unread,

Jan 7, 2005, 10:45:56 AM1/7/05

to Arjan van de Ven, Christoph Hellwig, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

>> well thats good, i hope someone updated the man page too :)
>>
>> but is there actually any way to grant specific users a reasonable
>> rlimit,
>
>yes; most distributions will use pam for this, you can set per user or per
>gorup limits there.

isn't that a uid/gid based system? ok, i'm being a little snide :)

fine, so the mlock situation may have improved enough post-2.6.9 that
it can be considered fixed. that leaves the scheduler issue. but
apparently, a uid/gid solution is OK for mlock, and not for the
scheduler. am i missing something?

--p

Martin Mares

unread,

Jan 7, 2005, 11:07:23 AM1/7/05

to Paul Davis, Arjan van de Ven, Christoph Hellwig, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

Hello!

> >yes; most distributions will use pam for this, you can set per user or per
> >gorup limits there.
>
> isn't that a uid/gid based system? ok, i'm being a little snide :)

:) The big difference between this and a pure uid/gid based system is that
pam_limits is not the only place where you can change the ulimits. If your
system is simple enough that deciding on uid/gid is enough, you can use
pam_limits; if not and you for example want to make the limits depend
on the phase of the moon, it's easy to do so -- just write a simple user space
program which will set the limits accordingly. Also, if the user wishes to
restrict his abilities, because he's going to do some experiment and he
doesn't want to lock up the machine, he can easily do so.

Except for filesystem permissions, I think that it's exactly the usual UNIX
way of controlling access -- the kernel takes care of access checks based
on some trivial attributes like ulimits and capabilities, and user space
decides who should get which. I don't see any reason why the right to use
realtime scheduling should be treated differently. Do you?

It's quite probable that the current system of capabilities is not well
suited for this, but I think that although it's tempting to work around it
by introducing a new security module, in the long term it's much better
to extend and/or fix the capabilities -- I don't see any fundamental reason
for capabilities being unusable for this goal, it's much more likely to be
just minor details in the implementation.

Have a nice fortnight
--
Martin `MJ' Mares <m...@ucw.cz> http://atrey.karlin.mff.cuni.cz/~mj/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
Always remember that you are absolutely unique ... just like everyone else.

Arjan van de Ven

unread,

Jan 7, 2005, 11:08:25 AM1/7/05

to Paul Davis, Christoph Hellwig, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

On Fri, Jan 07, 2005 at 10:41:40AM -0500, Paul Davis wrote:
>
> fine, so the mlock situation may have improved enough post-2.6.9 that
> it can be considered fixed. that leaves the scheduler issue. but
> apparently, a uid/gid solution is OK for mlock, and not for the
> scheduler. am i missing something?

I think you skipped a step. You don't have a scheduler requirement, you have
a latency requirement. You currently *solve* that latency requirement via a
scheduler "hack", yet is quite clear that the "hard" realtime solution is
most likely not the right approach. Note that I'm not saying that you
shouldn't get the latency that that currently provides, but the downsides
(can hang the machine) are bad; a solution that solves that would be far
preferable
something like a soft realtime flag that acts as if it's the hard realtime
one unless the app shows "misbehavior" (eg eats its timeslice for X times in
a row) might for example be such a solution. And with the anti abuse
protection it can run with far lighter privilegs.

Martin Mares

unread,

Jan 7, 2005, 11:12:05 AM1/7/05

to Paul Davis, Christoph Hellwig, Arjan van de Ven, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

Hello!

> Olaf:
> -----
> Capabilities don't work, because of missing filesystem
> capabilities. If you have them, it's a question of setting the
> appropriate permitted, inheritable and effective capability sets.

Sure, filesystem capabilities would be nice, but for the stuff Paul
mentions they aren't needed -- what you need is to grant capabilities
to the user's session, which can be easily done by a PAM module.

Have a nice fortnight
--
Martin `MJ' Mares <m...@ucw.cz> http://atrey.karlin.mff.cuni.cz/~mj/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth

"C++: an octopus made by nailing extra legs onto a dog." -- Steve Taylor

Paul Davis

unread,

Jan 7, 2005, 11:16:57 AM1/7/05

to Martin Mares, Christoph Hellwig, Arjan van de Ven, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

>Sure, filesystem capabilities would be nice, but for the stuff Paul
>mentions they aren't needed -- what you need is to grant capabilities
>to the user's session, which can be easily done by a PAM module.

i think this is true only if the kernel comes with capabilities
enabled.

various media-centric distributions (CCRMA, demudi, dyne:bolic and
others) enabled them for their 2.4 kernels, but not the major
desktop-centric ones. then the impression began to be received that in
2.6, capabilities were even more questionable of a mechanism to use.
In addition, the LSM system appeared, and seemed to offer a much
better solution entirely: no need to patch the kernel at all, or at
least it appeared to be so in the beginning. Hence the "realtime" LSM.

--p

Takashi Iwai

unread,

Jan 7, 2005, 11:23:50 AM1/7/05

to Arjan van de Ven, Paul Davis, Christoph Hellwig, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

At Fri, 7 Jan 2005 17:03:51 +0100,

Arjan van de Ven wrote:
>
> On Fri, Jan 07, 2005 at 10:41:40AM -0500, Paul Davis wrote:
> >
> > fine, so the mlock situation may have improved enough post-2.6.9 that
> > it can be considered fixed. that leaves the scheduler issue. but
> > apparently, a uid/gid solution is OK for mlock, and not for the
> > scheduler. am i missing something?
>
> I think you skipped a step. You don't have a scheduler requirement, you have
> a latency requirement. You currently *solve* that latency requirement via a
> scheduler "hack", yet is quite clear that the "hard" realtime solution is
> most likely not the right approach. Note that I'm not saying that you
> shouldn't get the latency that that currently provides, but the downsides
> (can hang the machine) are bad; a solution that solves that would be far
> preferable
> something like a soft realtime flag that acts as if it's the hard realtime
> one unless the app shows "misbehavior" (eg eats its timeslice for X times in
> a row) might for example be such a solution. And with the anti abuse
> protection it can run with far lighter privilegs.

This reminds me about the soft-RT patch posted quite sometime ago.
I feel such a handy psuedo-RT scheduler class would be useful for
other systems than JACK, too...

Takashi

Paul Davis

unread,

Jan 7, 2005, 11:25:24 AM1/7/05

to Arjan van de Ven, Christoph Hellwig, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

>On Fri, Jan 07, 2005 at 10:41:40AM -0500, Paul Davis wrote:
>>
>> fine, so the mlock situation may have improved enough post-2.6.9 that
>> it can be considered fixed. that leaves the scheduler issue. but
>> apparently, a uid/gid solution is OK for mlock, and not for the
>> scheduler. am i missing something?
>
>I think you skipped a step. You don't have a scheduler requirement, you have
>a latency requirement. You currently *solve* that latency requirement via a
>scheduler "hack", yet is quite clear that the "hard" realtime solution is
>most likely not the right approach. Note that I'm not saying that you

Why is that clear? In just about every respect, realtime audio has the
same characteristics as hard realtime, except that nobody gets hurt
when a deadline is missed :) We have an IRQ source, and a deadline
(sometimes on the sub-msec range, but more typically 1-5msec) for the
work that has to be done. This deadline is tight enough that the task
essentially *has* to run with SCHED_FIFO scheduling, because doing
almost anything else instead will cause the deadline to be missed.

>shouldn't get the latency that that currently provides, but the downsides
>(can hang the machine) are bad; a solution that solves that would be far
>preferable

OS X's deadline scheduler is arguably better, though I don't believe
it can actually offer the guarantees it claims to with 100%
reliability. But they are essentially do hard realtime via deadline
scheduling, combined with a task killer for any RT task that exceeds
its stated cycle consumption.

To do that in Linux would be great, but its really an addition to the
current scheduling mechanisms, not a replacement. The OS X realtime
task (its actually a Mach RT thread, to be more precise) can still
theoretically cause DOS *if* the kernel task killer was not present,
so its just the task killer that would be needed, presumably driven by
the timer interrupt.

>something like a soft realtime flag that acts as if it's the hard realtime
>one unless the app shows "misbehavior" (eg eats its timeslice for X times in
>a row) might for example be such a solution. And with the anti abuse
>protection it can run with far lighter privilegs.

i guess we're suggesting almost the same thing, except that i consider
this to be hard realtime plus a task killer, not "soft realtime
pretending to be hard realtime" :)

--p

Paul Davis

unread,

Jan 7, 2005, 11:28:30 AM1/7/05

to Martin Mares, Arjan van de Ven, Christoph Hellwig, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

>It's quite probable that the current system of capabilities is not well
>suited for this, but I think that although it's tempting to work around it
>by introducing a new security module, in the long term it's much better
>to extend and/or fix the capabilities -- I don't see any fundamental reason
>for capabilities being unusable for this goal, it's much more likely to be
>just minor details in the implementation.

capabilities work - we use them in 2.4 where a helper suid application
gets the ball rolling, and then its child grants capabilities to new
clients.

the problem we have with capabilities is that capabilities are not
enabled by default in the vanilla kernel, and there seems to be
considerable advice suggesting that they should not be enabled.

--p

Martin Mares

unread,

Jan 7, 2005, 11:32:22 AM1/7/05

to Paul Davis, Christoph Hellwig, Arjan van de Ven, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

Hello!

> i think this is true only if the kernel comes with capabilities
> enabled.
>
> various media-centric distributions (CCRMA, demudi, dyne:bolic and
> others) enabled them for their 2.4 kernels, but not the major
> desktop-centric ones. then the impression began to be received that in
> 2.6, capabilities were even more questionable of a mechanism to use.
> In addition, the LSM system appeared, and seemed to offer a much
> better solution entirely: no need to patch the kernel at all, or at
> least it appeared to be so in the beginning. Hence the "realtime" LSM.

Yes, but is there really some difference between people having to enable
LSM and add a new LSM module, and people recompiling the kernel to include
capabilities?

Also, is somebody really shipping 2.4 kernels without capabilities?
I'm unable to find any such config switch in 2.4.28 -- maybe it's because
I'm almost sleeping now, but it doesn't seem to be there.

Have a nice fortnight
--
Martin `MJ' Mares <m...@ucw.cz> http://atrey.karlin.mff.cuni.cz/~mj/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth

return(ECRAY); /* Program exited before being run */

Paul Davis

unread,

Jan 7, 2005, 11:41:11 AM1/7/05

to Martin Mares, Christoph Hellwig, Arjan van de Ven, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

>Yes, but is there really some difference between people having to enable
>LSM and add a new LSM module, and people recompiling the kernel to include
>capabilities?

Well, one is configuration issue, the other involves hacking the
kernel headers before recompiling. Maybe you and I might not seem much
difference, but many people would. One of them says "the kernel gang
think this is OK to use if you want to", the other one says "err, you
can do this but don't call me if it goes wrong".

>Also, is somebody really shipping 2.4 kernels without capabilities?
>I'm unable to find any such config switch in 2.4.28 -- maybe it's because
>I'm almost sleeping now, but it doesn't seem to be there.

They are present but disabled by default. You have to hack the initial
values of CAP_INIT_EFF_SET and CAP_INIT_IHN_SET.

--p

Takashi Iwai

unread,

Jan 7, 2005, 11:44:29 AM1/7/05

to Martin Mares, Paul Davis, Christoph Hellwig, Arjan van de Ven, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

At Fri, 7 Jan 2005 17:29:02 +0100,

Martin Mares wrote:
>
> Hello!
>
> > i think this is true only if the kernel comes with capabilities
> > enabled.
> >
> > various media-centric distributions (CCRMA, demudi, dyne:bolic and
> > others) enabled them for their 2.4 kernels, but not the major
> > desktop-centric ones. then the impression began to be received that in
> > 2.6, capabilities were even more questionable of a mechanism to use.
> > In addition, the LSM system appeared, and seemed to offer a much
> > better solution entirely: no need to patch the kernel at all, or at
> > least it appeared to be so in the beginning. Hence the "realtime" LSM.
>
> Yes, but is there really some difference between people having to enable
> LSM and add a new LSM module, and people recompiling the kernel to include
> capabilities?

For distributors, it's much easier to provide an additional module
than to let people recompile kernels.

Takashi

Lee Revell

unread,

Jan 7, 2005, 11:48:40 AM1/7/05

to Arjan van de Ven, Christoph Hellwig, linux-kernel, Andrew Morton, Ingo Molnar, Jack O'Quin, Paul Davis

[added Paul to cc:]

On Mon, 2005-01-03 at 15:15 +0100, Arjan van de Ven wrote:
> On Mon, 2005-01-03 at 14:03 +0000, Christoph Hellwig wrote:
> > On Wed, Dec 29, 2004 at 09:43:22PM -0500, Lee Revell wrote:
> > > The realtime LSM has been previously explained on this list. Its
> > > function is to allow selected nonroot users to run RT tasks. The most
> > > common application is low latency audio with JACK, http://jackit.sf.net.

> > >
> >
> > This is far too specialized. And option to the capability LSM to grant
> > capabilities to certain uids/gids sounds like the better choise - and
> > would also allow to get rid of the magic hugetlb uid horrors.

> those can go away anyway now that there is an rlimit to achieve the
> exact same thing.....
>
> I can see the point of making an rlimit like thing instead for both the
> nice levels allowed and maybe the "can do rt" bit
>

How about a "max RT prio" rlimit, that defaults to -1 (can't do RT).
Set it to 90 or something for audio users so you can still run a higher
prio watchdog thread.

Lee

Martin Mares

unread,

Jan 7, 2005, 11:48:38 AM1/7/05

to Takashi Iwai, Paul Davis, Christoph Hellwig, Arjan van de Ven, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

Hello!

> > Yes, but is there really some difference between people having to enable
> > LSM and add a new LSM module, and people recompiling the kernel to include
> > capabilities?
>
> For distributors, it's much easier to provide an additional module
> than to let people recompile kernels.

Well, if LSM is enabled in the kernel, enabling capabilities should be
a single insmod, shouldn't it?

Have a nice fortnight
--
Martin `MJ' Mares <m...@ucw.cz> http://atrey.karlin.mff.cuni.cz/~mj/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth

The better the better, the better the bet.

Martin Mares

unread,

Jan 7, 2005, 12:07:34 PM1/7/05

to Paul Davis, Christoph Hellwig, Arjan van de Ven, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

Hello!

> They are present but disabled by default. You have to hack the initial
> values of CAP_INIT_EFF_SET and CAP_INIT_IHN_SET.

Oops. Does anybody know why this has been done?

Also, it seems that it has a relatively easy work-around: boot with
init=/sbin/simple-wrapper and let the wrapper set the cap_bset and exec real
init. (I agree that it's a hack, but a temporarily usable one.)

Have a nice fortnight
--
Martin `MJ' Mares <m...@ucw.cz> http://atrey.karlin.mff.cuni.cz/~mj/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth

"When I was a boy I was told that anybody could become President; I'm beginning to believe it." -- C. Darrow

Chris Wright

unread,

Jan 7, 2005, 12:32:48 PM1/7/05

to Martin Mares, Paul Davis, Christoph Hellwig, Arjan van de Ven, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

* Martin Mares (m...@ucw.cz) wrote:
> Hello!
>
> > They are present but disabled by default. You have to hack the initial
> > values of CAP_INIT_EFF_SET and CAP_INIT_IHN_SET.
>
> Oops. Does anybody know why this has been done?

Yes, SETPCAP became a gaping security hole. Recall the sendmail hole.

> Also, it seems that it has a relatively easy work-around: boot with
> init=/sbin/simple-wrapper and let the wrapper set the cap_bset and exec real
> init. (I agree that it's a hack, but a temporarily usable one.)

This won't work, you can't increase the bset, which is hardcoded to
leave out SETPCAP. Also, init is hard coded to start without SETPCAP.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

Martin Mares

unread,

Jan 7, 2005, 12:36:43 PM1/7/05

to Chris Wright, Paul Davis, Christoph Hellwig, Arjan van de Ven, Lee Revell, Ingo Molnar, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

Hello!

> Yes, SETPCAP became a gaping security hole. Recall the sendmail hole.

Hmmm, I don't remember now, could you give me some pointer, please?

> This won't work, you can't increase the bset, which is hardcoded to
> leave out SETPCAP. Also, init is hard coded to start without SETPCAP.

If I read the source correctly, init is allowed to increase the bset,
the other processes aren't.

Have a nice fortnight
--
Martin `MJ' Mares <m...@ucw.cz> http://atrey.karlin.mff.cuni.cz/~mj/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth

American patent law: two monkeys, fourteen days.

Chris Wright

unread,

Jan 7, 2005, 12:52:51 PM1/7/05

to Martin Mares, Chris Wright, Paul Davis, Christoph Hellwig, Arjan van de Ven, Lee Revell, Ingo Molnar, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

* Martin Mares (m...@ucw.cz) wrote:

> Hello!
>
> > Yes, SETPCAP became a gaping security hole. Recall the sendmail hole.
>
> Hmmm, I don't remember now, could you give me some pointer, please?

Sure, the Wagner/Chen paper on setuid demystified has some references to
it IIRC. http://www.cs.ucdavis.edu/~hchen/paper/usenix02.ps

> > This won't work, you can't increase the bset, which is hardcoded to
> > leave out SETPCAP. Also, init is hard coded to start without SETPCAP.
>
> If I read the source correctly, init is allowed to increase the bset,
> the other processes aren't.

Yes, you're right I forgot about that.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

Chris Wright

unread,

Jan 7, 2005, 1:03:40 PM1/7/05

to Paul Davis, Christoph Hellwig, Arjan van de Ven, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

* Paul Davis (pa...@linuxaudiosystems.com) wrote:
> So, we have a few responses, some references to various potential
> solutions all of which have problems just as deep if not deeper than
> the uid/gid-based model that this particular LSM adopts. No proposal
> for any system that would actually work and address anyone's real
> needs in a useful way.

I don't think that's quite true. One repeated recommendation was to
simply generalize the idea so that it applies to all capabilities.
Another, which at this point appears quite workable, was Arjan's
recommendation to make scheduling policy/priority protected by an rlimit
(complicated only by representing the combinations sanely in a single
number).

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

Chris Wright

unread,

Jan 7, 2005, 1:08:46 PM1/7/05

to Arjan van de Ven, Paul Davis, Christoph Hellwig, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

* Arjan van de Ven (arj...@redhat.com) wrote:
> eh no. It defaults to zero, but if you increase it for a specific user, that
> user is allowed to mlock more.

Actually, I think it defaults to 32k to keep gpg happy (at least in
mainline) ;-)

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

Jack O'Quin

unread,

Jan 7, 2005, 3:10:08 PM1/7/05

to Martin Mares, Chris Wright, Paul Davis, Christoph Hellwig, Arjan van de Ven, Lee Revell, Ingo Molnar, Alan Cox, Linux Kernel Mailing List, Andrew Morton

Martin Mares <m...@ucw.cz> writes:

>> Yes, SETPCAP became a gaping security hole. Recall the sendmail hole.
>
> Hmmm, I don't remember now, could you give me some pointer, please?

I already did that...

> Jack O'Quin wrote:
> > The biggest problem was CAP_SETPCAP, which for good reasons[1] is
> > disabled in distributed kernels. This forced every user to patch and
> > build a custom kernel. Worse, it opened all our systems up to the
> > problems reported by this sendmail security advisory.

[1] http://www.securiteam.com/unixfocus/5KQ040A1RI.html

--
joq

Matt Mackall

unread,

Jan 7, 2005, 3:13:36 PM1/7/05

to Alan Cox, Andrew Morton, a...@domdv.de, rlre...@joe-job.com, Linux Kernel Mailing List, mi...@elte.hu, j...@io.com

On Fri, Jan 07, 2005 at 01:55:09AM +0000, Alan Cox wrote:

> On Gwe, 2005-01-07 at 01:13, Matt Mackall wrote:
> > You can't fix them without changing the semantics for existing users
> > in ways they didn't expect. It could be done with a new personality flag,
> > but..
>

> I disagree. At the most trivial you could just add another 32bits of
> sticky capability that are never touched by setuid/non-setuidness and
> represent additional "user" (or more rightly session) abilities to do
> limited overrides

I think we're referring to different brokenness. The problems I see
are with the semantics of inheritance of capabilities which make
wrapping applications painful. Those can't be changed without creating
holes in existing apps so the general utility of caps is limited.

--
Mathematics is the supreme nostalgia of our time.

Matt Mackall

unread,

Jan 7, 2005, 3:16:49 PM1/7/05

to Jack O'Quin, Alan Cox, Andreas Steinmetz, Lee Revell, Chris Wright, Linux Kernel Mailing List, Andrew Morton, Ingo Molnar, LAD mailing list

On Thu, Jan 06, 2005 at 11:54:05PM -0600, Jack O'Quin wrote:
> Note that sched_setschedule() provides no way to handle the mlock()
> requirement, which cannot be done from another process.

I'm pretty sure that part can be done by a privileged server handing
out mlocked shared memory segments.

The trouble with introducing something into the kernel is that once
done, it can't be undone. So you're absolutely going to meet
resistance to anything that can be a) done sufficiently in userspace
or b) can reasonably be done in a more generic manner so as to meet
the needs of a wider future audience. The onus is on the submitter to
meet these requirements because we can't easily kick out a broken API
after we accept it.

Chris Wright

unread,

Jan 7, 2005, 3:32:30 PM1/7/05

to Matt Mackall, Jack O'Quin, Alan Cox, Andreas Steinmetz, Lee Revell, Chris Wright, Linux Kernel Mailing List, Andrew Morton, Ingo Molnar, LAD mailing list

* Matt Mackall (m...@selenic.com) wrote:
> On Thu, Jan 06, 2005 at 11:54:05PM -0600, Jack O'Quin wrote:
> > Note that sched_setschedule() provides no way to handle the mlock()
> > requirement, which cannot be done from another process.
>
> I'm pretty sure that part can be done by a privileged server handing
> out mlocked shared memory segments.

It can actually be done with plain ol' rlimits (RLIMIT_MEMLOCK).

> The trouble with introducing something into the kernel is that once
> done, it can't be undone. So you're absolutely going to meet
> resistance to anything that can be a) done sufficiently in userspace
> or b) can reasonably be done in a more generic manner so as to meet
> the needs of a wider future audience. The onus is on the submitter to
> meet these requirements because we can't easily kick out a broken API
> after we accept it.

Indeed (although in this case it's not adding an API as much as using an
existing one).

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

Jack O'Quin

unread,

Jan 7, 2005, 3:33:46 PM1/7/05

to Matt Mackall, Alan Cox, Andreas Steinmetz, Lee Revell, Chris Wright, Linux Kernel Mailing List, Andrew Morton, Ingo Molnar, LAD mailing list

Matt Mackall <m...@selenic.com> writes:

> On Thu, Jan 06, 2005 at 11:54:05PM -0600, Jack O'Quin wrote:
>> Note that sched_setschedule() provides no way to handle the mlock()
>> requirement, which cannot be done from another process.
>
> I'm pretty sure that part can be done by a privileged server handing
> out mlocked shared memory segments.

If you're "pretty sure", please explain how locking a shared memory
segment prevents the code and stack of the client's realtime thread
from page faulting.
--
joq

Lee Revell

unread,

Jan 7, 2005, 3:50:04 PM1/7/05

to Matt Mackall, Jack O'Quin, Alan Cox, Andreas Steinmetz, Chris Wright, Linux Kernel Mailing List, Andrew Morton, Ingo Molnar, LAD mailing list

On Fri, 2005-01-07 at 12:02 -0800, Matt Mackall wrote:
> The trouble with introducing something into the kernel is that once
> done, it can't be undone. So you're absolutely going to meet
> resistance to anything that can be a) done sufficiently in userspace
> or b) can reasonably be done in a more generic manner so as to meet
> the needs of a wider future audience. The onus is on the submitter to
> meet these requirements because we can't easily kick out a broken API
> after we accept it.

For a big subsystem that exposes an API, you would be right. But this
is a *really* simple problem, all you need is a way to tell it who gets
RT privileges, which means uid or gid. So any future solution will be
orthogonal to this one, and when users upgrade even a not very smart
Perl script will be able to migrate the configuration. How many
different ways are there to say "these are the non-root users who have
realtime prvileges", anyway?

Unless, of course, the solution that's eventually merged is *really*
overcomplicated by comparison, in which case users will (rightly) reject
it, and the system will have worked.

Lee

Matt Mackall

unread,

Jan 7, 2005, 3:52:56 PM1/7/05

to Jack O'Quin, Alan Cox, Andreas Steinmetz, Lee Revell, Chris Wright, Linux Kernel Mailing List, Andrew Morton, Ingo Molnar, LAD mailing list

On Fri, Jan 07, 2005 at 02:27:26PM -0600, Jack O'Quin wrote:
> Matt Mackall <m...@selenic.com> writes:
>
> > On Thu, Jan 06, 2005 at 11:54:05PM -0600, Jack O'Quin wrote:
> >> Note that sched_setschedule() provides no way to handle the mlock()
> >> requirement, which cannot be done from another process.
> >
> > I'm pretty sure that part can be done by a privileged server handing
> > out mlocked shared memory segments.
>
> If you're "pretty sure", please explain how locking a shared memory
> segment prevents the code and stack of the client's realtime thread
> from page faulting.

You just map your RT-dependent routine (PIC, of course) into the
segment and move your stack pointer into a second segment. I didn't
say it was easy, but it's all just bits. There's also the rlimit
issue.

Or, going the other way, the client app can pass map handles to the
server to bless. Some juggling might be involved but it's obviously
doable.

As has been pointed out, an rlimit solution exists now as well.

--
Mathematics is the supreme nostalgia of our time.

Lee Revell

unread,

Jan 7, 2005, 4:00:59 PM1/7/05

to Matt Mackall, Jack O'Quin, Alan Cox, Andreas Steinmetz, Chris Wright, Linux Kernel Mailing List, Andrew Morton, Ingo Molnar, LAD mailing list

On Fri, 2005-01-07 at 12:46 -0800, Matt Mackall wrote:
> You just map your RT-dependent routine (PIC, of course) into the
> segment and move your stack pointer into a second segment. I didn't
> say it was easy, but it's all just bits. There's also the rlimit
> issue.
>
> Or, going the other way, the client app can pass map handles to the
> server to bless. Some juggling might be involved but it's obviously
> doable.
>

Christ, what a nightmare! Since when does "obviously doable" mean it's
a good idea? Please, reread your above statements, then go back and
look at the realtime LSM patch (it's less than 200 lines), and tell me
again that your way is more secure.

Please keep in mind that there are already 1000s of users using the
realtime LSM to do audio work. Sorry, but I will take a known good,
well understood, PROVEN solution over "it's obviously doable, it's all
bits anyway". Get back to me when you have some code, or at least some
reasonable suggestions as Alan, Christoph and others have made.

> As has been pointed out, an rlimit solution exists now as well.

Wrong, as was said repeatedly, rlimits only help with mlock! Have you
even been reading the thread?

Lee

Matt Mackall

unread,

Jan 7, 2005, 4:28:45 PM1/7/05

to Lee Revell, Jack O'Quin, Alan Cox, Andreas Steinmetz, Chris Wright, Linux Kernel Mailing List, Andrew Morton, Ingo Molnar, LAD mailing list

On Fri, Jan 07, 2005 at 03:55:12PM -0500, Lee Revell wrote:
> On Fri, 2005-01-07 at 12:46 -0800, Matt Mackall wrote:
> > You just map your RT-dependent routine (PIC, of course) into the
> > segment and move your stack pointer into a second segment. I didn't
> > say it was easy, but it's all just bits. There's also the rlimit
> > issue.
> >
> > Or, going the other way, the client app can pass map handles to the
> > server to bless. Some juggling might be involved but it's obviously
> > doable.
> >
>
> Christ, what a nightmare! Since when does "obviously doable" mean it's
> a good idea? Please, reread your above statements, then go back and
> look at the realtime LSM patch (it's less than 200 lines), and tell me
> again that your way is more secure.

My way simply proves that existing userspace methods have not been
exhausted. It's not impossible as was claimed and cleaner methods or
nicely wrapped variants of the above probably exist. And yes, doing
ugly things in userspace is preferable to adding application-specific
baggage to the kernel.

> > As has been pointed out, an rlimit solution exists now as well.
>
> Wrong, as was said repeatedly, rlimits only help with mlock! Have you
> even been reading the thread?

Feh. The RT scheduling class issue is orthogonal. Addressing mlock and
scheduling class at once (and nothing else) is actually an ugliness of
your LSM approach as there are folks who want mlock and not RT.

--
Mathematics is the supreme nostalgia of our time.

Lee Revell

unread,

Jan 7, 2005, 4:33:53 PM1/7/05

to Paul Davis, Arjan van de Ven, Christoph Hellwig, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

On Fri, 2005-01-07 at 11:20 -0500, Paul Davis wrote:
> >On Fri, Jan 07, 2005 at 10:41:40AM -0500, Paul Davis wrote:
> >>
> >> fine, so the mlock situation may have improved enough post-2.6.9 that
> >> it can be considered fixed. that leaves the scheduler issue. but
> >> apparently, a uid/gid solution is OK for mlock, and not for the
> >> scheduler. am i missing something?
> >
> >I think you skipped a step. You don't have a scheduler requirement, you have
> >a latency requirement. You currently *solve* that latency requirement via a
> >scheduler "hack", yet is quite clear that the "hard" realtime solution is
> >most likely not the right approach. Note that I'm not saying that you
>
> Why is that clear? In just about every respect, realtime audio has the
> same characteristics as hard realtime, except that nobody gets hurt
> when a deadline is missed :) We have an IRQ source, and a deadline
> (sometimes on the sub-msec range, but more typically 1-5msec) for the
> work that has to be done. This deadline is tight enough that the task
> essentially *has* to run with SCHED_FIFO scheduling, because doing
> almost anything else instead will cause the deadline to be missed.
>

It's not like hard realtime, it is. All that makes a hard RT system is
that missing a deadline means the system has utterly failed. How is
this any different than an xrun causing a loud pop or click in a live
performance?

Really, I think Linux has owned the server space for so long that some
folks on this list are getting hubristic. Just because you have the
best server OS does not mean it's the best at everything.

Lee

Chris Wright

unread,

Jan 7, 2005, 4:43:48 PM1/7/05

to Matt Mackall, Lee Revell, Jack O'Quin, Alan Cox, Andreas Steinmetz, Chris Wright, Linux Kernel Mailing List, Andrew Morton, Ingo Molnar, LAD mailing list

* Matt Mackall (m...@selenic.com) wrote:

> Feh. The RT scheduling class issue is orthogonal. Addressing mlock and
> scheduling class at once (and nothing else) is actually an ugliness of
> your LSM approach as there are folks who want mlock and not RT.

Last I checked they could be controlled separately in that module. It
has been suggested (by me and others) that one possible solution would
be to expand it to be generic for all caps.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

Andrew Morton

unread,

Jan 7, 2005, 4:51:51 PM1/7/05

to Lee Revell, pa...@linuxaudiosystems.com, arj...@redhat.com, h...@infradead.org, mi...@elte.hu, chr...@osdl.org, al...@lxorguk.ukuu.org.uk, j...@io.com, linux-...@vger.kernel.org

Lee Revell <rlre...@joe-job.com> wrote:
>
> Really, I think Linux has owned the server space for so long that some
> folks on this list are getting hubristic. Just because you have the
> best server OS does not mean it's the best at everything.

nah, the requirement is clearly valid, and longstanding. We need to
satisfy it. It's just a matter of working out the best way.

Chris Wright <chr...@osdl.org> wrote:
>
> ...

> Last I checked they could be controlled separately in that module. It
> has been suggested (by me and others) that one possible solution would
> be to expand it to be generic for all caps.

Maybe this is the way?

Valdis.K...@vt.edu

unread,

Jan 7, 2005, 5:13:59 PM1/7/05

to Andrew Morton, Lee Revell, pa...@linuxaudiosystems.com, arj...@redhat.com, h...@infradead.org, mi...@elte.hu, chr...@osdl.org, al...@lxorguk.ukuu.org.uk, j...@io.com, linux-...@vger.kernel.org

On Fri, 07 Jan 2005 13:49:41 PST, Andrew Morton said:

> Chris Wright <chr...@osdl.org> wrote:

> > Last I checked they could be controlled separately in that module. It
> > has been suggested (by me and others) that one possible solution would
> > be to expand it to be generic for all caps.
>
> Maybe this is the way?

We already *know* how to (in principle) fix the capabilities system to make
it useful. We should probably investigate doing that and at the same time
fixing the current CAP_SYS_ADMIN mess (which we also have at least some ideas
on fixing). The remaining problem is possible breakage of software that's doing
capability things The Old Way (as the inheritance rules are incompatible).

Linus at one time said that a 2.7 might open if there was some issue that
caused enough disruption to require a fork - could this be it, or does somebody
have a better way to address the backward-combatability problem?

Christoph Hellwig

unread,

Jan 7, 2005, 5:26:04 PM1/7/05

to Andrew Morton, Lee Revell, pa...@linuxaudiosystems.com, arj...@redhat.com, h...@infradead.org, mi...@elte.hu, chr...@osdl.org, al...@lxorguk.ukuu.org.uk, j...@io.com, linux-...@vger.kernel.org

On Fri, Jan 07, 2005 at 01:49:41PM -0800, Andrew Morton wrote:
> Chris Wright <chr...@osdl.org> wrote:
> >
> > ...
> > Last I checked they could be controlled separately in that module. It
> > has been suggested (by me and others) that one possible solution would
> > be to expand it to be generic for all caps.
>
> Maybe this is the way?

It's at least not as bad as the current hack (when properly done in
the capabilities modules instead of adding one ontop).

I must say I'm not exactly happy with that idea still. It ties the
privilegues we have been separating from a special uid (0) to filesystem
permissions again. It's not nessecarily a bad idea per, but it doesn't
really fit into the model we've been working to. I'd expect quite a few
unpleasant devices when a user detects that the distibution had been
binding various capabilities to uids/gids behinds his back.

So to make forward progress I'd like the audio people to confirm whether
the mlock bits in 2.6.9+ do help that half of their requirement first
(and if not find a way to fix it) and then tackle the scheduling part.
For that one I really wonder whether the combination of the now actually
working nicelevels (see Mingo's post) and a simple wrapper for the really
high requirements cases doesn't work.

Paul Davis

unread,

Jan 7, 2005, 5:39:36 PM1/7/05

to Andrew Morton, Lee Revell, arj...@redhat.com, h...@infradead.org, mi...@elte.hu, chr...@osdl.org, al...@lxorguk.ukuu.org.uk, j...@io.com, linux-...@vger.kernel.org

>> Last I checked they could be controlled separately in that module. It
>> has been suggested (by me and others) that one possible solution would
>> be to expand it to be generic for all caps.
>
>Maybe this is the way?

that would make a much more complex LSM, and thus opens the doors to
some inadvertent security hazard that doesn't arise in the simpler
tool we have now.

other than that, its not a terrible suggestion at all, just a lot, lot
more work.

--p

Chris Wright

unread,

Jan 7, 2005, 5:42:39 PM1/7/05

to Christoph Hellwig, Andrew Morton, Lee Revell, pa...@linuxaudiosystems.com, arj...@redhat.com, mi...@elte.hu, chr...@osdl.org, al...@lxorguk.ukuu.org.uk, j...@io.com, linux-...@vger.kernel.org

* Christoph Hellwig (h...@infradead.org) wrote:
> On Fri, Jan 07, 2005 at 01:49:41PM -0800, Andrew Morton wrote:
> > Chris Wright <chr...@osdl.org> wrote:
> > >
> > > ...
> > > Last I checked they could be controlled separately in that module. It
> > > has been suggested (by me and others) that one possible solution would
> > > be to expand it to be generic for all caps.
> >
> > Maybe this is the way?
>
> It's at least not as bad as the current hack (when properly done in
> the capabilities modules instead of adding one ontop).
>
> I must say I'm not exactly happy with that idea still. It ties the
> privilegues we have been separating from a special uid (0) to filesystem
> permissions again. It's not nessecarily a bad idea per, but it doesn't
> really fit into the model we've been working to. I'd expect quite a few
> unpleasant devices when a user detects that the distibution had been
> binding various capabilities to uids/gids behinds his back.

I agree, it's still a hack, just a generic and complete hack ;-)

> So to make forward progress I'd like the audio people to confirm whether
> the mlock bits in 2.6.9+ do help that half of their requirement first

It sure should, but I guess they can reply on that.

> (and if not find a way to fix it) and then tackle the scheduling part.
> For that one I really wonder whether the combination of the now actually
> working nicelevels (see Mingo's post) and a simple wrapper for the really
> high requirements cases doesn't work.

I saw Jack (I think) post some numbers showing that it wasn't enough.
What about making priority level protected via rlimit?

Here's an uncompiled, untested patch doing that (probably has some math
error or logic hole in it, but idea seems sound enough). I think it has
at least one problem, where nice 19 process, could renice itself back to
0. And it doesn't really handle the different scheduling policies,
other than implicit 40 - 139 being used for SCHED_FIFO/SCHED_RR.

It takes the 140 priority levels (0-139), inverts their priority
order, and then uses that number as the basis for the rlimit (so that a
larger rlimit means higher priority, to fall inline with normal rlimit
semantics). Defaults to 19 (which should be niceval of 0). And allows
CAP_SYS_NICE to continue to override if the rlimit is too low.

===== kernel/sched.c 1.386 vs edited =====
--- 1.386/kernel/sched.c 2005-01-04 18:48:21 -08:00
+++ edited/kernel/sched.c 2005-01-07 14:23:32 -08:00
@@ -3009,12 +3009,8 @@ asmlinkage long sys_nice(int increment)
* We don't have to worry. Conceptually one call occurs first
* and we have a single winner.
*/
- if (increment < 0) {
- if (!capable(CAP_SYS_NICE))
- return -EPERM;
- if (increment < -40)
- increment = -40;
- }
+ if (increment < -40)
+ increment = -40;
if (increment > 40)
increment = 40;

@@ -3024,6 +3020,11 @@ asmlinkage long sys_nice(int increment)
if (nice > 19)
nice = 19;

+ if ((MAX_PRIO-1) - NICE_TO_PRIO(nice) >
+ current->signal->rlim[RLIMIT_PRIO].rlim_cur &&
+ !capable(CAP_SYS_NICE))
+ return -EPERM;
+
retval = security_task_setnice(current, nice);
if (retval)
return retval;
@@ -3057,6 +3058,15 @@ int task_nice(const task_t *p)
}

/**
+ * nice_to_prio - return priority of give nice value
+ * @nice: nice value
+ */
+int nice_to_prio(const int nice)
+{
+ return NICE_TO_PRIO(nice);
+}
+
+/**
* idle_cpu - is a given cpu idle currently?
* @cpu: the processor in question.
*/
@@ -3140,6 +3150,7 @@ recheck:

retval = -EPERM;
if ((policy == SCHED_FIFO || policy == SCHED_RR) &&
+ lp.sched_priority+40 > p->signal->rlim[RLIMIT_PRIO].rlim_cur &&
!capable(CAP_SYS_NICE))
goto out_unlock;
if ((current->euid != p->euid) && (current->euid != p->uid) &&
===== kernel/sys.c 1.102 vs edited =====
--- 1.102/kernel/sys.c 2005-01-06 23:25:46 -08:00
+++ edited/kernel/sys.c 2005-01-07 14:13:37 -08:00
@@ -225,7 +225,9 @@ static int set_one_prio(struct task_stru
error = -EPERM;
goto out;
}
- if (niceval < task_nice(p) && !capable(CAP_SYS_NICE)) {
+ if ((MAX_PRIO-1) - nice_to_prio(niceval) >
+ p->signal->rlim[RLIMIT_PRIO].rlim_cur &&
+ !capable(CAP_SYS_NICE)) {
error = -EACCES;
goto out;
}
===== include/asm-i386/resource.h 1.5 vs edited =====
--- 1.5/include/asm-i386/resource.h 2004-08-23 01:15:26 -07:00
+++ edited/include/asm-i386/resource.h 2005-01-07 13:55:37 -08:00
@@ -18,8 +18,9 @@
#define RLIMIT_LOCKS 10 /* maximum file locks held */
#define RLIMIT_SIGPENDING 11 /* max number of pending signals */
#define RLIMIT_MSGQUEUE 12 /* maximum bytes in POSIX mqueues */
+#define RLIMIT_PRIO 13 /* maximum scheduling priority */

-#define RLIM_NLIMITS 13
+#define RLIM_NLIMITS 14

/*
@@ -45,6 +46,7 @@
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ MAX_SIGPENDING, MAX_SIGPENDING }, \
{ MQ_BYTES_MAX, MQ_BYTES_MAX }, \
+ { 19, 19 }, \
}

#endif /* __KERNEL__ */
===== include/linux/sched.h 1.280 vs edited =====
--- 1.280/include/linux/sched.h 2005-01-04 18:48:20 -08:00
+++ edited/include/linux/sched.h 2005-01-07 14:14:16 -08:00
@@ -760,6 +760,7 @@ extern void sched_idle_next(void);
extern void set_user_nice(task_t *p, long nice);
extern int task_prio(const task_t *p);
extern int task_nice(const task_t *p);
+extern int nice_to_prio(const int nice);
extern int task_curr(const task_t *p);
extern int idle_cpu(int cpu);

Chris Wright

unread,

Jan 7, 2005, 5:46:39 PM1/7/05

to Valdis.K...@vt.edu, Andrew Morton, Lee Revell, pa...@linuxaudiosystems.com, arj...@redhat.com, h...@infradead.org, mi...@elte.hu, chr...@osdl.org, al...@lxorguk.ukuu.org.uk, j...@io.com, linux-...@vger.kernel.org

* Valdis.K...@vt.edu (Valdis.K...@vt.edu) wrote:
> On Fri, 07 Jan 2005 13:49:41 PST, Andrew Morton said:
>
> > Chris Wright <chr...@osdl.org> wrote:
>
> > > Last I checked they could be controlled separately in that module. It
> > > has been suggested (by me and others) that one possible solution would
> > > be to expand it to be generic for all caps.
> >
> > Maybe this is the way?
>
> We already *know* how to (in principle) fix the capabilities system to make
> it useful. We should probably investigate doing that and at the same time
> fixing the current CAP_SYS_ADMIN mess (which we also have at least some ideas
> on fixing). The remaining problem is possible breakage of software that's doing
> capability things The Old Way (as the inheritance rules are incompatible).

Fixing CAP_SYS_ADMIN whole other can o' worms. No point in tangling the
two.

> Linus at one time said that a 2.7 might open if there was some issue that
> caused enough disruption to require a fork - could this be it, or does somebody
> have a better way to address the backward-combatability problem?

There's at least two ways. Introduce a new capability module or introduce
a PF flag to opt in. Neither are great

--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

Andreas Steinmetz

unread,

Jan 7, 2005, 5:55:42 PM1/7/05

to Andrew Morton, Lee Revell, pa...@linuxaudiosystems.com, arj...@redhat.com, h...@infradead.org, mi...@elte.hu, chr...@osdl.org, al...@lxorguk.ukuu.org.uk, j...@io.com, linux-...@vger.kernel.org

Andrew Morton wrote:
> Lee Revell <rlre...@joe-job.com> wrote:
>
>>Really, I think Linux has owned the server space for so long that some
>>folks on this list are getting hubristic. Just because you have the
>>best server OS does not mean it's the best at everything.
>
>
> nah, the requirement is clearly valid, and longstanding. We need to
> satisfy it. It's just a matter of working out the best way.
>
> Chris Wright <chr...@osdl.org> wrote:
>
>>...
>>Last I checked they could be controlled separately in that module. It
>>has been suggested (by me and others) that one possible solution would
>>be to expand it to be generic for all caps.
>
>
> Maybe this is the way?

This could give an advantage for e.g. networked daemons, too. No more
root privilege necessary for applications just to bind to a privileged
port which does make life easier (CAP_NET_BIND_SERVICE). Other ideas for
e.g. CAP_NET_RAW or CAP_SYS_RAWIO come to mind. Using the current
capabilties in this design as all incuding supersets that can be defined
more fine grained in a later step I guess should suit others, too. The
remaining problem would then be the design of an extensible interface
that is backwards compatible.

--
Andreas Steinmetz SPAMmers use robo...@domdv.de

Valdis.K...@vt.edu

unread,

Jan 7, 2005, 6:11:55 PM1/7/05

to Chris Wright, Andrew Morton, Lee Revell, pa...@linuxaudiosystems.com, arj...@redhat.com, h...@infradead.org, mi...@elte.hu, al...@lxorguk.ukuu.org.uk, j...@io.com, linux-...@vger.kernel.org

On Fri, 07 Jan 2005 14:36:38 PST, Chris Wright said:

> > We already *know* how to (in principle) fix the capabilities system to make
> > it useful. We should probably investigate doing that and at the same time
> > fixing the current CAP_SYS_ADMIN mess (which we also have at least some ideas
> > on fixing). The remaining problem is possible breakage of software that's doing
> > capability things The Old Way (as the inheritance rules are incompatible).
>
> Fixing CAP_SYS_ADMIN whole other can o' worms. No point in tangling the
> two.

Yes, it's two entire cans. The problem is that in *both* cases, we're probably
going to have to do an API change. It may be preferable to only require changes
on the userspace side once, rather than change it once to fix the inheritance
problems in 2.7/2.6.N+10 or whatever it will be, and then again in 2.9/2.6.N+20
or whatever....

> > Linus at one time said that a 2.7 might open if there was some issue that
> > caused enough disruption to require a fork - could this be it, or does somebody
> > have a better way to address the backward-combatability problem?
>
> There's at least two ways. Introduce a new capability module or introduce
> a PF flag to opt in. Neither are great

A new PF flag strikes me as marginally better, especially if we have a way to
propogate from Elf headers in a way similar to Execshield's use of elf_ex.e_phnum
to set the executable-stack...

Lee Revell

unread,

Jan 7, 2005, 6:16:31 PM1/7/05

to Christoph Hellwig, Andrew Morton, pa...@linuxaudiosystems.com, arj...@redhat.com, mi...@elte.hu, Chris Wright, al...@lxorguk.ukuu.org.uk, j...@io.com, linux-...@vger.kernel.org

On Fri, 2005-01-07 at 22:10 +0000, Christoph Hellwig wrote:
> It's not nessecarily a bad idea per, but it doesn't
> really fit into the model we've been working to. I'd expect quite a few
> unpleasant devices when a user detects that the distibution had been
> binding various capabilities to uids/gids behinds his back.
>

Point taken, but do keep in mind that this will *certainly* be disabled
by default, unless you run an audio oriented distro, and we assume those
people know what they're doing ;-)

> For that one I really wonder whether the combination of the now actually
> working nicelevels (see Mingo's post)

Ingo said "it should work". It currently doesn't, as you can see from
Jack's post. My concern here is, the semantics of SCHED_FIFO are well
defined and stable. The highest priority runnable SCHED_FIFO process
*always* runs. The semantics of "nice -20" apparently change from
release to release, as you can see. We can't have the scheduler
deciding to run something else when jackd needs to run because it
decides jackd is hogging the CPU or whatever. Everyone knows that when
dealing with realtime constraints the important case is not the average
but the worst.

In a live audio situation an xrun storm and a complete system lockup are
both catastrophic failures.

Lee

Andrew Morton

unread,

Jan 7, 2005, 6:38:02 PM1/7/05

to Valdis.K...@vt.edu, chr...@osdl.org, rlre...@joe-job.com, pa...@linuxaudiosystems.com, arj...@redhat.com, h...@infradead.org, mi...@elte.hu, al...@lxorguk.ukuu.org.uk, j...@io.com, linux-...@vger.kernel.org

Valdis.K...@vt.edu wrote:
>
> fix the inheritance problems

Does anyone actually have a handle on what's involved in fixing the
inheritance problem?

It's risky, but it is something which we should do.

<grumpytroll> We really shouldn't have merged all that new fancy security
stuff when the existing security framework was known-badly-broken.
Especially as the new stuff seems incapable of doing simple things which
unbroken inherited caps would do perfectly.</grumpytroll>

Valdis.K...@vt.edu

unread,

Jan 7, 2005, 6:43:23 PM1/7/05

to Andrew Morton, chr...@osdl.org, rlre...@joe-job.com, pa...@linuxaudiosystems.com, arj...@redhat.com, h...@infradead.org, mi...@elte.hu, al...@lxorguk.ukuu.org.uk, j...@io.com, linux-...@vger.kernel.org

On Fri, 07 Jan 2005 15:20:04 PST, Andrew Morton said:

> Does anyone actually have a handle on what's involved in fixing the
> inheritance problem?

Andy Lutomirski was looking at that, and it's actually a very small but
incompatible change that allows filesystem support for set-capability files to
be actually usable. He posted some patches back in May....

Paul Davis

unread,

Jan 7, 2005, 7:31:21 PM1/7/05

to Christoph Hellwig, Andrew Morton, Lee Revell, arj...@redhat.com, mi...@elte.hu, chr...@osdl.org, al...@lxorguk.ukuu.org.uk, j...@io.com, linux-...@vger.kernel.org

>So to make forward progress I'd like the audio people to confirm whether
>the mlock bits in 2.6.9+ do help that half of their requirement first

it does, although it would be nicer to not have two separate
components to administering the usability of realtime applications.

>(and if not find a way to fix it) and then tackle the scheduling part.
>For that one I really wonder whether the combination of the now actually
>working nicelevels (see Mingo's post) and a simple wrapper for the really
>high requirements cases doesn't work.

Jack already posted results: the nice levels are massively inferior as
they currently stand.

The wrapper is incredibly inconvenient for applications: when you use
JACK, start clients would require a different command depending on
whether JACK is using RT mode or not. That is extremely inelegant, and
its why we've developed these solutions (caps+jackstart for 2.4,
"realtime" LSM for 2.6).

--p

Con Kolivas

unread,

Jan 8, 2005, 12:41:27 AM1/8/05

to Takashi Iwai, Arjan van de Ven, Paul Davis, Christoph Hellwig, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

Takashi Iwai wrote:
> At Fri, 7 Jan 2005 17:03:51 +0100,
> Arjan van de Ven wrote:
>>something like a soft realtime flag that acts as if it's the hard realtime
>>one unless the app shows "misbehavior" (eg eats its timeslice for X times in
>>a row) might for example be such a solution. And with the anti abuse
>>protection it can run with far lighter privilegs.
>
>
> This reminds me about the soft-RT patch posted quite sometime ago.
> I feel such a handy psuedo-RT scheduler class would be useful for
> other systems than JACK, too...

You've already proven that soft RT does not suit your requirements. The
current scheduler running a task at nice -20 has extremely long periods
of cpu availability at the expense of lower priority tasks and is close
to the behaviour you would get with a soft RT patch. Your concern is
exactly the scenario where nice -20 fails, and would be the same
scenario where a soft RT policy would fail. Doing this with a scheduling
policy, you want cpu time long after there is any hope for fairness or
safety of hanging. From experimentation with such soft RT policies, we
find average latencies can be reduced but the maximum ones, which are
the ones that concern professional audio, remain the same.

Cheers,
Con

signature.asc

Jack O'Quin

unread,

Jan 8, 2005, 4:48:33 AM1/8/05

to Con Kolivas, Takashi Iwai, Arjan van de Ven, Paul Davis, Christoph Hellwig, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Linux Kernel Mailing List, Andrew Morton

Con Kolivas <ker...@kolivas.org> writes:

Yes, this is exactly right. The corrected test results I just posted
support your contention.

For realtime, most of the OS tricks we all know and love are
counter-productive. It's the worst case that matters, not the
average.
--
joq

Jack O'Quin

unread,

Jan 8, 2005, 4:55:21 AM1/8/05

to Chris Wright, Christoph Hellwig, Andrew Morton, Lee Revell, pa...@linuxaudiosystems.com, arj...@redhat.com, mi...@elte.hu, al...@lxorguk.ukuu.org.uk, linux-...@vger.kernel.org

Chris Wright <chr...@osdl.org> writes:

> * Christoph Hellwig (h...@infradead.org) wrote:
>> So to make forward progress I'd like the audio people to confirm whether
>> the mlock bits in 2.6.9+ do help that half of their requirement first
>
> It sure should, but I guess they can reply on that.

That does seem to work now (finally). It looks like that longstanding
CAP_IPC_LOCK bug is finally fixed, too.

I find it hard to understand why some of you think PAM is an adequate
solution. As currently deployed, it is poorly documented and nearly
impossible for non-experts to administer securely. On my Debian woody
system, when I login from the console I get one fairly sensible set of
ulimit values, but from gdm I get a much more permissive set (with
ulimited mlocking, BTW). Apparently, this is because the `gdm' PAM
config includes `session required pam_limits.so' but the system comes
with an empty /etc/security/limits.conf. I'm just guessing about that
because I can't find any decent documentation for any of this crap.

Remember, if something is difficult to administer, it's *not* secure.

>> (and if not find a way to fix it) and then tackle the scheduling part.
>> For that one I really wonder whether the combination of the now actually
>> working nicelevels (see Mingo's post) and a simple wrapper for the really
>> high requirements cases doesn't work.
>
> I saw Jack (I think) post some numbers showing that it wasn't enough.
> What about making priority level protected via rlimit?

The numbers I reported yesterday were so bad I couldn't figure out why
anyone even thought it was worth trying. Now I realize why.

When Ingo said to try "nice -20", I took him literally, forgetting
that the stupid command to achieve a nice value of -20 is `nice --20'.
So I was actually testing with a nice value of 19. Bah! No wonder it
sucked.

Running `nice --20' is still significantly worse than SCHED_FIFO, but
not the unmitigated disaster shown in the middle column. But, this
improved performance is still not adequate for audio work. The worst
delay was absurdly long (~1/2 sec).

Here are the corrected results...

With -R Without -R Without -R
(SCHED_FIFO) (nice -20) (nice --20)

************* SUMMARY RESULT ****************
Total seconds ran . . . . . . : 300
Number of clients . . . . . . : 20
Ports per client . . . . . . : 4
Frames per buffer . . . . . . : 64
*********************************************
Timeout Count . . . . . . . . :( 1) ( 1) ( 1)
XRUN Count . . . . . . . . . : 2 2837 43
Delay Count (>spare time) . . : 0 0 0
Delay Count (>1000 usecs) . . : 0 0 0
Delay Maximum . . . . . . . . : 3130 usecs 5038044 usecs 501374 usecs
Cycle Maximum . . . . . . . . : 960 usecs 18802 usecs 1036 usecs
Average DSP Load. . . . . . . : 34.3 % 44.1 % 34.3 %
Average CPU System Load . . . : 8.7 % 7.5 % 7.8 %
Average CPU User Load . . . . : 29.8 % 5.2 % 25.3 %
Average CPU Nice Load . . . . : 0.0 % 20.3 % 0.0 %
Average CPU I/O Wait Load . . : 3.2 % 5.2 % 0.1 %
Average CPU IRQ Load . . . . : 0.7 % 0.7 % 0.7 %
Average CPU Soft-IRQ Load . . : 0.0 % 0.2 % 0.0 %
Average Interrupt Rate . . . : 1707.6 /sec 1677.3 /sec 1692.9 /sec
Average Context-Switch Rate . : 11914.9 /sec 11197.6 /sec 11611.2 /sec
*********************************************

> Here's an uncompiled, untested patch doing that (probably has some math
> error or logic hole in it, but idea seems sound enough). I think it has
> at least one problem, where nice 19 process, could renice itself back to
> 0. And it doesn't really handle the different scheduling policies,
> other than implicit 40 - 139 being used for SCHED_FIFO/SCHED_RR.
>
> It takes the 140 priority levels (0-139), inverts their priority
> order, and then uses that number as the basis for the rlimit (so that a
> larger rlimit means higher priority, to fall inline with normal rlimit
> semantics). Defaults to 19 (which should be niceval of 0). And allows
> CAP_SYS_NICE to continue to override if the rlimit is too low.

If you really want to use PAM for everything, then this idea makes a
lot of sense.

But, what about all the other programs that would need updating to
make it useful? We'd need at least a new pam_limits.so module and a
new shell (since ulimit is built-in). I expect I will need to
maintain the realtime-lsm for at least another year before all that
can trickle down to actual end users.

--
joq

Paul Jakma

unread,

Jan 8, 2005, 8:09:58 AM1/8/05

to Paul Davis, Martin Mares, Arjan van de Ven, Christoph Hellwig, Lee Revell, Ingo Molnar, Chris Wright, Alan Cox, Jack O'Quin, Linux Kernel Mailing List, Andrew Morton

On Fri, 7 Jan 2005, Paul Davis wrote:

> capabilities work - we use them in 2.4 where a helper suid application
> gets the ball rolling, and then its child grants capabilities to new
> clients.

We use them too in Quagga. Reasonably happy with them.

Not a panacae, but far better to retain just a few capabilities, than
retaining ruid 0 (as we must on other systems).

Only issue really is "graininess" of capabilities, which i'd guess is
a double-edged sword.

regards,
--
Paul Jakma pa...@clubi.ie pa...@jakma.org Key ID: 64A2FF6A
Fortune:
Kill Ugly Radio
- Frank Zappa

ro...@jose.lug.udel.edu

unread,

Jan 8, 2005, 11:58:24 AM1/8/05

to Jack O'Quin, Chris Wright, Christoph Hellwig, Andrew Morton, Lee Revell, pa...@linuxaudiosystems.com, arj...@redhat.com, mi...@elte.hu, al...@lxorguk.ukuu.org.uk, linux-...@vger.kernel.org

On Sat, Jan 08, 2005 at 12:12:59AM -0600, Jack O'Quin wrote:
> I find it hard to understand why some of you think PAM is an adequate
> solution. As currently deployed, it is poorly documented and nearly
> impossible for non-experts to administer securely. On my Debian woody
> system, when I login from the console I get one fairly sensible set of
> ulimit values, but from gdm I get a much more permissive set (with
> ulimited mlocking, BTW). Apparently, this is because the `gdm' PAM
> config includes `session required pam_limits.so' but the system comes
> with an empty /etc/security/limits.conf. I'm just guessing about that
> because I can't find any decent documentation for any of this crap.
>
> Remember, if something is difficult to administer, it's *not* secure.

Not to mention that not everyone chooses to use PAM for precisely this
reason. Slackware has never included PAM and probably never will.
My audio workstation has worked swell with the 2.4+caps solution and
the 2.6+LSM solution. PAM would break me ::-(

--
Ross Vandegrift
ro...@lug.udel.edu

"The good Christian should beware of mathematicians, and all those who
make empty prophecies. The danger already exists that the mathematicians
have made a covenant with the devil to darken the spirit and to confine
man in the bonds of Hell."
--St. Augustine, De Genesi ad Litteram, Book II, xviii, 37

Christoph Hellwig

unread,

Jan 8, 2005, 1:28:15 PM1/8/05

to ro...@jose.lug.udel.edu, Jack O'Quin, Chris Wright, Andrew Morton, Lee Revell, pa...@linuxaudiosystems.com, arj...@redhat.com, mi...@elte.hu, al...@lxorguk.ukuu.org.uk, linux-...@vger.kernel.org

On Sat, Jan 08, 2005 at 11:56:57AM -0500, ro...@lug.udel.edu wrote:
> On Sat, Jan 08, 2005 at 12:12:59AM -0600, Jack O'Quin wrote:
> > I find it hard to understand why some of you think PAM is an adequate
> > solution. As currently deployed, it is poorly documented and nearly
> > impossible for non-experts to administer securely. On my Debian woody
> > system, when I login from the console I get one fairly sensible set of
> > ulimit values, but from gdm I get a much more permissive set (with
> > ulimited mlocking, BTW). Apparently, this is because the `gdm' PAM
> > config includes `session required pam_limits.so' but the system comes
> > with an empty /etc/security/limits.conf. I'm just guessing about that
> > because I can't find any decent documentation for any of this crap.
> >
> > Remember, if something is difficult to administer, it's *not* secure.
>
> Not to mention that not everyone chooses to use PAM for precisely this
> reason. Slackware has never included PAM and probably never will.
> My audio workstation has worked swell with the 2.4+caps solution and
> the 2.6+LSM solution. PAM would break me ::-(

you can set rmlimits as well without pam. it's just more complicated.
But hey, it was you who didn't want to use it :)

Lee Revell

unread,

Jan 8, 2005, 5:22:15 PM1/8/05

to Jack O'Quin, Chris Wright, Christoph Hellwig, Andrew Morton, pa...@linuxaudiosystems.com, arj...@redhat.com, mi...@elte.hu, al...@lxorguk.ukuu.org.uk, linux-...@vger.kernel.org

On Sat, 2005-01-08 at 00:12 -0600, Jack O'Quin wrote:
> I find it hard to understand why some of you think PAM is an adequate
> solution. As currently deployed, it is poorly documented and nearly
> impossible for non-experts to administer securely. On my Debian woody
> system, when I login from the console I get one fairly sensible set of
> ulimit values, but from gdm I get a much more permissive set (with
> ulimited mlocking, BTW). Apparently, this is because the `gdm' PAM
> config includes `session required pam_limits.so' but the system comes
> with an empty /etc/security/limits.conf. I'm just guessing about that
> because I can't find any decent documentation for any of this crap.

Eh, PAM is a perfectly fine solution. Documentation is lacking, but
it's easy to find examples. On my system /etc/security/limits.conf has
this sample config, commented out:

#<domain> <type> <item> <value>
#

#* soft core 0
#* hard rss 10000
#@student hard nproc 20
#@faculty soft nproc 20
#@faculty hard nproc 50
#ftp hard nproc 0

So add your audio users (or cdrecord users, or whoever) to group
realtime and add:

realtime hard memlock 100000
realtime soft prio 100

Problem solved.

Lee

Lee Revell

unread,

Jan 8, 2005, 5:26:26 PM1/8/05

to ro...@jose.lug.udel.edu, Jack O'Quin, Chris Wright, Christoph Hellwig, Andrew Morton, pa...@linuxaudiosystems.com, arj...@redhat.com, mi...@elte.hu, al...@lxorguk.ukuu.org.uk, linux-...@vger.kernel.org

On Sat, 2005-01-08 at 11:56 -0500, ro...@lug.udel.edu wrote:
> Not to mention that not everyone chooses to use PAM for precisely this
> reason. Slackware has never included PAM and probably never will.
> My audio workstation has worked swell with the 2.4+caps solution and
> the 2.6+LSM solution. PAM would break me ::-(

Hmm. How could you (for example) configure all your machines to
authenticate against an LDAP server without PAM?

Lee