The %pK format specifier is designed to hide exposed kernel pointers
from unprivileged users, specifically via /proc interfaces. Its
behavior depends on the kptr_restrict sysctl, whose default value
depends on CONFIG_SECURITY_KPTR_RESTRICT. If kptr_restrict is set to 0,
no deviation from the standard %p behavior occurs. If kptr_restrict is
set to 1, if the current user (intended to be a reader via seq_printf(),
etc.) does not have CAP_SYSLOG (which is currently in the LSM tree),
kernel pointers using %pK are printed as 0's. This was chosen over the
default "(null)", which cannot be parsed by userland %p, which expects
"(nil)".
Signed-off-by: Dan Rosenberg <drose...@vsecurity.com>
---
Documentation/sysctl/kernel.txt | 14 ++++++++++++++
include/linux/kernel.h | 2 ++
kernel/sysctl.c | 9 +++++++++
lib/vsprintf.c | 18 ++++++++++++++++++
security/Kconfig | 12 ++++++++++++
5 files changed, 55 insertions(+), 0 deletions(-)
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 209e158..e5373f3 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -34,6 +34,7 @@ show up in /proc/sys/kernel:
- hotplug
- java-appletviewer [ binfmt_java, obsolete ]
- java-interpreter [ binfmt_java, obsolete ]
+- kptr_restrict
- kstack_depth_to_print [ X86 only ]
- l2cr [ PPC only ]
- modprobe ==> Documentation/debugging-modules.txt
@@ -261,6 +262,19 @@ This flag controls the L2 cache of G3 processor boards. If
==============================================================
+kptr_restrict:
+
+This toggle indicates whether unprivileged users are prevented from reading
+kernel addresses via /proc and other interfaces. When kptr_restrict is set
+to (0), there are no restrictions. When kptr_restrict is set to (1), kernel
+pointers printed using the %pK format specifier will be replaced with 0's
+unless the user has CAP_SYSLOG.
+
+The kernel config option CONFIG_SECURITY_KPTR_RESTRICT sets the default
+value of kptr_restrict.
+
+==============================================================
+
kstack_depth_to_print: (X86 only)
Controls the number of words to print when dumping the raw
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index b6de9a6..b4f4863 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -201,6 +201,8 @@ extern int sscanf(const char *, const char *, ...)
extern int vsscanf(const char *, const char *, va_list)
__attribute__ ((format (scanf, 2, 0)));
+extern int kptr_restrict; /* for sysctl */
+
extern int get_option(char **str, int *pint);
extern char *get_options(const char *str, int nints, int *ints);
extern unsigned long long memparse(const char *ptr, char **retptr);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 5abfa15..de46e47 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -713,6 +713,15 @@ static struct ctl_table kern_table[] = {
},
#endif
{
+ .procname = "kptr_restrict",
+ .data = &kptr_restrict,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &zero,
+ .extra2 = &one,
+ },
+ {
.procname = "ngroups_max",
.data = &ngroups_max,
.maxlen = sizeof (int),
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index c150d3d..c011249 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -936,6 +936,8 @@ char *uuid_string(char *buf, char *end, const u8 *addr,
return string(buf, end, uuid, spec);
}
+int kptr_restrict = CONFIG_SECURITY_KPTR_RESTRICT;
+
/*
* Show a '%p' thing. A kernel extension is that the '%p' is followed
* by an extra set of alphanumeric characters that are extended format
@@ -979,6 +981,7 @@ char *uuid_string(char *buf, char *end, const u8 *addr,
* Implements a "recursive vsnprintf".
* Do not use this feature without some mechanism to verify the
* correctness of the format string and va_list arguments.
+ * - 'K' For a kernel pointer that should be hidden from unprivileged users
*
* Note: The difference between 'S' and 'F' is that on ia64 and ppc64
* function pointers are really function descriptors, which contain a
@@ -1035,6 +1038,21 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
return buf + vsnprintf(buf, end - buf,
((struct va_format *)ptr)->fmt,
*(((struct va_format *)ptr)->va));
+ case 'K':
+ if (kptr_restrict) {
+ if (in_interrupt())
+ WARN(1, "%%pK used in interrupt context.\n");
+
+ else if (capable(CAP_SYSLOG))
+ break;
+
+ if (spec.field_width == -1) {
+ spec.field_width = 2 * sizeof(void *);
+ spec.flags |= ZEROPAD;
+ }
+ return number(buf, end, 0, spec);
+ }
+ break;
}
spec.flags |= SMALL;
if (spec.field_width == -1) {
diff --git a/security/Kconfig b/security/Kconfig
index e80da95..944fc73 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -51,6 +51,18 @@ config SECURITY_DMESG_RESTRICT
If you are unsure how to answer this question, answer N.
+config SECURITY_KPTR_RESTRICT
+ bool "Hide kernel pointers from unprivileged users"
+ default n
+ help
+ This enforces restrictions on unprivileged users reading kernel
+ addresses via various interfaces, e.g. /proc.
+
+ If this option is not selected, no restrictions will be enforced
+ unless the kptr_restrict sysctl is explicitly set to (1).
+
+ If you are unsure how to answer this question, answer N.
+
config SECURITY
bool "Enable different security models"
depends on SYSFS
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Thanks for not giving credits to people suggesting this idea to you
(Thomas if I remember well), and not Ccing netdev where original
discussion took place.
> Signed-off-by: Dan Rosenberg <drose...@vsecurity.com>
> ---
> Documentation/sysctl/kernel.txt | 14 ++++++++++++++
> include/linux/kernel.h | 2 ++
> kernel/sysctl.c | 9 +++++++++
> lib/vsprintf.c | 18 ++++++++++++++++++
> security/Kconfig | 12 ++++++++++++
> 5 files changed, 55 insertions(+), 0 deletions(-)
...
So caller can not block BH ?
This seems wrong to me, please consider :
normal process context :
spin_lock_bh() ...
for (...)
{xxx}printf( ... "%pK" ...)
spin_unlock_bh();
> Thanks for not giving credits to people suggesting this idea to you
> (Thomas if I remember well), and not Ccing netdev where original
> discussion took place.
Yes, credits should be given to Thomas Graf
http://www.spinics.net/lists/netdev/msg146606.html
Thanks
I am happy to credit Thomas, even though he is far from the first person
to have suggested this approach to me. Thanks for the suggestion.
>
> So caller can not block BH ?
>
> This seems wrong to me, please consider :
>
> normal process context :
>
> spin_lock_bh() ...
>
> for (...)
> {xxx}printf( ... "%pK" ...)
>
> spin_unlock_bh();
>
I will think about this and address it.
-Dan
Would you be happier if I omitted the in_interrupt() check entirely?
Well, it seems difficult to make a check here, its a generic function
that happens to be used from different contexts.
Even using in_irq() might be a problem.
I agree it seems difficult - my only goal was to prevent subsequent
breakage with the capability check. Does anyone have any suggestions
for a better approach here?
-Dan
That's a bug in in_interrupt(), one I've been pointing out for a long
while. Luckily we recently grew the infrastructure to deal with it.
If you write it as: if (in_irq() || in_serving_softirq() || in_nmi())
you'll not trigger for the above example.
Ideally in_serving_softirq() wouldn't exist and in_softirq() would do
what in_server_softirq() does -- which would make it symmetric with the
hardirq functions -- but nobody has found time to audit all in_softirq()
users.
The %pK format specifier is designed to hide exposed kernel pointers
from unprivileged users, specifically via /proc interfaces. Its
behavior depends on the kptr_restrict sysctl, whose default value
depends on CONFIG_SECURITY_KPTR_RESTRICT. If kptr_restrict is set to 0,
no deviation from the standard %p behavior occurs. If kptr_restrict is
set to 1, if the current user (intended to be a reader via seq_printf(),
etc.) does not have CAP_SYSLOG (which is currently in the LSM tree),
kernel pointers using %pK are printed as 0's. This was chosen over the
default "(null)", which cannot be parsed by userland %p, which expects
"(nil)".
v2 improves checking for inappropriate context, on suggestion by Peter
Zijlstra. Thanks to Thomas Graf for suggesting use of a centralized
format specifier.
Signed-off-by: Dan Rosenberg <drose...@vsecurity.com>
CC: James Morris <jmo...@namei.org>
CC: Eugene Teo <euge...@kernel.org>
CC: Kees Cook <kees...@canonical.com>
CC: Ingo Molnar <mi...@elte.hu>
CC: David S. Miller <da...@davemloft.net>
CC: linux-secu...@vger.kernel.org
CC: net...@vger.kernel.org
---
Documentation/sysctl/kernel.txt | 14 ++++++++++++++
include/linux/kernel.h | 2 ++
kernel/sysctl.c | 9 +++++++++
lib/vsprintf.c | 18 ++++++++++++++++++
security/Kconfig | 12 ++++++++++++
5 files changed, 55 insertions(+), 0 deletions(-)
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index c150d3d..ceb1a3b 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -936,6 +936,8 @@ char *uuid_string(char *buf, char *end, const u8 *addr,
return string(buf, end, uuid, spec);
}
+int kptr_restrict = CONFIG_SECURITY_KPTR_RESTRICT;
+
/*
* Show a '%p' thing. A kernel extension is that the '%p' is followed
* by an extra set of alphanumeric characters that are extended format
@@ -979,6 +981,7 @@ char *uuid_string(char *buf, char *end, const u8 *addr,
* Implements a "recursive vsnprintf".
* Do not use this feature without some mechanism to verify the
* correctness of the format string and va_list arguments.
+ * - 'K' For a kernel pointer that should be hidden from unprivileged users
*
* Note: The difference between 'S' and 'F' is that on ia64 and ppc64
* function pointers are really function descriptors, which contain a
@@ -1035,6 +1038,21 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
return buf + vsnprintf(buf, end - buf,
((struct va_format *)ptr)->fmt,
*(((struct va_format *)ptr)->va));
+ case 'K':
+ if (kptr_restrict) {
+ if (in_irq() || in_serving_softirq() || in_nmi())
+ WARN(1, "%%pK used in interrupt context.\n");
This will come in very handy! Thanks for working on this approach. :)
Acked-by: Kees Cook <kees...@canonical.com>
-Kees
--
Kees Cook
Ubuntu Security Team
> The below patch adds the %pK format specifier, the
> CONFIG_SECURITY_KPTR_RESTRICT configuration option, and the
> kptr_restrict sysctl.
>
> The %pK format specifier is designed to hide exposed kernel pointers
> from unprivileged users, specifically via /proc interfaces. Its
> behavior depends on the kptr_restrict sysctl, whose default value
> depends on CONFIG_SECURITY_KPTR_RESTRICT. If kptr_restrict is set to 0,
> no deviation from the standard %p behavior occurs. If kptr_restrict is
> set to 1, if the current user (intended to be a reader via seq_printf(),
> etc.) does not have CAP_SYSLOG (which is currently in the LSM tree),
> kernel pointers using %pK are printed as 0's. This was chosen over the
> default "(null)", which cannot be parsed by userland %p, which expects
> "(nil)".
>
> v2 improves checking for inappropriate context, on suggestion by Peter
> Zijlstra. Thanks to Thomas Graf for suggesting use of a centralized
> format specifier.
The changelog doesn't describe why CONFIG_SECURITY_KPTR_RESTRICT
exists, nor why the kptr_restrict sysctl exists. I can kinda guess why
this was done, but it would be much better if your reasoning was
present here.
And I'd question whether we need CONFIG_SECURITY_KPTR_RESTRICT at all.
Disabling it saves no memory. Its presence just increases the level of
incompatibility between different vendor's kernels and potentially
doubles the number of kernels which distros must ship (which they of
course won't do). It might be better to add a kptr_restrict=1 kernel boot
option (although people sometimes have problems with boot options in
embedded environments).
All that being said, distro initscripts can just set the sysctl to the
desired value before any non-root process has even started, but this
apparently is far too hard for them :(
Finally, the changelog and the documentation changes don't tell us the
full /proc path to the kptr_restrict pseudo-file. That would be useful
info. Seems that it's /proc/sys/kernel/kptr_restrict?
>
> ...
And the reason why it's unusable in interrupt context is that we can't
meaningfully check CAP_SYSLOG from interrupt.
Fair enough, but this does restrict %pK's usefulness.
I think I'd be more comfortable with a WARN_ONCE here. If someone
screws up then we don't want to spew thousands of repeated warnings at
our poor users - one will do.
So what's next? We need to convert 1,000,000 %p callsites to use %pK?
That'll be fun. Please consider adding a new checkpatch rule which
detects %p and asks people whether they should have used %pK.
> + case 'K':
> + if (kptr_restrict) {
> + if (in_irq() || in_serving_softirq() || in_nmi())
> + WARN(1, "%%pK used in interrupt context.\n");
> +
> + else if (capable(CAP_SYSLOG))
> + break;
> +
> + if (spec.field_width == -1) {
> + spec.field_width = 2 * sizeof(void *);
> + spec.flags |= ZEROPAD;
> + }
> + return number(buf, end, 0, spec);
> + }
> + break;
Also, we should emit the runtime warning even if kptr_restrict is
false. Otherwise programmers might ship buggy code because they didn't
enable kptr_restrict during testing.
So what I ended up with was
case 'K':
/*
* %pK cannot be used in IRQ context because it tests
* CAP_SYSLOG.
*/
if (in_irq() || in_serving_softirq() || in_nmi())
WARN_ONCE(1, "%%pK used in interrupt context.\n");
if (!kptr_restrict)
break; /* %pK does not obscure pointers */
if (capable(CAP_SYSLOG))
break; /* privileged apps expose pointers */
if (spec.field_width == -1) {
spec.field_width = 2 * sizeof(void *);
spec.flags |= ZEROPAD;
}
return number(buf, end, 0, spec);
How does that look?
Also... permitting root to bypass the %pK obscuring seems pretty lame,
really. I bet a *lot* of the existing %p sites are already root-only
(eg, driver initialisation). So much of the value is lost.
I'll send a clean-up patch tomorrow fixing the documentation issues.
I'm also willing to take more feedback on the need for a config - this
was the approach that was recommended to me recently with
dmesg_restrict, but I also understand your reasoning.
Agreed.
>
> So what's next? We need to convert 1,000,000 %p callsites to use %pK?
> That'll be fun. Please consider adding a new checkpatch rule which
> detects %p and asks people whether they should have used %pK.
The goal of this format specifier is specifically for pointers that are
exposed to unprivileged users. I agree that hiding all kernel pointers
would be nice, but I don't expect the angry masses to ever agree to
that. For now, I'll isolate specific cases, especially in /proc, that
are clear risks in terms of information leakage. I'll also be skipping
over pointers written to the syslog, since I think hiding that
information is dmesg_restrict's job.
Thanks,
Dan
> >
> > So what's next? We need to convert 1,000,000 %p callsites to use %pK?
> > That'll be fun. Please consider adding a new checkpatch rule which
> > detects %p and asks people whether they should have used %pK.
>
> The goal of this format specifier is specifically for pointers that are
> exposed to unprivileged users. I agree that hiding all kernel pointers
> would be nice, but I don't expect the angry masses to ever agree to
> that. For now, I'll isolate specific cases, especially in /proc, that
> are clear risks in terms of information leakage. I'll also be skipping
> over pointers written to the syslog, since I think hiding that
> information is dmesg_restrict's job.
Well... some administrators may wish to hide the pointer values even
for privileged callers. That's a pretty trivial add-on for the code
which you have, and means that those admins can also suppress the
pointers for IRQ-time callers. More /proc knobs :)
Then again, perhaps those admins would be OK if we simply disabled
plain old %p everywhere. In which case we're looking at a separate
patch, I suggest.
I can add a "2" setting that hides %pK pointers regardless of privilege
level, which I agree is a useful option. But because it would be built
into the same format specifier, you still couldn't use %pK in interrupt
context (in case the sysctl wasn't set to 2).
> Then again, perhaps those admins would be OK if we simply disabled
> plain old %p everywhere. In which case we're looking at a separate
> patch, I suggest.
I would be happy to do this from a security perspective, but I'd imagine
there's a pretty high risk of things breaking by doing such a sweeping
change.
-Dan