[PATCH 00/23] Hardened usercopy whitelisting

Kees Cook

unread,

Jun 19, 2017, 7:40:06 PM6/19/17

to

This series is modified from Brad Spengler/PaX Team's PAX_USERCOPY code
in the last public patch of grsecurity/PaX based on our understanding
of the code. Changes or omissions from the original code are ours and
don't reflect the original grsecurity/PaX code.

David Windsor did the bulk of the porting, refactoring, splitting,
testing, etc; I just did some extra tweaks, hunk moving, and small
extra patches.

This updates the slab allocator to add annotations (useroffset and
usersize) to define allowed usercopy regions. Currently, hardened
usercopy performs dynamic bounds checking on whole slab cache objects.
This is good, but still leaves a lot of kernel slab memory available to
be copied to/from userspace in the face of bugs. To further restrict
what memory is available for copying, this creates a way to whitelist
specific areas of a given slab cache object for copying to/from userspace,
allowing much finer granularity of access control. Slab caches that are
never exposed to userspace can declare no whitelist for their objects,
thereby keeping them unavailable to userspace via dynamic copy operations.
(Note, an implicit form of whitelisting is the use of constant sizes
in usercopy operations and get_user()/put_user(); these bypass hardened
usercopy checks since these sizes cannot change at runtime.)

Two good examples of how much more this protects are with task_struct
(a huge structure that only needs two fields exposed to userspace)
and mm_struct (another large and sensitive structure that only needs
auxv exposed). Other places for whitelists are mostly VFS name related,
and some areas of network caches.

To support the whitelist annotation, usercopy region offset and size
members are added to struct kmem_cache. The slab allocator receives a
new function that creates a new cache with a usercopy region defined
(kmem_cache_create_usercopy()), suitable for storing objects that get
copied to/from userspace. The default cache creation function
(kmem_cache_create()) remains unchanged and defaults to having no
whitelist region.

Additionally, a way to split trivially size-controllable kmallocs away
from the general-purpose kmalloc is added.

Finally, a Kconfig is created to control slab_nomerge, since it
would be nice to make this build-time controllable.

-Kees (and David)

Kees Cook

unread,

Jun 19, 2017, 7:50:05 PM6/19/17

to

Some hardened environments want to build kernels with slab_nomerge
already set (so that they do not depend on remembering to set the kernel
command line option). This is desired to reduce the risk of kernel heap
overflows being able to overwrite objects from merged caches, increasing
the difficulty of these attacks. By keeping caches unmerged, these kinds
of exploits can usually only damage objects in the same cache (though the
risk to metadata exploitation is unchanged).

Signed-off-by: Kees Cook <kees...@chromium.org>
---
mm/slab_common.c | 5 ++---
security/Kconfig | 13 +++++++++++++
2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 6c14d765379f..17a4c4b33283 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -47,13 +47,12 @@ static DECLARE_WORK(slab_caches_to_rcu_destroy_work,

/*
* Merge control. If this is set then no merging of slab caches will occur.
- * (Could be removed. This was introduced to pacify the merge skeptics.)
*/
-static int slab_nomerge;
+static bool slab_nomerge = !IS_ENABLED(CONFIG_SLAB_MERGE_DEFAULT);

static int __init setup_slab_nomerge(char *str)
{
- slab_nomerge = 1;
+ slab_nomerge = true;
return 1;
}

diff --git a/security/Kconfig b/security/Kconfig
index 0c181cebdb8a..e40bd2a260f8 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -166,6 +166,19 @@ config HARDENED_USERCOPY_SPLIT_KMALLOC
confined to a separate cache, attackers must find other ways
to prepare heap attacks that will be near their desired target.

+config SLAB_MERGE_DEFAULT
+ bool "Allow slab caches to be merged"
+ default y
+ help
+ For reduced kernel memory fragmentation, slab caches can be
+ merged when they share the same size and other characteristics.
+ This carries a small risk of kernel heap overflows being able
+ to overwrite objects from merged caches, which reduces the
+ difficulty of such heap attacks. By keeping caches unmerged,
+ these kinds of exploits can usually only damage objects in the
+ same cache. To disable merging at runtime, "slab_nomerge" can be
+ passed on the kernel command line.
+
config STATIC_USERMODEHELPER
bool "Force all usermode helper calls through a single binary"
help
--
2.7.4

Kees Cook

unread,

Jun 19, 2017, 7:50:05 PM6/19/17

to

From: David Windsor <da...@nullcore.net>

The befs symlink pathnames, stored in struct befs_inode_info.i_data.symlink
and therefore contained in the befs_inode_cache slab cache, need to be
copied to/from userspace.

In support of usercopy hardening, this patch defines a region in
the befs_inode_cache slab cache in which userspace copy operations
are allowed.

This region is known as the slab cache's usercopy region. Slab
caches can now check that each copy operation involving cache-managed
memory falls entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my

understanding of the code. Changes or omissions from the original code are

mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <da...@nullcore.net>
[kees: adjust commit log]

Signed-off-by: Kees Cook <kees...@chromium.org>
---

fs/befs/linuxvfs.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index 63e7c4760bfb..893607591805 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -442,11 +442,15 @@ static struct inode *befs_iget(struct super_block *sb, unsigned long ino)
static int __init
befs_init_inodecache(void)
{
- befs_inode_cachep = kmem_cache_create("befs_inode_cache",
- sizeof (struct befs_inode_info),
- 0, (SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
- init_once);
+ befs_inode_cachep = kmem_cache_create_usercopy("befs_inode_cache",
+ sizeof(struct befs_inode_info), 0,
+ (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
+ SLAB_ACCOUNT),
+ offsetof(struct befs_inode_info,
+ i_data.symlink),
+ sizeof_field(struct befs_inode_info,
+ i_data.symlink),
+ init_once);
if (befs_inode_cachep == NULL)
return -ENOMEM;

--
2.7.4

Kees Cook

unread,

Jun 19, 2017, 7:50:05 PM6/19/17

to

From: David Windsor <da...@nullcore.net>

Mark the kmalloc slab caches as entirely whitelisted. These
caches are frequently used to fulfill kernel allocations that contain
data to be copied to/from userspace. It is impossible to know
at build time what objects will be contained in these caches,
so the entire cache is whitelisted.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my

understanding of the code. Changes or omissions from the original code are

mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <da...@nullcore.net>

[kees: merged in moved kmalloc hunks, adjust commit log]

Signed-off-by: Kees Cook <kees...@chromium.org>
---

mm/slab.c | 3 ++-
mm/slab.h | 3 ++-
mm/slab_common.c | 10 ++++++----
3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 5c78830aeea0..4cafbe13f239 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1291,7 +1291,8 @@ void __init kmem_cache_init(void)
*/
kmalloc_caches[INDEX_NODE] = create_kmalloc_cache(
kmalloc_info[INDEX_NODE].name,
- kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS);
+ kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS,
+ 0, kmalloc_size(INDEX_NODE));
slab_state = PARTIAL_NODE;
setup_kmalloc_cache_index_table();

diff --git a/mm/slab.h b/mm/slab.h
index 92c0cedb296d..4cdc8e64fdbd 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -96,7 +96,8 @@ struct kmem_cache *kmalloc_slab(size_t, gfp_t);
extern int __kmem_cache_create(struct kmem_cache *, unsigned long flags);

extern struct kmem_cache *create_kmalloc_cache(const char *name, size_t size,
- unsigned long flags);
+ unsigned long flags, size_t useroffset,
+ size_t usersize);
extern void create_boot_cache(struct kmem_cache *, const char *name,
size_t size, unsigned long flags, size_t useroffset,
size_t usersize);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index af97465b99e6..685321a0d355 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -917,14 +917,15 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name, size_t siz
}

struct kmem_cache *__init create_kmalloc_cache(const char *name, size_t size,
- unsigned long flags)
+ unsigned long flags, size_t useroffset,
+ size_t usersize)
{
struct kmem_cache *s = kmem_cache_zalloc(kmem_cache, GFP_NOWAIT);

if (!s)
panic("Out of memory when creating slab %s\n", name);

- create_boot_cache(s, name, size, flags, 0, size);
+ create_boot_cache(s, name, size, flags, useroffset, usersize);
list_add(&s->list, &slab_caches);
memcg_link_cache(s);
s->refcount = 1;
@@ -1078,7 +1079,8 @@ void __init setup_kmalloc_cache_index_table(void)
static void __init new_kmalloc_cache(int idx, unsigned long flags)
{
kmalloc_caches[idx] = create_kmalloc_cache(kmalloc_info[idx].name,
- kmalloc_info[idx].size, flags);
+ kmalloc_info[idx].size, flags, 0,
+ kmalloc_info[idx].size);
}

/*
@@ -1119,7 +1121,7 @@ void __init create_kmalloc_caches(unsigned long flags)

BUG_ON(!n);
kmalloc_dma_caches[i] = create_kmalloc_cache(n,
- size, SLAB_CACHE_DMA | flags);
+ size, SLAB_CACHE_DMA | flags, 0, 0);
}
}
#endif
--
2.7.4

Kees Cook

unread,

Jun 19, 2017, 7:50:05 PM6/19/17

to

From: David Windsor <da...@nullcore.net>

This patch prepares the slab allocator to handle caches having annotations
(useroffset and usersize) defining usercopy regions.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on
my understanding of the code. Changes or omissions from the original
code are mine and don't reflect the original grsecurity/PaX code.

Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory

available to be copied to/from userspace in the face of bugs. To
further restrict what memory is available for copying, this creates a
way to whitelist specific areas of a given slab cache object for copying
to/from userspace, allowing much finer granularity of access control.
Slab caches that are never exposed to userspace can declare no whitelist
for their objects, thereby keeping them unavailable to userspace via
dynamic copy operations. (Note, an implicit form of whitelisting is the
use of constant sizes in usercopy operations and get_user()/put_user();
these bypass hardened usercopy checks since these sizes cannot change at
runtime.)

To support this whitelist annotation, usercopy region offset and size

members are added to struct kmem_cache. The slab allocator receives a

new function that creates a new cache with a usercopy region defined,

suitable for storing objects that get copied to/from userspace.

In this patch, the default kmem_cache_create() marks the entire allocation
as whitelisted. Once all whitelists have been added, this will be changed
back to a usersize of 0.

Signed-off-by: David Windsor <da...@nullcore.net>
[kees: adjust commit log, split out a few extra kmalloc hunks]

Signed-off-by: Kees Cook <kees...@chromium.org>
---

include/linux/slab.h | 3 +++
include/linux/slab_def.h | 3 +++
include/linux/slub_def.h | 3 +++
include/linux/stddef.h | 2 ++
mm/slab.c | 2 +-
mm/slab.h | 5 ++++-
mm/slab_common.c | 42 ++++++++++++++++++++++++++++++++++--------
mm/slub.c | 11 +++++++++--
8 files changed, 59 insertions(+), 12 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 04a7f7993e67..a48f54238273 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -129,6 +129,9 @@ bool slab_is_available(void);
struct kmem_cache *kmem_cache_create(const char *, size_t, size_t,
unsigned long,
void (*)(void *));
+struct kmem_cache *kmem_cache_create_usercopy(const char *, size_t, size_t,
+ unsigned long, size_t, size_t,
+ void (*)(void *));
void kmem_cache_destroy(struct kmem_cache *);
int kmem_cache_shrink(struct kmem_cache *);

diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 4ad2c5a26399..03eef0df8648 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -84,6 +84,9 @@ struct kmem_cache {
unsigned int *random_seq;
#endif

+ size_t useroffset; /* Usercopy region offset */
+ size_t usersize; /* Usercopy region size */
+
struct kmem_cache_node *node[MAX_NUMNODES];
};

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 07ef550c6627..05b7343f69eb 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -108,6 +108,9 @@ struct kmem_cache {
struct kasan_cache kasan_info;
#endif

+ size_t useroffset; /* Usercopy region offset */
+ size_t usersize; /* Usercopy region size */
+
struct kmem_cache_node *node[MAX_NUMNODES];
};

diff --git a/include/linux/stddef.h b/include/linux/stddef.h
index 9c61c7cda936..f00355086fb2 100644
--- a/include/linux/stddef.h
+++ b/include/linux/stddef.h
@@ -18,6 +18,8 @@ enum {
#define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER)
#endif

+#define sizeof_field(structure, field) sizeof((((structure *)0)->field))
+
/**
* offsetofend(TYPE, MEMBER)
*
diff --git a/mm/slab.c b/mm/slab.c
index 2a31ee3c5814..cf77f1691588 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1281,7 +1281,7 @@ void __init kmem_cache_init(void)
create_boot_cache(kmem_cache, "kmem_cache",
offsetof(struct kmem_cache, node) +
nr_node_ids * sizeof(struct kmem_cache_node *),
- SLAB_HWCACHE_ALIGN);
+ SLAB_HWCACHE_ALIGN, 0, 0);
list_add(&kmem_cache->list, &slab_caches);
slab_state = PARTIAL;

diff --git a/mm/slab.h b/mm/slab.h
index 9cfcf099709c..92c0cedb296d 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -21,6 +21,8 @@ struct kmem_cache {
unsigned int size; /* The aligned/padded/added on size */
unsigned int align; /* Alignment as calculated */
unsigned long flags; /* Active flags on the slab */
+ size_t useroffset; /* Usercopy region offset */
+ size_t usersize; /* Usercopy region size */
const char *name; /* Slab name for sysfs */
int refcount; /* Use counter */
void (*ctor)(void *); /* Called on object slot creation */
@@ -96,7 +98,8 @@ extern int __kmem_cache_create(struct kmem_cache *, unsigned long flags);

extern struct kmem_cache *create_kmalloc_cache(const char *name, size_t size,

unsigned long flags);

extern void create_boot_cache(struct kmem_cache *, const char *name,

- size_t size, unsigned long flags);
+ size_t size, unsigned long flags, size_t useroffset,
+ size_t usersize);

int slab_unmergeable(struct kmem_cache *s);
struct kmem_cache *find_mergeable(size_t size, size_t align,
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 01a0fe2eb332..af97465b99e6 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -273,6 +273,9 @@ int slab_unmergeable(struct kmem_cache *s)
if (s->ctor)
return 1;

+ if (s->usersize)
+ return 1;
+
/*
* We may have set a slab to be unmergeable during bootstrap.
*/
@@ -358,12 +361,15 @@ unsigned long calculate_alignment(unsigned long flags,

static struct kmem_cache *create_cache(const char *name,
size_t object_size, size_t size, size_t align,
- unsigned long flags, void (*ctor)(void *),

+ unsigned long flags, size_t useroffset,

+ size_t usersize, void (*ctor)(void *),
struct mem_cgroup *memcg, struct kmem_cache *root_cache)
{
struct kmem_cache *s;
int err;

+ BUG_ON(useroffset + usersize > object_size);
+
err = -ENOMEM;
s = kmem_cache_zalloc(kmem_cache, GFP_KERNEL);
if (!s)
@@ -374,6 +380,8 @@ static struct kmem_cache *create_cache(const char *name,
s->size = size;
s->align = align;
s->ctor = ctor;
+ s->useroffset = useroffset;
+ s->usersize = usersize;

err = init_memcg_params(s, memcg, root_cache);
if (err)
@@ -398,11 +406,13 @@ static struct kmem_cache *create_cache(const char *name,
}

/*
- * kmem_cache_create - Create a cache.
+ * kmem_cache_create_usercopy - Create a cache.
* @name: A string which is used in /proc/slabinfo to identify this cache.
* @size: The size of objects to be created in this cache.
* @align: The required alignment for the objects.
* @flags: SLAB flags
+ * @useroffset: Usercopy region offset
+ * @usersize: Usercopy region size
* @ctor: A constructor for the objects.
*
* Returns a ptr to the cache on success, NULL on failure.
@@ -422,8 +432,9 @@ static struct kmem_cache *create_cache(const char *name,
* as davem.
*/
struct kmem_cache *
-kmem_cache_create(const char *name, size_t size, size_t align,
- unsigned long flags, void (*ctor)(void *))
+kmem_cache_create_usercopy(const char *name, size_t size, size_t align,
+ unsigned long flags, size_t useroffset, size_t usersize,
+ void (*ctor)(void *))
{
struct kmem_cache *s = NULL;
const char *cache_name;
@@ -454,7 +465,10 @@ kmem_cache_create(const char *name, size_t size, size_t align,
*/
flags &= CACHE_CREATE_MASK;

- s = __kmem_cache_alias(name, size, align, flags, ctor);
+ BUG_ON(!usersize && useroffset);
+ BUG_ON(size < usersize || size - usersize < useroffset);
+ if (!usersize)
+ s = __kmem_cache_alias(name, size, align, flags, ctor);
if (s)
goto out_unlock;

@@ -466,7 +480,7 @@ kmem_cache_create(const char *name, size_t size, size_t align,

s = create_cache(cache_name, size, size,
calculate_alignment(flags, align, size),
- flags, ctor, NULL, NULL);
+ flags, useroffset, usersize, ctor, NULL, NULL);
if (IS_ERR(s)) {
err = PTR_ERR(s);
kfree_const(cache_name);
@@ -492,6 +506,15 @@ kmem_cache_create(const char *name, size_t size, size_t align,
}
return s;
}
+EXPORT_SYMBOL(kmem_cache_create_usercopy);
+
+struct kmem_cache *
+kmem_cache_create(const char *name, size_t size, size_t align,
+ unsigned long flags, void (*ctor)(void *))
+{
+ return kmem_cache_create_usercopy(name, size, align, flags, 0, size,
+ ctor);
+}
EXPORT_SYMBOL(kmem_cache_create);

static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work)
@@ -604,6 +627,7 @@ void memcg_create_kmem_cache(struct mem_cgroup *memcg,
s = create_cache(cache_name, root_cache->object_size,
root_cache->size, root_cache->align,
root_cache->flags & CACHE_CREATE_MASK,
+ root_cache->useroffset, root_cache->usersize,
root_cache->ctor, memcg, root_cache);
/*
* If we could not create a memcg cache, do not complain, because
@@ -871,13 +895,15 @@ bool slab_is_available(void)
#ifndef CONFIG_SLOB
/* Create a cache during boot when no slab services are available yet */
void __init create_boot_cache(struct kmem_cache *s, const char *name, size_t size,
- unsigned long flags)
+ unsigned long flags, size_t useroffset, size_t usersize)
{
int err;

s->name = name;
s->size = s->object_size = size;
s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);
+ s->useroffset = useroffset;
+ s->usersize = usersize;

slab_init_memcg_params(s);

@@ -898,7 +924,7 @@ struct kmem_cache *__init create_kmalloc_cache(const char *name, size_t size,

if (!s)
panic("Out of memory when creating slab %s\n", name);

- create_boot_cache(s, name, size, flags);
+ create_boot_cache(s, name, size, flags, 0, size);

list_add(&s->list, &slab_caches);
memcg_link_cache(s);
s->refcount = 1;

diff --git a/mm/slub.c b/mm/slub.c
index 7449593fca72..b8cbbc31b005 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4164,7 +4164,7 @@ void __init kmem_cache_init(void)
kmem_cache = &boot_kmem_cache;

create_boot_cache(kmem_cache_node, "kmem_cache_node",
- sizeof(struct kmem_cache_node), SLAB_HWCACHE_ALIGN);
+ sizeof(struct kmem_cache_node), SLAB_HWCACHE_ALIGN, 0, 0);

register_hotmemory_notifier(&slab_memory_callback_nb);

@@ -4174,7 +4174,7 @@ void __init kmem_cache_init(void)
create_boot_cache(kmem_cache, "kmem_cache",
offsetof(struct kmem_cache, node) +
nr_node_ids * sizeof(struct kmem_cache_node *),
- SLAB_HWCACHE_ALIGN);
+ SLAB_HWCACHE_ALIGN, 0, 0);

kmem_cache = bootstrap(&boot_kmem_cache);

@@ -5040,6 +5040,12 @@ static ssize_t cache_dma_show(struct kmem_cache *s, char *buf)
SLAB_ATTR_RO(cache_dma);
#endif

+static ssize_t usercopy_show(struct kmem_cache *s, char *buf)
+{
+ return sprintf(buf, "%d\n", !!s->usersize);
+}
+SLAB_ATTR_RO(usercopy);
+
static ssize_t destroy_by_rcu_show(struct kmem_cache *s, char *buf)
{
return sprintf(buf, "%d\n", !!(s->flags & SLAB_TYPESAFE_BY_RCU));
@@ -5414,6 +5420,7 @@ static struct attribute *slab_attrs[] = {
#ifdef CONFIG_FAILSLAB
&failslab_attr.attr,
#endif
+ &usercopy_attr.attr,

NULL
};
--
2.7.4

Kees Cook

unread,

Jun 19, 2017, 7:50:06 PM6/19/17

to

From: David Windsor <da...@nullcore.net>

The mnt_id field can be copied with put_user(), so there is no need to
use copy_to_user(). In both cases, hardened usercopy is being bypassed
since the size is constant, and not open to runtime manipulation.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my

understanding of the code. Changes or omissions from the original code are

mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <da...@nullcore.net>
[kees: adjust commit log]

Signed-off-by: Kees Cook <kees...@chromium.org>
---

fs/fhandle.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/fhandle.c b/fs/fhandle.c
index 58a61f55e0d0..46e00ccca8f0 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -68,8 +68,7 @@ static long do_sys_name_to_handle(struct path *path,
} else
retval = 0;
/* copy the mount id */
- if (copy_to_user(mnt_id, &real_mount(path->mnt)->mnt_id,
- sizeof(*mnt_id)) ||
+ if (put_user(real_mount(path->mnt)->mnt_id, mnt_id) ||
copy_to_user(ufh, handle,
sizeof(struct file_handle) + handle_bytes))
retval = -EFAULT;
--
2.7.4

Kees Cook

unread,

Jun 19, 2017, 7:50:06 PM6/19/17

to

From: David Windsor <da...@nullcore.net>

vfs pathnames stored internally in inodes and contained in
the names_cache slab cache need to be copied to/from userspace.

In support of usercopy hardening, this patch defines the entire
cache object in the names_cache slab cache as whitelisted, since
it holds name strings to be copied to userspace.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my

understanding of the code. Changes or omissions from the original code are

mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <da...@nullcore.net>
[kees: adjust commit log]
Signed-off-by: Kees Cook <kees...@chromium.org>
---

fs/dcache.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index cddf39777835..f7f3c4114baa 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3616,8 +3616,8 @@ void __init vfs_caches_init_early(void)

void __init vfs_caches_init(void)
{
- names_cachep = kmem_cache_create("names_cache", PATH_MAX, 0,
- SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
+ names_cachep = kmem_cache_create_usercopy("names_cache", PATH_MAX, 0,
+ SLAB_HWCACHE_ALIGN|SLAB_PANIC, 0, PATH_MAX, NULL);

dcache_init();
inode_init();
--
2.7.4

Kees Cook

unread,

Jun 19, 2017, 7:50:06 PM6/19/17

to

With all known usercopied cache whitelists now defined in the kernel, switch
the default usercopy region of kmem_cache_create() to size 0. Any new caches
with usercopy regions will now need to use kmem_cache_create_usercopy()
instead of kmem_cache_create().

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my

understanding of the code. Changes or omissions from the original code are

mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: Kees Cook <kees...@chromium.org>
Cc: David Windsor <da...@nullcore.net>
---
mm/slab_common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 685321a0d355..2365dd21623d 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -512,7 +512,7 @@ struct kmem_cache *

kmem_cache_create(const char *name, size_t size, size_t align,

unsigned long flags, void (*ctor)(void *))

{
- return kmem_cache_create_usercopy(name, size, align, flags, 0, size,
+ return kmem_cache_create_usercopy(name, size, align, flags, 0, 0,
ctor);
}
EXPORT_SYMBOL(kmem_cache_create);
--
2.7.4

Kees Cook

unread,

Jun 19, 2017, 7:50:06 PM6/19/17

to

From: David Windsor <da...@nullcore.net>

cifs request buffers, stored in the cifs_request slab cache,

need to be copied to/from userspace.

In support of usercopy hardening, this patch defines a region in
the cifs_request slab cache in which userspace copy operations

are allowed.

This region is known as the slab cache's usercopy region. Slab
caches can now check that each copy operation involving cache-managed
memory falls entirely within the slab's usercopy region.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my

understanding of the code. Changes or omissions from the original code are

mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <da...@nullcore.net>
[kees: adjust commit log]
Signed-off-by: Kees Cook <kees...@chromium.org>
---

fs/cifs/cifsfs.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 9a1667e0e8d6..385c5cc8903e 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -1234,9 +1234,11 @@ cifs_init_request_bufs(void)
cifs_dbg(VFS, "CIFSMaxBufSize %d 0x%x\n",
CIFSMaxBufSize, CIFSMaxBufSize);
*/
- cifs_req_cachep = kmem_cache_create("cifs_request",
+ cifs_req_cachep = kmem_cache_create_usercopy("cifs_request",
CIFSMaxBufSize + max_hdr_size, 0,
- SLAB_HWCACHE_ALIGN, NULL);
+ SLAB_HWCACHE_ALIGN, 0,
+ CIFSMaxBufSize + max_hdr_size,
+ NULL);
if (cifs_req_cachep == NULL)
return -ENOMEM;

@@ -1262,9 +1264,9 @@ cifs_init_request_bufs(void)
more SMBs to use small buffer alloc and is still much more
efficient to alloc 1 per page off the slab compared to 17K (5page)
alloc of large cifs buffers even when page debugging is on */
- cifs_sm_req_cachep = kmem_cache_create("cifs_small_rq",
+ cifs_sm_req_cachep = kmem_cache_create_usercopy("cifs_small_rq",
MAX_CIFS_SMALL_BUFFER_SIZE, 0, SLAB_HWCACHE_ALIGN,
- NULL);
+ 0, MAX_CIFS_SMALL_BUFFER_SIZE, NULL);
if (cifs_sm_req_cachep == NULL) {
mempool_destroy(cifs_req_poolp);
kmem_cache_destroy(cifs_req_cachep);
--
2.7.4

Kees Cook

unread,

Jun 19, 2017, 7:50:06 PM6/19/17

to

From: David Windsor <da...@nullcore.net>

XFS inline inode data, stored in struct xfs_inode_t.i_df.if_u2.if_inline_data
and therefore contained in the xfs_inode slab cache, needs to be copied

to/from userspace.

In support of usercopy hardening, this patch defines a region in

the xfs_inode slab cache in which userspace copy operations

are allowed.

This region is known as the slab cache's usercopy region. Slab
caches can now check that each copy operation involving cache-managed
memory falls entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my

understanding of the code. Changes or omissions from the original code are

mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <da...@nullcore.net>
[kees: adjust commit log]
Signed-off-by: Kees Cook <kees...@chromium.org>
---

fs/xfs/kmem.h | 10 ++++++++++
fs/xfs/xfs_super.c | 7 +++++--
2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
index d6ea520162b2..b3f02b6226b3 100644
--- a/fs/xfs/kmem.h
+++ b/fs/xfs/kmem.h
@@ -100,6 +100,16 @@ kmem_zone_init_flags(int size, char *zone_name, unsigned long flags,
return kmem_cache_create(zone_name, size, 0, flags, construct);
}

+static inline kmem_zone_t *
+kmem_zone_init_flags_usercopy(int size, char *zone_name, unsigned long flags,
+ size_t useroffset, size_t usersize,
+ void (*construct)(void *))
+{
+ return kmem_cache_create_usercopy(zone_name, size, 0, flags,
+ useroffset, usersize, construct);
+}
+
+
static inline void
kmem_zone_free(kmem_zone_t *zone, void *ptr)
{
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 455a575f101d..b6963baa3ac8 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1828,9 +1828,12 @@ xfs_init_zones(void)
goto out_destroy_efd_zone;

xfs_inode_zone =
- kmem_zone_init_flags(sizeof(xfs_inode_t), "xfs_inode",
+ kmem_zone_init_flags_usercopy(sizeof(xfs_inode_t), "xfs_inode",
KM_ZONE_HWALIGN | KM_ZONE_RECLAIM | KM_ZONE_SPREAD |
- KM_ZONE_ACCOUNT, xfs_fs_inode_init_once);
+ KM_ZONE_ACCOUNT,
+ offsetof(xfs_inode_t, i_df.if_u2.if_inline_data),
+ sizeof_field(xfs_inode_t, i_df.if_u2.if_inline_data),
+ xfs_fs_inode_init_once);
if (!xfs_inode_zone)
goto out_destroy_efi_zone;

--
2.7.4

Kees Cook

unread,

Jun 19, 2017, 7:50:06 PM6/19/17

to

From: David Windsor <da...@nullcore.net>

When a dentry name is short enough, it can be stored directly in
the dentry itself. These dentry short names, stored in struct
dentry.d_iname and therefore contained in the dentry_cache slab cache,
need to be coped to/from userspace.

In support of usercopy hardening, this patch defines a region in

the dentry_cache slab cache in which userspace copy operations

are allowed.

This region is known as the slab cache's usercopy region. Slab
caches can now check that each copy operation involving cache-managed
memory falls entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my

understanding of the code. Changes or omissions from the original code are

mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <da...@nullcore.net>

[kees: adjust hunks for kmalloc-specific things moved later, adjust commit log]

Signed-off-by: Kees Cook <kees...@chromium.org>
---

fs/dcache.c | 5 +++--
include/linux/slab.h | 5 +++++
2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index f7f3c4114baa..bae2e148946c 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3580,8 +3580,9 @@ static void __init dcache_init(void)
* but it is probably not worth it because of the cache nature
* of the dcache.
*/
- dentry_cache = KMEM_CACHE(dentry,
- SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT);
+ dentry_cache = KMEM_CACHE_USERCOPY(dentry,
+ SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
+ d_iname);

/* Hash may have been set up in dcache_init_early */
if (!hashdist)
diff --git a/include/linux/slab.h b/include/linux/slab.h
index a48f54238273..97f4a0117b3b 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -151,6 +151,11 @@ void memcg_destroy_kmem_caches(struct mem_cgroup *);
sizeof(struct __struct), __alignof__(struct __struct),\
(__flags), NULL)

+#define KMEM_CACHE_USERCOPY(__struct, __flags, __field) kmem_cache_create_usercopy(#__struct,\
+ sizeof(struct __struct), __alignof__(struct __struct),\
+ (__flags), offsetof(struct __struct, __field),\
+ sizeof_field(struct __struct, __field), NULL)
+
/*
* Common kmalloc functions provided by all allocators
*/
--
2.7.4

Kees Cook

unread,

Jun 19, 2017, 7:50:06 PM6/19/17

to

From: David Windsor <da...@nullcore.net>

SCSI sense buffers, stored in struct scsi_cmnd.sense and therefore
contained in the scsi_sense_cache slab cache, need to be copied to/from

userspace.

In support of usercopy hardening, this patch defines a region in

the scsi_sense_cache slab cache in which userspace copy operations

are allowed.

This region is known as the slab cache's usercopy region. Slab
caches can now check that each copy operation involving cache-managed
memory falls entirely within the slab's usercopy region.

Signed-off-by: David Windsor <da...@nullcore.net>

Signed-off-by: Kees Cook <kees...@chromium.org>
---

drivers/scsi/scsi_lib.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 99e16ac479e3..fc5052aded84 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -77,14 +77,15 @@ int scsi_init_sense_cache(struct Scsi_Host *shost)
if (shost->unchecked_isa_dma) {
scsi_sense_isadma_cache =
kmem_cache_create("scsi_sense_cache(DMA)",
- SCSI_SENSE_BUFFERSIZE, 0,
- SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
+ SCSI_SENSE_BUFFERSIZE, 0,
+ SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
if (!scsi_sense_isadma_cache)
ret = -ENOMEM;
} else {
scsi_sense_cache =
- kmem_cache_create("scsi_sense_cache",
- SCSI_SENSE_BUFFERSIZE, 0, SLAB_HWCACHE_ALIGN, NULL);
+ kmem_cache_create_usercopy("scsi_sense_cache",
+ SCSI_SENSE_BUFFERSIZE, 0, SLAB_HWCACHE_ALIGN,
+ 0, SCSI_SENSE_BUFFERSIZE, NULL);
if (!scsi_sense_cache)
ret = -ENOMEM;
}
--
2.7.4

Kees Cook

unread,

Jun 19, 2017, 7:50:08 PM6/19/17

to

From: David Windsor <da...@nullcore.net>

This patch adds the enforcement component of usercopy cache whitelisting,
and is modified from Brad Spengler/PaX Team's PAX_USERCOPY whitelisting
code in the last public patch of grsecurity/PaX based on my understanding
of the code. Changes or omissions from the original code are mine and

don't reflect the original grsecurity/PaX code.

The SLAB and SLUB allocators are modified to deny all copy operations
in which the kernel heap memory being modified falls outside of the cache's
defined usercopy region.

Signed-off-by: David Windsor <da...@nullcore.net>
[kees: adjust commit log and comments]

Signed-off-by: Kees Cook <kees...@chromium.org>
---

mm/slab.c | 16 +++++++++++-----
mm/slub.c | 18 +++++++++++-------
2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index cf77f1691588..5c78830aeea0 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4416,7 +4416,9 @@ module_init(slab_proc_init);

#ifdef CONFIG_HARDENED_USERCOPY
/*
- * Rejects objects that are incorrectly sized.
+ * Rejects incorrectly sized objects and objects that are to be copied
+ * to/from userspace but do not fall entirely within the containing slab
+ * cache's usercopy region.
*
* Returns NULL if check passes, otherwise const char * to name of cache
* to indicate an error.
@@ -4436,11 +4438,15 @@ const char *__check_heap_object(const void *ptr, unsigned long n,
/* Find offset within object. */
offset = ptr - index_to_obj(cachep, page, objnr) - obj_offset(cachep);

- /* Allow address range falling entirely within object size. */
- if (offset <= cachep->object_size && n <= cachep->object_size - offset)
- return NULL;
+ /* Make sure object falls entirely within cache's usercopy region. */
+ if (offset < cachep->useroffset)
+ return cachep->name;
+ if (offset - cachep->useroffset > cachep->usersize)
+ return cachep->name;
+ if (n > cachep->useroffset - offset + cachep->usersize)
+ return cachep->name;

- return cachep->name;
+ return NULL;
}
#endif /* CONFIG_HARDENED_USERCOPY */

diff --git a/mm/slub.c b/mm/slub.c
index b8cbbc31b005..e12a2bfbca1e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3796,7 +3796,9 @@ EXPORT_SYMBOL(__kmalloc_node);

#ifdef CONFIG_HARDENED_USERCOPY
/*
- * Rejects objects that are incorrectly sized.
+ * Rejects incorrectly sized objects and objects that are to be copied
+ * to/from userspace but do not fall entirely within the containing slab
+ * cache's usercopy region.
*
* Returns NULL if check passes, otherwise const char * to name of cache
* to indicate an error.
@@ -3806,11 +3808,9 @@ const char *__check_heap_object(const void *ptr, unsigned long n,
{
struct kmem_cache *s;
unsigned long offset;
- size_t object_size;

/* Find object and usable object size. */
s = page->slab_cache;
- object_size = slab_ksize(s);

/* Reject impossible pointers. */
if (ptr < page_address(page))
@@ -3826,11 +3826,15 @@ const char *__check_heap_object(const void *ptr, unsigned long n,
offset -= s->red_left_pad;
}

- /* Allow address range falling entirely within object size. */
- if (offset <= object_size && n <= object_size - offset)
- return NULL;
+ /* Make sure object falls entirely within cache's usercopy region. */
+ if (offset < s->useroffset)
+ return s->name;
+ if (offset - s->useroffset > s->usersize)
+ return s->name;
+ if (n > s->useroffset - offset + s->usersize)
+ return s->name;

- return s->name;
+ return NULL;
}
#endif /* CONFIG_HARDENED_USERCOPY */

--
2.7.4

Eric Biggers

unread,

Jun 20, 2017, 12:10:05 AM6/20/17

to

Hi David + Kees,

On Mon, Jun 19, 2017 at 04:36:35PM -0700, Kees Cook wrote:
> With all known usercopied cache whitelists now defined in the kernel, switch
> the default usercopy region of kmem_cache_create() to size 0. Any new caches
> with usercopy regions will now need to use kmem_cache_create_usercopy()
> instead of kmem_cache_create().
>

While I'd certainly like to see the caches be whitelisted, it needs to be made
very clear that it's being done (the cover letter for this series falsely claims
that kmem_cache_create() is unchanged) and what the consequences are. Is there
any specific plan for identifying caches that were missed? If it's expected for
people to just fix them as they are found, then they need to be helped a little
--- at the very least by putting a big comment above report_usercopy() that
explains the possible reasons why the error might have triggered and what to do
about it.

- Eric

Eric Biggers

unread,

Jun 20, 2017, 12:10:05 AM6/20/17

to

On Mon, Jun 19, 2017 at 04:36:31PM -0700, Kees Cook wrote:
> From: David Windsor <da...@nullcore.net>
>
> When a dentry name is short enough, it can be stored directly in
> the dentry itself. These dentry short names, stored in struct
> dentry.d_iname and therefore contained in the dentry_cache slab cache,
> need to be coped to/from userspace.
>
> In support of usercopy hardening, this patch defines a region in
> the dentry_cache slab cache in which userspace copy operations
> are allowed.
>
> This region is known as the slab cache's usercopy region. Slab
> caches can now check that each copy operation involving cache-managed
> memory falls entirely within the slab's usercopy region.
>
> This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
> whitelisting code in the last public patch of grsecurity/PaX based on my
> understanding of the code. Changes or omissions from the original code are
> mine and don't reflect the original grsecurity/PaX code.
>

For all these patches please mention *where* the data is being copied to/from
userspace.

> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index a48f54238273..97f4a0117b3b 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -151,6 +151,11 @@ void memcg_destroy_kmem_caches(struct mem_cgroup *);
> sizeof(struct __struct), __alignof__(struct __struct),\
> (__flags), NULL)
>
> +#define KMEM_CACHE_USERCOPY(__struct, __flags, __field) kmem_cache_create_usercopy(#__struct,\
> + sizeof(struct __struct), __alignof__(struct __struct),\
> + (__flags), offsetof(struct __struct, __field),\
> + sizeof_field(struct __struct, __field), NULL)
> +

This helper macro should be added in the patch which adds
kmem_cache_create_usercopy(), not in this one.

- Eric

Daniel Micay

unread,

Jun 20, 2017, 12:10:05 AM6/20/17

to

On Mon, 2017-06-19 at 16:36 -0700, Kees Cook wrote:
> Some hardened environments want to build kernels with slab_nomerge
> already set (so that they do not depend on remembering to set the
> kernel
> command line option). This is desired to reduce the risk of kernel
> heap
> overflows being able to overwrite objects from merged caches,
> increasing
> the difficulty of these attacks. By keeping caches unmerged, these
> kinds
> of exploits can usually only damage objects in the same cache (though
> the
> risk to metadata exploitation is unchanged).

It also further fragments the ability to influence slab cache layout,
i.e. primitives to do things like filling up slabs to set things up for
an exploit might not be able to deal with the target slabs anymore. It
doesn't need to be mentioned but it's something to think about too. In
theory, disabling merging can make it *easier* to get the right layout
too if there was some annoyance that's now split away. It's definitely a
lot more good than bad for security though, but allocator changes have
subtle impact on exploitation. This can make caches more deterministic.

Eric Biggers

unread,

Jun 20, 2017, 12:30:04 AM6/20/17

to

It's good to at least have this option, but again it's logically separate and
shouldn't just be hidden in patch 23/23. And again, is it really just about
heap overflows?

Please also fix the documentation for slab_nomerge in
Documentation/admin-guide/kernel-parameters.txt.

- Eric

Rik van Riel

unread,

Jun 20, 2017, 3:50:05 PM6/20/17

to

On Mon, 2017-06-19 at 16:36 -0700, Kees Cook wrote:

> This series is modified from Brad Spengler/PaX Team's PAX_USERCOPY
> code
> in the last public patch of grsecurity/PaX based on our understanding
> of the code. Changes or omissions from the original code are ours and
> don't reflect the original grsecurity/PaX code.
>
> David Windsor did the bulk of the porting, refactoring, splitting,
> testing, etc; I just did some extra tweaks, hunk moving, and small
> extra patches.
>
>
> This updates the slab allocator to add annotations (useroffset and
> usersize) to define allowed usercopy regions.

This is a great improvement over the old system
of having a few whitelisted kmalloc caches, and
bounce buffering to copy data from caches that
are not whitelisted!

I like it.

--
All rights reversed

signature.asc

Kees Cook

unread,

Jun 20, 2017, 7:00:05 PM6/20/17

to

Good point about changes to heap grooming; I'll adjust the commit log.

-Kees

--
Kees Cook
Pixel Security

Kees Cook

unread,

Jun 20, 2017, 7:10:04 PM6/20/17

to

Some hardened environments want to build kernels with slab_nomerge
already set (so that they do not depend on remembering to set the kernel
command line option). This is desired to reduce the risk of kernel heap

overflows being able to overwrite objects from merged caches and changes
the requirements for cache layout control, increasing the difficulty of

these attacks. By keeping caches unmerged, these kinds of exploits can
usually only damage objects in the same cache (though the risk to metadata
exploitation is unchanged).

Cc: Daniel Micay <danie...@gmail.com>
Cc: David Windsor <da...@nullcore.net>
Cc: Eric Biggers <ebig...@gmail.com>

Signed-off-by: Kees Cook <kees...@chromium.org>
---

v2: split out of slab whitelisting series
---
Documentation/admin-guide/kernel-parameters.txt | 10 ++++++++--
init/Kconfig | 14 ++++++++++++++
mm/slab_common.c | 5 ++---
3 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 7737ab5d04b2..94d8b8195cb8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3715,8 +3715,14 @@
slab_nomerge [MM]
Disable merging of slabs with similar size. May be
necessary if there is some reason to distinguish
- allocs to different slabs. Debug options disable
- merging on their own.
+ allocs to different slabs, especially in hardened
+ environments where the risk of heap overflows and
+ layout control by attackers can usually be
+ frustrated by disabling merging. This will reduce
+ most of the exposure of a heap attack to a single
+ cache (risks via metadata attacks are mostly
+ unchanged). Debug options disable merging on their
+ own.
For more information see Documentation/vm/slub.txt.

slab_max_order= [MM, SLAB]
diff --git a/init/Kconfig b/init/Kconfig
index 1d3475fc9496..ce813acf2f4f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1891,6 +1891,20 @@ config SLOB

endchoice

+config SLAB_MERGE_DEFAULT
+ bool "Allow slab caches to be merged"
+ default y
+ help
+ For reduced kernel memory fragmentation, slab caches can be
+ merged when they share the same size and other characteristics.

+ This carries a risk of kernel heap overflows being able to
+ overwrite objects from merged caches (and more easily control
+ cache layout), which makes such heap attacks easier to exploit
+ by attackers. By keeping caches unmerged, these kinds of exploits
+ can usually only damage objects in the same cache. To disable
+ merging at runtime, "slab_nomerge" can be passed on the kernel
+ command line.
+
config SLAB_FREELIST_RANDOM
default n
depends on SLAB || SLUB
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 01a0fe2eb332..904a83be82de 100644

--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -47,13 +47,12 @@ static DECLARE_WORK(slab_caches_to_rcu_destroy_work,

/*
* Merge control. If this is set then no merging of slab caches will occur.
- * (Could be removed. This was introduced to pacify the merge skeptics.)
*/
-static int slab_nomerge;
+static bool slab_nomerge = !IS_ENABLED(CONFIG_SLAB_MERGE_DEFAULT);

static int __init setup_slab_nomerge(char *str)
{
- slab_nomerge = 1;
+ slab_nomerge = true;
return 1;
}

--
2.7.4

Randy Dunlap

unread,

Jun 20, 2017, 7:20:05 PM6/20/17

to

On 06/20/2017 04:09 PM, Kees Cook wrote:
> Some hardened environments want to build kernels with slab_nomerge
> already set (so that they do not depend on remembering to set the kernel
> command line option). This is desired to reduce the risk of kernel heap
> overflows being able to overwrite objects from merged caches and changes
> the requirements for cache layout control, increasing the difficulty of
> these attacks. By keeping caches unmerged, these kinds of exploits can
> usually only damage objects in the same cache (though the risk to metadata
> exploitation is unchanged).
>
> Cc: Daniel Micay <danie...@gmail.com>
> Cc: David Windsor <da...@nullcore.net>
> Cc: Eric Biggers <ebig...@gmail.com>
> Signed-off-by: Kees Cook <kees...@chromium.org>
> ---
> v2: split out of slab whitelisting series
> ---
> Documentation/admin-guide/kernel-parameters.txt | 10 ++++++++--
> init/Kconfig | 14 ++++++++++++++
> mm/slab_common.c | 5 ++---
> 3 files changed, 24 insertions(+), 5 deletions(-)

> diff --git a/init/Kconfig b/init/Kconfig
> index 1d3475fc9496..ce813acf2f4f 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1891,6 +1891,20 @@ config SLOB
>
> endchoice
>
> +config SLAB_MERGE_DEFAULT
> + bool "Allow slab caches to be merged"
> + default y
> + help
> + For reduced kernel memory fragmentation, slab caches can be
> + merged when they share the same size and other characteristics.
> + This carries a risk of kernel heap overflows being able to
> + overwrite objects from merged caches (and more easily control
> + cache layout), which makes such heap attacks easier to exploit
> + by attackers. By keeping caches unmerged, these kinds of exploits
> + can usually only damage objects in the same cache. To disable
> + merging at runtime, "slab_nomerge" can be passed on the kernel
> + command line.

command line or this option can be disabled in the kernel config.

> +
> config SLAB_FREELIST_RANDOM
> default n
> depends on SLAB || SLUB

--
~Randy

Kees Cook

unread,

Jun 20, 2017, 7:20:06 PM6/20/17

to

I've split it out and updated the docs, thanks!

-Kees

Kees Cook

unread,

Jun 20, 2017, 7:30:05 PM6/20/17

to

Isn't that implicit in that it is Kconfig help text? Happy to add it,
but seems redundant to me.

-Kees

>
>> +
>> config SLAB_FREELIST_RANDOM
>> default n
>> depends on SLAB || SLUB
>
> --
> ~Randy

Randy Dunlap

unread,

Jun 20, 2017, 8:10:06 PM6/20/17

to

Just trying for completeness instead of being implicit.

>
>>
>>> +
>>> config SLAB_FREELIST_RANDOM
>>> default n
>>> depends on SLAB || SLUB
>>
>> --
>> ~Randy
>
>
>

--

~Randy

Michal Hocko

unread,

Jun 23, 2017, 10:10:09 AM6/23/17

to

On Tue 20-06-17 16:09:11, Kees Cook wrote:
> Some hardened environments want to build kernels with slab_nomerge
> already set (so that they do not depend on remembering to set the kernel
> command line option). This is desired to reduce the risk of kernel heap
> overflows being able to overwrite objects from merged caches and changes
> the requirements for cache layout control, increasing the difficulty of
> these attacks. By keeping caches unmerged, these kinds of exploits can
> usually only damage objects in the same cache (though the risk to metadata
> exploitation is unchanged).

Do we really want to have a dedicated config for each hardening specific
kernel command line? I believe we have quite a lot of config options
already. Can we rather have a CONFIG_HARDENED_CMD_OPIONS and cover all
those defauls there instead?

> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majo...@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"do...@kvack.org"> em...@kvack.org </a>

--
Michal Hocko
SUSE Labs

Kees Cook

unread,

Jun 23, 2017, 3:30:05 PM6/23/17

to

On Fri, Jun 23, 2017 at 7:06 AM, Michal Hocko <mho...@kernel.org> wrote:
> On Tue 20-06-17 16:09:11, Kees Cook wrote:
>> Some hardened environments want to build kernels with slab_nomerge
>> already set (so that they do not depend on remembering to set the kernel
>> command line option). This is desired to reduce the risk of kernel heap
>> overflows being able to overwrite objects from merged caches and changes
>> the requirements for cache layout control, increasing the difficulty of
>> these attacks. By keeping caches unmerged, these kinds of exploits can
>> usually only damage objects in the same cache (though the risk to metadata
>> exploitation is unchanged).
>
> Do we really want to have a dedicated config for each hardening specific
> kernel command line? I believe we have quite a lot of config options
> already. Can we rather have a CONFIG_HARDENED_CMD_OPIONS and cover all
> those defauls there instead?

There's not been a lot of success with grouped Kconfigs in the past
(e.g. CONFIG_EXPERIMENTAL), but one thing that has been suggested is a
defconfig-like make target that would collect all the things together.
I haven't had time for that, but that would let us group the various
configs.

Additionally, using something like CONFIG_CMDLINE seems a little clunky to me.

-Kees

Michal Hocko

unread,

Jun 26, 2017, 5:10:06 AM6/26/17

to

On Fri 23-06-17 12:20:25, Kees Cook wrote:
> On Fri, Jun 23, 2017 at 7:06 AM, Michal Hocko <mho...@kernel.org> wrote:
> > On Tue 20-06-17 16:09:11, Kees Cook wrote:
> >> Some hardened environments want to build kernels with slab_nomerge
> >> already set (so that they do not depend on remembering to set the kernel
> >> command line option). This is desired to reduce the risk of kernel heap
> >> overflows being able to overwrite objects from merged caches and changes
> >> the requirements for cache layout control, increasing the difficulty of
> >> these attacks. By keeping caches unmerged, these kinds of exploits can
> >> usually only damage objects in the same cache (though the risk to metadata
> >> exploitation is unchanged).
> >
> > Do we really want to have a dedicated config for each hardening specific
> > kernel command line? I believe we have quite a lot of config options
> > already. Can we rather have a CONFIG_HARDENED_CMD_OPIONS and cover all
> > those defauls there instead?
>
> There's not been a lot of success with grouped Kconfigs in the past
> (e.g. CONFIG_EXPERIMENTAL), but one thing that has been suggested is a
> defconfig-like make target that would collect all the things together.

Which wouldn't reduce the number of config options, would it? I don't
know but is there any usecase when somebody wants to have hardened
kernel and still want to have different defaults than you are
suggesting?

Kees Cook

unread,

Jun 28, 2017, 2:00:12 PM6/28/17

to

On Mon, Jun 19, 2017 at 9:04 PM, Eric Biggers <ebig...@gmail.com> wrote:
> Hi David + Kees,
>
> On Mon, Jun 19, 2017 at 04:36:35PM -0700, Kees Cook wrote:
>> With all known usercopied cache whitelists now defined in the kernel, switch
>> the default usercopy region of kmem_cache_create() to size 0. Any new caches
>> with usercopy regions will now need to use kmem_cache_create_usercopy()
>> instead of kmem_cache_create().
>>
>
> While I'd certainly like to see the caches be whitelisted, it needs to be made
> very clear that it's being done (the cover letter for this series falsely claims
> that kmem_cache_create() is unchanged) and what the consequences are. Is there

Well, in the first patch it is semantically unchanged: calls to
kmem_cache_create() after the first patch whitelist the entire cache
object. Only from this patch on does it change behavior to no longer
whitelist the object.

> any specific plan for identifying caches that were missed? If it's expected for

The plan for finding caches needing whitelisting is mainly code audit
and operational testing. Encountering it is quite loud in that it BUGs
the kernel during the hardened usercopy checks.

> people to just fix them as they are found, then they need to be helped a little
> --- at the very least by putting a big comment above report_usercopy() that
> explains the possible reasons why the error might have triggered and what to do
> about it.

That sounds reasonable. It should have a comment even for the existing
protections.

Thanks!

Kees Cook

unread,

Jun 28, 2017, 2:00:13 PM6/28/17

to

On Mon, Jun 19, 2017 at 9:08 PM, Eric Biggers <ebig...@gmail.com> wrote:
> On Mon, Jun 19, 2017 at 04:36:31PM -0700, Kees Cook wrote:
>> From: David Windsor <da...@nullcore.net>
>>
>> When a dentry name is short enough, it can be stored directly in
>> the dentry itself. These dentry short names, stored in struct
>> dentry.d_iname and therefore contained in the dentry_cache slab cache,
>> need to be coped to/from userspace.
>>
>> In support of usercopy hardening, this patch defines a region in
>> the dentry_cache slab cache in which userspace copy operations
>> are allowed.
>>
>> This region is known as the slab cache's usercopy region. Slab
>> caches can now check that each copy operation involving cache-managed
>> memory falls entirely within the slab's usercopy region.
>>
>> This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
>> whitelisting code in the last public patch of grsecurity/PaX based on my
>> understanding of the code. Changes or omissions from the original code are
>> mine and don't reflect the original grsecurity/PaX code.
>>
>
> For all these patches please mention *where* the data is being copied to/from
> userspace.

Can you explain what you mean here? The field being copied is already
mentioned in the commit log; do you mean where in the kernel source
does the copy happen?

>
>> diff --git a/include/linux/slab.h b/include/linux/slab.h
>> index a48f54238273..97f4a0117b3b 100644
>> --- a/include/linux/slab.h
>> +++ b/include/linux/slab.h
>> @@ -151,6 +151,11 @@ void memcg_destroy_kmem_caches(struct mem_cgroup *);
>> sizeof(struct __struct), __alignof__(struct __struct),\
>> (__flags), NULL)
>>
>> +#define KMEM_CACHE_USERCOPY(__struct, __flags, __field) kmem_cache_create_usercopy(#__struct,\
>> + sizeof(struct __struct), __alignof__(struct __struct),\
>> + (__flags), offsetof(struct __struct, __field),\
>> + sizeof_field(struct __struct, __field), NULL)
>> +
>
> This helper macro should be added in the patch which adds
> kmem_cache_create_usercopy(), not in this one.

It got moved here since this was the only user of this function and
there was already enough happening in the first patch. But yes,
probably it should stay with the first patch. It can be moved.

Eric Biggers

unread,

Jun 28, 2017, 2:00:19 PM6/28/17

to

On Wed, Jun 28, 2017 at 09:44:13AM -0700, Kees Cook wrote:
> On Mon, Jun 19, 2017 at 9:08 PM, Eric Biggers <ebig...@gmail.com> wrote:
> > On Mon, Jun 19, 2017 at 04:36:31PM -0700, Kees Cook wrote:
> >> From: David Windsor <da...@nullcore.net>
> >>
> >> When a dentry name is short enough, it can be stored directly in
> >> the dentry itself. These dentry short names, stored in struct
> >> dentry.d_iname and therefore contained in the dentry_cache slab cache,
> >> need to be coped to/from userspace.
> >>
> >> In support of usercopy hardening, this patch defines a region in
> >> the dentry_cache slab cache in which userspace copy operations
> >> are allowed.
> >>
> >> This region is known as the slab cache's usercopy region. Slab
> >> caches can now check that each copy operation involving cache-managed
> >> memory falls entirely within the slab's usercopy region.
> >>
> >> This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
> >> whitelisting code in the last public patch of grsecurity/PaX based on my
> >> understanding of the code. Changes or omissions from the original code are
> >> mine and don't reflect the original grsecurity/PaX code.
> >>
> >
> > For all these patches please mention *where* the data is being copied to/from
> > userspace.
>
> Can you explain what you mean here? The field being copied is already
> mentioned in the commit log; do you mean where in the kernel source
> does the copy happen?
>

Yes, for the ones where it isn't obvious, mentioning a syscall or ioctl might be
sufficient. Others may need more explanation.

Eric