[PATCH v1 00/11] mm/kasan: support per-page shadow memory to reduce memory consumption

315 views
Skip to first unread message

js1...@gmail.com

unread,
May 15, 2017, 9:17:35 PM5/15/17
to Andrew Morton, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com, Joonsoo Kim
From: Joonsoo Kim <iamjoon...@lge.com>

Hello, all.

This is an attempt to recude memory consumption of KASAN. Please see
following description to get the more information.

1. What is per-page shadow memory

This patch introduces infrastructure to support per-page shadow memory.
Per-page shadow memory is the same with original shadow memory except
the granualarity. It's one byte shows the shadow value for the page.
The purpose of introducing this new shadow memory is to save memory
consumption.

2. Problem of current approach

Until now, KASAN needs shadow memory for all the range of the memory
so the amount of statically allocated memory is so large. It causes
the problem that KASAN cannot run on the system with hard memory
constraint. Even if KASAN can run, large memory consumption due to
KASAN changes behaviour of the workload so we cannot validate
the moment that we want to check.

3. How does this patch fix the problem

This patch tries to fix the problem by reducing memory consumption for
the shadow memory. There are two observations.

1) Type of memory usage can be distinguished well.
2) Shadow memory is manipulated/checked in byte unit only for slab,
kernel stack and global variable. Shadow memory for other usecases
just show KASAN_FREE_PAGE or 0 (means valid) in page unit.

With these two observations, I think an optimized way to support
KASAN feature.

1) Introduces per-page shadow that cover all the memory
2) Checks validity of the access through per-page shadow except
that checking object is a slab, kernel stack, global variable
3) For those byte accessible types of object, allocate/map original
shadow by on-demand and checks validity of the access through
original shadow

Instead original shadow statically consumes 1/8 bytes of the amount of
total memory, per-page shadow statically consumes 1/PAGE_SIZE bytes of it.
Extra memory is required for a slab, kernel stack and global variable by
on-demand in runtime, however, it would not be larger than before.

4. Result

Following is the result of the memory consumption on my QEMU system.
'runtime' shows the maximum memory usage for on-demand shadow allocation
during the kernel build workload.

base vs patched

MemTotal: 858 MB vs 987 MB
runtime: 0 MB vs 30MB
Net Available: 858 MB vs 957 MB

For 4096 MB QEMU system

MemTotal: 3477 MB vs 4000 MB
runtime: 0 MB vs 50MB

base vs patched (2048 MB QEMU system)
204 s vs 224 s
Net Available: 3477 MB vs 3950 MB

Thanks.

Joonsoo Kim (11):
mm/kasan: rename XXX_is_zero to XXX_is_nonzero
mm/kasan: don't fetch the next shadow value speculartively
mm/kasan: handle unaligned end address in zero_pte_populate
mm/kasan: extend kasan_populate_zero_shadow()
mm/kasan: introduce per-page shadow memory infrastructure
mm/kasan: mark/unmark the target range that is for original shadow
memory
x86/kasan: use per-page shadow memory
mm/kasan: support on-demand shadow allocation/mapping
x86/kasan: support on-demand shadow mapping
mm/kasan: support dynamic shadow memory free
mm/kasan: change the order of shadow memory check

arch/arm64/mm/kasan_init.c | 17 +-
arch/x86/include/asm/kasan.h | 8 +
arch/x86/include/asm/processor.h | 4 +
arch/x86/kernel/cpu/common.c | 4 +-
arch/x86/kernel/setup_percpu.c | 2 +
arch/x86/mm/kasan_init_64.c | 191 ++++++++++++--
include/linux/kasan.h | 71 ++++-
kernel/fork.c | 7 +
mm/kasan/kasan.c | 555 +++++++++++++++++++++++++++++++++------
mm/kasan/kasan.h | 22 +-
mm/kasan/kasan_init.c | 158 ++++++++---
mm/kasan/report.c | 28 ++
mm/page_alloc.c | 10 +
mm/slab.c | 9 +
mm/slab_common.c | 11 +-
mm/slub.c | 8 +
16 files changed, 957 insertions(+), 148 deletions(-)

--
2.7.4

js1...@gmail.com

unread,
May 15, 2017, 9:17:39 PM5/15/17
to Andrew Morton, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com, Joonsoo Kim
From: Joonsoo Kim <iamjoon...@lge.com>

They return positive value, that is, true, if non-zero value
is found. Rename them to reduce confusion.

Signed-off-by: Joonsoo Kim <iamjoon...@lge.com>
---
mm/kasan/kasan.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index c81549d..85ee45b0 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -224,7 +224,7 @@ static __always_inline bool memory_is_poisoned_16(unsigned long addr)
return false;
}

-static __always_inline unsigned long bytes_is_zero(const u8 *start,
+static __always_inline unsigned long bytes_is_nonzero(const u8 *start,
size_t size)
{
while (size) {
@@ -237,7 +237,7 @@ static __always_inline unsigned long bytes_is_zero(const u8 *start,
return 0;
}

-static __always_inline unsigned long memory_is_zero(const void *start,
+static __always_inline unsigned long memory_is_nonzero(const void *start,
const void *end)
{
unsigned int words;
@@ -245,11 +245,11 @@ static __always_inline unsigned long memory_is_zero(const void *start,
unsigned int prefix = (unsigned long)start % 8;

if (end - start <= 16)
- return bytes_is_zero(start, end - start);
+ return bytes_is_nonzero(start, end - start);

if (prefix) {
prefix = 8 - prefix;
- ret = bytes_is_zero(start, prefix);
+ ret = bytes_is_nonzero(start, prefix);
if (unlikely(ret))
return ret;
start += prefix;
@@ -258,12 +258,12 @@ static __always_inline unsigned long memory_is_zero(const void *start,
words = (end - start) / 8;
while (words) {
if (unlikely(*(u64 *)start))
- return bytes_is_zero(start, 8);
+ return bytes_is_nonzero(start, 8);
start += 8;
words--;
}

- return bytes_is_zero(start, (end - start) % 8);
+ return bytes_is_nonzero(start, (end - start) % 8);
}

static __always_inline bool memory_is_poisoned_n(unsigned long addr,
@@ -271,7 +271,7 @@ static __always_inline bool memory_is_poisoned_n(unsigned long addr,
{
unsigned long ret;

- ret = memory_is_zero(kasan_mem_to_shadow((void *)addr),
+ ret = memory_is_nonzero(kasan_mem_to_shadow((void *)addr),
kasan_mem_to_shadow((void *)addr + size - 1) + 1);

if (unlikely(ret)) {
--
2.7.4

js1...@gmail.com

unread,
May 15, 2017, 9:17:42 PM5/15/17
to Andrew Morton, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com, Joonsoo Kim
From: Joonsoo Kim <iamjoon...@lge.com>

Fetching the next shadow value speculartively has pros and cons.
If shadow bytes are zero, we can exit the check with a single branch.
However, it could cause unaligned access. And, if the next shadow value
isn't zero, we need to do additional check. Next shadow value can be
non-zero due to various reasons.

Moreoever, following patch will introduce on-demand shadow memory
allocation/mapping and this speculartive fetch would cause more stale
TLB case.

So, I think that there is more side-effect than the benefit.
This patch removes it.

Signed-off-by: Joonsoo Kim <iamjoon...@lge.com>
---
mm/kasan/kasan.c | 104 +++++++++++++++++++++++--------------------------------
1 file changed, 44 insertions(+), 60 deletions(-)

diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index 85ee45b0..97d3560 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -136,90 +136,74 @@ static __always_inline bool memory_is_poisoned_1(unsigned long addr)

static __always_inline bool memory_is_poisoned_2(unsigned long addr)
{
- u16 *shadow_addr = (u16 *)kasan_mem_to_shadow((void *)addr);
-
- if (unlikely(*shadow_addr)) {
- if (memory_is_poisoned_1(addr + 1))
- return true;
-
- /*
- * If single shadow byte covers 2-byte access, we don't
- * need to do anything more. Otherwise, test the first
- * shadow byte.
- */
- if (likely(((addr + 1) & KASAN_SHADOW_MASK) != 0))
- return false;
+ if (unlikely(memory_is_poisoned_1(addr)))
+ return true;

- return unlikely(*(u8 *)shadow_addr);
- }
+ /*
+ * If single shadow byte covers 2-byte access, we don't
+ * need to do anything more. Otherwise, test the first
+ * shadow byte.
+ */
+ if (likely(((addr + 1) & KASAN_SHADOW_MASK) != 0))
+ return false;

- return false;
+ return memory_is_poisoned_1(addr + 1);
}

static __always_inline bool memory_is_poisoned_4(unsigned long addr)
{
- u16 *shadow_addr = (u16 *)kasan_mem_to_shadow((void *)addr);
-
- if (unlikely(*shadow_addr)) {
- if (memory_is_poisoned_1(addr + 3))
- return true;
-
- /*
- * If single shadow byte covers 4-byte access, we don't
- * need to do anything more. Otherwise, test the first
- * shadow byte.
- */
- if (likely(((addr + 3) & KASAN_SHADOW_MASK) >= 3))
- return false;
+ if (unlikely(memory_is_poisoned_1(addr + 3)))
+ return true;

- return unlikely(*(u8 *)shadow_addr);
- }
+ /*
+ * If single shadow byte covers 4-byte access, we don't
+ * need to do anything more. Otherwise, test the first
+ * shadow byte.
+ */
+ if (likely(((addr + 3) & KASAN_SHADOW_MASK) >= 3))
+ return false;

- return false;
+ return memory_is_poisoned_1(addr);
}

static __always_inline bool memory_is_poisoned_8(unsigned long addr)
{
- u16 *shadow_addr = (u16 *)kasan_mem_to_shadow((void *)addr);
+ u8 *shadow_addr = (u8 *)kasan_mem_to_shadow((void *)addr);

- if (unlikely(*shadow_addr)) {
- if (memory_is_poisoned_1(addr + 7))
- return true;
+ if (unlikely(*shadow_addr))
+ return true;

- /*
- * If single shadow byte covers 8-byte access, we don't
- * need to do anything more. Otherwise, test the first
- * shadow byte.
- */
- if (likely(IS_ALIGNED(addr, KASAN_SHADOW_SCALE_SIZE)))
- return false;
+ /*
+ * If single shadow byte covers 8-byte access, we don't
+ * need to do anything more. Otherwise, test the first
+ * shadow byte.
+ */
+ if (likely(IS_ALIGNED(addr, KASAN_SHADOW_SCALE_SIZE)))
+ return false;

- return unlikely(*(u8 *)shadow_addr);
- }
+ if (unlikely(memory_is_poisoned_1(addr + 7)))
+ return true;

return false;
}

static __always_inline bool memory_is_poisoned_16(unsigned long addr)
{
- u32 *shadow_addr = (u32 *)kasan_mem_to_shadow((void *)addr);
-
- if (unlikely(*shadow_addr)) {
- u16 shadow_first_bytes = *(u16 *)shadow_addr;
+ u16 *shadow_addr = (u16 *)kasan_mem_to_shadow((void *)addr);

- if (unlikely(shadow_first_bytes))
- return true;
+ if (unlikely(*shadow_addr))
+ return true;

- /*
- * If two shadow bytes covers 16-byte access, we don't
- * need to do anything more. Otherwise, test the last
- * shadow byte.
- */
- if (likely(IS_ALIGNED(addr, KASAN_SHADOW_SCALE_SIZE)))
- return false;
+ /*
+ * If two shadow bytes covers 16-byte access, we don't
+ * need to do anything more. Otherwise, test the last
+ * shadow byte.
+ */
+ if (likely(IS_ALIGNED(addr, KASAN_SHADOW_SCALE_SIZE)))
+ return false;

- return memory_is_poisoned_1(addr + 15);
- }
+ if (unlikely(memory_is_poisoned_1(addr + 15)))
+ return true;

return false;
}
--
2.7.4

js1...@gmail.com

unread,
May 15, 2017, 9:17:46 PM5/15/17
to Andrew Morton, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com, Joonsoo Kim
From: Joonsoo Kim <iamjoon...@lge.com>

It doesn't handle unaligned end address so last pte could not
be initialized. Fix it.

Note that this shadow memory can be used by others so map
the actual page in this case.

Signed-off-by: Joonsoo Kim <iamjoon...@lge.com>
---
mm/kasan/kasan_init.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/mm/kasan/kasan_init.c b/mm/kasan/kasan_init.c
index 554e4c0..48559d9 100644
--- a/mm/kasan/kasan_init.c
+++ b/mm/kasan/kasan_init.c
@@ -61,6 +61,14 @@ static void __init zero_pte_populate(pmd_t *pmd, unsigned long addr,
addr += PAGE_SIZE;
pte = pte_offset_kernel(pmd, addr);
}
+
+ if (addr == end)
+ return;
+
+ /* Population for unaligned end address */
+ zero_pte = pfn_pte(PFN_DOWN(
+ __pa(early_alloc(PAGE_SIZE, NUMA_NO_NODE))), PAGE_KERNEL);
+ set_pte_at(&init_mm, addr, pte, zero_pte);
}

static void __init zero_pmd_populate(pud_t *pud, unsigned long addr,
--
2.7.4

js1...@gmail.com

unread,
May 15, 2017, 9:17:50 PM5/15/17
to Andrew Morton, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com, Joonsoo Kim
From: Joonsoo Kim <iamjoon...@lge.com>

In the following patch, per-page shadow memory will be introduced and
some ranges are checked by per-page shadow and the others are checked by
original shadow. To notify the range type, per-page shadow will be mapped
by the page that is filled by a special shadow value,
KASAN_PER_PAGE_BYPASS. Using the actual page for this purpose causes
memory consumption so this patch introduces the black shadow page which
is conceptually similar to the zero shadow page. And, this patch also
extend kasan_populate_zero_shadow() to handle/map the black shadow page.

In addition, this patch adds 'private' argument to this function to force
populate intermediate level page table. It will also used by
the following patch to reduce memory consumption.

Signed-off-by: Joonsoo Kim <iamjoon...@lge.com>
---
arch/arm64/mm/kasan_init.c | 17 +++---
arch/x86/mm/kasan_init_64.c | 15 +++---
include/linux/kasan.h | 11 +++-
mm/kasan/kasan_init.c | 123 ++++++++++++++++++++++++++++++--------------
4 files changed, 112 insertions(+), 54 deletions(-)

diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index 687a358..f60b74d 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -168,21 +168,24 @@ void __init kasan_init(void)
* vmemmap_populate() has populated the shadow region that covers the
* kernel image with SWAPPER_BLOCK_SIZE mappings, so we have to round
* the start and end addresses to SWAPPER_BLOCK_SIZE as well, to prevent
- * kasan_populate_zero_shadow() from replacing the page table entries
+ * kasan_populate_shadow() from replacing the page table entries
* (PMD or PTE) at the edges of the shadow region for the kernel
* image.
*/
kimg_shadow_start = round_down(kimg_shadow_start, SWAPPER_BLOCK_SIZE);
kimg_shadow_end = round_up(kimg_shadow_end, SWAPPER_BLOCK_SIZE);

- kasan_populate_zero_shadow((void *)KASAN_SHADOW_START,
- (void *)mod_shadow_start);
- kasan_populate_zero_shadow((void *)kimg_shadow_end,
- kasan_mem_to_shadow((void *)PAGE_OFFSET));
+ kasan_populate_shadow((void *)KASAN_SHADOW_START,
+ (void *)mod_shadow_start,
+ true, false);
+ kasan_populate_shadow((void *)kimg_shadow_end,
+ kasan_mem_to_shadow((void *)PAGE_OFFSET),
+ true, false);

if (kimg_shadow_start > mod_shadow_end)
- kasan_populate_zero_shadow((void *)mod_shadow_end,
- (void *)kimg_shadow_start);
+ kasan_populate_shadow((void *)mod_shadow_end,
+ (void *)kimg_shadow_start,
+ true, false);

for_each_memblock(memory, reg) {
void *start = (void *)__phys_to_virt(reg->base);
diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index 0c7d812..adc673b 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -127,8 +127,9 @@ void __init kasan_init(void)

clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);

- kasan_populate_zero_shadow((void *)KASAN_SHADOW_START,
- kasan_mem_to_shadow((void *)PAGE_OFFSET));
+ kasan_populate_shadow((void *)KASAN_SHADOW_START,
+ kasan_mem_to_shadow((void *)PAGE_OFFSET),
+ true, false);

for (i = 0; i < E820_MAX_ENTRIES; i++) {
if (pfn_mapped[i].end == 0)
@@ -137,16 +138,18 @@ void __init kasan_init(void)
if (map_range(&pfn_mapped[i]))
panic("kasan: unable to allocate shadow!");
}
- kasan_populate_zero_shadow(
+ kasan_populate_shadow(
kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM),
- kasan_mem_to_shadow((void *)__START_KERNEL_map));
+ kasan_mem_to_shadow((void *)__START_KERNEL_map),
+ true, false);

vmemmap_populate((unsigned long)kasan_mem_to_shadow(_stext),
(unsigned long)kasan_mem_to_shadow(_end),
NUMA_NO_NODE);

- kasan_populate_zero_shadow(kasan_mem_to_shadow((void *)MODULES_END),
- (void *)KASAN_SHADOW_END);
+ kasan_populate_shadow(kasan_mem_to_shadow((void *)MODULES_END),
+ (void *)KASAN_SHADOW_END,
+ true, false);

load_cr3(init_level4_pgt);
__flush_tlb_all();
diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index a5c7046..7e501b3 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -21,8 +21,15 @@ extern pmd_t kasan_zero_pmd[PTRS_PER_PMD];
extern pud_t kasan_zero_pud[PTRS_PER_PUD];
extern p4d_t kasan_zero_p4d[PTRS_PER_P4D];

-void kasan_populate_zero_shadow(const void *shadow_start,
- const void *shadow_end);
+extern unsigned char kasan_black_page[PAGE_SIZE];
+extern pte_t kasan_black_pte[PTRS_PER_PTE];
+extern pmd_t kasan_black_pmd[PTRS_PER_PMD];
+extern pud_t kasan_black_pud[PTRS_PER_PUD];
+extern p4d_t kasan_black_p4d[PTRS_PER_P4D];
+
+void kasan_populate_shadow(const void *shadow_start,
+ const void *shadow_end,
+ bool zero, bool private);

static inline void *kasan_mem_to_shadow(const void *addr)
{
diff --git a/mm/kasan/kasan_init.c b/mm/kasan/kasan_init.c
index 48559d9..cd0a551 100644
--- a/mm/kasan/kasan_init.c
+++ b/mm/kasan/kasan_init.c
@@ -21,6 +21,8 @@
#include <asm/page.h>
#include <asm/pgalloc.h>

+#include "kasan.h"
+
/*
* This page serves two purposes:
* - It used as early shadow memory. The entire shadow region populated
@@ -30,16 +32,26 @@
*/
unsigned char kasan_zero_page[PAGE_SIZE] __page_aligned_bss;

+/*
+ * The shadow memory range that this page is mapped will be considered
+ * to be checked later by another shadow memory.
+ */
+unsigned char kasan_black_page[PAGE_SIZE] __page_aligned_bss;
+
#if CONFIG_PGTABLE_LEVELS > 4
p4d_t kasan_zero_p4d[PTRS_PER_P4D] __page_aligned_bss;
+p4d_t kasan_black_p4d[PTRS_PER_P4D] __page_aligned_bss;
#endif
#if CONFIG_PGTABLE_LEVELS > 3
pud_t kasan_zero_pud[PTRS_PER_PUD] __page_aligned_bss;
+pud_t kasan_black_pud[PTRS_PER_PUD] __page_aligned_bss;
#endif
#if CONFIG_PGTABLE_LEVELS > 2
pmd_t kasan_zero_pmd[PTRS_PER_PMD] __page_aligned_bss;
+pmd_t kasan_black_pmd[PTRS_PER_PMD] __page_aligned_bss;
#endif
pte_t kasan_zero_pte[PTRS_PER_PTE] __page_aligned_bss;
+pte_t kasan_black_pte[PTRS_PER_PTE] __page_aligned_bss;

static __init void *early_alloc(size_t size, int node)
{
@@ -47,32 +59,38 @@ static __init void *early_alloc(size_t size, int node)
BOOTMEM_ALLOC_ACCESSIBLE, node);
}

-static void __init zero_pte_populate(pmd_t *pmd, unsigned long addr,
- unsigned long end)
+static void __init kasan_pte_populate(pmd_t *pmd, unsigned long addr,
+ unsigned long end, bool zero)
{
- pte_t *pte = pte_offset_kernel(pmd, addr);
- pte_t zero_pte;
+ pte_t *ptep = pte_offset_kernel(pmd, addr);
+ pte_t pte;
+ unsigned char *page;

- zero_pte = pfn_pte(PFN_DOWN(__pa_symbol(kasan_zero_page)), PAGE_KERNEL);
- zero_pte = pte_wrprotect(zero_pte);
+ pte = pfn_pte(PFN_DOWN(zero ?
+ __pa_symbol(kasan_zero_page) : __pa_symbol(kasan_black_page)),
+ PAGE_KERNEL);
+ pte = pte_wrprotect(pte);

while (addr + PAGE_SIZE <= end) {
- set_pte_at(&init_mm, addr, pte, zero_pte);
+ set_pte_at(&init_mm, addr, ptep, pte);
addr += PAGE_SIZE;
- pte = pte_offset_kernel(pmd, addr);
+ ptep = pte_offset_kernel(pmd, addr);
}

if (addr == end)
return;

/* Population for unaligned end address */
- zero_pte = pfn_pte(PFN_DOWN(
- __pa(early_alloc(PAGE_SIZE, NUMA_NO_NODE))), PAGE_KERNEL);
- set_pte_at(&init_mm, addr, pte, zero_pte);
+ page = early_alloc(PAGE_SIZE, NUMA_NO_NODE);
+ if (!zero)
+ __memcpy(page, kasan_black_page, end - addr);
+
+ pte = pfn_pte(PFN_DOWN(__pa(page)), PAGE_KERNEL);
+ set_pte_at(&init_mm, addr, ptep, pte);
}

-static void __init zero_pmd_populate(pud_t *pud, unsigned long addr,
- unsigned long end)
+static void __init kasan_pmd_populate(pud_t *pud, unsigned long addr,
+ unsigned long end, bool zero, bool private)
{
pmd_t *pmd = pmd_offset(pud, addr);
unsigned long next;
@@ -80,8 +98,11 @@ static void __init zero_pmd_populate(pud_t *pud, unsigned long addr,
do {
next = pmd_addr_end(addr, end);

- if (IS_ALIGNED(addr, PMD_SIZE) && end - addr >= PMD_SIZE) {
- pmd_populate_kernel(&init_mm, pmd, lm_alias(kasan_zero_pte));
+ if (IS_ALIGNED(addr, PMD_SIZE) && end - addr >= PMD_SIZE &&
+ !private) {
+ pmd_populate_kernel(&init_mm, pmd,
+ zero ? lm_alias(kasan_zero_pte) :
+ lm_alias(kasan_black_pte));
continue;
}

@@ -89,24 +110,30 @@ static void __init zero_pmd_populate(pud_t *pud, unsigned long addr,
pmd_populate_kernel(&init_mm, pmd,
early_alloc(PAGE_SIZE, NUMA_NO_NODE));
}
- zero_pte_populate(pmd, addr, next);
+
+ kasan_pte_populate(pmd, addr, next, zero);
} while (pmd++, addr = next, addr != end);
}

-static void __init zero_pud_populate(p4d_t *p4d, unsigned long addr,
- unsigned long end)
+static void __init kasan_pud_populate(p4d_t *p4d, unsigned long addr,
+ unsigned long end, bool zero, bool private)
{
pud_t *pud = pud_offset(p4d, addr);
unsigned long next;

do {
next = pud_addr_end(addr, end);
- if (IS_ALIGNED(addr, PUD_SIZE) && end - addr >= PUD_SIZE) {
+ if (IS_ALIGNED(addr, PUD_SIZE) && end - addr >= PUD_SIZE &&
+ !private) {
pmd_t *pmd;

- pud_populate(&init_mm, pud, lm_alias(kasan_zero_pmd));
+ pud_populate(&init_mm, pud,
+ zero ? lm_alias(kasan_zero_pmd) :
+ lm_alias(kasan_black_pmd));
pmd = pmd_offset(pud, addr);
- pmd_populate_kernel(&init_mm, pmd, lm_alias(kasan_zero_pte));
+ pmd_populate_kernel(&init_mm, pmd,
+ zero ? lm_alias(kasan_zero_pte) :
+ lm_alias(kasan_black_pte));
continue;
}

@@ -114,28 +141,34 @@ static void __init zero_pud_populate(p4d_t *p4d, unsigned long addr,
pud_populate(&init_mm, pud,
early_alloc(PAGE_SIZE, NUMA_NO_NODE));
}
- zero_pmd_populate(pud, addr, next);
+ kasan_pmd_populate(pud, addr, next, zero, private);
} while (pud++, addr = next, addr != end);
}

-static void __init zero_p4d_populate(pgd_t *pgd, unsigned long addr,
- unsigned long end)
+static void __init kasan_p4d_populate(pgd_t *pgd, unsigned long addr,
+ unsigned long end, bool zero, bool private)
{
p4d_t *p4d = p4d_offset(pgd, addr);
unsigned long next;

do {
next = p4d_addr_end(addr, end);
- if (IS_ALIGNED(addr, P4D_SIZE) && end - addr >= P4D_SIZE) {
+ if (IS_ALIGNED(addr, P4D_SIZE) && end - addr >= P4D_SIZE &&
+ !private) {
pud_t *pud;
pmd_t *pmd;

- p4d_populate(&init_mm, p4d, lm_alias(kasan_zero_pud));
+ p4d_populate(&init_mm, p4d,
+ zero ? lm_alias(kasan_zero_pud) :
+ lm_alias(kasan_black_pud));
pud = pud_offset(p4d, addr);
- pud_populate(&init_mm, pud, lm_alias(kasan_zero_pmd));
+ pud_populate(&init_mm, pud,
+ zero ? lm_alias(kasan_zero_pmd) :
+ lm_alias(kasan_black_pmd));
pmd = pmd_offset(pud, addr);
pmd_populate_kernel(&init_mm, pmd,
- lm_alias(kasan_zero_pte));
+ zero ? lm_alias(kasan_zero_pte) :
+ lm_alias(kasan_black_pte));
continue;
}

@@ -143,18 +176,21 @@ static void __init zero_p4d_populate(pgd_t *pgd, unsigned long addr,
p4d_populate(&init_mm, p4d,
early_alloc(PAGE_SIZE, NUMA_NO_NODE));
}
- zero_pud_populate(p4d, addr, next);
+ kasan_pud_populate(p4d, addr, next, zero, private);
} while (p4d++, addr = next, addr != end);
}

/**
- * kasan_populate_zero_shadow - populate shadow memory region with
- * kasan_zero_page
+ * kasan_populate_shadow - populate shadow memory region with
+ * kasan_(zero|black)_page
* @shadow_start - start of the memory range to populate
* @shadow_end - end of the memory range to populate
+ * @zero - type of populated shadow, zero and black
+ * @private - force to populate private shadow except the last page
*/
-void __init kasan_populate_zero_shadow(const void *shadow_start,
- const void *shadow_end)
+void __init kasan_populate_shadow(const void *shadow_start,
+ const void *shadow_end,
+ bool zero, bool private)
{
unsigned long addr = (unsigned long)shadow_start;
unsigned long end = (unsigned long)shadow_end;
@@ -164,7 +200,8 @@ void __init kasan_populate_zero_shadow(const void *shadow_start,
do {
next = pgd_addr_end(addr, end);

- if (IS_ALIGNED(addr, PGDIR_SIZE) && end - addr >= PGDIR_SIZE) {
+ if (IS_ALIGNED(addr, PGDIR_SIZE) && end - addr >= PGDIR_SIZE &&
+ !private) {
p4d_t *p4d;
pud_t *pud;
pmd_t *pmd;
@@ -187,14 +224,22 @@ void __init kasan_populate_zero_shadow(const void *shadow_start,
* architectures will switch to pgtable-nop4d.h.
*/
#ifndef __ARCH_HAS_5LEVEL_HACK
- pgd_populate(&init_mm, pgd, lm_alias(kasan_zero_p4d));
+ pgd_populate(&init_mm, pgd,
+ zero ? lm_alias(kasan_zero_p4d) :
+ lm_alias(kasan_black_p4d));
#endif
p4d = p4d_offset(pgd, addr);
- p4d_populate(&init_mm, p4d, lm_alias(kasan_zero_pud));
+ p4d_populate(&init_mm, p4d,
+ zero ? lm_alias(kasan_zero_pud) :
+ lm_alias(kasan_black_pud));
pud = pud_offset(p4d, addr);
- pud_populate(&init_mm, pud, lm_alias(kasan_zero_pmd));
+ pud_populate(&init_mm, pud,
+ zero ? lm_alias(kasan_zero_pmd) :
+ lm_alias(kasan_black_pmd));
pmd = pmd_offset(pud, addr);
- pmd_populate_kernel(&init_mm, pmd, lm_alias(kasan_zero_pte));
+ pmd_populate_kernel(&init_mm, pmd,
+ zero ? lm_alias(kasan_zero_pte) :
+ lm_alias(kasan_black_pte));
continue;
}

@@ -202,6 +247,6 @@ void __init kasan_populate_zero_shadow(const void *shadow_start,
pgd_populate(&init_mm, pgd,
early_alloc(PAGE_SIZE, NUMA_NO_NODE));
}
- zero_p4d_populate(pgd, addr, next);
+ kasan_p4d_populate(pgd, addr, next, zero, private);
} while (pgd++, addr = next, addr != end);
}
--
2.7.4

js1...@gmail.com

unread,
May 15, 2017, 9:17:54 PM5/15/17
to Andrew Morton, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com, Joonsoo Kim
From: Joonsoo Kim <iamjoon...@lge.com>
Following is the result of the memory consumption on my QEMU system.
'runtime' shows the maximum memory usage for on-demand shadow allocation
during the kernel build workload. Note that this patch just introduces
an infrastructure. These benefit will be observed at the last patch
in this series.

Base vs Patched

MemTotal: 858 MB vs 987 MB
runtime: 0 MB vs 30MB
Net Available: 858 MB vs 957 MB

For 4096 MB QEMU system

MemTotal: 3477 MB vs 4000 MB
runtime: 0 MB vs 50MB
Net Available: 3477 MB vs 3950 MB

Memory consumption is reduced by 99 MB and 473 MB, respectively.

Signed-off-by: Joonsoo Kim <iamjoon...@lge.com>
---
include/linux/kasan.h | 41 +++++++++++++++++++++
mm/kasan/kasan.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++
mm/kasan/kasan.h | 12 +++++--
mm/kasan/kasan_init.c | 31 ++++++++++++++++
mm/kasan/report.c | 28 +++++++++++++++
5 files changed, 207 insertions(+), 3 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 7e501b3..4390788 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -15,6 +15,18 @@ struct task_struct;
#include <asm/kasan.h>
#include <asm/pgtable.h>

+#ifndef KASAN_PSHADOW_SIZE
+#define KASAN_PSHADOW_SIZE 0
+#endif
+#ifndef KASAN_PSHADOW_START
+#define KASAN_PSHADOW_START 0
+#endif
+#ifndef KASAN_PSHADOW_END
+#define KASAN_PSHADOW_END 0
+#endif
+
+extern unsigned long kasan_pshadow_offset;
+
extern unsigned char kasan_zero_page[PAGE_SIZE];
extern pte_t kasan_zero_pte[PTRS_PER_PTE];
extern pmd_t kasan_zero_pmd[PTRS_PER_PMD];
@@ -30,6 +42,13 @@ extern p4d_t kasan_black_p4d[PTRS_PER_P4D];
void kasan_populate_shadow(const void *shadow_start,
const void *shadow_end,
bool zero, bool private);
+void kasan_early_init_pshadow(void);
+
+static inline const void *kasan_shadow_to_mem(const void *shadow_addr)
+{
+ return (void *)(((unsigned long)shadow_addr - KASAN_SHADOW_OFFSET)
+ << KASAN_SHADOW_SCALE_SHIFT);
+}

static inline void *kasan_mem_to_shadow(const void *addr)
{
@@ -37,6 +56,24 @@ static inline void *kasan_mem_to_shadow(const void *addr)
+ KASAN_SHADOW_OFFSET;
}

+static inline void *kasan_mem_to_pshadow(const void *addr)
+{
+ return (void *)((unsigned long)addr >> PAGE_SHIFT)
+ + kasan_pshadow_offset;
+}
+
+static inline void *kasan_shadow_to_pshadow(const void *addr)
+{
+ /*
+ * KASAN_SHADOW_END needs special handling since
+ * it will overflow in kasan_shadow_to_mem()
+ */
+ if ((unsigned long)addr == KASAN_SHADOW_END)
+ return (void *)KASAN_PSHADOW_END;
+
+ return kasan_mem_to_pshadow(kasan_shadow_to_mem(addr));
+}
+
/* Enable reporting bugs after kasan_disable_current() */
extern void kasan_enable_current(void);

@@ -44,6 +81,8 @@ extern void kasan_enable_current(void);
extern void kasan_disable_current(void);

void kasan_unpoison_shadow(const void *address, size_t size);
+void kasan_poison_pshadow(const void *address, size_t size);
+void kasan_unpoison_pshadow(const void *address, size_t size);

void kasan_unpoison_task_stack(struct task_struct *task);
void kasan_unpoison_stack_above_sp_to(const void *watermark);
@@ -89,6 +128,8 @@ void kasan_restore_multi_shot(bool enabled);
#else /* CONFIG_KASAN */

static inline void kasan_unpoison_shadow(const void *address, size_t size) {}
+static inline void kasan_poison_pshadow(const void *address, size_t size) {}
+static inline void kasan_unpoison_pshadow(const void *address, size_t size) {}

static inline void kasan_unpoison_task_stack(struct task_struct *task) {}
static inline void kasan_unpoison_stack_above_sp_to(const void *watermark) {}
diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index 97d3560..76b7b89 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -116,6 +116,30 @@ void kasan_unpoison_stack_above_sp_to(const void *watermark)
kasan_unpoison_shadow(sp, size);
}

+static void kasan_mark_pshadow(const void *address, size_t size, u8 value)
+{
+ void *pshadow_start;
+ void *pshadow_end;
+
+ if (!kasan_pshadow_inited())
+ return;
+
+ pshadow_start = kasan_mem_to_pshadow(address);
+ pshadow_end = kasan_mem_to_pshadow(address + size);
+
+ memset(pshadow_start, value, pshadow_end - pshadow_start);
+}
+
+void kasan_poison_pshadow(const void *address, size_t size)
+{
+ kasan_mark_pshadow(address, size, KASAN_PER_PAGE_BYPASS);
+}
+
+void kasan_unpoison_pshadow(const void *address, size_t size)
+{
+ kasan_mark_pshadow(address, size, 0);
+}
+
/*
* All functions below always inlined so compiler could
* perform better optimizations in each of __asan_loadX/__assn_storeX
@@ -269,8 +293,82 @@ static __always_inline bool memory_is_poisoned_n(unsigned long addr,
return false;
}

+static __always_inline u8 pshadow_val_builtin(unsigned long addr, size_t size)
+{
+ u8 shadow_val = *(u8 *)kasan_mem_to_pshadow((void *)addr);
+
+ if (shadow_val == KASAN_PER_PAGE_FREE)
+ return shadow_val;
+
+ if (likely(((addr + size - 1) & PAGE_MASK) >= (size - 1)))
+ return shadow_val;
+
+ if (shadow_val != *(u8 *)kasan_mem_to_pshadow((void *)addr + size - 1))
+ return KASAN_PER_PAGE_FREE;
+
+ return shadow_val;
+}
+
+static __always_inline u8 pshadow_val_n(unsigned long addr, size_t size)
+{
+ u8 *start, *end;
+ u8 shadow_val;
+
+ start = kasan_mem_to_pshadow((void *)addr);
+ end = kasan_mem_to_pshadow((void *)addr + size - 1);
+ size = end - start + 1;
+
+ shadow_val = *start;
+ if (shadow_val == KASAN_PER_PAGE_FREE)
+ return shadow_val;
+
+ while (size) {
+ /*
+ * Different shadow value means that access is over
+ * the boundary. Report the error even if it is
+ * in the valid area.
+ */
+ if (shadow_val != *start)
+ return KASAN_PER_PAGE_FREE;
+
+ start++;
+ size--;
+ }
+
+ return shadow_val;
+}
+
+static __always_inline u8 pshadow_val(unsigned long addr, size_t size)
+{
+ if (!kasan_pshadow_inited())
+ return KASAN_PER_PAGE_BYPASS;
+
+ if (__builtin_constant_p(size)) {
+ switch (size) {
+ case 1:
+ case 2:
+ case 4:
+ case 8:
+ case 16:
+ return pshadow_val_builtin(addr, size);
+ default:
+ BUILD_BUG();
+ }
+ }
+
+ return pshadow_val_n(addr, size);
+}
+
static __always_inline bool memory_is_poisoned(unsigned long addr, size_t size)
{
+ u8 shadow_val = pshadow_val(addr, size);
+
+ if (!shadow_val)
+ return false;
+
+ if (shadow_val != KASAN_PER_PAGE_BYPASS)
+ return true;
+
if (__builtin_constant_p(size)) {
switch (size) {
case 1:
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index 1229298..e9a67ac 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -13,6 +13,9 @@
#define KASAN_KMALLOC_FREE 0xFB /* object was freed (kmem_cache_free/kfree) */
#define KASAN_GLOBAL_REDZONE 0xFA /* redzone for global variable */

+#define KASAN_PER_PAGE_BYPASS 0xFF /* page should be checked by per-byte shadow */
+#define KASAN_PER_PAGE_FREE 0xFE /* page was freed */
+
/*
* Stack redzone shadow values
* (Those are compiler's ABI, don't change them)
@@ -90,10 +93,13 @@ struct kasan_alloc_meta *get_alloc_info(struct kmem_cache *cache,
struct kasan_free_meta *get_free_info(struct kmem_cache *cache,
const void *object);

-static inline const void *kasan_shadow_to_mem(const void *shadow_addr)
+static inline bool kasan_pshadow_inited(void)
{
- return (void *)(((unsigned long)shadow_addr - KASAN_SHADOW_OFFSET)
- << KASAN_SHADOW_SCALE_SHIFT);
+#ifdef HAVE_KASAN_PER_PAGE_SHADOW
+ return true;
+#else
+ return false;
+#endif
}

void kasan_report(unsigned long addr, size_t size,
diff --git a/mm/kasan/kasan_init.c b/mm/kasan/kasan_init.c
index cd0a551..da9dcab 100644
--- a/mm/kasan/kasan_init.c
+++ b/mm/kasan/kasan_init.c
@@ -17,12 +17,15 @@
#include <linux/memblock.h>
#include <linux/mm.h>
#include <linux/pfn.h>
+#include <linux/vmalloc.h>

#include <asm/page.h>
#include <asm/pgalloc.h>

#include "kasan.h"

+unsigned long kasan_pshadow_offset __read_mostly;
+
/*
* This page serves two purposes:
* - It used as early shadow memory. The entire shadow region populated
@@ -250,3 +253,31 @@ void __init kasan_populate_shadow(const void *shadow_start,
kasan_p4d_populate(pgd, addr, next, zero, private);
} while (pgd++, addr = next, addr != end);
}
+
+void __init kasan_early_init_pshadow(void)
+{
+ static struct vm_struct pshadow;
+ unsigned long kernel_offset;
+ int i;
+
+ /*
+ * Temprorary map per-page shadow to per-byte shadow in order to
+ * pass the KASAN checks in vm_area_register_early()
+ */
+ kernel_offset = (unsigned long)kasan_shadow_to_mem(
+ (void *)KASAN_SHADOW_START);
+ kasan_pshadow_offset = KASAN_SHADOW_START -
+ (kernel_offset >> PAGE_SHIFT);
+
+ pshadow.size = KASAN_PSHADOW_SIZE;
+ pshadow.flags = VM_ALLOC | VM_NO_GUARD;
+ vm_area_register_early(&pshadow,
+ (PAGE_SIZE << KASAN_SHADOW_SCALE_SHIFT));
+
+ kasan_pshadow_offset = (unsigned long)pshadow.addr -
+ (kernel_offset >> PAGE_SHIFT);
+
+ BUILD_BUG_ON(KASAN_FREE_PAGE != KASAN_PER_PAGE_BYPASS);
+ for (i = 0; i < PAGE_SIZE; i++)
+ kasan_black_page[i] = KASAN_FREE_PAGE;
+}
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index beee0e9..9b47e10 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -39,6 +39,26 @@
#define SHADOW_BYTES_PER_ROW (SHADOW_BLOCKS_PER_ROW * SHADOW_BYTES_PER_BLOCK)
#define SHADOW_ROWS_AROUND_ADDR 2

+static bool bad_in_pshadow(const void *addr, size_t size)
+{
+ u8 shadow_val;
+ const void *end = addr + size;
+
+ if (!kasan_pshadow_inited())
+ return false;
+
+ shadow_val = *(u8 *)kasan_mem_to_pshadow(addr);
+ if (shadow_val == KASAN_PER_PAGE_FREE)
+ return true;
+
+ for (; addr < end; addr += PAGE_SIZE) {
+ if (shadow_val != *(u8 *)kasan_mem_to_pshadow(addr))
+ return true;
+ }
+
+ return false;
+}
+
static const void *find_first_bad_addr(const void *addr, size_t size)
{
u8 shadow_val = *(u8 *)kasan_mem_to_shadow(addr);
@@ -62,6 +82,11 @@ static const char *get_shadow_bug_type(struct kasan_access_info *info)
const char *bug_type = "unknown-crash";
u8 *shadow_addr;

+ if (bad_in_pshadow(info->access_addr, info->access_size)) {
+ info->first_bad_addr = NULL;
+ bug_type = "use-after-free";
+ return bug_type;
+ }
info->first_bad_addr = find_first_bad_addr(info->access_addr,
info->access_size);

@@ -290,6 +315,9 @@ static void print_shadow_for_address(const void *addr)
const void *shadow = kasan_mem_to_shadow(addr);
const void *shadow_row;

+ if (!addr)
+ return;
+
shadow_row = (void *)round_down((unsigned long)shadow,
SHADOW_BYTES_PER_ROW)
- SHADOW_ROWS_AROUND_ADDR * SHADOW_BYTES_PER_ROW;
--
2.7.4

js1...@gmail.com

unread,
May 15, 2017, 9:17:58 PM5/15/17
to Andrew Morton, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com, Joonsoo Kim
From: Joonsoo Kim <iamjoon...@lge.com>

Now, we have the per-page shadow. The purpose of the per-page shadow is
to check the page that is just used/checked in page size granularity.
File cache pages/anonymous page are in this category. The other
category is for being used by byte size granularity. Global variable,
kernel stack and slab memory are in this category.

This patch distinguishes them and mark the page that should be checked by
the original shadow. Validity check for this page will be performed
by using original shadow so we don't lose any checking accuracy even if
we check other pages by using per-page shadow.

Note that there is no code for global variable in this patch since it is
a static area and it will be directly handled by architecture
specific code.

Signed-off-by: Joonsoo Kim <iamjoon...@lge.com>
---
include/linux/kasan.h | 15 ++++++++--
kernel/fork.c | 7 +++++
mm/kasan/kasan.c | 77 +++++++++++++++++++++++++++++++++++++++++++++------
mm/slab.c | 9 ++++++
mm/slab_common.c | 11 ++++++--
mm/slub.c | 8 ++++++
6 files changed, 115 insertions(+), 12 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 4390788..c8ef665 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -83,6 +83,10 @@ extern void kasan_disable_current(void);
void kasan_unpoison_shadow(const void *address, size_t size);
void kasan_poison_pshadow(const void *address, size_t size);
void kasan_unpoison_pshadow(const void *address, size_t size);
+int kasan_stack_alloc(const void *address, size_t size);
+void kasan_stack_free(const void *addr, size_t size);
+int kasan_slab_page_alloc(const void *address, size_t size, gfp_t flags);
+void kasan_slab_page_free(const void *addr, size_t size);

void kasan_unpoison_task_stack(struct task_struct *task);
void kasan_unpoison_stack_above_sp_to(const void *watermark);
@@ -100,7 +104,7 @@ void kasan_unpoison_object_data(struct kmem_cache *cache, void *object);
void kasan_poison_object_data(struct kmem_cache *cache, void *object);
void kasan_init_slab_obj(struct kmem_cache *cache, const void *object);

-void kasan_kmalloc_large(const void *ptr, size_t size, gfp_t flags);
+int kasan_kmalloc_large(const void *ptr, size_t size, gfp_t flags);
void kasan_kfree_large(const void *ptr);
void kasan_poison_kfree(void *ptr);
void kasan_kmalloc(struct kmem_cache *s, const void *object, size_t size,
@@ -130,6 +134,12 @@ void kasan_restore_multi_shot(bool enabled);
static inline void kasan_unpoison_shadow(const void *address, size_t size) {}
static inline void kasan_poison_pshadow(const void *address, size_t size) {}
static inline void kasan_unpoison_pshadow(const void *address, size_t size) {}
+static inline int kasan_stack_alloc(const void *address,
+ size_t size) { return 0; }
+static inline void kasan_stack_free(const void *addr, size_t size) {}
+static inline int kasan_slab_page_alloc(const void *address, size_t size,
+ gfp_t flags) { return 0; }
+static inline void kasan_slab_page_free(const void *addr, size_t size) {}

static inline void kasan_unpoison_task_stack(struct task_struct *task) {}
static inline void kasan_unpoison_stack_above_sp_to(const void *watermark) {}
@@ -154,7 +164,8 @@ static inline void kasan_poison_object_data(struct kmem_cache *cache,
static inline void kasan_init_slab_obj(struct kmem_cache *cache,
const void *object) {}

-static inline void kasan_kmalloc_large(void *ptr, size_t size, gfp_t flags) {}
+static inline int kasan_kmalloc_large(void *ptr, size_t size,
+ gfp_t flags) { return 0; }
static inline void kasan_kfree_large(const void *ptr) {}
static inline void kasan_poison_kfree(void *ptr) {}
static inline void kasan_kmalloc(struct kmem_cache *s, const void *object,
diff --git a/kernel/fork.c b/kernel/fork.c
index 5d32780..6741d3c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -237,6 +237,12 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
struct page *page = alloc_pages_node(node, THREADINFO_GFP,
THREAD_SIZE_ORDER);

+ if (kasan_stack_alloc(page ? page_address(page) : NULL,
+ PAGE_SIZE << THREAD_SIZE_ORDER)) {
+ __free_pages(page, THREAD_SIZE_ORDER);
+ page = NULL;
+ }
+
return page ? page_address(page) : NULL;
#endif
}
@@ -264,6 +270,7 @@ static inline void free_thread_stack(struct task_struct *tsk)
}
#endif

+ kasan_stack_free(tsk->stack, PAGE_SIZE << THREAD_SIZE_ORDER);
__free_pages(virt_to_page(tsk->stack), THREAD_SIZE_ORDER);
}
# else
diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index 76b7b89..fb18283 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -455,16 +455,31 @@ void *memcpy(void *dest, const void *src, size_t len)

void kasan_alloc_pages(struct page *page, unsigned int order)
{
- if (likely(!PageHighMem(page)))
- kasan_unpoison_shadow(page_address(page), PAGE_SIZE << order);
+ if (likely(!PageHighMem(page))) {
+ if (!kasan_pshadow_inited()) {
+ kasan_unpoison_shadow(page_address(page),
+ PAGE_SIZE << order);
+ return;
+ }
+
+ kasan_unpoison_pshadow(page_address(page), PAGE_SIZE << order);
+ }
}

void kasan_free_pages(struct page *page, unsigned int order)
{
- if (likely(!PageHighMem(page)))
- kasan_poison_shadow(page_address(page),
- PAGE_SIZE << order,
- KASAN_FREE_PAGE);
+ if (likely(!PageHighMem(page))) {
+ if (!kasan_pshadow_inited()) {
+ kasan_poison_shadow(page_address(page),
+ PAGE_SIZE << order,
+ KASAN_FREE_PAGE);
+ return;
+ }
+
+ kasan_mark_pshadow(page_address(page),
+ PAGE_SIZE << order,
+ KASAN_PER_PAGE_FREE);
+ }
}

/*
@@ -700,19 +715,25 @@ void kasan_kmalloc(struct kmem_cache *cache, const void *object, size_t size,
}
EXPORT_SYMBOL(kasan_kmalloc);

-void kasan_kmalloc_large(const void *ptr, size_t size, gfp_t flags)
+int kasan_kmalloc_large(const void *ptr, size_t size, gfp_t flags)
{
struct page *page;
unsigned long redzone_start;
unsigned long redzone_end;
+ int err;

if (gfpflags_allow_blocking(flags))
quarantine_reduce();

if (unlikely(ptr == NULL))
- return;
+ return 0;

page = virt_to_page(ptr);
+ err = kasan_slab_page_alloc(ptr,
+ PAGE_SIZE << compound_order(page), flags);
+ if (err)
+ return err;
+
redzone_start = round_up((unsigned long)(ptr + size),
KASAN_SHADOW_SCALE_SIZE);
redzone_end = (unsigned long)ptr + (PAGE_SIZE << compound_order(page));
@@ -720,6 +741,8 @@ void kasan_kmalloc_large(const void *ptr, size_t size, gfp_t flags)
kasan_unpoison_shadow(ptr, size);
kasan_poison_shadow((void *)redzone_start, redzone_end - redzone_start,
KASAN_PAGE_REDZONE);
+
+ return 0;
}

void kasan_krealloc(const void *object, size_t size, gfp_t flags)
@@ -758,6 +781,25 @@ void kasan_kfree_large(const void *ptr)
KASAN_FREE_PAGE);
}

+int kasan_slab_page_alloc(const void *addr, size_t size, gfp_t flags)
+{
+ if (!kasan_pshadow_inited() || !addr)
+ return 0;
+
+ kasan_unpoison_shadow(addr, size);
+ kasan_poison_pshadow(addr, size);
+
+ return 0;
+}
+
+void kasan_slab_page_free(const void *addr, size_t size)
+{
+ if (!kasan_pshadow_inited() || !addr)
+ return;
+
+ kasan_poison_shadow(addr, size, KASAN_FREE_PAGE);
+}
+
int kasan_module_alloc(void *addr, size_t size)
{
void *ret;
@@ -792,6 +834,25 @@ void kasan_free_shadow(const struct vm_struct *vm)
vfree(kasan_mem_to_shadow(vm->addr));
}

+int kasan_stack_alloc(const void *addr, size_t size)
+{
+ if (!kasan_pshadow_inited() || !addr)
+ return 0;
+
+ kasan_unpoison_shadow(addr, size);
+ kasan_poison_pshadow(addr, size);
+
+ return 0;
+}
+
+void kasan_stack_free(const void *addr, size_t size)
+{
+ if (!kasan_pshadow_inited() || !addr)
+ return;
+
+ kasan_poison_shadow(addr, size, KASAN_FREE_PAGE);
+}
+
static void register_global(struct kasan_global *global)
{
size_t aligned_size = round_up(global->size, KASAN_SHADOW_SCALE_SIZE);
diff --git a/mm/slab.c b/mm/slab.c
index 2a31ee3..77b8be6 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1418,7 +1418,15 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
return NULL;
}

+ if (kasan_slab_page_alloc(page_address(page),
+ PAGE_SIZE << cachep->gfporder, flags)) {
+ __free_pages(page, cachep->gfporder);
+ return NULL;
+ }
+
if (memcg_charge_slab(page, flags, cachep->gfporder, cachep)) {
+ kasan_slab_page_free(page_address(page),
+ PAGE_SIZE << cachep->gfporder);
__free_pages(page, cachep->gfporder);
return NULL;
}
@@ -1474,6 +1482,7 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
if (current->reclaim_state)
current->reclaim_state->reclaimed_slab += nr_freed;
memcg_uncharge_slab(page, order, cachep);
+ kasan_slab_page_free(page_address(page), PAGE_SIZE << order);
__free_pages(page, order);
}

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 01a0fe2..4545975 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1112,9 +1112,16 @@ void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)

flags |= __GFP_COMP;
page = alloc_pages(flags, order);
- ret = page ? page_address(page) : NULL;
+ if (!page)
+ return NULL;
+
+ ret = page_address(page);
+ if (kasan_kmalloc_large(ret, size, flags)) {
+ __free_pages(page, order);
+ return NULL;
+ }
+
kmemleak_alloc(ret, size, 1, flags);
- kasan_kmalloc_large(ret, size, flags);
return ret;
}
EXPORT_SYMBOL(kmalloc_order);
diff --git a/mm/slub.c b/mm/slub.c
index 57e5156..721894c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1409,7 +1409,14 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
else
page = __alloc_pages_node(node, flags, order);

+ if (kasan_slab_page_alloc(page ? page_address(page) : NULL,
+ PAGE_SIZE << order, flags)) {
+ __free_pages(page, order);
+ page = NULL;
+ }
+
if (page && memcg_charge_slab(page, flags, order, s)) {
+ kasan_slab_page_free(page_address(page), PAGE_SIZE << order);
__free_pages(page, order);
page = NULL;
}
@@ -1667,6 +1674,7 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
if (current->reclaim_state)
current->reclaim_state->reclaimed_slab += pages;
memcg_uncharge_slab(page, order, s);
+ kasan_slab_page_free(page_address(page), PAGE_SIZE << order);
__free_pages(page, order);
}

--
2.7.4

js1...@gmail.com

unread,
May 15, 2017, 9:18:02 PM5/15/17
to Andrew Morton, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com, Joonsoo Kim
From: Joonsoo Kim <iamjoon...@lge.com>

This patch enables for x86 to use per-page shadow memory.
Most of initialization code for per-page shadow memory is
copied from the code for original shadow memory.

There are two things that aren't trivial.
1. per-page shadow memory for global variable is initialized
as the bypass range. It's not the target for on-demand shadow
memory allocation since shadow memory for global variable is
always required.
2. per-page shadow memory for the module is initialized as the
bypass range since on-demand shadow memory allocation
for the module is already implemented.

Note that on-demand allocation for original shadow memory isn't
implemented yet so there is no memory saving on this patch.
It will be implemented in the following patch.

Signed-off-by: Joonsoo Kim <iamjoon...@lge.com>
---
arch/x86/include/asm/kasan.h | 6 +++
arch/x86/mm/kasan_init_64.c | 87 +++++++++++++++++++++++++++++++++++++++-----
2 files changed, 84 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kasan.h b/arch/x86/include/asm/kasan.h
index f527b02..cfa63c7 100644
--- a/arch/x86/include/asm/kasan.h
+++ b/arch/x86/include/asm/kasan.h
@@ -18,6 +18,12 @@
*/
#define KASAN_SHADOW_END (KASAN_SHADOW_START + (1ULL << (__VIRTUAL_MASK_SHIFT - 3)))

+#define HAVE_KASAN_PER_PAGE_SHADOW 1
+#define KASAN_PSHADOW_SIZE ((1ULL << (47 - PAGE_SHIFT)))
+#define KASAN_PSHADOW_START (kasan_pshadow_offset + \
+ (0xffff800000000000ULL >> PAGE_SHIFT))
+#define KASAN_PSHADOW_END (KASAN_PSHADOW_START + KASAN_PSHADOW_SIZE)
+
#ifndef __ASSEMBLY__

#ifdef CONFIG_KASAN
diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index adc673b..1c300bf 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -15,19 +15,29 @@
extern pgd_t early_level4_pgt[PTRS_PER_PGD];
extern struct range pfn_mapped[E820_MAX_ENTRIES];

-static int __init map_range(struct range *range)
+static int __init map_range(struct range *range, bool pshadow)
{
unsigned long start;
unsigned long end;

- start = (unsigned long)kasan_mem_to_shadow(pfn_to_kaddr(range->start));
- end = (unsigned long)kasan_mem_to_shadow(pfn_to_kaddr(range->end));
+ start = (unsigned long)pfn_to_kaddr(range->start);
+ end = (unsigned long)pfn_to_kaddr(range->end);

/*
* end + 1 here is intentional. We check several shadow bytes in advance
* to slightly speed up fastpath. In some rare cases we could cross
* boundary of mapped shadow, so we just map some more here.
*/
+ if (pshadow) {
+ start = (unsigned long)kasan_mem_to_pshadow((void *)start);
+ end = (unsigned long)kasan_mem_to_pshadow((void *)end);
+
+ return vmemmap_populate(start, end + 1, NUMA_NO_NODE);
+ }
+
+ start = (unsigned long)kasan_mem_to_shadow((void *)start);
+ end = (unsigned long)kasan_mem_to_shadow((void *)end);
+
return vmemmap_populate(start, end + 1, NUMA_NO_NODE);
}

@@ -49,11 +59,10 @@ static void __init clear_pgds(unsigned long start,
}
}

-static void __init kasan_map_early_shadow(pgd_t *pgd)
+static void __init kasan_map_early_shadow(pgd_t *pgd,
+ unsigned long start, unsigned long end)
{
int i;
- unsigned long start = KASAN_SHADOW_START;
- unsigned long end = KASAN_SHADOW_END;

for (i = pgd_index(start); start < end; i++) {
switch (CONFIG_PGTABLE_LEVELS) {
@@ -109,8 +118,35 @@ void __init kasan_early_init(void)
for (i = 0; CONFIG_PGTABLE_LEVELS >= 5 && i < PTRS_PER_P4D; i++)
kasan_zero_p4d[i] = __p4d(p4d_val);

- kasan_map_early_shadow(early_level4_pgt);
- kasan_map_early_shadow(init_level4_pgt);
+ kasan_map_early_shadow(early_level4_pgt,
+ KASAN_SHADOW_START, KASAN_SHADOW_END);
+ kasan_map_early_shadow(init_level4_pgt,
+ KASAN_SHADOW_START, KASAN_SHADOW_END);
+
+ kasan_early_init_pshadow();
+
+ kasan_map_early_shadow(early_level4_pgt,
+ KASAN_PSHADOW_START, KASAN_PSHADOW_END);
+ kasan_map_early_shadow(init_level4_pgt,
+ KASAN_PSHADOW_START, KASAN_PSHADOW_END);
+
+ /* Prepare black shadow memory */
+ pte_val = __pa_nodebug(kasan_black_page) | __PAGE_KERNEL_RO;
+ pmd_val = __pa_nodebug(kasan_black_pte) | _KERNPG_TABLE;
+ pud_val = __pa_nodebug(kasan_black_pmd) | _KERNPG_TABLE;
+ p4d_val = __pa_nodebug(kasan_black_pud) | _KERNPG_TABLE;
+
+ for (i = 0; i < PTRS_PER_PTE; i++)
+ kasan_black_pte[i] = __pte(pte_val);
+
+ for (i = 0; i < PTRS_PER_PMD; i++)
+ kasan_black_pmd[i] = __pmd(pmd_val);
+
+ for (i = 0; i < PTRS_PER_PUD; i++)
+ kasan_black_pud[i] = __pud(pud_val);
+
+ for (i = 0; CONFIG_PGTABLE_LEVELS >= 5 && i < PTRS_PER_P4D; i++)
+ kasan_black_p4d[i] = __p4d(p4d_val);
}

void __init kasan_init(void)
@@ -135,7 +171,7 @@ void __init kasan_init(void)
if (pfn_mapped[i].end == 0)
break;

- if (map_range(&pfn_mapped[i]))
+ if (map_range(&pfn_mapped[i], false))
panic("kasan: unable to allocate shadow!");
}
kasan_populate_shadow(
@@ -151,6 +187,39 @@ void __init kasan_init(void)
(void *)KASAN_SHADOW_END,
true, false);

+ /* For per-page shadow */
+ clear_pgds(KASAN_PSHADOW_START, KASAN_PSHADOW_END);
+
+ kasan_populate_shadow((void *)KASAN_PSHADOW_START,
+ kasan_mem_to_pshadow((void *)PAGE_OFFSET),
+ true, false);
+
+ for (i = 0; i < E820_MAX_ENTRIES; i++) {
+ if (pfn_mapped[i].end == 0)
+ break;
+
+ if (map_range(&pfn_mapped[i], true))
+ panic("kasan: unable to allocate shadow!");
+ }
+ kasan_populate_shadow(
+ kasan_mem_to_pshadow((void *)PAGE_OFFSET + MAXMEM),
+ kasan_mem_to_pshadow((void *)__START_KERNEL_map),
+ true, false);
+
+ kasan_populate_shadow(
+ kasan_mem_to_pshadow(_stext),
+ kasan_mem_to_pshadow(_end),
+ false, false);
+
+ kasan_populate_shadow(
+ kasan_mem_to_pshadow((void *)MODULES_VADDR),
+ kasan_mem_to_pshadow((void *)MODULES_END),
+ false, false);
+
+ kasan_populate_shadow(kasan_mem_to_pshadow((void *)MODULES_END),
+ (void *)KASAN_PSHADOW_END,
+ true, false);
+
load_cr3(init_level4_pgt);
__flush_tlb_all();

--
2.7.4

js1...@gmail.com

unread,
May 15, 2017, 9:18:06 PM5/15/17
to Andrew Morton, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com, Joonsoo Kim
From: Joonsoo Kim <iamjoon...@lge.com>

Original shadow memory is only used when it is used by specific types
of access. We can distinguish them and can allocate actual shadow memory
on-demand to reduce memory consumption.

There is a problem on this on-demand shadow memory. After setting up
new mapping, we need to flush TLB entry in all cpus but it's not always
possible in some contexts. Solving this problem isn't possible without
considering architecture specific property so this patch introduces
two architecture specific functions. Architecture who wants to use
this feature needs to implemente them correctly.

Signed-off-by: Joonsoo Kim <iamjoon...@lge.com>
---
arch/x86/mm/kasan_init_64.c | 9 +++
mm/kasan/kasan.c | 133 +++++++++++++++++++++++++++++++++++++++++++-
mm/kasan/kasan.h | 16 ++++--
mm/kasan/kasan_init.c | 2 +
4 files changed, 154 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index 1c300bf..136b73d 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -239,3 +239,12 @@ void __init kasan_init(void)
init_task.kasan_depth = 0;
pr_info("KernelAddressSanitizer initialized\n");
}
+
+void arch_kasan_map_shadow(unsigned long s, unsigned long e)
+{
+}
+
+bool arch_kasan_recheck_prepare(unsigned long addr, size_t size)
+{
+ return false;
+}
diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index fb18283..8d59cf0 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -36,9 +36,13 @@
#include <linux/types.h>
#include <linux/vmalloc.h>
#include <linux/bug.h>
+#include <asm/cacheflush.h>

#include "kasan.h"
#include "../slab.h"
+#include "../internal.h"
+
+static DEFINE_SPINLOCK(shadow_lock);

void kasan_enable_current(void)
{
@@ -140,6 +144,103 @@ void kasan_unpoison_pshadow(const void *address, size_t size)
kasan_mark_pshadow(address, size, 0);
}

+static bool kasan_black_shadow(pte_t *ptep)
+{
+ pte_t pte = *ptep;
+
+ if (pte_none(pte))
+ return true;
+
+ if (pte_pfn(pte) == kasan_black_page_pfn)
+ return true;
+
+ return false;
+}
+
+static int kasan_exist_shadow_pte(pte_t *ptep, pgtable_t token,
+ unsigned long addr, void *data)
+{
+ unsigned long *count = data;
+
+ if (kasan_black_shadow(ptep))
+ return 0;
+
+ (*count)++;
+ return 0;
+}
+
+static int kasan_map_shadow_pte(pte_t *ptep, pgtable_t token,
+ unsigned long addr, void *data)
+{
+ pte_t pte;
+ gfp_t gfp_flags = *(gfp_t *)data;
+ struct page *page;
+ unsigned long flags;
+
+ if (!kasan_black_shadow(ptep))
+ return 0;
+
+ page = alloc_page(gfp_flags);
+ if (!page)
+ return -ENOMEM;
+
+ __memcpy(page_address(page), kasan_black_page, PAGE_SIZE);
+
+ spin_lock_irqsave(&shadow_lock, flags);
+ if (!kasan_black_shadow(ptep))
+ goto out;
+
+ pte = mk_pte(page, PAGE_KERNEL);
+ set_pte_at(&init_mm, addr, ptep, pte);
+ page = NULL;
+
+out:
+ spin_unlock_irqrestore(&shadow_lock, flags);
+ if (page)
+ __free_page(page);
+
+ return 0;
+}
+
+static int kasan_map_shadow(const void *addr, size_t size, gfp_t flags)
+{
+ int err;
+ unsigned long shadow_start, shadow_end;
+ unsigned long count = 0;
+
+ if (!kasan_pshadow_inited())
+ return 0;
+
+ flags = flags & GFP_RECLAIM_MASK;
+ shadow_start = (unsigned long)kasan_mem_to_shadow(addr);
+ shadow_end = (unsigned long)kasan_mem_to_shadow(addr + size);
+ shadow_start = round_down(shadow_start, PAGE_SIZE);
+ shadow_end = ALIGN(shadow_end, PAGE_SIZE);
+
+ err = apply_to_page_range(&init_mm, shadow_start,
+ shadow_end - shadow_start,
+ kasan_exist_shadow_pte, &count);
+ if (err) {
+ pr_err("checking shadow entry is failed");
+ return err;
+ }
+
+ if (count == (shadow_end - shadow_start) / PAGE_SIZE)
+ goto out;
+
+ err = apply_to_page_range(&init_mm, shadow_start,
+ shadow_end - shadow_start,
+ kasan_map_shadow_pte, (void *)&flags);
+
+out:
+ arch_kasan_map_shadow(shadow_start, shadow_end);
+ flush_cache_vmap(shadow_start, shadow_end);
+ if (err)
+ pr_err("mapping shadow entry is failed");
+
+ return err;
+}
+
/*
* All functions below always inlined so compiler could
* perform better optimizations in each of __asan_loadX/__assn_storeX
@@ -389,6 +490,24 @@ static __always_inline bool memory_is_poisoned(unsigned long addr, size_t size)
return memory_is_poisoned_n(addr, size);
}

+static noinline void check_memory_region_slow(unsigned long addr,
+ size_t size, bool write,
+ unsigned long ret_ip)
+{
+ preempt_disable();
+ if (!arch_kasan_recheck_prepare(addr, size))
+ goto report;
+
+ if (!memory_is_poisoned(addr, size)) {
+ preempt_enable();
+ return;
+ }
+
+report:
+ preempt_enable();
+ kasan_report(addr, size, write, ret_ip);
+}
+
static __always_inline void check_memory_region_inline(unsigned long addr,
size_t size, bool write,
unsigned long ret_ip)
@@ -405,7 +524,7 @@ static __always_inline void check_memory_region_inline(unsigned long addr,
if (likely(!memory_is_poisoned(addr, size)))
return;

- kasan_report(addr, size, write, ret_ip);
+ check_memory_region_slow(addr, size, write, ret_ip);
}

static void check_memory_region(unsigned long addr,
@@ -783,9 +902,15 @@ void kasan_kfree_large(const void *ptr)

int kasan_slab_page_alloc(const void *addr, size_t size, gfp_t flags)
{
+ int err;
+
if (!kasan_pshadow_inited() || !addr)
return 0;

+ err = kasan_map_shadow(addr, size, flags);
+ if (err)
+ return err;
+
kasan_unpoison_shadow(addr, size);
kasan_poison_pshadow(addr, size);

@@ -836,9 +961,15 @@ void kasan_free_shadow(const struct vm_struct *vm)

int kasan_stack_alloc(const void *addr, size_t size)
{
+ int err;
+
if (!kasan_pshadow_inited() || !addr)
return 0;

+ err = kasan_map_shadow(addr, size, THREADINFO_GFP);
+ if (err)
+ return err;
+
kasan_unpoison_shadow(addr, size);
kasan_poison_pshadow(addr, size);

diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index e9a67ac..db04087 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -88,19 +88,25 @@ struct kasan_free_meta {
struct qlist_node quarantine_link;
};

+extern unsigned long kasan_black_page_pfn;
+
struct kasan_alloc_meta *get_alloc_info(struct kmem_cache *cache,
const void *object);
struct kasan_free_meta *get_free_info(struct kmem_cache *cache,
const void *object);

-static inline bool kasan_pshadow_inited(void)
-{
#ifdef HAVE_KASAN_PER_PAGE_SHADOW
- return true;
+void arch_kasan_map_shadow(unsigned long s, unsigned long e);
+bool arch_kasan_recheck_prepare(unsigned long addr, size_t size);
+
+static inline bool kasan_pshadow_inited(void) { return true; }
+
#else
- return false;
+static inline void arch_kasan_map_shadow(unsigned long s, unsigned long e) { }
+static inline bool arch_kasan_recheck_prepare(unsigned long addr,
+ size_t size) { return false; }
+static inline bool kasan_pshadow_inited(void) { return false; }
#endif
-}

void kasan_report(unsigned long addr, size_t size,
bool is_write, unsigned long ip);
diff --git a/mm/kasan/kasan_init.c b/mm/kasan/kasan_init.c
index da9dcab..85dff70 100644
--- a/mm/kasan/kasan_init.c
+++ b/mm/kasan/kasan_init.c
@@ -25,6 +25,7 @@
#include "kasan.h"

unsigned long kasan_pshadow_offset __read_mostly;
+unsigned long kasan_black_page_pfn __read_mostly;

/*
* This page serves two purposes:
@@ -278,6 +279,7 @@ void __init kasan_early_init_pshadow(void)
(kernel_offset >> PAGE_SHIFT);

BUILD_BUG_ON(KASAN_FREE_PAGE != KASAN_PER_PAGE_BYPASS);
+ kasan_black_page_pfn = PFN_DOWN(__pa(kasan_black_page));
for (i = 0; i < PAGE_SIZE; i++)
kasan_black_page[i] = KASAN_FREE_PAGE;
}
--
2.7.4

js1...@gmail.com

unread,
May 15, 2017, 9:18:10 PM5/15/17
to Andrew Morton, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com, Joonsoo Kim
From: Joonsoo Kim <iamjoon...@lge.com>

Enable on-demand shadow mapping in x86.

x86 uses separate per-cpu kernel stack for interrupt/exception context.
We need to populate shadow memory for them before they are used.

And, there are two possible problems due to stable TLB entry when using
on-demand shadow mapping since we cannot fully flush the TLB in
some context and we need to handle these situation.

1. write protection fault: original shadow memory for the page is
mapped by black shadow page with write protection in default. When
this page is allocated for a slab or kernel stack, new mapping is
established but stable TLB isn't fully flushed. So, when marking
the shadow value happen in other cpu, write protection fault will happen.
Thanks to x86's spurious fault handling, stale TLB will be invalidated
after one exception fault so there is no actual problem in this case.

2. false-positive in KASAN shadow check: With above situation, if someone
try to check shadow memory, wrong value would be read due to stale
TLB entry. We need to recheck with flushing the stale TLB in this
case. It is implemented in arch_kasan_recheck_prepare() and
generic KASAN check function.

Signed-off-by: Joonsoo Kim <iamjoon...@lge.com>
---
arch/x86/include/asm/kasan.h | 2 +
arch/x86/include/asm/processor.h | 4 ++
arch/x86/kernel/cpu/common.c | 4 +-
arch/x86/kernel/setup_percpu.c | 2 +
arch/x86/mm/kasan_init_64.c | 82 +++++++++++++++++++++++++++++++++++++++-
5 files changed, 90 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kasan.h b/arch/x86/include/asm/kasan.h
index cfa63c7..91a29ed 100644
--- a/arch/x86/include/asm/kasan.h
+++ b/arch/x86/include/asm/kasan.h
@@ -29,9 +29,11 @@
#ifdef CONFIG_KASAN
void __init kasan_early_init(void);
void __init kasan_init(void);
+void __init kasan_init_late(void);
#else
static inline void kasan_early_init(void) { }
static inline void kasan_init(void) { }
+static inline void kasan_init_late(void) { }
#endif

#endif
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 3cada99..516c972 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -377,6 +377,10 @@ DECLARE_INIT_PER_CPU(irq_stack_union);

DECLARE_PER_CPU(char *, irq_stack_ptr);
DECLARE_PER_CPU(unsigned int, irq_count);
+
+#define EXCEPTION_STKSZ_TOTAL ((N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ)
+DECLARE_PER_CPU(char, exception_stacks[EXCEPTION_STKSZ_TOTAL]);
+
extern asmlinkage void ignore_sysret(void);
#else /* X86_64 */
#ifdef CONFIG_CC_STACKPROTECTOR
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index c8b3987..d16c65a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1328,8 +1328,8 @@ static const unsigned int exception_stack_sizes[N_EXCEPTION_STACKS] = {
[DEBUG_STACK - 1] = DEBUG_STKSZ
};

-static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks
- [(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]);
+DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks
+ [EXCEPTION_STKSZ_TOTAL]);

/* May not be marked __init: used by software suspend */
void syscall_init(void)
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index 10edd1e..cb3aeef 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -21,6 +21,7 @@
#include <asm/cpumask.h>
#include <asm/cpu.h>
#include <asm/stackprotector.h>
+#include <asm/kasan.h>

DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number);
EXPORT_PER_CPU_SYMBOL(cpu_number);
@@ -309,4 +310,5 @@ void __init setup_per_cpu_areas(void)
swapper_pg_dir + KERNEL_PGD_BOUNDARY,
min(KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
#endif
+ kasan_init_late();
}
diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index 136b73d..a185668 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -7,6 +7,7 @@
#include <linux/sched.h>
#include <linux/sched/task.h>
#include <linux/vmalloc.h>
+#include <linux/memblock.h>

#include <asm/e820/types.h>
#include <asm/tlbflush.h>
@@ -15,6 +16,12 @@
extern pgd_t early_level4_pgt[PTRS_PER_PGD];
extern struct range pfn_mapped[E820_MAX_ENTRIES];

+static __init void *early_alloc(size_t size, int node)
+{
+ return memblock_virt_alloc_try_nid(size, size, __pa(MAX_DMA_ADDRESS),
+ BOOTMEM_ALLOC_ACCESSIBLE, node);
+}
+
static int __init map_range(struct range *range, bool pshadow)
{
unsigned long start;
@@ -38,7 +45,9 @@ static int __init map_range(struct range *range, bool pshadow)
start = (unsigned long)kasan_mem_to_shadow((void *)start);
end = (unsigned long)kasan_mem_to_shadow((void *)end);

- return vmemmap_populate(start, end + 1, NUMA_NO_NODE);
+ kasan_populate_shadow((void *)start, (void *)end + 1,
+ false, true);
+ return 0;
}

static void __init clear_pgds(unsigned long start,
@@ -240,11 +249,80 @@ void __init kasan_init(void)
pr_info("KernelAddressSanitizer initialized\n");
}

+static void __init kasan_map_shadow_late(unsigned long start,
+ unsigned long end)
+{
+ unsigned long addr;
+ unsigned char *page;
+ pgd_t *pgd;
+ p4d_t *p4d;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *ptep;
+ pte_t pte;
+
+ for (addr = start; addr < end; addr += PAGE_SIZE) {
+ pgd = pgd_offset_k(addr);
+ p4d = p4d_offset(pgd, addr);
+ pud = pud_offset(p4d, addr);
+ pmd = pmd_offset(pud, addr);
+ ptep = pte_offset_kernel(pmd, addr);
+
+ page = early_alloc(PAGE_SIZE, NUMA_NO_NODE);
+ pte = pfn_pte(PFN_DOWN(__pa(page)), PAGE_KERNEL);
+ set_pte_at(&init_mm, addr, ptep, pte);
+ }
+}
+
+static void __init __kasan_init_late(unsigned long start, unsigned long end)
+{
+ unsigned long shadow_start, shadow_end;
+
+ shadow_start = (unsigned long)kasan_mem_to_shadow((void *)start);
+ shadow_start = round_down(shadow_start, PAGE_SIZE);
+ shadow_end = (unsigned long)kasan_mem_to_shadow((void *)end);
+ shadow_end = ALIGN(shadow_end, PAGE_SIZE);
+
+ kasan_map_shadow_late(shadow_start, shadow_end);
+ kasan_poison_pshadow((void *)start, ALIGN(end, PAGE_SIZE) - start);
+}
+
+void __init kasan_init_late(void)
+{
+ int cpu;
+ unsigned long start, end;
+
+ for_each_possible_cpu(cpu) {
+ end = (unsigned long)per_cpu(irq_stack_ptr, cpu);
+ start = end - IRQ_STACK_SIZE;
+
+ __kasan_init_late(start, end);
+
+ start = (unsigned long)per_cpu(exception_stacks, cpu);
+ end = start + sizeof(exception_stacks);
+
+ __kasan_init_late(start, end);
+ }
+}
+
+/*
+ * We cannot flush the TLBs in other cpus due to deadlock
+ * so just flush the TLB in current cpu. Accessing stale TLB
+ * entry would cause following two problem and we can handle them.
+ *
+ * 1. write protection fault: It will be handled by spurious
+ * fault handler. It will invalidate stale TLB entry.
+ * 2. false-positive in KASAN shadow check: It will be
+ * handled by re-check with flushing local TLB.
+ */
void arch_kasan_map_shadow(unsigned long s, unsigned long e)
{
+ __flush_tlb_all();
}

bool arch_kasan_recheck_prepare(unsigned long addr, size_t size)
{
- return false;
+ __flush_tlb_all();
+
+ return true;
}
--
2.7.4

js1...@gmail.com

unread,
May 15, 2017, 9:18:14 PM5/15/17
to Andrew Morton, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com, Joonsoo Kim
From: Joonsoo Kim <iamjoon...@lge.com>

On-demand alloc/map the shadow memory isn't sufficient to save
memory consumption since shadow memory would be populated
for all the memory range in the long running system. This patch
implements dynamic shadow memory unmap/free to solve this problem.

Since shadow memory is populated in order-3 page unit, we can also
unmap/free in order-3 page unit. Therefore, this patch inserts
a hook in buddy allocator to detect free of order-3 page.

Note that unmapping need to flush TLBs in all cpus so actual
unmap/free is delegate to the workqueue.

Signed-off-by: Joonsoo Kim <iamjoon...@lge.com>
---
include/linux/kasan.h | 4 ++
mm/kasan/kasan.c | 134 ++++++++++++++++++++++++++++++++++++++++++++++++++
mm/page_alloc.c | 10 ++++
3 files changed, 148 insertions(+)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index c8ef665..9e44cf6 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -87,6 +87,8 @@ int kasan_stack_alloc(const void *address, size_t size);
void kasan_stack_free(const void *addr, size_t size);
int kasan_slab_page_alloc(const void *address, size_t size, gfp_t flags);
void kasan_slab_page_free(const void *addr, size_t size);
+bool kasan_free_buddy(struct page *page, unsigned int order,
+ unsigned int max_order);

void kasan_unpoison_task_stack(struct task_struct *task);
void kasan_unpoison_stack_above_sp_to(const void *watermark);
@@ -140,6 +142,8 @@ static inline void kasan_stack_free(const void *addr, size_t size) {}
static inline int kasan_slab_page_alloc(const void *address, size_t size,
gfp_t flags) { return 0; }
static inline void kasan_slab_page_free(const void *addr, size_t size) {}
+static inline bool kasan_free_buddy(struct page *page, unsigned int order,
+ unsigned int max_order) { return false; }

static inline void kasan_unpoison_task_stack(struct task_struct *task) {}
static inline void kasan_unpoison_stack_above_sp_to(const void *watermark) {}
diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index 8d59cf0..e5612be 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -36,13 +36,19 @@
#include <linux/types.h>
#include <linux/vmalloc.h>
#include <linux/bug.h>
+#include <linux/page-isolation.h>
#include <asm/cacheflush.h>
+#include <asm/tlbflush.h>
+#include <asm/sections.h>

#include "kasan.h"
#include "../slab.h"
#include "../internal.h"

static DEFINE_SPINLOCK(shadow_lock);
+static LIST_HEAD(unmap_list);
+static void kasan_unmap_shadow_workfn(struct work_struct *work);
+static DECLARE_WORK(kasan_unmap_shadow_work, kasan_unmap_shadow_workfn);

void kasan_enable_current(void)
{
@@ -241,6 +247,125 @@ static int kasan_map_shadow(const void *addr, size_t size, gfp_t flags)
return err;
}

+static int kasan_unmap_shadow_pte(pte_t *ptep, pgtable_t token,
+ unsigned long addr, void *data)
+{
+ pte_t pte;
+ struct page *page;
+ struct list_head *list = data;
+
+ if (kasan_black_shadow(ptep))
+ return 0;
+
+ if (addr >= (unsigned long)_text && addr < (unsigned long)_end)
+ return 0;
+
+ pte = *ptep;
+ page = pfn_to_page(pte_pfn(pte));
+ list_add(&page->lru, list);
+
+ pte = pfn_pte(PFN_DOWN(__pa(kasan_black_page)), PAGE_KERNEL);
+ pte = pte_wrprotect(pte);
+ set_pte_at(&init_mm, addr, ptep, pte);
+
+ return 0;
+}
+
+static void kasan_unmap_shadow_workfn(struct work_struct *work)
+{
+ struct page *page, *next;
+ LIST_HEAD(list);
+ LIST_HEAD(shadow_list);
+ unsigned long flags;
+ unsigned int order;
+ unsigned long shadow_addr, shadow_size;
+ unsigned long tlb_start = ULONG_MAX, tlb_end = 0;
+ int err;
+
+ spin_lock_irqsave(&shadow_lock, flags);
+ list_splice_init(&unmap_list, &list);
+ spin_unlock_irqrestore(&shadow_lock, flags);
+
+ if (list_empty(&list))
+ return;
+
+ list_for_each_entry_safe(page, next, &list, lru) {
+ order = page_private(page);
+ post_alloc_hook(page, order, GFP_NOWAIT);
+ set_page_private(page, order);
+
+ shadow_addr = (unsigned long)kasan_mem_to_shadow(
+ page_address(page));
+ shadow_size = PAGE_SIZE << (order - KASAN_SHADOW_SCALE_SHIFT);
+
+ tlb_start = min(shadow_addr, tlb_start);
+ tlb_end = max(shadow_addr + shadow_size, tlb_end);
+
+ flush_cache_vunmap(shadow_addr, shadow_addr + shadow_size);
+ err = apply_to_page_range(&init_mm, shadow_addr, shadow_size,
+ kasan_unmap_shadow_pte, &shadow_list);
+ if (err) {
+ pr_err("invalid shadow entry is found");
+ list_del(&page->lru);
+ }
+ }
+ flush_tlb_kernel_range(tlb_start, tlb_end);
+
+ list_for_each_entry_safe(page, next, &list, lru) {
+ list_del(&page->lru);
+ __free_pages(page, page_private(page));
+ }
+ list_for_each_entry_safe(page, next, &shadow_list, lru) {
+ list_del(&page->lru);
+ __free_page(page);
+ }
+}
+
+static bool kasan_unmap_shadow(struct page *page, unsigned int order,
+ unsigned int max_order)
+{
+ int err;
+ unsigned long shadow_addr, shadow_size;
+ unsigned long count = 0;
+ LIST_HEAD(list);
+ unsigned long flags;
+ struct zone *zone;
+ int mt;
+
+ if (order < KASAN_SHADOW_SCALE_SHIFT)
+ return false;
+
+ if (max_order != (KASAN_SHADOW_SCALE_SHIFT + 1))
+ return false;
+
+ shadow_addr = (unsigned long)kasan_mem_to_shadow(page_address(page));
+ shadow_size = PAGE_SIZE << (order - KASAN_SHADOW_SCALE_SHIFT);
+ err = apply_to_page_range(&init_mm, shadow_addr, shadow_size,
+ kasan_exist_shadow_pte, &count);
+ if (err) {
+ pr_err("checking shadow entry is failed");
+ return false;
+ }
+
+ if (!count)
+ return false;
+
+ zone = page_zone(page);
+ mt = get_pageblock_migratetype(page);
+ if (!is_migrate_isolate(mt))
+ __mod_zone_freepage_state(zone, -(1UL << order), mt);
+
+ set_page_private(page, order);
+
+ spin_lock_irqsave(&shadow_lock, flags);
+ list_add(&page->lru, &unmap_list);
+ spin_unlock_irqrestore(&shadow_lock, flags);
+
+ schedule_work(&kasan_unmap_shadow_work);
+
+ return true;
+}
+
/*
* All functions below always inlined so compiler could
* perform better optimizations in each of __asan_loadX/__assn_storeX
@@ -601,6 +726,15 @@ void kasan_free_pages(struct page *page, unsigned int order)
}
}

+bool kasan_free_buddy(struct page *page, unsigned int order,
+ unsigned int max_order)
+{
+ if (!kasan_pshadow_inited())
+ return false;
+
+ return kasan_unmap_shadow(page, order, max_order);
+}
+
/*
* Adaptive redzone policy taken from the userspace AddressSanitizer runtime.
* For larger allocations larger redzones are used.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3b175c3..4a6f722 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -797,6 +797,12 @@ static inline void __free_one_page(struct page *page,

max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1);

+#ifdef CONFIG_KASAN
+ /* Suppress merging at initial attempt to unmap shadow memory */
+ max_order = min_t(unsigned int,
+ KASAN_SHADOW_SCALE_SHIFT + 1, max_order);
+#endif
+
VM_BUG_ON(!zone_is_initialized(zone));
VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page);

@@ -832,6 +838,10 @@ static inline void __free_one_page(struct page *page,
pfn = combined_pfn;
order++;
}
+
+ if (unlikely(kasan_free_buddy(page, order, max_order)))
+ return;
+
if (max_order < MAX_ORDER) {
/* If we are here, it means order is >= pageblock_order.
* We want to prevent merge between freepages on isolate
--
2.7.4

js1...@gmail.com

unread,
May 15, 2017, 9:18:18 PM5/15/17
to Andrew Morton, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com, Joonsoo Kim
From: Joonsoo Kim <iamjoon...@lge.com>

Majority of access in the kernel is an access to slab objects.
In current implementation, we checks two types of shadow memory
in this case and it causes performance regression.

kernel build (2048 MB QEMU)
Base vs per-page
219 sec vs 238 sec

Although current per-page shadow implementation is easy
to understand in terms of concept, this performance regression is
too bad so this patch changes the check order from per-page and
then per-byte shadow to per-byte and then per-page shadow.

This change would increases chance of stale TLB problem since
mapping for per-byte shadow isn't fully synchronized and we will try
to access all the region on this shadow memory. But, it doesn't hurt
the correctness so there is no problem on this new implementation.
Following is the result of this patch.

kernel build (2048 MB QEMU)
base vs per-page vs this patch
219 sec vs 238 sec vs 222 sec

Performance is restored.

Signed-off-by: Joonsoo Kim <iamjoon...@lge.com>
---
mm/kasan/kasan.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index e5612be..76c1c37 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -587,14 +587,6 @@ static __always_inline u8 pshadow_val(unsigned long addr, size_t size)

static __always_inline bool memory_is_poisoned(unsigned long addr, size_t size)
{
- u8 shadow_val = pshadow_val(addr, size);
-
- if (!shadow_val)
- return false;
-
- if (shadow_val != KASAN_PER_PAGE_BYPASS)
- return true;
-
if (__builtin_constant_p(size)) {
switch (size) {
case 1:
@@ -649,6 +641,9 @@ static __always_inline void check_memory_region_inline(unsigned long addr,
if (likely(!memory_is_poisoned(addr, size)))
return;

+ if (!pshadow_val(addr, size))
+ return;
+
check_memory_region_slow(addr, size, write, ret_ip);
}

--
2.7.4

Joonsoo Kim

unread,
May 15, 2017, 9:24:03 PM5/15/17
to Andrew Morton, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
Reply-To:
In-Reply-To: <1494897409-14408-1-git-s...@lge.com>

Look like there is a sending failure so resend this patch.

-------------------------->8-----------------------------
From 989d7b079c1fd0b934d98d738cbecf7a56f8c6e6 Mon Sep 17 00:00:00 2001
From: Joonsoo Kim <iamjoon...@lge.com>
Date: Fri, 3 Feb 2017 12:52:13 +0900
Subject: [PATCH v1 01/11] mm/kasan: rename XXX_is_zero to XXX_is_nonzero

They return positive value, that is, true, if non-zero value
is found. Rename them to reduce confusion.

Signed-off-by: Joonsoo Kim <iamjoon...@lge.com>
---
mm/kasan/kasan.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index c81549d..85ee45b0 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
static __always_inline bool memory_is_poisoned_n(unsigned long addr,

Joonsoo Kim

unread,
May 15, 2017, 9:28:35 PM5/15/17
to Andrew Morton, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
Sorry for a noise.
Failure is due to suspicious subject.
Change it and resend.

---------------------->8-------------------
From 989d7b079c1fd0b934d98d738cbecf7a56f8c6e6 Mon Sep 17 00:00:00 2001
From: Joonsoo Kim <iamjoon...@lge.com>
Date: Fri, 3 Feb 2017 12:52:13 +0900
Subject: [PATCH v1 01/11] mm/kasan: rename _is_zero to _is_nonzero

Dmitry Vyukov

unread,
May 16, 2017, 12:34:38 AM5/16/17
to Joonsoo Kim, Andrew Morton, Andrey Ryabinin, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com, Joonsoo Kim
On Mon, May 15, 2017 at 6:16 PM, <js1...@gmail.com> wrote:
> From: Joonsoo Kim <iamjoon...@lge.com>
>
> Hello, all.
>
> This is an attempt to recude memory consumption of KASAN. Please see
> following description to get the more information.
>
> 1. What is per-page shadow memory

Hi Joonsoo,

First I need to say that this is great work. I wanted KASAN to consume
1/8-th of _kernel_ memory rather than total physical memory for a long
time.

However, this implementation does not work inline instrumentation. And
the inline instrumentation is the main mode for KASAN. Outline
instrumentation is merely a rudiment to support gcc 4.9, and it needs
to be removed as soon as we stop caring about gcc 4.9 (do we at all?
is it the current compiler in any distro? Ubuntu 12 has 4.8, Ubuntu 14
already has 5.4. And if you build gcc yourself or get a fresher
compiler from somewhere else, you hopefully get something better than
4.9).

Here is an example boot+scp log with inline instrumentation:
https://gist.githubusercontent.com/dvyukov/dfdc8b6972ddd260b201a85d5d5cdb5d/raw/2a032cd5be371c7ad6cad8f14c0a0610e6fa772e/gistfile1.txt

Joonsoo, can you think of a way to take advantages of your approach,
but make it work with inline instrumentation?

Will it work if we map a single zero page for whole shadow initially,
and then lazily map real shadow pages only for kernel memory, and then
remap it again to zero pages when the whole KASAN_SHADOW_SCALE_SHIFT
range of pages becomes unused (similarly to what you do in
kasan_unmap_shadow())?

Dmitry Vyukov

unread,
May 16, 2017, 12:48:09 AM5/16/17
to Joonsoo Kim, Andrew Morton, Andrey Ryabinin, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com, Joonsoo Kim
Just in case, I've uploaded a squashed version of this to codereview
site, if somebody will find it useful:
https://codereview.appspot.com/325780043
(side-by-side diffs is what you want)

Joonsoo Kim

unread,
May 16, 2017, 2:23:28 AM5/16/17
to Dmitry Vyukov, Andrew Morton, Andrey Ryabinin, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
On Mon, May 15, 2017 at 09:34:17PM -0700, Dmitry Vyukov wrote:
> On Mon, May 15, 2017 at 6:16 PM, <js1...@gmail.com> wrote:
> > From: Joonsoo Kim <iamjoon...@lge.com>
> >
> > Hello, all.
> >
> > This is an attempt to recude memory consumption of KASAN. Please see
> > following description to get the more information.
> >
> > 1. What is per-page shadow memory
>
> Hi Joonsoo,

Hello, Dmitry.

>
> First I need to say that this is great work. I wanted KASAN to consume

Thanks!

> 1/8-th of _kernel_ memory rather than total physical memory for a long
> time.
>
> However, this implementation does not work inline instrumentation. And
> the inline instrumentation is the main mode for KASAN. Outline
> instrumentation is merely a rudiment to support gcc 4.9, and it needs
> to be removed as soon as we stop caring about gcc 4.9 (do we at all?
> is it the current compiler in any distro? Ubuntu 12 has 4.8, Ubuntu 14
> already has 5.4. And if you build gcc yourself or get a fresher
> compiler from somewhere else, you hopefully get something better than
> 4.9).

Hmm... I don't think that outline instrumentation is something to be
removed. In embedded world, there is a fixed partition table and
enlarging the kernel binary would cause the problem. Changing that
table is possible but is really uncomfortable thing for debugging
something. So, I think that outline instrumentation has it's own merit.

Anyway, I have missed inline instrumentation completely.

I will attach the fix in the bottom. It doesn't look beautiful
since it breaks layer design (some check will be done at report
function). However, I think that it's a good trade-off.

>
> Here is an example boot+scp log with inline instrumentation:
> https://gist.githubusercontent.com/dvyukov/dfdc8b6972ddd260b201a85d5d5cdb5d/raw/2a032cd5be371c7ad6cad8f14c0a0610e6fa772e/gistfile1.txt
>
> Joonsoo, can you think of a way to take advantages of your approach,
> but make it work with inline instrumentation?
>
> Will it work if we map a single zero page for whole shadow initially,
> and then lazily map real shadow pages only for kernel memory, and then
> remap it again to zero pages when the whole KASAN_SHADOW_SCALE_SHIFT
> range of pages becomes unused (similarly to what you do in
> kasan_unmap_shadow())?

Mapping zero page to non-kernel memory could cause true-negative
problem since we cannot flush the TLB in all cpus. We will read zero
shadow value value in this case even if actual shadow value is not
zero. This is one of the reason that black page is introduced in this
patchset.

Thanks.

-------------------->8------------------
From b2d38de92f2b1c20de6c29682b7a5c29e0f3fe26 Mon Sep 17 00:00:00 2001
From: Joonsoo Kim <iamjoon...@lge.com>
Date: Tue, 16 May 2017 14:56:27 +0900
Subject: [PATCH] mm/kasan: fix-up CONFIG_KASAN_INLINE

Signed-off-by: Joonsoo Kim <iamjoon...@lge.com>
---
mm/kasan/kasan.c | 13 +++++++++++--
mm/kasan/kasan.h | 2 ++
mm/kasan/report.c | 2 +-
3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index 76c1c37..fd6b7d4 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -622,7 +622,7 @@ static noinline void check_memory_region_slow(unsigned long addr,

report:
preempt_enable();
- kasan_report(addr, size, write, ret_ip);
+ __kasan_report(addr, size, write, ret_ip);
}

static __always_inline void check_memory_region_inline(unsigned long addr,
@@ -634,7 +634,7 @@ static __always_inline void check_memory_region_inline(unsigned long addr,

if (unlikely((void *)addr <
kasan_shadow_to_mem((void *)KASAN_SHADOW_START))) {
- kasan_report(addr, size, write, ret_ip);
+ __kasan_report(addr, size, write, ret_ip);
return;
}

@@ -692,6 +692,15 @@ void *memcpy(void *dest, const void *src, size_t len)
return __memcpy(dest, src, len);
}

+void kasan_report(unsigned long addr, size_t size,
+ bool is_write, unsigned long ip)
+{
+ if (!pshadow_val(addr, size))
+ return;
+
+ check_memory_region_slow(addr, size, is_write, ip);
+}
+
void kasan_alloc_pages(struct page *page, unsigned int order)
{
if (likely(!PageHighMem(page))) {
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index db04087..7a20707 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -108,6 +108,8 @@ static inline bool arch_kasan_recheck_prepare(unsigned long addr,
static inline bool kasan_pshadow_inited(void) { return false; }
#endif

+void __kasan_report(unsigned long addr, size_t size,
+ bool is_write, unsigned long ip);
void kasan_report(unsigned long addr, size_t size,
bool is_write, unsigned long ip);
void kasan_report_double_free(struct kmem_cache *cache, void *object,
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 9b47e10..7831d58 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -418,7 +418,7 @@ static inline bool kasan_report_enabled(void)
return !test_and_set_bit(KASAN_BIT_REPORTED, &kasan_flags);
}

-void kasan_report(unsigned long addr, size_t size,
+void __kasan_report(unsigned long addr, size_t size,
bool is_write, unsigned long ip)
{
struct kasan_access_info info;
--
2.7.4

Dmitry Vyukov

unread,
May 16, 2017, 4:49:31 PM5/16/17
to Joonsoo Kim, Andrew Morton, Andrey Ryabinin, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
On Mon, May 15, 2017 at 11:23 PM, Joonsoo Kim <js1...@gmail.com> wrote:
>> >
>> > Hello, all.
>> >
>> > This is an attempt to recude memory consumption of KASAN. Please see
>> > following description to get the more information.
>> >
>> > 1. What is per-page shadow memory
>>
>> Hi Joonsoo,
>
> Hello, Dmitry.
>
>>
>> First I need to say that this is great work. I wanted KASAN to consume
>
> Thanks!
>
>> 1/8-th of _kernel_ memory rather than total physical memory for a long
>> time.
>>
>> However, this implementation does not work inline instrumentation. And
>> the inline instrumentation is the main mode for KASAN. Outline
>> instrumentation is merely a rudiment to support gcc 4.9, and it needs
>> to be removed as soon as we stop caring about gcc 4.9 (do we at all?
>> is it the current compiler in any distro? Ubuntu 12 has 4.8, Ubuntu 14
>> already has 5.4. And if you build gcc yourself or get a fresher
>> compiler from somewhere else, you hopefully get something better than
>> 4.9).
>
> Hmm... I don't think that outline instrumentation is something to be
> removed. In embedded world, there is a fixed partition table and
> enlarging the kernel binary would cause the problem. Changing that
> table is possible but is really uncomfortable thing for debugging
> something. So, I think that outline instrumentation has it's own merit.

Fair. Let's consider both as important.

> Anyway, I have missed inline instrumentation completely.
>
> I will attach the fix in the bottom. It doesn't look beautiful
> since it breaks layer design (some check will be done at report
> function). However, I think that it's a good trade-off.


I can confirm that inline works with that patch.

I can also confirm that it reduces memory usage. I've booted qemu with
2G ram and run some fixed workload. Before:
31853 dvyukov 20 0 3043200 765464 21312 S 366.0 4.7 2:39.53
qemu-system-x86
7528 dvyukov 20 0 3043200 732444 21676 S 333.3 4.5 2:23.19
qemu-system-x86
After:
6192 dvyukov 20 0 3043200 394244 20636 S 17.9 2.4 2:32.95
qemu-system-x86
6265 dvyukov 20 0 3043200 388860 21416 S 399.3 2.4 3:02.88
qemu-system-x86
9005 dvyukov 20 0 3043200 383564 21220 S 397.1 2.3 2:35.33
qemu-system-x86

However, I see some very significant slowdowns with inline
instrumentation. I did 3 tests:
1. Boot speed, I measured time for a particular message to appear on
console. Before:
[ 2.504652] random: crng init done
[ 2.435861] random: crng init done
[ 2.537135] random: crng init done
After:
[ 7.263402] random: crng init done
[ 7.263402] random: crng init done
[ 7.174395] random: crng init done

That's ~3x slowdown.

2. I've run bench_readv benchmark:
https://raw.githubusercontent.com/google/sanitizers/master/address-sanitizer/kernel_buildbot/slave/bench_readv.c
as:
while true; do time ./bench_readv bench_readv 300000 1; done

Before:
sys 0m7.299s
sys 0m7.218s
sys 0m6.973s
sys 0m6.892s
sys 0m7.035s
sys 0m6.982s
sys 0m6.921s
sys 0m6.940s
sys 0m6.905s
sys 0m7.006s

After:
sys 0m8.141s
sys 0m8.077s
sys 0m8.067s
sys 0m8.116s
sys 0m8.128s
sys 0m8.115s
sys 0m8.108s
sys 0m8.326s
sys 0m8.529s
sys 0m8.164s
sys 0m8.380s

This is ~19% slowdown.

3. I've run bench_pipes benchmark:
https://raw.githubusercontent.com/google/sanitizers/master/address-sanitizer/kernel_buildbot/slave/bench_pipes.c
as:
while true; do time ./bench_pipes 10 10000 1; done

Before:
sys 0m5.393s
sys 0m6.178s
sys 0m5.909s
sys 0m6.024s
sys 0m5.874s
sys 0m5.737s
sys 0m5.826s
sys 0m5.664s
sys 0m5.758s
sys 0m5.421s
sys 0m5.444s
sys 0m5.479s
sys 0m5.461s
sys 0m5.417s

After:
sys 0m8.718s
sys 0m8.281s
sys 0m8.268s
sys 0m8.334s
sys 0m8.246s
sys 0m8.267s
sys 0m8.265s
sys 0m8.437s
sys 0m8.228s
sys 0m8.312s
sys 0m8.556s
sys 0m8.680s

This is ~52% slowdown.


This does not look acceptable to me. I would ready to pay for this,
say, 10% of performance. But it seems that this can have up to 2-4x
slowdown for some workloads.


Your use-case is embed devices where you care a lot about both code
size and memory consumption, right?

I see 2 possible ways forward:
1. Enable this new mode only for outline, but keep current scheme for
inline. Then outline will be "small but slow" type of configuration.
2. Somehow fix slowness (at least in inline mode).


> Mapping zero page to non-kernel memory could cause true-negative
> problem since we cannot flush the TLB in all cpus. We will read zero
> shadow value value in this case even if actual shadow value is not
> zero. This is one of the reason that black page is introduced in this
> patchset.

What does make your current patch work then?
Say we map a new shadow page, update the page shadow to say that there
is mapped shadow. Then another CPU loads the page shadow and then
loads from the newly mapped shadow. If we don't flush TLB, what makes
the second CPU see the newly mapped shadow?

Joonsoo Kim

unread,
May 17, 2017, 3:23:28 AM5/17/17
to Dmitry Vyukov, Andrew Morton, Andrey Ryabinin, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
On Tue, May 16, 2017 at 01:49:10PM -0700, Dmitry Vyukov wrote:
> > Anyway, I have missed inline instrumentation completely.
> >
> > I will attach the fix in the bottom. It doesn't look beautiful
> > since it breaks layer design (some check will be done at report
> > function). However, I think that it's a good trade-off.
>
>
> I can confirm that inline works with that patch.

Thanks for confirming!
I found the reasons of above regression. There are two reasons.

1. In my implementation, original shadow to the memory allocated from
memblock is black shadow so it causes to call kasan_report(). It will
pass the check since per page shadow would be zero shadow but it
causes some overhead.

2. Memory used by stackdepot is in a similar situation with #1. It
allocates page and divide it to many objects. Then, use it like as
object. Although there is "KASAN_SANITIZE_stackdepot.o := n" which try
to disable sanitizer, there is a function call (memcmp() in
find_stack()) to other file and sanitizer work for it.

#1 problem can be fixed but more investigation is needed. I will
respin the series after fixing it.

#2 problem also can be fixed. There are two options here. First, uses
private memcmp() for stackdepot and disable sanitizer for it. I think
that this is a right approach since it slowdown the performance in all
KASAN build cases. And, we don't want to sanitize KASAN itself.
Second, I can provide a function to map the actual shadow manually. It
will reduce the case calling kasan_report().

See the attached patch. It implements later approach on #2 problem.
It would reduce performance regression. I have tested your bench_pipes
test with it and found that performance is restored. However, there is
still remaining problem, #1, so I'm not sure that it completely
restore your regression. Could you check that if possible?

Anyway, I think that respin is needed to fix this performance problem
completely.

>
>
> Your use-case is embed devices where you care a lot about both code
> size and memory consumption, right?

Yes.

> I see 2 possible ways forward:
> 1. Enable this new mode only for outline, but keep current scheme for
> inline. Then outline will be "small but slow" type of configuration.

Performance problem is not that bad in OUTLINE build. Therefore,
this is a reasonable option to have.

> 2. Somehow fix slowness (at least in inline mode).

I will try to fix slowness as much as possible. If slowness cannot be
acceptable after such effort, we can choose the direction at that
moment.

>
> > Mapping zero page to non-kernel memory could cause true-negative
> > problem since we cannot flush the TLB in all cpus. We will read zero
> > shadow value value in this case even if actual shadow value is not
> > zero. This is one of the reason that black page is introduced in this
> > patchset.
>
> What does make your current patch work then?
> Say we map a new shadow page, update the page shadow to say that there
> is mapped shadow. Then another CPU loads the page shadow and then
> loads from the newly mapped shadow. If we don't flush TLB, what makes
> the second CPU see the newly mapped shadow?

There is a fix-up processing to see the newly mapped shadow in other
cpus. check_memory_region_slow() exists for that purpose. In stale TLB
case, we will see black shadow and fall in to this function. In this
function, we flush stale TLB and re-check so we can see correct
result.

Thanks.

Joonsoo Kim

unread,
May 17, 2017, 3:26:01 AM5/17/17
to Dmitry Vyukov, Andrew Morton, Andrey Ryabinin, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
Oops... I missed to attach the patch.

Thanks.

--------------------->8-------------------
From 7798620be07c2c0c7197dfbc1ebeb0b603ab35c7 Mon Sep 17 00:00:00 2001
From: Joonsoo Kim <iamjoon...@lge.com>
Date: Wed, 17 May 2017 15:34:43 +0900
Subject: [PATCH] lib/stackdeopt: use original shadow

Signed-off-by: Joonsoo Kim <iamjoon...@lge.com>
---
lib/stackdepot.c | 7 ++++++-
mm/kasan/kasan.c | 12 ++++++++++++
2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index f87d138..cc98ce2 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -80,6 +80,8 @@ static int next_slab_inited;
static size_t depot_offset;
static DEFINE_SPINLOCK(depot_lock);

+extern void kasan_map_shadow_private(const void *addr, size_t size, gfp_t flags);
+
static bool init_stack_slab(void **prealloc)
{
if (!*prealloc)
@@ -245,8 +247,11 @@ depot_stack_handle_t depot_save_stack(struct stack_trace *trace,
alloc_flags &= (GFP_ATOMIC | GFP_KERNEL);
alloc_flags |= __GFP_NOWARN;
page = alloc_pages(alloc_flags, STACK_ALLOC_ORDER);
- if (page)
+ if (page) {
prealloc = page_address(page);
+ kasan_map_shadow_private(prealloc,
+ PAGE_SIZE << STACK_ALLOC_ORDER, alloc_flags);
+ }
}

spin_lock_irqsave(&depot_lock, flags);
diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index fd6b7d4..3c18d18 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -247,6 +247,18 @@ static int kasan_map_shadow(const void *addr, size_t size, gfp_t flags)
return err;
}

+void kasan_map_shadow_private(const void *addr, size_t size, gfp_t flags)
+{
+ int err;
+
+ err = kasan_map_shadow(addr, size, flags);
+ if (err)
+ return;
+
+ kasan_unpoison_shadow(addr, size);
+ kasan_poison_pshadow(addr, size);
+}
+
static int kasan_unmap_shadow_pte(pte_t *ptep, pgtable_t token,
unsigned long addr, void *data)
{
--
2.7.4

Andrey Ryabinin

unread,
May 17, 2017, 8:15:30 AM5/17/17
to js1...@gmail.com, Andrew Morton, Alexander Potapenko, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com, Joonsoo Kim
On 05/16/2017 04:16 AM, js1...@gmail.com wrote:
> From: Joonsoo Kim <iamjoon...@lge.com>
>
> Hello, all.
>
> This is an attempt to recude memory consumption of KASAN. Please see
> following description to get the more information.
>
> 1. What is per-page shadow memory
>
> This patch introduces infrastructure to support per-page shadow memory.
> Per-page shadow memory is the same with original shadow memory except
> the granualarity. It's one byte shows the shadow value for the page.
> The purpose of introducing this new shadow memory is to save memory
> consumption.
>
> 2. Problem of current approach
>
> Until now, KASAN needs shadow memory for all the range of the memory
> so the amount of statically allocated memory is so large. It causes
> the problem that KASAN cannot run on the system with hard memory
> constraint. Even if KASAN can run, large memory consumption due to
> KASAN changes behaviour of the workload so we cannot validate
> the moment that we want to check.
>
> 3. How does this patch fix the problem
>
> This patch tries to fix the problem by reducing memory consumption for
> the shadow memory. There are two observations.
>


I think that the best way to deal with your problem is to increase shadow scale size.

You'll need to add tunable to gcc to control shadow size. I expect that gcc has some
places where 8-shadow scale size is hardcoded, but it should be fixable.

The kernel also have some small amount of code written with KASAN_SHADOW_SCALE_SIZE == 8 in mind,
which should be easy to fix.

Note that bigger shadow scale size requires bigger alignment of allocated memory and variables.
However, according to comments in gcc/asan.c gcc already aligns stack and global variables and at
32-bytes boundary.
So we could bump shadow scale up to 32 without increasing current stack consumption.

On a small machine (1Gb) 1/32 of shadow is just 32Mb which is comparable to yours 30Mb, but I expect it to be
much faster. More importantly, this will require only small amount of simple changes in code, which will be
a *lot* more easier to maintain.

I'd start from implementing this on the kernel side only. With KASAN_OUTLINE and disabled
stack instrumentation (--param asan-stack=0) it's doable without any changes in gcc.


...

Joonsoo Kim

unread,
May 18, 2017, 9:54:00 PM5/18/17
to Andrey Ryabinin, Andrew Morton, Alexander Potapenko, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
I agree that it is also a good option to reduce memory consumption.
Nevertheless, there are two reasons that justifies this patchset.

1) With this patchset, memory consumption isn't increased in
proportional to total memory size. Please consider my 4Gb system
example on the below. With increasing shadow scale size to 32, memory
would be consumed by 128M. However, this patchset consumed 50MB. This
difference can be larger if we run KASAN with bigger machine.

2) These two optimization can be applied simulatenously. It is just an
orthogonal feature. If shadow scale size is increased to 32, memory
consumption will be decreased in case of my patchset, too.

Therefore, I think that this patchset is useful in any case.

Note that increasing shadow scale has it's own trade-off. It requires
that the size of slab object is aligned to shadow scale. It will
increase memory consumption due to slab.

Thanks.

Dmitry Vyukov

unread,
May 22, 2017, 2:02:57 AM5/22/17
to Joonsoo Kim, Andrey Ryabinin, Andrew Morton, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
Interesting option. We never considered increasing scale in user space
due to performance implications. But the algorithm always supported up
to 128x scale. Definitely worth considering as an option.


> I agree that it is also a good option to reduce memory consumption.
> Nevertheless, there are two reasons that justifies this patchset.
>
> 1) With this patchset, memory consumption isn't increased in
> proportional to total memory size. Please consider my 4Gb system
> example on the below. With increasing shadow scale size to 32, memory
> would be consumed by 128M. However, this patchset consumed 50MB. This
> difference can be larger if we run KASAN with bigger machine.
>
> 2) These two optimization can be applied simulatenously. It is just an
> orthogonal feature. If shadow scale size is increased to 32, memory
> consumption will be decreased in case of my patchset, too.
>
> Therefore, I think that this patchset is useful in any case.

It is definitely useful all else being equal. But it does considerably
increase code size and complexity, which is an important aspect.

Also note that there is also fixed size quarantine (1/32 of RAM) and
redzones. Reducing shadow overhead beyond some threshold has
diminishing returns, because overall overhead will be just dominated
by quarantine/redzones.

What's your target devices and constraints? We run KASAN on phones
today without any issues.


> Note that increasing shadow scale has it's own trade-off. It requires
> that the size of slab object is aligned to shadow scale. It will
> increase memory consumption due to slab.

I've tried to retest your latest change on top of
http://git.cmpxchg.org/cgit.cgi/linux-mmots.git
d9cd9c95cc3b2fed0f04d233ebf2f7056741858c, but now this version
https://codereview.appspot.com/325780043 always crashes during boot
for me. Report points to zero shadow.

[ 0.123434] ==================================================================
[ 0.125153] BUG: KASAN: double-free or invalid-free in
cleanup_uevent_env+0x2c/0x40
[ 0.126900]
[ 0.127318] CPU: 1 PID: 226 Comm: kworker/u8:0 Not tainted
4.12.0-rc1-mm1+ #376
[ 0.128995] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Bochs 01/01/2011
[ 0.130896] Call Trace:
[ 0.131202] kworker/u8:0 (277) used greatest stack depth: 22976 bytes left
[ 0.133129] dump_stack+0xb0/0x13d
[ 0.133958] ? _atomic_dec_and_lock+0x1e3/0x1e3
[ 0.135020] ? load_image_and_restore+0xf6/0xf6
[ 0.136083] ? kmemdup+0x31/0x40
[ 0.136143] kworker/u8:0 (320) used greatest stack depth: 22112 bytes left
[ 0.138294] ? cleanup_uevent_env+0x2c/0x40
[ 0.139255] print_address_description+0x6a/0x270
[ 0.140285] ? cleanup_uevent_env+0x2c/0x40
[ 0.141224] ? cleanup_uevent_env+0x2c/0x40
[ 0.142168] kasan_report_double_free+0x55/0x80
[ 0.143162] kasan_slab_free+0xa4/0xc0
[ 0.143934] ? cleanup_uevent_env+0x2c/0x40
[ 0.144882] kfree+0x8f/0x190
[ 0.145561] cleanup_uevent_env+0x2c/0x40
[ 0.146455] umh_complete+0x3c/0x60
[ 0.147180] call_usermodehelper_exec_async+0x671/0x950
[ 0.148334] ? __asan_report_store_n_noabort+0x12/0x20
[ 0.149460] ? native_load_sp0+0xa3/0xb0
[ 0.150213] ? umh_complete+0x60/0x60
[ 0.150990] ? kasan_end_report+0x20/0x50
[ 0.151829] ? finish_task_switch+0x510/0x7d0
[ 0.152760] ? copy_user_overflow+0x20/0x20
[ 0.153565] ? umh_complete+0x60/0x60
[ 0.154341] ? umh_complete+0x60/0x60
[ 0.155125] ret_from_fork+0x2c/0x40
[ 0.155888]
[ 0.156190] Allocated by task 1:
[ 0.156890] save_stack_trace+0x16/0x20
[ 0.157629] save_stack+0x43/0xd0
[ 0.158299] kasan_kmalloc+0xad/0xe0
[ 0.159068] kmem_cache_alloc_trace+0x61/0x170
[ 0.159920] kobject_uevent_env+0x1b2/0xa20
[ 0.160819] kobject_uevent+0xb/0x10
[ 0.161551] param_sysfs_init+0x28e/0x2d2
[ 0.162375] do_one_initcall+0x8c/0x290
[ 0.163083] kernel_init_freeable+0x4a2/0x554
[ 0.163958] kernel_init+0xe/0x120
[ 0.164669] ret_from_fork+0x2c/0x40
[ 0.165393]
[ 0.165685] Freed by task 0:
[ 0.166232] (stack is not available)
[ 0.166954]
[ 0.167247] The buggy address belongs to the object at ffff88007b45e818
[ 0.167247] which belongs to the cache kmalloc-4096 of size 4096
[ 0.169709] The buggy address is located 0 bytes inside of
[ 0.169709] 4096-byte region [ffff88007b45e818, ffff88007b45f818)
[ 0.171897] The buggy address belongs to the page:
[ 0.172833] page:ffffea0001ed1600 count:1 mapcount:0 mapping:
(null) index:0x0 compound_mapcount: 0
[ 0.174560] flags: 0x100000000008100(slab|head)
[ 0.175410] raw: 0100000000008100 0000000000000000 0000000000000000
0000000100070007
[ 0.176819] raw: ffffea0001ed0c20 ffffea0001ed3c20 ffff88007c80ed40
0000000000000000
[ 0.178250] page dumped because: kasan: bad access detected
[ 0.179312]
[ 0.179586] Memory state around the buggy address:
[ 0.180488] ffff88007b45e700: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[ 0.181801] ffff88007b45e780: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[ 0.183112] >ffff88007b45e800: fc fc fc 00 00 00 00 00 00 00 00 00
00 00 00 00
[ 0.184518] ^
[ 0.185177] ffff88007b45e880: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
[ 0.186420] ffff88007b45e900: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
[ 0.187723] ==================================================================

Andrey Ryabinin

unread,
May 22, 2017, 9:58:44 AM5/22/17
to Joonsoo Kim, Andrew Morton, Alexander Potapenko, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
Well, yes, but I assume that bigger machine implies that we can use more memory without
causing a significant change in system's behavior.

> 2) These two optimization can be applied simulatenously. It is just an
> orthogonal feature. If shadow scale size is increased to 32, memory
> consumption will be decreased in case of my patchset, too.
>
> Therefore, I think that this patchset is useful in any case.

These are valid points, but IMO it's not enough to justify this patchset.
Too much of hacky and fragile code.

If our goal is to make KASAN to eat less memory, the first step definitely would be a 1/32 shadow.
Simply because it's the best way to achieve that goal.
And only if it's not enough we could think about something else, like decreasing/turning off quarantine
and/or smaller redzones.


> Note that increasing shadow scale has it's own trade-off. It requires
> that the size of slab object is aligned to shadow scale. It will
> increase memory consumption due to slab.
>

Yes, but I don't think that it will be significant, many objects are aligned already.
I've tried the kernel with 32 ARCH_SLAB_MINALIGN and ARCH_KMALLOC_MINALIGN and
the difference in Slab consumption after booting 1G VM was not significant:

8-byte align:
Slab: 126516 kB
SReclaimable: 31140 kB
SUnreclaim: 95376 kB

32-byte align:
Slab: 126712 kB
SReclaimable: 30912 kB
SUnreclaim: 95800 kB


Numbers slightly vary from boot to boot.


> Thanks.
>

Joonsoo Kim

unread,
May 24, 2017, 2:04:45 AM5/24/17
to Dmitry Vyukov, Andrey Ryabinin, Andrew Morton, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
Could you explain me how does increasing scale reduce performance? I
tried to guess the reason but failed.

>
>
> > I agree that it is also a good option to reduce memory consumption.
> > Nevertheless, there are two reasons that justifies this patchset.
> >
> > 1) With this patchset, memory consumption isn't increased in
> > proportional to total memory size. Please consider my 4Gb system
> > example on the below. With increasing shadow scale size to 32, memory
> > would be consumed by 128M. However, this patchset consumed 50MB. This
> > difference can be larger if we run KASAN with bigger machine.
> >
> > 2) These two optimization can be applied simulatenously. It is just an
> > orthogonal feature. If shadow scale size is increased to 32, memory
> > consumption will be decreased in case of my patchset, too.
> >
> > Therefore, I think that this patchset is useful in any case.
>
> It is definitely useful all else being equal. But it does considerably
> increase code size and complexity, which is an important aspect.
>
> Also note that there is also fixed size quarantine (1/32 of RAM) and
> redzones. Reducing shadow overhead beyond some threshold has
> diminishing returns, because overall overhead will be just dominated
> by quarantine/redzones.

My usecase doesn't use quarantine yet since it uses old version kernel
and quarantine isn't back-ported. But, this 1/32 of RAM for quarantine
also could affect the system and I think that we need a switch to
disable it. In our case, making the feature work is more important
than detecting more bugs.

Redzone is also a good target to make selectable since
error pattern could be changed with different object layout. I
sometimes saw that error disappears if KASAN is enabled. I'm not sure
what causes it, but, in some case, it would be helpful that everything
else than something compulsory is the same with non-KASAN build.

> What's your target devices and constraints? We run KASAN on phones
> today without any issues.

My target devices are a smart TV or embedded system on a car. Usually,
these devices have specific use scenario and memory is managed more
tightly than a phone. I have heard that some system with 1GB memory
cannot run if 128MB is used for KASAN. I'm not sure that 1/32 scale
changes the picture, but, yes, I guess that most of problem will disappear.

>
> > Note that increasing shadow scale has it's own trade-off. It requires
> > that the size of slab object is aligned to shadow scale. It will
> > increase memory consumption due to slab.
>
> I've tried to retest your latest change on top of
> http://git.cmpxchg.org/cgit.cgi/linux-mmots.git
> d9cd9c95cc3b2fed0f04d233ebf2f7056741858c, but now this version
> https://codereview.appspot.com/325780043 always crashes during boot
> for me. Report points to zero shadow.

Oops... Maybe, it's due to lack of stale TLB handling on double-free
check in kasan_slab_free(). I fixed it on my version 2 patchset.
And, I also fixed performance problem due to memory allocated by early
allocator(memblock or (no)bootmem).

https://github.com/JoonsooKim/linux/tree/kasan-opt-memory-consumption-v2.0-next-20170511

This branch is based on next-20170511.

Thanks.

Joonsoo Kim

unread,
May 24, 2017, 2:18:48 AM5/24/17
to Andrey Ryabinin, Andrew Morton, Alexander Potapenko, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
In common case, yes. But, I guess that there is a system that
statically uses most of memory and just a few memory is left for others.
For example, consider 64GB system and some program (DB?) runs with
using 60GB. Only 4GB left. If KASAN uses 2GB, just 2GB is left and it
would cause the problem. So, I'd like to insist that this merit 1)
should be considered as valuable.

>
> > 2) These two optimization can be applied simulatenously. It is just an
> > orthogonal feature. If shadow scale size is increased to 32, memory
> > consumption will be decreased in case of my patchset, too.
> >
> > Therefore, I think that this patchset is useful in any case.
>
> These are valid points, but IMO it's not enough to justify this patchset.
> Too much of hacky and fragile code.
>
> If our goal is to make KASAN to eat less memory, the first step definitely would be a 1/32 shadow.
> Simply because it's the best way to achieve that goal.
> And only if it's not enough we could think about something else, like decreasing/turning off quarantine
> and/or smaller redzones.

Please refer the reply to Dmitry. I think that we need an option that
everything else than something compulsory is the same with non-KASAN
build as much as possible. 1/32 scale would change object layout so it
will not work for this option.

Thanks.

Dmitry Vyukov

unread,
May 24, 2017, 2:57:45 AM5/24/17
to Joonsoo Kim, Andrew Morton, Andrey Ryabinin, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
/\/\/\/\/\/\

Joonsoo, please answer this question above.
I am trying to understand if there is any chance to make mapping a
single page for all non-interesting shadow ranges work. That would be
much simpler change that does not require changing instrumentation,
and will not force inline instrumentation onto slow path for some
ranges (vmalloc?).

Joonsoo Kim

unread,
May 24, 2017, 3:45:51 AM5/24/17
to Dmitry Vyukov, Andrew Morton, Andrey Ryabinin, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
Hello, I've answered it in another e-mail however it would not be
sufficient. I try again.

If the page isn't used for kernel stack, slab, and global variable
(aka. kernel memory), black shadow is mapped for the page. We map a
new shadow page if the page will be used for kernel memory. We need to
flush TLB in all cpus when mapping a new shadow however it's not
possible in some cases. So, this patch does just flushing local cpu's
TLB. Another cpu could have stale TLB that points black shadow for
this page. If that cpu with stale TLB try to check vailidity of the
object on this page, result would be invalid since stale TLB points
the black shadow and it's shadow value is non-zero. We need a magic
here. At this moment, we cannot make sure if invalid is correct result
or not since we didn't do full TLB flush. So fixup processing is
started. It is implemented in check_memory_region_slow(). Flushing
local TLB and re-checking the shadow value. With flushing local TLB,
we will use fresh TLB at this time. Therefore, we can pass the
validity check as usual.

> I am trying to understand if there is any chance to make mapping a
> single page for all non-interesting shadow ranges work. That would be

This is what this patchset does. Mapping a single (zero/black) shadow
page for all non-interesting (non-kernel memory) shadow ranges.
There is only single instance of zero/black shadow page. On v1,
I used black shadow page only so fail to get enough performance. On
v2 mentioned in another thread, I use zero shadow for some region. I
guess that performance problem would be gone.

> much simpler change that does not require changing instrumentation,

Yes! I think that it is really good benefit of this patchset.

> and will not force inline instrumentation onto slow path for some
> ranges (vmalloc?).

Thanks.

Dmitry Vyukov

unread,
May 24, 2017, 12:31:26 PM5/24/17
to Joonsoo Kim, Andrey Ryabinin, Andrew Morton, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
The main reason is inline instrumentation. Inline instrumentation for
a check of 8-byte access (which are very common in 64-bit code) is
just a check of the shadow byte for 0. For smaller accesses we have
more complex instrumentation that first checks shadow for 0 and then
does precise check based on size/offset of the access + shadow value.
That's slower and also increases register pressure and code size
(which can further reduce performance due to icache overflow). If we
increase scale to 16/32, all accesses will need that slow path.
Another thing is stack instrumentation: larger scale will require
larger redzones to ensure proper alignment. That will increase stack
frames and also more instructions to poison/unpoison stack shadow on
function entry/exit.

Dmitry Vyukov

unread,
May 24, 2017, 1:20:11 PM5/24/17
to Joonsoo Kim, Andrew Morton, Andrey Ryabinin, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
I can't say I understand everything here, but after staring at the
patch I don't understand why we need pshadow at all now. Especially
with this commit
https://github.com/JoonsooKim/linux/commit/be36ee65f185e3c4026fe93b633056ea811120fb.
It seems that the current shadow is enough.
If we see bad shadow when the actual shadow value is good, we fall
onto slow path, flush tlb, reload shadow, see that it is good and
return. Pshadow is not needed in this case.
If we see good shadow when the actual shadow value is bad, we return
immediately and get false negative. Pshadow is not involved as well.
What am I missing?

Joonsoo Kim

unread,
May 24, 2017, 8:41:17 PM5/24/17
to Dmitry Vyukov, Andrew Morton, Andrey Ryabinin, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
pshadow exists for non-kernel memory like as page cache or anonymous page.
This patch doesn't map a new shadow (per-byte shadow) for those pages
to reduce memory consumption. However, we need to know if those page
are allocated or not in order to check the validity of access to those
page. We cannot utilize zero/black shadow page here since mapping
single zero/black shadow page represents eight real page's shadow
value. Instead, we use per-page shadow here and mark/unmark it when
allocation and free happens. With it, we can know the state of the
page and we can determine the validity of access to them.

> If we see bad shadow when the actual shadow value is good, we fall
> onto slow path, flush tlb, reload shadow, see that it is good and
> return. Pshadow is not needed in this case.

For the kernel memory, if we see bad shadow due to *stale TLB*, we
fall onto slow path (check_memory_region_slow()) and flush tlb and
reload shadow.

For the non-kernel memory, if we see bad shadow, we fall onto
pshadow_val() check and we can see actual state of the page.

> If we see good shadow when the actual shadow value is bad, we return
> immediately and get false negative. Pshadow is not involved as well.
> What am I missing?

In this patchset, there is no case that we see good shadow when the
actual (p)shadow value is bad. This case should not happen since we
can miss actual error.

Please let me know that these explanation is insufficient. I will try
more. :)

Thanks.

Joonsoo Kim

unread,
May 24, 2017, 8:46:50 PM5/24/17
to Dmitry Vyukov, Andrey Ryabinin, Andrew Morton, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
Now, I see. Thanks for explanation.

Thanks.

Dmitry Vyukov

unread,
May 29, 2017, 11:08:10 AM5/29/17
to Joonsoo Kim, Andrew Morton, Andrey Ryabinin, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
I see the problem with 8 kernel pages mapped to a single shadow page.


>> If we see bad shadow when the actual shadow value is good, we fall
>> onto slow path, flush tlb, reload shadow, see that it is good and
>> return. Pshadow is not needed in this case.
>
> For the kernel memory, if we see bad shadow due to *stale TLB*, we
> fall onto slow path (check_memory_region_slow()) and flush tlb and
> reload shadow.
>
> For the non-kernel memory, if we see bad shadow, we fall onto
> pshadow_val() check and we can see actual state of the page.
>
>> If we see good shadow when the actual shadow value is bad, we return
>> immediately and get false negative. Pshadow is not involved as well.
>> What am I missing?
>
> In this patchset, there is no case that we see good shadow when the
> actual (p)shadow value is bad. This case should not happen since we
> can miss actual error.

But why is not it possible?
Let's say we have a real shadow page allocated for range of kernel
memory. Then we unmap the shadow page and map the back page (maybe
even unmap the black page and map another real shadow page). Then
another CPU reads shadow for this range. What prevents it from seeing
the old shadow page?

Dmitry Vyukov

unread,
May 29, 2017, 11:12:36 AM5/29/17
to Joonsoo Kim, Andrew Morton, Andrey Ryabinin, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
Re the async processing in kasan_unmap_shadow_workfn. Can't it lead to
shadow corruption? It seems that it can cause unsynchronized state of
shadow pages and corresponding kernel pages in page alloc.
Consider that we schedule unmap of some pages in kasan_unmap_shadow.
Then the range is reallocated in page_alloc and we get into
kasan_map_shadow, which tries to map shadow for these pages again, but
since they are already mapped it bails out. Then
kasan_unmap_shadow_workfn starts and unmaps shadow for the range.

Dmitry Vyukov

unread,
May 29, 2017, 11:29:51 AM5/29/17
to Joonsoo Kim, Andrew Morton, Andrey Ryabinin, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
Joonsoo,

I guess mine (and Andrey's) main concern is the amount of additional
complexity (I am still struggling to understand how it all works) and
more arch-dependent code in exchange for moderate memory win.

Joonsoo, Andrey,

I have an alternative proposal. It should be conceptually simpler and
also less arch-dependent. But I don't know if I miss something
important that will render it non working.
Namely, we add a pointer to shadow to the page struct. Then, create a
slab allocator for 512B shadow blocks. Then, attach/detach these
shadow blocks to page structs as necessary. It should lead to even
smaller memory consumption because we won't need a whole shadow page
when only 1 out of 8 corresponding kernel pages are used (we will need
just a single 512B block). I guess with some fragmentation we need
lots of excessive shadow with the current proposed patch.
This does not depend on TLB in any way and does not require hooking
into buddy allocator.
The main downside is that we will need to be careful to not assume
that shadow is continuous. In particular this means that this mode
will work only with outline instrumentation and will need some ifdefs.
Also it will be slower due to the additional indirection when
accessing shadow, but that's meant as "small but slow" mode as far as
I understand.

But the main win as I see it is that that's basically complete support
for 32-bit arches. People do ask about arm32 support:
https://groups.google.com/d/msg/kasan-dev/Sk6BsSPMRRc/Gqh4oD_wAAAJ
https://groups.google.com/d/msg/kasan-dev/B22vOFp-QWg/EVJPbrsgAgAJ
and probably mips32 is relevant as well.
Such mode does not require a huge continuous address space range, has
minimal memory consumption and requires minimal arch-dependent code.
Works only with outline instrumentation, but I think that's a
reasonable compromise.

What do you think?

Vladimir Murzin

unread,
May 30, 2017, 3:58:33 AM5/30/17
to Dmitry Vyukov, Joonsoo Kim, Andrew Morton, Andrey Ryabinin, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
.. or you can just keep shadow in page extension. It was suggested back in
2015 [1], but seems that lack of stack instrumentation was "no-way"...

[1] https://lkml.org/lkml/2015/8/24/573

Cheers
Vladimir

Dmitry Vyukov

unread,
May 30, 2017, 4:15:55 AM5/30/17
to Vladimir Murzin, Joonsoo Kim, Andrew Morton, Andrey Ryabinin, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
Right. It describes basically the same idea.

How is page_ext better than adding data page struct?
It seems that memory for all page_ext is preallocated along with page
structs; but just the lookup is slower.

Vladimir Murzin

unread,
May 30, 2017, 4:31:44 AM5/30/17
to Dmitry Vyukov, Joonsoo Kim, Andrew Morton, Andrey Ryabinin, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
page_ext is already here along with some other debug options ;)

> It seems that memory for all page_ext is preallocated along with page
> structs; but just the lookup is slower.
>

Yup. Lookup would look like (based on v4.0):

...
page_ext = lookup_page_ext_begin(virt_to_page(start));

do {
page_ext->shadow[idx++] = value;
} while (idx < bound);

lookup_page_ext_end((void *)page_ext);

...

Cheers
Vladimir


Vladimir Murzin

unread,
May 30, 2017, 4:40:58 AM5/30/17
to Dmitry Vyukov, Joonsoo Kim, Andrew Morton, Andrey Ryabinin, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
On 30/05/17 09:31, Vladimir Murzin wrote:
> [This sender failed our fraud detection checks and may not be who they appear to be. Learn about spoofing at http://aka.ms/LearnAboutSpoofing]
Correction: please, ignore that *_{begin,end} stuff - mainline only
lookup_page_ext() is only used.

Cheers
Vladimir

>
> Cheers
> Vladimir
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majo...@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"do...@kvack.org"> em...@kvack.org </a>
>

Dmitry Vyukov

unread,
May 30, 2017, 4:49:40 AM5/30/17
to Vladimir Murzin, Joonsoo Kim, Andrew Morton, Andrey Ryabinin, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
On Tue, May 30, 2017 at 10:40 AM, Vladimir Murzin
But page struct is also here. What am I missing?


>>> It seems that memory for all page_ext is preallocated along with page
>>> structs; but just the lookup is slower.
>>>
>>
>> Yup. Lookup would look like (based on v4.0):
>>
>> ...
>> page_ext = lookup_page_ext_begin(virt_to_page(start));
>>
>> do {
>> page_ext->shadow[idx++] = value;
>> } while (idx < bound);
>>
>> lookup_page_ext_end((void *)page_ext);
>>
>> ...
>
> Correction: please, ignore that *_{begin,end} stuff - mainline only
> lookup_page_ext() is only used.


Note that this added code will be executed during handling of each and
every memory access in kernel. Every instruction matters on that path.
The additional indirection via page struct will also slow down it, but
that's the cost for lower memory consumption and potentially 32-bit
support. For page_ext it looks like even more overhead for no gain.

Vladimir Murzin

unread,
May 30, 2017, 5:08:27 AM5/30/17
to Dmitry Vyukov, Joonsoo Kim, Andrew Morton, Andrey Ryabinin, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
Probably, free room in page struct? I guess most of the page_ext stuff would
love to live in page struct, but... for instance, look at page idle tracking
which has to live in page_ext only for 32-bit.

>
>>>> It seems that memory for all page_ext is preallocated along with page
>>>> structs; but just the lookup is slower.
>>>>
>>>
>>> Yup. Lookup would look like (based on v4.0):
>>>
>>> ...
>>> page_ext = lookup_page_ext_begin(virt_to_page(start));
>>>
>>> do {
>>> page_ext->shadow[idx++] = value;
>>> } while (idx < bound);
>>>
>>> lookup_page_ext_end((void *)page_ext);
>>>
>>> ...
>>
>> Correction: please, ignore that *_{begin,end} stuff - mainline only
>> lookup_page_ext() is only used.
>
>
> Note that this added code will be executed during handling of each and
> every memory access in kernel. Every instruction matters on that path.

I know, I know... still better than nothing.

> The additional indirection via page struct will also slow down it, but
> that's the cost for lower memory consumption and potentially 32-bit
> support. For page_ext it looks like even more overhead for no gain.
>

eefa864 (mm/page_ext: resurrect struct page extending code for debugging)
express some cases where keeping data in page_ext has benefit.

Cheers
Vladimir

Dmitry Vyukov

unread,
May 30, 2017, 5:27:16 AM5/30/17
to Vladimir Murzin, Joonsoo Kim, Andrew Morton, Andrey Ryabinin, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
On Tue, May 30, 2017 at 11:08 AM, Vladimir Murzin
Sorry for my ignorance. What's the fundamental problem with just
pushing everything into page struct?

I don't see anything relevant in page struct comment. Nor I see "idle"
nor "tracking" page struct. I see only 2 mentions of CONFIG_64BIT, but
both declare the same fields just with different types (int vs short).

Vladimir Murzin

unread,
May 30, 2017, 5:39:23 AM5/30/17
to Dmitry Vyukov, Joonsoo Kim, Andrew Morton, Andrey Ryabinin, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
I think [1] has an answer for your question ;)

>
> I don't see anything relevant in page struct comment. Nor I see "idle"
> nor "tracking" page struct. I see only 2 mentions of CONFIG_64BIT, but
> both declare the same fields just with different types (int vs short).

Right, it is because implementation is based on page flags [1]:

Note, since there is no room for extra page flags on 32 bit, this feature
uses extended page flags when compiled on 32 bit.


[1] https://lwn.net/Articles/565097/
[2] 33c3fc7 ("mm: introduce idle page tracking")

Cheers
Vladimir

Dmitry Vyukov

unread,
May 30, 2017, 5:46:03 AM5/30/17
to Vladimir Murzin, Joonsoo Kim, Andrew Morton, Andrey Ryabinin, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
On Tue, May 30, 2017 at 11:39 AM, Vladimir Murzin
It also has an answer for why we should put it into page struct :)

Vladimir Murzin

unread,
May 30, 2017, 5:54:38 AM5/30/17
to Dmitry Vyukov, Joonsoo Kim, Andrew Morton, Andrey Ryabinin, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
Glad you find it useful ;) I'd be glad to see it lands into 32-bit world :)

Cheers
Vladimir

Andrey Ryabinin

unread,
May 30, 2017, 10:15:06 AM5/30/17
to Dmitry Vyukov, Joonsoo Kim, Andrew Morton, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
On 05/29/2017 06:29 PM, Dmitry Vyukov wrote:
> Joonsoo,
>
> I guess mine (and Andrey's) main concern is the amount of additional
> complexity (I am still struggling to understand how it all works) and
> more arch-dependent code in exchange for moderate memory win.
>
> Joonsoo, Andrey,
>
> I have an alternative proposal. It should be conceptually simpler and
> also less arch-dependent. But I don't know if I miss something
> important that will render it non working.
> Namely, we add a pointer to shadow to the page struct. Then, create a
> slab allocator for 512B shadow blocks. Then, attach/detach these
> shadow blocks to page structs as necessary. It should lead to even
> smaller memory consumption because we won't need a whole shadow page
> when only 1 out of 8 corresponding kernel pages are used (we will need
> just a single 512B block). I guess with some fragmentation we need
> lots of excessive shadow with the current proposed patch.
> This does not depend on TLB in any way and does not require hooking
> into buddy allocator.
> The main downside is that we will need to be careful to not assume
> that shadow is continuous. In particular this means that this mode
> will work only with outline instrumentation and will need some ifdefs.
> Also it will be slower due to the additional indirection when
> accessing shadow, but that's meant as "small but slow" mode as far as
> I understand.

It seems that you are forgetting about stack instrumentation.
You'll have to disable it completely, at least with current implementation of it in gcc.

> But the main win as I see it is that that's basically complete support
> for 32-bit arches. People do ask about arm32 support:
> https://groups.google.com/d/msg/kasan-dev/Sk6BsSPMRRc/Gqh4oD_wAAAJ
> https://groups.google.com/d/msg/kasan-dev/B22vOFp-QWg/EVJPbrsgAgAJ
> and probably mips32 is relevant as well.

I don't see how above is relevant for 32-bit arches. Current design
is perfectly fine for 32-bit arches. I did some POC arm32 port couple years
ago - https://github.com/aryabinin/linux/commits/kasan/arm_v0_1
It has some ugly hacks and non-critical bugs. AFAIR it also super-slow because I (mistakenly)
made shadow memory uncached. But otherwise it works.

> Such mode does not require a huge continuous address space range, has
> minimal memory consumption and requires minimal arch-dependent code.
> Works only with outline instrumentation, but I think that's a
> reasonable compromise.
>
> What do you think?

I don't understand why we trying to invent some hacky/complex schemes when we already have
a simple one - scaling shadow to 1/32. It's easy to implement and should be more performant comparing
to suggested schemes.


Joonsoo Kim

unread,
May 31, 2017, 1:51:00 AM5/31/17
to Andrey Ryabinin, Dmitry Vyukov, Andrew Morton, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
Correct. Even if we use OUTLINE build, gcc directly inserts codes to the
function prologue/epilogue to mark/unmakr the shadow. And, I'm not
sure we can change it since it would affect performance greately. In
current situation, alternative proposal loses most of benefit mentioned
above.
>
> > But the main win as I see it is that that's basically complete support
> > for 32-bit arches. People do ask about arm32 support:
> > https://groups.google.com/d/msg/kasan-dev/Sk6BsSPMRRc/Gqh4oD_wAAAJ
> > https://groups.google.com/d/msg/kasan-dev/B22vOFp-QWg/EVJPbrsgAgAJ
> > and probably mips32 is relevant as well.
>
> I don't see how above is relevant for 32-bit arches. Current design
> is perfectly fine for 32-bit arches. I did some POC arm32 port couple years
> ago - https://github.com/aryabinin/linux/commits/kasan/arm_v0_1
> It has some ugly hacks and non-critical bugs. AFAIR it also super-slow because I (mistakenly)
> made shadow memory uncached. But otherwise it works.

Could you explain that where is the code to map shadow memory uncached?
I don't find anything related to it.

> > Such mode does not require a huge continuous address space range, has
> > minimal memory consumption and requires minimal arch-dependent code.
> > Works only with outline instrumentation, but I think that's a
> > reasonable compromise.
> >
> > What do you think?
>
> I don't understand why we trying to invent some hacky/complex schemes when we already have
> a simple one - scaling shadow to 1/32. It's easy to implement and should be more performant comparing
> to suggested schemes.

My approach can co-exist with changing scaling approach. It has it's
own benefit.

And, as Dmitry mentioned before, scaling shadow to 1/32 also has downsides,
expecially for inline instrumentation. And, it requires compiler
modification and user needs to update their compiler to newer version
which is not so simple in terms of the user's usability

Thanks.

Andrey Ryabinin

unread,
May 31, 2017, 12:30:01 PM5/31/17
to Joonsoo Kim, Dmitry Vyukov, Andrew Morton, Alexander Potapenko, kasan-dev, linu...@kvack.org, LKML, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, kerne...@lge.com
On 05/31/2017 08:50 AM, Joonsoo Kim wrote:
>>> But the main win as I see it is that that's basically complete support
>>> for 32-bit arches. People do ask about arm32 support:
>>> https://groups.google.com/d/msg/kasan-dev/Sk6BsSPMRRc/Gqh4oD_wAAAJ
>>> https://groups.google.com/d/msg/kasan-dev/B22vOFp-QWg/EVJPbrsgAgAJ
>>> and probably mips32 is relevant as well.
>>
>> I don't see how above is relevant for 32-bit arches. Current design
>> is perfectly fine for 32-bit arches. I did some POC arm32 port couple years
>> ago - https://github.com/aryabinin/linux/commits/kasan/arm_v0_1
>> It has some ugly hacks and non-critical bugs. AFAIR it also super-slow because I (mistakenly)
>> made shadow memory uncached. But otherwise it works.
>
> Could you explain that where is the code to map shadow memory uncached?
> I don't find anything related to it.
>

I didn't set set any cache policy (L_PTE_MT_*) on shadow mapping (see set_pte_at() calls )
which means it's L_PTE_MT_UNCACHED

王靖天

unread,
Jun 1, 2017, 11:16:04 AM6/1/17