[PATCH v7 00/43] Add KernelMemorySanitizer infrastructure

7 views
Skip to first unread message

Alexander Potapenko

unread,
Sep 15, 2022, 11:04:39 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
KernelMemorySanitizer (KMSAN) is a detector of errors related to uses of
uninitialized memory. It relies on compile-time Clang instrumentation
(similar to MSan in the userspace [1]) and tracks the state of every bit
of kernel memory, being able to report an error if uninitialized value
is used in a condition, dereferenced, or escapes to userspace, USB or
DMA.

KMSAN has reported more than 300 bugs in the past few years (recently
fixed bugs: [2]), most of them with the help of syzkaller. Such bugs
keep getting introduced into the kernel despite new compiler warnings
and other analyses (the 6.0 cycle already resulted in several
KMSAN-reported bugs, e.g. [3]). Mitigations like total stack and heap
initialization are unfortunately very far from being deployable.

The proposed patchset contains KMSAN runtime implementation together
with small changes to other subsystems needed to make KMSAN work.

The latter changes fall into several categories:

1. Changes and refactorings of existing code required to add KMSAN:
- [01/43] x86: add missing include to sparsemem.h
- [02/43] stackdepot: reserve 5 extra bits in depot_stack_handle_t
- [03/43] instrumented.h: allow instrumenting both sides of copy_from_user()
- [04/43] x86: asm: instrument usercopy in get_user() and __put_user_size()
- [05/43] asm-generic: instrument usercopy in cacheflush.h
- [10/43] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE

2. KMSAN-related declarations in generic code, KMSAN runtime library,
docs and configs:
- [06/43] kmsan: add ReST documentation
- [07/43] kmsan: introduce __no_sanitize_memory and __no_kmsan_checks
- [09/43] x86: kmsan: pgtable: reduce vmalloc space
- [11/43] kmsan: add KMSAN runtime core
- [13/43] MAINTAINERS: add entry for KMSAN
- [24/43] kmsan: add tests for KMSAN
- [31/43] objtool: kmsan: list KMSAN API functions as uaccess-safe
- [35/43] x86: kmsan: use __msan_ string functions where possible
- [43/43] x86: kmsan: enable KMSAN builds for x86

3. Adding hooks from different subsystems to notify KMSAN about memory
state changes:
- [14/43] mm: kmsan: maintain KMSAN metadata for page
- [15/43] mm: kmsan: call KMSAN hooks from SLUB code
- [16/43] kmsan: handle task creation and exiting
- [17/43] init: kmsan: call KMSAN initialization routines
- [18/43] instrumented.h: add KMSAN support
- [19/43] kmsan: add iomap support
- [20/43] Input: libps2: mark data received in __ps2_command() as initialized
- [21/43] dma: kmsan: unpoison DMA mappings
- [34/43] x86: kmsan: handle open-coded assembly in lib/iomem.c
- [36/43] x86: kmsan: sync metadata pages on page fault

4. Changes that prevent false reports by explicitly initializing memory,
disabling optimized code that may trick KMSAN, selectively skipping
instrumentation:
- [08/43] kmsan: mark noinstr as __no_sanitize_memory
- [12/43] kmsan: disable instrumentation of unsupported common kernel code
- [22/43] virtio: kmsan: check/unpoison scatterlist in vring_map_one_sg()
- [23/43] kmsan: handle memory sent to/from USB
- [25/43] kmsan: disable strscpy() optimization under KMSAN
- [26/43] crypto: kmsan: disable accelerated configs under KMSAN
- [27/43] kmsan: disable physical page merging in biovec
- [28/43] block: kmsan: skip bio block merging logic for KMSAN
- [29/43] kcov: kmsan: unpoison area->list in kcov_remote_area_put()
- [30/43] security: kmsan: fix interoperability with auto-initialization
- [32/43] x86: kmsan: disable instrumentation of unsupported code
- [33/43] x86: kmsan: skip shadow checks in __switch_to()
- [37/43] x86: kasan: kmsan: support CONFIG_GENERIC_CSUM on x86, enable it for KASAN/KMSAN
- [38/43] x86: fs: kmsan: disable CONFIG_DCACHE_WORD_ACCESS
- [39/43] x86: kmsan: don't instrument stack walking functions
- [40/43] entry: kmsan: introduce kmsan_unpoison_entry_regs()

5. Fixes for bugs detected with CONFIG_KMSAN_CHECK_PARAM_RETVAL:
- [41/43] bpf: kmsan: initialize BPF registers with zeroes
- [42/43] mm: fs: initialize fsdata passed to write_begin/write_end interface

This patchset allows one to boot and run a defconfig+KMSAN kernel on a
QEMU without known false positives. It however doesn't guarantee there
are no false positives in drivers of certain devices or less tested
subsystems, although KMSAN is actively tested on syzbot with a large
config.

By default, KMSAN enforces conservative checks of most kernel function
parameters passed by value (via CONFIG_KMSAN_CHECK_PARAM_RETVAL, which
maps to the -fsanitize-memory-param-retval compiler flag). As discussed
in [4] and [5], passing uninitialized values as function parameters is
considered undefined behavior, therefore KMSAN now reports such cases as
errors. Several newly added patches fix known manifestations of these
errors.

The patchset was generated relative to Linux v6.0-rc5. The most
up-to-date KMSAN tree currently resides at
https://github.com/google/kmsan/. One may find it handy to review these
patches in Gerrit [6].

Patchset v7 includes only minor changes to origin tracking that allowed
us to drop "kmsan: unpoison @tlb in arch_tlb_gather_mmu()" from the
series.

For the following patches diff from v6 is non-trivial:
- kmsan: add KMSAN runtime core
- kmsan: add tests for KMSAN

A huge thanks goes to the reviewers of the RFC patch series sent to LKML
in 2020 ([7]).

[1] https://clang.llvm.org/docs/MemorySanitizer.html
[2] https://syzkaller.appspot.com/upstream/fixed?manager=ci-upstream-kmsan-gce
[3] https://lore.kernel.org/all/0000000000002c...@google.com/
[4] https://lore.kernel.org/all/20220614144853....@google.com/
[5] https://lore.kernel.org/linux-mm/20220701142310.2...@google.com/
[6] https://linux-review.googlesource.com/c/linux/kernel/git/torvalds/linux/+/12604/
[7] https://lore.kernel.org/all/20200325161249...@google.com/


Alexander Potapenko (42):
stackdepot: reserve 5 extra bits in depot_stack_handle_t
instrumented.h: allow instrumenting both sides of copy_from_user()
x86: asm: instrument usercopy in get_user() and put_user()
asm-generic: instrument usercopy in cacheflush.h
kmsan: add ReST documentation
kmsan: introduce __no_sanitize_memory and __no_kmsan_checks
kmsan: mark noinstr as __no_sanitize_memory
x86: kmsan: pgtable: reduce vmalloc space
libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE
kmsan: add KMSAN runtime core
kmsan: disable instrumentation of unsupported common kernel code
MAINTAINERS: add entry for KMSAN
mm: kmsan: maintain KMSAN metadata for page operations
mm: kmsan: call KMSAN hooks from SLUB code
kmsan: handle task creation and exiting
init: kmsan: call KMSAN initialization routines
instrumented.h: add KMSAN support
kmsan: add iomap support
Input: libps2: mark data received in __ps2_command() as initialized
dma: kmsan: unpoison DMA mappings
virtio: kmsan: check/unpoison scatterlist in vring_map_one_sg()
kmsan: handle memory sent to/from USB
kmsan: add tests for KMSAN
kmsan: disable strscpy() optimization under KMSAN
crypto: kmsan: disable accelerated configs under KMSAN
kmsan: disable physical page merging in biovec
block: kmsan: skip bio block merging logic for KMSAN
kcov: kmsan: unpoison area->list in kcov_remote_area_put()
security: kmsan: fix interoperability with auto-initialization
objtool: kmsan: list KMSAN API functions as uaccess-safe
x86: kmsan: disable instrumentation of unsupported code
x86: kmsan: skip shadow checks in __switch_to()
x86: kmsan: handle open-coded assembly in lib/iomem.c
x86: kmsan: use __msan_ string functions where possible.
x86: kmsan: sync metadata pages on page fault
x86: kasan: kmsan: support CONFIG_GENERIC_CSUM on x86, enable it for
KASAN/KMSAN
x86: fs: kmsan: disable CONFIG_DCACHE_WORD_ACCESS
x86: kmsan: don't instrument stack walking functions
entry: kmsan: introduce kmsan_unpoison_entry_regs()
bpf: kmsan: initialize BPF registers with zeroes
mm: fs: initialize fsdata passed to write_begin/write_end interface
x86: kmsan: enable KMSAN builds for x86

Dmitry Vyukov (1):
x86: add missing include to sparsemem.h

Documentation/dev-tools/index.rst | 1 +
Documentation/dev-tools/kmsan.rst | 427 +++++++++++++++++
MAINTAINERS | 13 +
Makefile | 1 +
arch/s390/lib/uaccess.c | 3 +-
arch/x86/Kconfig | 9 +-
arch/x86/boot/Makefile | 1 +
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/entry/vdso/Makefile | 3 +
arch/x86/include/asm/checksum.h | 16 +-
arch/x86/include/asm/kmsan.h | 55 +++
arch/x86/include/asm/page_64.h | 7 +
arch/x86/include/asm/pgtable_64_types.h | 47 +-
arch/x86/include/asm/sparsemem.h | 2 +
arch/x86/include/asm/string_64.h | 23 +-
arch/x86/include/asm/uaccess.h | 22 +-
arch/x86/kernel/Makefile | 2 +
arch/x86/kernel/cpu/Makefile | 1 +
arch/x86/kernel/dumpstack.c | 6 +
arch/x86/kernel/process_64.c | 1 +
arch/x86/kernel/unwind_frame.c | 11 +
arch/x86/lib/Makefile | 2 +
arch/x86/lib/iomem.c | 5 +
arch/x86/mm/Makefile | 2 +
arch/x86/mm/fault.c | 23 +-
arch/x86/mm/init_64.c | 2 +-
arch/x86/mm/ioremap.c | 3 +
arch/x86/realmode/rm/Makefile | 1 +
block/bio.c | 2 +
block/blk.h | 7 +
crypto/Kconfig | 30 ++
drivers/firmware/efi/libstub/Makefile | 1 +
drivers/input/serio/libps2.c | 5 +-
drivers/net/Kconfig | 1 +
drivers/nvdimm/nd.h | 2 +-
drivers/nvdimm/pfn_devs.c | 2 +-
drivers/usb/core/urb.c | 2 +
drivers/virtio/virtio_ring.c | 10 +-
fs/buffer.c | 4 +-
fs/namei.c | 2 +-
include/asm-generic/cacheflush.h | 14 +-
include/linux/compiler-clang.h | 23 +
include/linux/compiler-gcc.h | 6 +
include/linux/compiler_types.h | 3 +-
include/linux/fortify-string.h | 2 +
include/linux/highmem.h | 3 +
include/linux/instrumented.h | 59 ++-
include/linux/kmsan-checks.h | 83 ++++
include/linux/kmsan.h | 330 ++++++++++++++
include/linux/kmsan_types.h | 35 ++
include/linux/mm_types.h | 12 +
include/linux/sched.h | 5 +
include/linux/stackdepot.h | 8 +
include/linux/uaccess.h | 19 +-
init/main.c | 3 +
kernel/Makefile | 1 +
kernel/bpf/core.c | 2 +-
kernel/dma/mapping.c | 10 +-
kernel/entry/common.c | 5 +
kernel/exit.c | 2 +
kernel/fork.c | 2 +
kernel/kcov.c | 7 +
kernel/locking/Makefile | 3 +-
lib/Kconfig.debug | 1 +
lib/Kconfig.kmsan | 62 +++
lib/Makefile | 3 +
lib/iomap.c | 44 ++
lib/iov_iter.c | 9 +-
lib/stackdepot.c | 29 +-
lib/string.c | 8 +
lib/usercopy.c | 3 +-
mm/Makefile | 1 +
mm/filemap.c | 2 +-
mm/internal.h | 6 +
mm/kasan/common.c | 2 +-
mm/kmsan/Makefile | 28 ++
mm/kmsan/core.c | 450 ++++++++++++++++++
mm/kmsan/hooks.c | 384 ++++++++++++++++
mm/kmsan/init.c | 235 ++++++++++
mm/kmsan/instrumentation.c | 307 +++++++++++++
mm/kmsan/kmsan.h | 209 +++++++++
mm/kmsan/kmsan_test.c | 581 ++++++++++++++++++++++++
mm/kmsan/report.c | 219 +++++++++
mm/kmsan/shadow.c | 294 ++++++++++++
mm/memory.c | 2 +
mm/page_alloc.c | 19 +
mm/slab.h | 1 +
mm/slub.c | 17 +
mm/vmalloc.c | 20 +-
scripts/Makefile.kmsan | 8 +
scripts/Makefile.lib | 9 +
security/Kconfig.hardening | 4 +
tools/objtool/check.c | 20 +
93 files changed, 4316 insertions(+), 56 deletions(-)
create mode 100644 Documentation/dev-tools/kmsan.rst
create mode 100644 arch/x86/include/asm/kmsan.h
create mode 100644 include/linux/kmsan-checks.h
create mode 100644 include/linux/kmsan.h
create mode 100644 include/linux/kmsan_types.h
create mode 100644 lib/Kconfig.kmsan
create mode 100644 mm/kmsan/Makefile
create mode 100644 mm/kmsan/core.c
create mode 100644 mm/kmsan/hooks.c
create mode 100644 mm/kmsan/init.c
create mode 100644 mm/kmsan/instrumentation.c
create mode 100644 mm/kmsan/kmsan.h
create mode 100644 mm/kmsan/kmsan_test.c
create mode 100644 mm/kmsan/report.c
create mode 100644 mm/kmsan/shadow.c
create mode 100644 scripts/Makefile.kmsan

--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:04:41 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
From: Dmitry Vyukov <dvy...@google.com>

Including sparsemem.h from other files (e.g. transitively via
asm/pgtable_64_types.h) results in compilation errors due to unknown
types:

sparsemem.h:34:32: error: unknown type name 'phys_addr_t'
extern int phys_to_target_node(phys_addr_t start);
^
sparsemem.h:36:39: error: unknown type name 'u64'
extern int memory_add_physaddr_to_nid(u64 start);
^

Fix these errors by including linux/types.h from sparsemem.h
This is required for the upcoming KMSAN patches.

Signed-off-by: Dmitry Vyukov <dvy...@google.com>
Signed-off-by: Alexander Potapenko <gli...@google.com>
---
Link: https://linux-review.googlesource.com/id/Ifae221ce85d870d8f8d17173bd44d5cf9be2950f
---
arch/x86/include/asm/sparsemem.h | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/sparsemem.h b/arch/x86/include/asm/sparsemem.h
index 6a9ccc1b2be5d..64df897c0ee30 100644
--- a/arch/x86/include/asm/sparsemem.h
+++ b/arch/x86/include/asm/sparsemem.h
@@ -2,6 +2,8 @@
#ifndef _ASM_X86_SPARSEMEM_H
#define _ASM_X86_SPARSEMEM_H

+#include <linux/types.h>
+
#ifdef CONFIG_SPARSEMEM
/*
* generic non-linear memory support:
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:04:46 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
Some users (currently only KMSAN) may want to use spare bits in
depot_stack_handle_t. Let them do so by adding @extra_bits to
__stack_depot_save() to store arbitrary flags, and providing
stack_depot_get_extra_bits() to retrieve those flags.

Also adapt KASAN to the new prototype by passing extra_bits=0, as KASAN
does not intend to store additional information in the stack handle.

Signed-off-by: Alexander Potapenko <gli...@google.com>
Reviewed-by: Marco Elver <el...@google.com>

---
v4:
-- per Marco Elver's request, fold "kasan: common: adapt to the new
prototype of __stack_depot_save()" into this patch to prevent
bisection breakages.

Link: https://linux-review.googlesource.com/id/I0587f6c777667864768daf07821d594bce6d8ff9
---
include/linux/stackdepot.h | 8 ++++++++
lib/stackdepot.c | 29 ++++++++++++++++++++++++-----
mm/kasan/common.c | 2 +-
3 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index bc2797955de90..9ca7798d7a318 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -14,9 +14,15 @@
#include <linux/gfp.h>

typedef u32 depot_stack_handle_t;
+/*
+ * Number of bits in the handle that stack depot doesn't use. Users may store
+ * information in them.
+ */
+#define STACK_DEPOT_EXTRA_BITS 5

depot_stack_handle_t __stack_depot_save(unsigned long *entries,
unsigned int nr_entries,
+ unsigned int extra_bits,
gfp_t gfp_flags, bool can_alloc);

/*
@@ -59,6 +65,8 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
unsigned int stack_depot_fetch(depot_stack_handle_t handle,
unsigned long **entries);

+unsigned int stack_depot_get_extra_bits(depot_stack_handle_t handle);
+
int stack_depot_snprint(depot_stack_handle_t handle, char *buf, size_t size,
int spaces);

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index e73fda23388d8..79e894cf84064 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -43,7 +43,8 @@
#define STACK_ALLOC_OFFSET_BITS (STACK_ALLOC_ORDER + PAGE_SHIFT - \
STACK_ALLOC_ALIGN)
#define STACK_ALLOC_INDEX_BITS (DEPOT_STACK_BITS - \
- STACK_ALLOC_NULL_PROTECTION_BITS - STACK_ALLOC_OFFSET_BITS)
+ STACK_ALLOC_NULL_PROTECTION_BITS - \
+ STACK_ALLOC_OFFSET_BITS - STACK_DEPOT_EXTRA_BITS)
#define STACK_ALLOC_SLABS_CAP 8192
#define STACK_ALLOC_MAX_SLABS \
(((1LL << (STACK_ALLOC_INDEX_BITS)) < STACK_ALLOC_SLABS_CAP) ? \
@@ -56,6 +57,7 @@ union handle_parts {
u32 slabindex : STACK_ALLOC_INDEX_BITS;
u32 offset : STACK_ALLOC_OFFSET_BITS;
u32 valid : STACK_ALLOC_NULL_PROTECTION_BITS;
+ u32 extra : STACK_DEPOT_EXTRA_BITS;
};
};

@@ -77,6 +79,14 @@ static int next_slab_inited;
static size_t depot_offset;
static DEFINE_RAW_SPINLOCK(depot_lock);

+unsigned int stack_depot_get_extra_bits(depot_stack_handle_t handle)
+{
+ union handle_parts parts = { .handle = handle };
+
+ return parts.extra;
+}
+EXPORT_SYMBOL(stack_depot_get_extra_bits);
+
static bool init_stack_slab(void **prealloc)
{
if (!*prealloc)
@@ -140,6 +150,7 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
stack->handle.slabindex = depot_index;
stack->handle.offset = depot_offset >> STACK_ALLOC_ALIGN;
stack->handle.valid = 1;
+ stack->handle.extra = 0;
memcpy(stack->entries, entries, flex_array_size(stack, entries, size));
depot_offset += required_size;

@@ -382,6 +393,7 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
*
* @entries: Pointer to storage array
* @nr_entries: Size of the storage array
+ * @extra_bits: Flags to store in unused bits of depot_stack_handle_t
* @alloc_flags: Allocation gfp flags
* @can_alloc: Allocate stack slabs (increased chance of failure if false)
*
@@ -393,6 +405,10 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
* If the stack trace in @entries is from an interrupt, only the portion up to
* interrupt entry is saved.
*
+ * Additional opaque flags can be passed in @extra_bits, stored in the unused
+ * bits of the stack handle, and retrieved using stack_depot_get_extra_bits()
+ * without calling stack_depot_fetch().
+ *
* Context: Any context, but setting @can_alloc to %false is required if
* alloc_pages() cannot be used from the current context. Currently
* this is the case from contexts where neither %GFP_ATOMIC nor
@@ -402,10 +418,11 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
*/
depot_stack_handle_t __stack_depot_save(unsigned long *entries,
unsigned int nr_entries,
+ unsigned int extra_bits,
gfp_t alloc_flags, bool can_alloc)
{
struct stack_record *found = NULL, **bucket;
- depot_stack_handle_t retval = 0;
+ union handle_parts retval = { .handle = 0 };
struct page *page = NULL;
void *prealloc = NULL;
unsigned long flags;
@@ -489,9 +506,11 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
free_pages((unsigned long)prealloc, STACK_ALLOC_ORDER);
}
if (found)
- retval = found->handle.handle;
+ retval.handle = found->handle.handle;
fast_exit:
- return retval;
+ retval.extra = extra_bits;
+
+ return retval.handle;
}
EXPORT_SYMBOL_GPL(__stack_depot_save);

@@ -511,6 +530,6 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
unsigned int nr_entries,
gfp_t alloc_flags)
{
- return __stack_depot_save(entries, nr_entries, alloc_flags, true);
+ return __stack_depot_save(entries, nr_entries, 0, alloc_flags, true);
}
EXPORT_SYMBOL_GPL(stack_depot_save);
diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 69f583855c8be..94caa2d46a327 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -36,7 +36,7 @@ depot_stack_handle_t kasan_save_stack(gfp_t flags, bool can_alloc)
unsigned int nr_entries;

nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
- return __stack_depot_save(entries, nr_entries, flags, can_alloc);
+ return __stack_depot_save(entries, nr_entries, 0, flags, can_alloc);
}

void kasan_set_track(struct kasan_track *track, gfp_t flags)
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:04:48 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
Introduce instrument_copy_from_user_before() and
instrument_copy_from_user_after() hooks to be invoked before and after
the call to copy_from_user().

KASAN and KCSAN will be only using instrument_copy_from_user_before(),
but for KMSAN we'll need to insert code after copy_from_user().

Signed-off-by: Alexander Potapenko <gli...@google.com>
Reviewed-by: Marco Elver <el...@google.com>

---
v4:
-- fix _copy_from_user_key() in arch/s390/lib/uaccess.c (Reported-by:
kernel test robot <l...@intel.com>)

Link: https://linux-review.googlesource.com/id/I855034578f0b0f126734cbd734fb4ae1d3a6af99
---
arch/s390/lib/uaccess.c | 3 ++-
include/linux/instrumented.h | 21 +++++++++++++++++++--
include/linux/uaccess.h | 19 ++++++++++++++-----
lib/iov_iter.c | 9 ++++++---
lib/usercopy.c | 3 ++-
5 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/arch/s390/lib/uaccess.c b/arch/s390/lib/uaccess.c
index d7b3b193d1088..58033dfcb6d45 100644
--- a/arch/s390/lib/uaccess.c
+++ b/arch/s390/lib/uaccess.c
@@ -81,8 +81,9 @@ unsigned long _copy_from_user_key(void *to, const void __user *from,

might_fault();
if (!should_fail_usercopy()) {
- instrument_copy_from_user(to, from, n);
+ instrument_copy_from_user_before(to, from, n);
res = raw_copy_from_user_key(to, from, n, key);
+ instrument_copy_from_user_after(to, from, n, res);
}
if (unlikely(res))
memset(to + (n - res), 0, res);
diff --git a/include/linux/instrumented.h b/include/linux/instrumented.h
index 42faebbaa202a..ee8f7d17d34f5 100644
--- a/include/linux/instrumented.h
+++ b/include/linux/instrumented.h
@@ -120,7 +120,7 @@ instrument_copy_to_user(void __user *to, const void *from, unsigned long n)
}

/**
- * instrument_copy_from_user - instrument writes of copy_from_user
+ * instrument_copy_from_user_before - add instrumentation before copy_from_user
*
* Instrument writes to kernel memory, that are due to copy_from_user (and
* variants). The instrumentation should be inserted before the accesses.
@@ -130,10 +130,27 @@ instrument_copy_to_user(void __user *to, const void *from, unsigned long n)
* @n number of bytes to copy
*/
static __always_inline void
-instrument_copy_from_user(const void *to, const void __user *from, unsigned long n)
+instrument_copy_from_user_before(const void *to, const void __user *from, unsigned long n)
{
kasan_check_write(to, n);
kcsan_check_write(to, n);
}

+/**
+ * instrument_copy_from_user_after - add instrumentation after copy_from_user
+ *
+ * Instrument writes to kernel memory, that are due to copy_from_user (and
+ * variants). The instrumentation should be inserted after the accesses.
+ *
+ * @to destination address
+ * @from source address
+ * @n number of bytes to copy
+ * @left number of bytes not copied (as returned by copy_from_user)
+ */
+static __always_inline void
+instrument_copy_from_user_after(const void *to, const void __user *from,
+ unsigned long n, unsigned long left)
+{
+}
+
#endif /* _LINUX_INSTRUMENTED_H */
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 47e5d374c7ebe..afb18f198843b 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -58,20 +58,28 @@
static __always_inline __must_check unsigned long
__copy_from_user_inatomic(void *to, const void __user *from, unsigned long n)
{
- instrument_copy_from_user(to, from, n);
+ unsigned long res;
+
+ instrument_copy_from_user_before(to, from, n);
check_object_size(to, n, false);
- return raw_copy_from_user(to, from, n);
+ res = raw_copy_from_user(to, from, n);
+ instrument_copy_from_user_after(to, from, n, res);
+ return res;
}

static __always_inline __must_check unsigned long
__copy_from_user(void *to, const void __user *from, unsigned long n)
{
+ unsigned long res;
+
might_fault();
+ instrument_copy_from_user_before(to, from, n);
if (should_fail_usercopy())
return n;
- instrument_copy_from_user(to, from, n);
check_object_size(to, n, false);
- return raw_copy_from_user(to, from, n);
+ res = raw_copy_from_user(to, from, n);
+ instrument_copy_from_user_after(to, from, n, res);
+ return res;
}

/**
@@ -115,8 +123,9 @@ _copy_from_user(void *to, const void __user *from, unsigned long n)
unsigned long res = n;
might_fault();
if (!should_fail_usercopy() && likely(access_ok(from, n))) {
- instrument_copy_from_user(to, from, n);
+ instrument_copy_from_user_before(to, from, n);
res = raw_copy_from_user(to, from, n);
+ instrument_copy_from_user_after(to, from, n, res);
}
if (unlikely(res))
memset(to + (n - res), 0, res);
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 4b7fce72e3e52..c3ca28ca68a65 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -174,13 +174,16 @@ static int copyout(void __user *to, const void *from, size_t n)

static int copyin(void *to, const void __user *from, size_t n)
{
+ size_t res = n;
+
if (should_fail_usercopy())
return n;
if (access_ok(from, n)) {
- instrument_copy_from_user(to, from, n);
- n = raw_copy_from_user(to, from, n);
+ instrument_copy_from_user_before(to, from, n);
+ res = raw_copy_from_user(to, from, n);
+ instrument_copy_from_user_after(to, from, n, res);
}
- return n;
+ return res;
}

static inline struct pipe_buffer *pipe_buf(const struct pipe_inode_info *pipe,
diff --git a/lib/usercopy.c b/lib/usercopy.c
index 7413dd300516e..1505a52f23a01 100644
--- a/lib/usercopy.c
+++ b/lib/usercopy.c
@@ -12,8 +12,9 @@ unsigned long _copy_from_user(void *to, const void __user *from, unsigned long n
unsigned long res = n;
might_fault();
if (!should_fail_usercopy() && likely(access_ok(from, n))) {
- instrument_copy_from_user(to, from, n);
+ instrument_copy_from_user_before(to, from, n);
res = raw_copy_from_user(to, from, n);
+ instrument_copy_from_user_after(to, from, n, res);
}
if (unlikely(res))
memset(to + (n - res), 0, res);
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:04:51 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
Use hooks from instrumented.h to notify bug detection tools about
usercopy events in variations of get_user() and put_user().

Signed-off-by: Alexander Potapenko <gli...@google.com>
---
v5:
-- handle put_user(), make sure to not evaluate pointer/value twice

v6:
-- add missing empty definitions of instrument_get_user() and
instrument_put_user()

Link: https://linux-review.googlesource.com/id/Ia9f12bfe5832623250e20f1859fdf5cc485a2fce
---
arch/x86/include/asm/uaccess.h | 22 +++++++++++++++-------
include/linux/instrumented.h | 28 ++++++++++++++++++++++++++++
2 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 913e593a3b45f..c1b8982899eca 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -5,6 +5,7 @@
* User space memory access functions
*/
#include <linux/compiler.h>
+#include <linux/instrumented.h>
#include <linux/kasan-checks.h>
#include <linux/string.h>
#include <asm/asm.h>
@@ -103,6 +104,7 @@ extern int __get_user_bad(void);
: "=a" (__ret_gu), "=r" (__val_gu), \
ASM_CALL_CONSTRAINT \
: "0" (ptr), "i" (sizeof(*(ptr)))); \
+ instrument_get_user(__val_gu); \
(x) = (__force __typeof__(*(ptr))) __val_gu; \
__builtin_expect(__ret_gu, 0); \
})
@@ -192,9 +194,11 @@ extern void __put_user_nocheck_8(void);
int __ret_pu; \
void __user *__ptr_pu; \
register __typeof__(*(ptr)) __val_pu asm("%"_ASM_AX); \
- __chk_user_ptr(ptr); \
- __ptr_pu = (ptr); \
- __val_pu = (x); \
+ __typeof__(*(ptr)) __x = (x); /* eval x once */ \
+ __typeof__(ptr) __ptr = (ptr); /* eval ptr once */ \
+ __chk_user_ptr(__ptr); \
+ __ptr_pu = __ptr; \
+ __val_pu = __x; \
asm volatile("call __" #fn "_%P[size]" \
: "=c" (__ret_pu), \
ASM_CALL_CONSTRAINT \
@@ -202,6 +206,7 @@ extern void __put_user_nocheck_8(void);
"r" (__val_pu), \
[size] "i" (sizeof(*(ptr))) \
:"ebx"); \
+ instrument_put_user(__x, __ptr, sizeof(*(ptr))); \
__builtin_expect(__ret_pu, 0); \
})

@@ -248,23 +253,25 @@ extern void __put_user_nocheck_8(void);

#define __put_user_size(x, ptr, size, label) \
do { \
+ __typeof__(*(ptr)) __x = (x); /* eval x once */ \
__chk_user_ptr(ptr); \
switch (size) { \
case 1: \
- __put_user_goto(x, ptr, "b", "iq", label); \
+ __put_user_goto(__x, ptr, "b", "iq", label); \
break; \
case 2: \
- __put_user_goto(x, ptr, "w", "ir", label); \
+ __put_user_goto(__x, ptr, "w", "ir", label); \
break; \
case 4: \
- __put_user_goto(x, ptr, "l", "ir", label); \
+ __put_user_goto(__x, ptr, "l", "ir", label); \
break; \
case 8: \
- __put_user_goto_u64(x, ptr, label); \
+ __put_user_goto_u64(__x, ptr, label); \
break; \
default: \
__put_user_bad(); \
} \
+ instrument_put_user(__x, ptr, size); \
} while (0)

#ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
@@ -305,6 +312,7 @@ do { \
default: \
(x) = __get_user_bad(); \
} \
+ instrument_get_user(x); \
} while (0)

#define __get_user_asm(x, addr, itype, ltype, label) \
diff --git a/include/linux/instrumented.h b/include/linux/instrumented.h
index ee8f7d17d34f5..9f1dba8f717b0 100644
--- a/include/linux/instrumented.h
+++ b/include/linux/instrumented.h
@@ -153,4 +153,32 @@ instrument_copy_from_user_after(const void *to, const void __user *from,
{
}

+/**
+ * instrument_get_user() - add instrumentation to get_user()-like macros
+ *
+ * get_user() and friends are fragile, so it may depend on the implementation
+ * whether the instrumentation happens before or after the data is copied from
+ * the userspace.
+ *
+ * @to destination variable, may not be address-taken
+ */
+#define instrument_get_user(to) \
+({ \
+})
+
+/**
+ * instrument_put_user() - add instrumentation to put_user()-like macros
+ *
+ * put_user() and friends are fragile, so it may depend on the implementation
+ * whether the instrumentation happens before or after the data is copied from
+ * the userspace.
+ *
+ * @from source address
+ * @ptr userspace pointer to copy to
+ * @size number of bytes to copy
+ */
+#define instrument_put_user(from, ptr, size) \
+({ \
+})
+
#endif /* _LINUX_INSTRUMENTED_H */
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:04:53 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
Notify memory tools about usercopy events in copy_to_user_page() and
copy_from_user_page().

Signed-off-by: Alexander Potapenko <gli...@google.com>
Reviewed-by: Marco Elver <el...@google.com>

---
v5:
-- cast user pointers to `void __user *`

Link: https://linux-review.googlesource.com/id/Ic1ee8da1886325f46ad67f52176f48c2c836c48f
---
include/asm-generic/cacheflush.h | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/cacheflush.h b/include/asm-generic/cacheflush.h
index 4f07afacbc239..f46258d1a080f 100644
--- a/include/asm-generic/cacheflush.h
+++ b/include/asm-generic/cacheflush.h
@@ -2,6 +2,8 @@
#ifndef _ASM_GENERIC_CACHEFLUSH_H
#define _ASM_GENERIC_CACHEFLUSH_H

+#include <linux/instrumented.h>
+
struct mm_struct;
struct vm_area_struct;
struct page;
@@ -105,14 +107,22 @@ static inline void flush_cache_vunmap(unsigned long start, unsigned long end)
#ifndef copy_to_user_page
#define copy_to_user_page(vma, page, vaddr, dst, src, len) \
do { \
+ instrument_copy_to_user((void __user *)dst, src, len); \
memcpy(dst, src, len); \
flush_icache_user_page(vma, page, vaddr, len); \
} while (0)
#endif

+
#ifndef copy_from_user_page
-#define copy_from_user_page(vma, page, vaddr, dst, src, len) \
- memcpy(dst, src, len)
+#define copy_from_user_page(vma, page, vaddr, dst, src, len) \
+ do { \
+ instrument_copy_from_user_before(dst, (void __user *)src, \
+ len); \
+ memcpy(dst, src, len); \
+ instrument_copy_from_user_after(dst, (void __user *)src, len, \
+ 0); \
+ } while (0)
#endif

#endif /* _ASM_GENERIC_CACHEFLUSH_H */
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:04:56 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
Add Documentation/dev-tools/kmsan.rst and reference it in the dev-tools
index.

Signed-off-by: Alexander Potapenko <gli...@google.com>
---
v2:
-- added a note that KMSAN is not intended for production use

v5:
-- mention CONFIG_KMSAN_CHECK_PARAM_RETVAL, drop mentions of cpu_entry_area
-- add SPDX license
-- address Marco Elver's comments: reorganize doc structure, fix minor
nits

Link: https://linux-review.googlesource.com/id/I751586f79418b95550a83c6035c650b5b01567cc
---
Documentation/dev-tools/index.rst | 1 +
Documentation/dev-tools/kmsan.rst | 427 ++++++++++++++++++++++++++++++
2 files changed, 428 insertions(+)
create mode 100644 Documentation/dev-tools/kmsan.rst

diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index 4621eac290f46..6b0663075dc04 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -24,6 +24,7 @@ Documentation/dev-tools/testing-overview.rst
kcov
gcov
kasan
+ kmsan
ubsan
kmemleak
kcsan
diff --git a/Documentation/dev-tools/kmsan.rst b/Documentation/dev-tools/kmsan.rst
new file mode 100644
index 0000000000000..2a53a801198cb
--- /dev/null
+++ b/Documentation/dev-tools/kmsan.rst
@@ -0,0 +1,427 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. Copyright (C) 2022, Google LLC.
+
+===================================
+The Kernel Memory Sanitizer (KMSAN)
+===================================
+
+KMSAN is a dynamic error detector aimed at finding uses of uninitialized
+values. It is based on compiler instrumentation, and is quite similar to the
+userspace `MemorySanitizer tool`_.
+
+An important note is that KMSAN is not intended for production use, because it
+drastically increases kernel memory footprint and slows the whole system down.
+
+Usage
+=====
+
+Building the kernel
+-------------------
+
+In order to build a kernel with KMSAN you will need a fresh Clang (14.0.6+).
+Please refer to `LLVM documentation`_ for the instructions on how to build Clang.
+
+Now configure and build the kernel with CONFIG_KMSAN enabled.
+
+Example report
+--------------
+
+Here is an example of a KMSAN report::
+
+ =====================================================
+ BUG: KMSAN: uninit-value in test_uninit_kmsan_check_memory+0x1be/0x380 [kmsan_test]
+ test_uninit_kmsan_check_memory+0x1be/0x380 mm/kmsan/kmsan_test.c:273
+ kunit_run_case_internal lib/kunit/test.c:333
+ kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
+ kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
+ kthread+0x721/0x850 kernel/kthread.c:327
+ ret_from_fork+0x1f/0x30 ??:?
+
+ Uninit was stored to memory at:
+ do_uninit_local_array+0xfa/0x110 mm/kmsan/kmsan_test.c:260
+ test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271
+ kunit_run_case_internal lib/kunit/test.c:333
+ kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
+ kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
+ kthread+0x721/0x850 kernel/kthread.c:327
+ ret_from_fork+0x1f/0x30 ??:?
+
+ Local variable uninit created at:
+ do_uninit_local_array+0x4a/0x110 mm/kmsan/kmsan_test.c:256
+ test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271
+
+ Bytes 4-7 of 8 are uninitialized
+ Memory access of size 8 starts at ffff888083fe3da0
+
+ CPU: 0 PID: 6731 Comm: kunit_try_catch Tainted: G B E 5.16.0-rc3+ #104
+ Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
+ =====================================================
+
+The report says that the local variable ``uninit`` was created uninitialized in
+``do_uninit_local_array()``. The third stack trace corresponds to the place
+where this variable was created.
+
+The first stack trace shows where the uninit value was used (in
+``test_uninit_kmsan_check_memory()``). The tool shows the bytes which were left
+uninitialized in the local variable, as well as the stack where the value was
+copied to another memory location before use.
+
+A use of uninitialized value ``v`` is reported by KMSAN in the following cases:
+ - in a condition, e.g. ``if (v) { ... }``;
+ - in an indexing or pointer dereferencing, e.g. ``array[v]`` or ``*v``;
+ - when it is copied to userspace or hardware, e.g. ``copy_to_user(..., &v, ...)``;
+ - when it is passed as an argument to a function, and
+ ``CONFIG_KMSAN_CHECK_PARAM_RETVAL`` is enabled (see below).
+
+The mentioned cases (apart from copying data to userspace or hardware, which is
+a security issue) are considered undefined behavior from the C11 Standard point
+of view.
+
+Disabling the instrumentation
+-----------------------------
+
+A function can be marked with ``__no_kmsan_checks``. Doing so makes KMSAN
+ignore uninitialized values in that function and mark its output as initialized.
+As a result, the user will not get KMSAN reports related to that function.
+
+Another function attribute supported by KMSAN is ``__no_sanitize_memory``.
+Applying this attribute to a function will result in KMSAN not instrumenting
+it, which can be helpful if we do not want the compiler to interfere with some
+low-level code (e.g. that marked with ``noinstr`` which implicitly adds
+``__no_sanitize_memory``).
+
+This however comes at a cost: stack allocations from such functions will have
+incorrect shadow/origin values, likely leading to false positives. Functions
+called from non-instrumented code may also receive incorrect metadata for their
+parameters.
+
+As a rule of thumb, avoid using ``__no_sanitize_memory`` explicitly.
+
+It is also possible to disable KMSAN for a single file (e.g. main.o)::
+
+ KMSAN_SANITIZE_main.o := n
+
+or for the whole directory::
+
+ KMSAN_SANITIZE := n
+
+in the Makefile. Think of this as applying ``__no_sanitize_memory`` to every
+function in the file or directory. Most users won't need KMSAN_SANITIZE, unless
+their code gets broken by KMSAN (e.g. runs at early boot time).
+
+Support
+=======
+
+In order for KMSAN to work the kernel must be built with Clang, which so far is
+the only compiler that has KMSAN support. The kernel instrumentation pass is
+based on the userspace `MemorySanitizer tool`_.
+
+The runtime library only supports x86_64 at the moment.
+
+How KMSAN works
+===============
+
+KMSAN shadow memory
+-------------------
+
+KMSAN associates a metadata byte (also called shadow byte) with every byte of
+kernel memory. A bit in the shadow byte is set iff the corresponding bit of the
+kernel memory byte is uninitialized. Marking the memory uninitialized (i.e.
+setting its shadow bytes to ``0xff``) is called poisoning, marking it
+initialized (setting the shadow bytes to ``0x00``) is called unpoisoning.
+
+When a new variable is allocated on the stack, it is poisoned by default by
+instrumentation code inserted by the compiler (unless it is a stack variable
+that is immediately initialized). Any new heap allocation done without
+``__GFP_ZERO`` is also poisoned.
+
+Compiler instrumentation also tracks the shadow values as they are used along
+the code. When needed, instrumentation code invokes the runtime library in
+``mm/kmsan/`` to persist shadow values.
+
+The shadow value of a basic or compound type is an array of bytes of the same
+length. When a constant value is written into memory, that memory is unpoisoned.
+When a value is read from memory, its shadow memory is also obtained and
+propagated into all the operations which use that value. For every instruction
+that takes one or more values the compiler generates code that calculates the
+shadow of the result depending on those values and their shadows.
+
+Example::
+
+ int a = 0xff; // i.e. 0x000000ff
+ int b;
+ int c = a | b;
+
+In this case the shadow of ``a`` is ``0``, shadow of ``b`` is ``0xffffffff``,
+shadow of ``c`` is ``0xffffff00``. This means that the upper three bytes of
+``c`` are uninitialized, while the lower byte is initialized.
+
+Origin tracking
+---------------
+
+Every four bytes of kernel memory also have a so-called origin mapped to them.
+This origin describes the point in program execution at which the uninitialized
+value was created. Every origin is associated with either the full allocation
+stack (for heap-allocated memory), or the function containing the uninitialized
+variable (for locals).
+
+When an uninitialized variable is allocated on stack or heap, a new origin
+value is created, and that variable's origin is filled with that value. When a
+value is read from memory, its origin is also read and kept together with the
+shadow. For every instruction that takes one or more values, the origin of the
+result is one of the origins corresponding to any of the uninitialized inputs.
+If a poisoned value is written into memory, its origin is written to the
+corresponding storage as well.
+
+Example 1::
+
+ int a = 42;
+ int b;
+ int c = a + b;
+
+In this case the origin of ``b`` is generated upon function entry, and is
+stored to the origin of ``c`` right before the addition result is written into
+memory.
+
+Several variables may share the same origin address, if they are stored in the
+same four-byte chunk. In this case every write to either variable updates the
+origin for all of them. We have to sacrifice precision in this case, because
+storing origins for individual bits (and even bytes) would be too costly.
+
+Example 2::
+
+ int combine(short a, short b) {
+ union ret_t {
+ int i;
+ short s[2];
+ } ret;
+ ret.s[0] = a;
+ ret.s[1] = b;
+ return ret.i;
+ }
+
+If ``a`` is initialized and ``b`` is not, the shadow of the result would be
+0xffff0000, and the origin of the result would be the origin of ``b``.
+``ret.s[0]`` would have the same origin, but it will never be used, because
+that variable is initialized.
+
+If both function arguments are uninitialized, only the origin of the second
+argument is preserved.
+
+Origin chaining
+~~~~~~~~~~~~~~~
+
+To ease debugging, KMSAN creates a new origin for every store of an
+uninitialized value to memory. The new origin references both its creation stack
+and the previous origin the value had. This may cause increased memory
+consumption, so we limit the length of origin chains in the runtime.
+
+Clang instrumentation API
+-------------------------
+
+Clang instrumentation pass inserts calls to functions defined in
+``mm/kmsan/nstrumentation.c`` into the kernel code.
+
+Shadow manipulation
+~~~~~~~~~~~~~~~~~~~
+
+For every memory access the compiler emits a call to a function that returns a
+pair of pointers to the shadow and origin addresses of the given memory::
+
+ typedef struct {
+ void *shadow, *origin;
+ } shadow_origin_ptr_t
+
+ shadow_origin_ptr_t __msan_metadata_ptr_for_load_{1,2,4,8}(void *addr)
+ shadow_origin_ptr_t __msan_metadata_ptr_for_store_{1,2,4,8}(void *addr)
+ shadow_origin_ptr_t __msan_metadata_ptr_for_load_n(void *addr, uintptr_t size)
+ shadow_origin_ptr_t __msan_metadata_ptr_for_store_n(void *addr, uintptr_t size)
+
+The function name depends on the memory access size.
+
+The compiler makes sure that for every loaded value its shadow and origin
+values are read from memory. When a value is stored to memory, its shadow and
+origin are also stored using the metadata pointers.
+
+Handling locals
+~~~~~~~~~~~~~~~
+
+A special function is used to create a new origin value for a local variable and
+set the origin of that variable to that value::
+
+ void __msan_poison_alloca(void *addr, uintptr_t size, char *descr)
+
+Access to per-task data
+~~~~~~~~~~~~~~~~~~~~~~~
+
+At the beginning of every instrumented function KMSAN inserts a call to
+``__msan_get_context_state()``::
+
+ kmsan_context_state *__msan_get_context_state(void)
+
+``kmsan_context_state`` is declared in ``include/linux/kmsan.h``::
+
+ struct kmsan_context_state {
+ char param_tls[KMSAN_PARAM_SIZE];
+ char retval_tls[KMSAN_RETVAL_SIZE];
+ char va_arg_tls[KMSAN_PARAM_SIZE];
+ char va_arg_origin_tls[KMSAN_PARAM_SIZE];
+ u64 va_arg_overflow_size_tls;
+ char param_origin_tls[KMSAN_PARAM_SIZE];
+ depot_stack_handle_t retval_origin_tls;
+ };
+
+This structure is used by KMSAN to pass parameter shadows and origins between
+instrumented functions (unless the parameters are checked immediately by
+``CONFIG_KMSAN_CHECK_PARAM_RETVAL``).
+
+Passing uninitialized values to functions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Clang's MemorySanitizer instrumentation has an option,
+``-fsanitize-memory-param-retval``, which makes the compiler check function
+parameters passed by value, as well as function return values.
+
+The option is controlled by ``CONFIG_KMSAN_CHECK_PARAM_RETVAL``, which is
+enabled by default to let KMSAN report uninitialized values earlier.
+Please refer to the `LKML discussion`_ for more details.
+
+Because of the way the checks are implemented in LLVM (they are only applied to
+parameters marked as ``noundef``), not all parameters are guaranteed to be
+checked, so we cannot give up the metadata storage in ``kmsan_context_state``.
+
+String functions
+~~~~~~~~~~~~~~~~
+
+The compiler replaces calls to ``memcpy()``/``memmove()``/``memset()`` with the
+following functions. These functions are also called when data structures are
+initialized or copied, making sure shadow and origin values are copied alongside
+with the data::
+
+ void *__msan_memcpy(void *dst, void *src, uintptr_t n)
+ void *__msan_memmove(void *dst, void *src, uintptr_t n)
+ void *__msan_memset(void *dst, int c, uintptr_t n)
+
+Error reporting
+~~~~~~~~~~~~~~~
+
+For each use of a value the compiler emits a shadow check that calls
+``__msan_warning()`` in the case that value is poisoned::
+
+ void __msan_warning(u32 origin)
+
+``__msan_warning()`` causes KMSAN runtime to print an error report.
+
+Inline assembly instrumentation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+KMSAN instruments every inline assembly output with a call to::
+
+ void __msan_instrument_asm_store(void *addr, uintptr_t size)
+
+, which unpoisons the memory region.
+
+This approach may mask certain errors, but it also helps to avoid a lot of
+false positives in bitwise operations, atomics etc.
+
+Sometimes the pointers passed into inline assembly do not point to valid memory.
+In such cases they are ignored at runtime.
+
+
+Runtime library
+---------------
+
+The code is located in ``mm/kmsan/``.
+
+Per-task KMSAN state
+~~~~~~~~~~~~~~~~~~~~
+
+Every task_struct has an associated KMSAN task state that holds the KMSAN
+context (see above) and a per-task flag disallowing KMSAN reports::
+
+ struct kmsan_context {
+ ...
+ bool allow_reporting;
+ struct kmsan_context_state cstate;
+ ...
+ }
+
+ struct task_struct {
+ ...
+ struct kmsan_context kmsan;
+ ...
+ }
+
+KMSAN contexts
+~~~~~~~~~~~~~~
+
+When running in a kernel task context, KMSAN uses ``current->kmsan.cstate`` to
+hold the metadata for function parameters and return values.
+
+But in the case the kernel is running in the interrupt, softirq or NMI context,
+where ``current`` is unavailable, KMSAN switches to per-cpu interrupt state::
+
+ DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
+
+Metadata allocation
+~~~~~~~~~~~~~~~~~~~
+
+There are several places in the kernel for which the metadata is stored.
+
+1. Each ``struct page`` instance contains two pointers to its shadow and
+origin pages::
+
+ struct page {
+ ...
+ struct page *shadow, *origin;
+ ...
+ };
+
+At boot-time, the kernel allocates shadow and origin pages for every available
+kernel page. This is done quite late, when the kernel address space is already
+fragmented, so normal data pages may arbitrarily interleave with the metadata
+pages.
+
+This means that in general for two contiguous memory pages their shadow/origin
+pages may not be contiguous. Consequently, if a memory access crosses the
+boundary of a memory block, accesses to shadow/origin memory may potentially
+corrupt other pages or read incorrect values from them.
+
+In practice, contiguous memory pages returned by the same ``alloc_pages()``
+call will have contiguous metadata, whereas if these pages belong to two
+different allocations their metadata pages can be fragmented.
+
+For the kernel data (``.data``, ``.bss`` etc.) and percpu memory regions
+there also are no guarantees on metadata contiguity.
+
+In the case ``__msan_metadata_ptr_for_XXX_YYY()`` hits the border between two
+pages with non-contiguous metadata, it returns pointers to fake shadow/origin regions::
+
+ char dummy_load_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
+ char dummy_store_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
+
+``dummy_load_page`` is zero-initialized, so reads from it always yield zeroes.
+All stores to ``dummy_store_page`` are ignored.
+
+2. For vmalloc memory and modules, there is a direct mapping between the memory
+range, its shadow and origin. KMSAN reduces the vmalloc area by 3/4, making only
+the first quarter available to ``vmalloc()``. The second quarter of the vmalloc
+area contains shadow memory for the first quarter, the third one holds the
+origins. A small part of the fourth quarter contains shadow and origins for the
+kernel modules. Please refer to ``arch/x86/include/asm/pgtable_64_types.h`` for
+more details.
+
+When an array of pages is mapped into a contiguous virtual memory space, their
+shadow and origin pages are similarly mapped into contiguous regions.
+
+References
+==========
+
+E. Stepanov, K. Serebryany. `MemorySanitizer: fast detector of uninitialized
+memory use in C++
+<https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43308.pdf>`_.
+In Proceedings of CGO 2015.
+
+.. _MemorySanitizer tool: https://clang.llvm.org/docs/MemorySanitizer.html
+.. _LLVM documentation: https://llvm.org/docs/GettingStarted.html
+.. _LKML discussion: https://lore.kernel.org/all/20220614144853....@google.com/
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:05:00 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
__no_sanitize_memory is a function attribute that instructs KMSAN to
skip a function during instrumentation. This is needed to e.g. implement
the noinstr functions.

__no_kmsan_checks is a function attribute that makes KMSAN
ignore the uninitialized values coming from the function's
inputs, and initialize the function's outputs.

Functions marked with this attribute can't be inlined into functions
not marked with it, and vice versa. This behavior is overridden by
__always_inline.

__SANITIZE_MEMORY__ is a macro that's defined iff the file is
instrumented with KMSAN. This is not the same as CONFIG_KMSAN, which is
defined for every file.

Signed-off-by: Alexander Potapenko <gli...@google.com>
Reviewed-by: Marco Elver <el...@google.com>

---
Link: https://linux-review.googlesource.com/id/I004ff0360c918d3cd8b18767ddd1381c6d3281be
---
include/linux/compiler-clang.h | 23 +++++++++++++++++++++++
include/linux/compiler-gcc.h | 6 ++++++
2 files changed, 29 insertions(+)

diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
index c84fec767445d..4fa0cc4cbd2c8 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -51,6 +51,29 @@
#define __no_sanitize_undefined
#endif

+#if __has_feature(memory_sanitizer)
+#define __SANITIZE_MEMORY__
+/*
+ * Unlike other sanitizers, KMSAN still inserts code into functions marked with
+ * no_sanitize("kernel-memory"). Using disable_sanitizer_instrumentation
+ * provides the behavior consistent with other __no_sanitize_ attributes,
+ * guaranteeing that __no_sanitize_memory functions remain uninstrumented.
+ */
+#define __no_sanitize_memory __disable_sanitizer_instrumentation
+
+/*
+ * The __no_kmsan_checks attribute ensures that a function does not produce
+ * false positive reports by:
+ * - initializing all local variables and memory stores in this function;
+ * - skipping all shadow checks;
+ * - passing initialized arguments to this function's callees.
+ */
+#define __no_kmsan_checks __attribute__((no_sanitize("kernel-memory")))
+#else
+#define __no_sanitize_memory
+#define __no_kmsan_checks
+#endif
+
/*
* Support for __has_feature(coverage_sanitizer) was added in Clang 13 together
* with no_sanitize("coverage"). Prior versions of Clang support coverage
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index 9b157b71036f1..f55a37efdb974 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -114,6 +114,12 @@
#define __SANITIZE_ADDRESS__
#endif

+/*
+ * GCC does not support KMSAN.
+ */
+#define __no_sanitize_memory
+#define __no_kmsan_checks
+
/*
* Turn individual warnings and errors on and off locally, depending
* on version.
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:05:02 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
noinstr functions should never be instrumented, so make KMSAN skip them
by applying the __no_sanitize_memory attribute.

Signed-off-by: Alexander Potapenko <gli...@google.com>
Reviewed-by: Marco Elver <el...@google.com>

---
v2:
-- moved this patch earlier in the series per Mark Rutland's request

Link: https://linux-review.googlesource.com/id/I3c9abe860b97b49bc0c8026918b17a50448dec0d
---
include/linux/compiler_types.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 4f2a819fd60a3..015207a6e2bf5 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -229,7 +229,8 @@ struct ftrace_likely_data {
/* Section for code which can't be instrumented at all */
#define noinstr \
noinline notrace __attribute((__section__(".noinstr.text"))) \
- __no_kcsan __no_sanitize_address __no_profile __no_sanitize_coverage
+ __no_kcsan __no_sanitize_address __no_profile __no_sanitize_coverage \
+ __no_sanitize_memory

#endif /* __KERNEL__ */

--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:05:05 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
KMSAN is going to use 3/4 of existing vmalloc space to hold the
metadata, therefore we lower VMALLOC_END to make sure vmalloc() doesn't
allocate past the first 1/4.

Signed-off-by: Alexander Potapenko <gli...@google.com>
---
v2:
-- added x86: to the title

v5:
-- add comment for VMEMORY_END

Link: https://linux-review.googlesource.com/id/I9d8b7f0a88a639f1263bc693cbd5c136626f7efd
---
arch/x86/include/asm/pgtable_64_types.h | 47 ++++++++++++++++++++++++-
arch/x86/mm/init_64.c | 2 +-
2 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 70e360a2e5fb7..04f36063ad546 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -139,7 +139,52 @@ extern unsigned int ptrs_per_p4d;
# define VMEMMAP_START __VMEMMAP_BASE_L4
#endif /* CONFIG_DYNAMIC_MEMORY_LAYOUT */

-#define VMALLOC_END (VMALLOC_START + (VMALLOC_SIZE_TB << 40) - 1)
+/*
+ * End of the region for which vmalloc page tables are pre-allocated.
+ * For non-KMSAN builds, this is the same as VMALLOC_END.
+ * For KMSAN builds, VMALLOC_START..VMEMORY_END is 4 times bigger than
+ * VMALLOC_START..VMALLOC_END (see below).
+ */
+#define VMEMORY_END (VMALLOC_START + (VMALLOC_SIZE_TB << 40) - 1)
+
+#ifndef CONFIG_KMSAN
+#define VMALLOC_END VMEMORY_END
+#else
+/*
+ * In KMSAN builds vmalloc area is four times smaller, and the remaining 3/4
+ * are used to keep the metadata for virtual pages. The memory formerly
+ * belonging to vmalloc area is now laid out as follows:
+ *
+ * 1st quarter: VMALLOC_START to VMALLOC_END - new vmalloc area
+ * 2nd quarter: KMSAN_VMALLOC_SHADOW_START to
+ * VMALLOC_END+KMSAN_VMALLOC_SHADOW_OFFSET - vmalloc area shadow
+ * 3rd quarter: KMSAN_VMALLOC_ORIGIN_START to
+ * VMALLOC_END+KMSAN_VMALLOC_ORIGIN_OFFSET - vmalloc area origins
+ * 4th quarter: KMSAN_MODULES_SHADOW_START to KMSAN_MODULES_ORIGIN_START
+ * - shadow for modules,
+ * KMSAN_MODULES_ORIGIN_START to
+ * KMSAN_MODULES_ORIGIN_START + MODULES_LEN - origins for modules.
+ */
+#define VMALLOC_QUARTER_SIZE ((VMALLOC_SIZE_TB << 40) >> 2)
+#define VMALLOC_END (VMALLOC_START + VMALLOC_QUARTER_SIZE - 1)
+
+/*
+ * vmalloc metadata addresses are calculated by adding shadow/origin offsets
+ * to vmalloc address.
+ */
+#define KMSAN_VMALLOC_SHADOW_OFFSET VMALLOC_QUARTER_SIZE
+#define KMSAN_VMALLOC_ORIGIN_OFFSET (VMALLOC_QUARTER_SIZE << 1)
+
+#define KMSAN_VMALLOC_SHADOW_START (VMALLOC_START + KMSAN_VMALLOC_SHADOW_OFFSET)
+#define KMSAN_VMALLOC_ORIGIN_START (VMALLOC_START + KMSAN_VMALLOC_ORIGIN_OFFSET)
+
+/*
+ * The shadow/origin for modules are placed one by one in the last 1/4 of
+ * vmalloc space.
+ */
+#define KMSAN_MODULES_SHADOW_START (VMALLOC_END + KMSAN_VMALLOC_ORIGIN_OFFSET + 1)
+#define KMSAN_MODULES_ORIGIN_START (KMSAN_MODULES_SHADOW_START + MODULES_LEN)
+#endif /* CONFIG_KMSAN */

#define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
/* The module sections ends with the start of the fixmap */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 0fe690ebc269b..39b6bfcaa0ed4 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1287,7 +1287,7 @@ static void __init preallocate_vmalloc_pages(void)
unsigned long addr;
const char *lvl;

- for (addr = VMALLOC_START; addr <= VMALLOC_END; addr = ALIGN(addr + 1, PGDIR_SIZE)) {
+ for (addr = VMALLOC_START; addr <= VMEMORY_END; addr = ALIGN(addr + 1, PGDIR_SIZE)) {
pgd_t *pgd = pgd_offset_k(addr);
p4d_t *p4d;
pud_t *pud;
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:05:07 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
KMSAN adds extra metadata fields to struct page, so it does not fit into
64 bytes anymore.

This change leads to increased memory consumption of the nvdimm driver,
regardless of whether the kernel is built with KMSAN or not.

Signed-off-by: Alexander Potapenko <gli...@google.com>
Reviewed-by: Marco Elver <el...@google.com>
---
Link: https://linux-review.googlesource.com/id/I353796acc6a850bfd7bb342aa1b63e616fc614f1
---
drivers/nvdimm/nd.h | 2 +-
drivers/nvdimm/pfn_devs.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index ec5219680092d..85ca5b4da3cf3 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -652,7 +652,7 @@ void devm_namespace_disable(struct device *dev,
struct nd_namespace_common *ndns);
#if IS_ENABLED(CONFIG_ND_CLAIM)
/* max struct page size independent of kernel config */
-#define MAX_STRUCT_PAGE_SIZE 64
+#define MAX_STRUCT_PAGE_SIZE 128
int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap);
#else
static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 0e92ab4b32833..61af072ac98f9 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -787,7 +787,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
* when populating the vmemmap. This *should* be equal to
* PMD_SIZE for most architectures.
*
- * Also make sure size of struct page is less than 64. We
+ * Also make sure size of struct page is less than 128. We
* want to make sure we use large enough size here so that
* we don't have a dynamic reserve space depending on
* struct page size. But we also want to make sure we notice
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:05:10 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
For each memory location KernelMemorySanitizer maintains two types of
metadata:
1. The so-called shadow of that location - а byte:byte mapping describing
whether or not individual bits of memory are initialized (shadow is 0)
or not (shadow is 1).
2. The origins of that location - а 4-byte:4-byte mapping containing
4-byte IDs of the stack traces where uninitialized values were
created.

Each struct page now contains pointers to two struct pages holding
KMSAN metadata (shadow and origins) for the original struct page.
Utility routines in mm/kmsan/core.c and mm/kmsan/shadow.c handle the
metadata creation, addressing, copying and checking.
mm/kmsan/report.c performs error reporting in the cases an uninitialized
value is used in a way that leads to undefined behavior.

KMSAN compiler instrumentation is responsible for tracking the metadata
along with the kernel memory. mm/kmsan/instrumentation.c provides the
implementation for instrumentation hooks that are called from files
compiled with -fsanitize=kernel-memory.

To aid parameter passing (also done at instrumentation level), each
task_struct now contains a struct kmsan_task_state used to track the
metadata of function parameters and return values for that task.

Finally, this patch provides CONFIG_KMSAN that enables KMSAN, and
declares CFLAGS_KMSAN, which are applied to files compiled with KMSAN.
The KMSAN_SANITIZE:=n Makefile directive can be used to completely
disable KMSAN instrumentation for certain files.

Similarly, KMSAN_ENABLE_CHECKS:=n disables KMSAN checks and makes newly
created stack memory initialized.

Users can also use functions from include/linux/kmsan-checks.h to mark
certain memory regions as uninitialized or initialized (this is called
"poisoning" and "unpoisoning") or check that a particular region is
initialized.

Signed-off-by: Alexander Potapenko <gli...@google.com>
Acked-by: Marco Elver <el...@google.com>

---
v2:
-- as requested by Greg K-H, moved hooks for different subsystems to respective patches,
rewrote the patch description;
-- addressed comments by Dmitry Vyukov;
-- added a note about KMSAN being not intended for production use.
-- fix case of unaligned dst in kmsan_internal_memmove_metadata()

v3:
-- print build IDs in reports where applicable
-- drop redundant filter_irq_stacks(), unpoison the local passed to __stack_depot_save()
-- remove a stray BUG()

v4: (mostly fixes suggested by Marco Elver)
-- add missing SPDX headers
-- move CC_IS_CLANG && CLANG_VERSION under HAVE_KMSAN_COMPILER
-- replace occurrences of |var| with @var
-- reflow KMSAN_WARN_ON(), fix comments
-- remove x86-specific code from shadow.c to improve portability
-- convert kmsan_report_lock to raw spinlock
-- add enter_runtime/exit_runtime around kmsan_internal_memmove_metadata()
-- remove unnecessary include from kmsan.h (reported by <l...@intel.com>)
-- introduce CONFIG_KMSAN_CHECK_PARAM_RETVAL (on by default), which
maps to -fsanitize-memory-param-retval and makes KMSAN eagerly check
values passed as function parameters and returned from functions.

v5:
-- do not return dummy shadow from within runtime
-- preserve shadow when calling memcpy()/memmove()/memset()
-- reword some code comments
-- reapply clang-format, switch to modern style for-loops
-- move kmsan_internal_is_vmalloc_addr() and
kmsan_internal_is_module_addr() to the header
-- refactor lib/Kconfig.kmsan as suggested by Marco Elver
-- remove forward declaration of `struct page` from this patch

v6:
-- move definitions of `struct kmsan_context_state` and `struct
kmsan_ctx` to <linux/kmsan_types.h> to avoid circular header
dependencies that manifested in -mm tree.

v7:
-- instead of printing warnings about origin chains exceeding max
depth, notify the user about truncated chains when reporting
errors.

Link: https://linux-review.googlesource.com/id/I9b71bfe3425466c97159f9de0062e5e8e4fec866
---
Makefile | 1 +
include/linux/kmsan-checks.h | 64 +++++
include/linux/kmsan_types.h | 35 +++
include/linux/mm_types.h | 12 +
include/linux/sched.h | 5 +
lib/Kconfig.debug | 1 +
lib/Kconfig.kmsan | 50 ++++
mm/Makefile | 1 +
mm/kmsan/Makefile | 23 ++
mm/kmsan/core.c | 440 +++++++++++++++++++++++++++++++++++
mm/kmsan/hooks.c | 66 ++++++
mm/kmsan/instrumentation.c | 307 ++++++++++++++++++++++++
mm/kmsan/kmsan.h | 204 ++++++++++++++++
mm/kmsan/report.c | 219 +++++++++++++++++
mm/kmsan/shadow.c | 147 ++++++++++++
scripts/Makefile.kmsan | 8 +
scripts/Makefile.lib | 9 +
17 files changed, 1592 insertions(+)
create mode 100644 include/linux/kmsan-checks.h
create mode 100644 include/linux/kmsan_types.h
create mode 100644 lib/Kconfig.kmsan
create mode 100644 mm/kmsan/Makefile
create mode 100644 mm/kmsan/core.c
create mode 100644 mm/kmsan/hooks.c
create mode 100644 mm/kmsan/instrumentation.c
create mode 100644 mm/kmsan/kmsan.h
create mode 100644 mm/kmsan/report.c
create mode 100644 mm/kmsan/shadow.c
create mode 100644 scripts/Makefile.kmsan

diff --git a/Makefile b/Makefile
index a5e9d9388649a..f0f8e912287d3 100644
--- a/Makefile
+++ b/Makefile
@@ -1015,6 +1015,7 @@ include-y := scripts/Makefile.extrawarn
include-$(CONFIG_DEBUG_INFO) += scripts/Makefile.debug
include-$(CONFIG_KASAN) += scripts/Makefile.kasan
include-$(CONFIG_KCSAN) += scripts/Makefile.kcsan
+include-$(CONFIG_KMSAN) += scripts/Makefile.kmsan
include-$(CONFIG_UBSAN) += scripts/Makefile.ubsan
include-$(CONFIG_KCOV) += scripts/Makefile.kcov
include-$(CONFIG_RANDSTRUCT) += scripts/Makefile.randstruct
diff --git a/include/linux/kmsan-checks.h b/include/linux/kmsan-checks.h
new file mode 100644
index 0000000000000..a6522a0c28df9
--- /dev/null
+++ b/include/linux/kmsan-checks.h
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * KMSAN checks to be used for one-off annotations in subsystems.
+ *
+ * Copyright (C) 2017-2022 Google LLC
+ * Author: Alexander Potapenko <gli...@google.com>
+ *
+ */
+
+#ifndef _LINUX_KMSAN_CHECKS_H
+#define _LINUX_KMSAN_CHECKS_H
+
+#include <linux/types.h>
+
+#ifdef CONFIG_KMSAN
+
+/**
+ * kmsan_poison_memory() - Mark the memory range as uninitialized.
+ * @address: address to start with.
+ * @size: size of buffer to poison.
+ * @flags: GFP flags for allocations done by this function.
+ *
+ * Until other data is written to this range, KMSAN will treat it as
+ * uninitialized. Error reports for this memory will reference the call site of
+ * kmsan_poison_memory() as origin.
+ */
+void kmsan_poison_memory(const void *address, size_t size, gfp_t flags);
+
+/**
+ * kmsan_unpoison_memory() - Mark the memory range as initialized.
+ * @address: address to start with.
+ * @size: size of buffer to unpoison.
+ *
+ * Until other data is written to this range, KMSAN will treat it as
+ * initialized.
+ */
+void kmsan_unpoison_memory(const void *address, size_t size);
+
+/**
+ * kmsan_check_memory() - Check the memory range for being initialized.
+ * @address: address to start with.
+ * @size: size of buffer to check.
+ *
+ * If any piece of the given range is marked as uninitialized, KMSAN will report
+ * an error.
+ */
+void kmsan_check_memory(const void *address, size_t size);
+
+#else
+
+static inline void kmsan_poison_memory(const void *address, size_t size,
+ gfp_t flags)
+{
+}
+static inline void kmsan_unpoison_memory(const void *address, size_t size)
+{
+}
+static inline void kmsan_check_memory(const void *address, size_t size)
+{
+}
+
+#endif
+
+#endif /* _LINUX_KMSAN_CHECKS_H */
diff --git a/include/linux/kmsan_types.h b/include/linux/kmsan_types.h
new file mode 100644
index 0000000000000..8bfa6c98176d4
--- /dev/null
+++ b/include/linux/kmsan_types.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * A minimal header declaring types added by KMSAN to existing kernel structs.
+ *
+ * Copyright (C) 2017-2022 Google LLC
+ * Author: Alexander Potapenko <gli...@google.com>
+ *
+ */
+#ifndef _LINUX_KMSAN_TYPES_H
+#define _LINUX_KMSAN_TYPES_H
+
+/* These constants are defined in the MSan LLVM instrumentation pass. */
+#define KMSAN_RETVAL_SIZE 800
+#define KMSAN_PARAM_SIZE 800
+
+struct kmsan_context_state {
+ char param_tls[KMSAN_PARAM_SIZE];
+ char retval_tls[KMSAN_RETVAL_SIZE];
+ char va_arg_tls[KMSAN_PARAM_SIZE];
+ char va_arg_origin_tls[KMSAN_PARAM_SIZE];
+ u64 va_arg_overflow_size_tls;
+ char param_origin_tls[KMSAN_PARAM_SIZE];
+ u32 retval_origin_tls;
+};
+
+#undef KMSAN_PARAM_SIZE
+#undef KMSAN_RETVAL_SIZE
+
+struct kmsan_ctx {
+ struct kmsan_context_state cstate;
+ int kmsan_in_runtime;
+ bool allow_reporting;
+};
+
+#endif /* _LINUX_KMSAN_TYPES_H */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index cf97f3884fda2..8be4f34cb8caa 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -223,6 +223,18 @@ struct page {
not kmapped, ie. highmem) */
#endif /* WANT_PAGE_VIRTUAL */

+#ifdef CONFIG_KMSAN
+ /*
+ * KMSAN metadata for this page:
+ * - shadow page: every bit indicates whether the corresponding
+ * bit of the original page is initialized (0) or not (1);
+ * - origin page: every 4 bytes contain an id of the stack trace
+ * where the uninitialized value was created.
+ */
+ struct page *kmsan_shadow;
+ struct page *kmsan_origin;
+#endif
+
#ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
int _last_cpupid;
#endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e7b2f8a5c711c..b6de1045f044e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -14,6 +14,7 @@
#include <linux/pid.h>
#include <linux/sem.h>
#include <linux/shm.h>
+#include <linux/kmsan_types.h>
#include <linux/mutex.h>
#include <linux/plist.h>
#include <linux/hrtimer.h>
@@ -1355,6 +1356,10 @@ struct task_struct {
#endif
#endif

+#ifdef CONFIG_KMSAN
+ struct kmsan_ctx kmsan_ctx;
+#endif
+
#if IS_ENABLED(CONFIG_KUNIT)
struct kunit *kunit_test;
#endif
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index bcbe60d6c80c1..ff098746c16be 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -971,6 +971,7 @@ config DEBUG_STACKOVERFLOW

source "lib/Kconfig.kasan"
source "lib/Kconfig.kfence"
+source "lib/Kconfig.kmsan"

endmenu # "Memory Debugging"

diff --git a/lib/Kconfig.kmsan b/lib/Kconfig.kmsan
new file mode 100644
index 0000000000000..5b19dbd34d76e
--- /dev/null
+++ b/lib/Kconfig.kmsan
@@ -0,0 +1,50 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config HAVE_ARCH_KMSAN
+ bool
+
+config HAVE_KMSAN_COMPILER
+ # Clang versions <14.0.0 also support -fsanitize=kernel-memory, but not
+ # all the features necessary to build the kernel with KMSAN.
+ depends on CC_IS_CLANG && CLANG_VERSION >= 140000
+ def_bool $(cc-option,-fsanitize=kernel-memory -mllvm -msan-disable-checks=1)
+
+config KMSAN
+ bool "KMSAN: detector of uninitialized values use"
+ depends on HAVE_ARCH_KMSAN && HAVE_KMSAN_COMPILER
+ depends on SLUB && DEBUG_KERNEL && !KASAN && !KCSAN
+ select STACKDEPOT
+ select STACKDEPOT_ALWAYS_INIT
+ help
+ KernelMemorySanitizer (KMSAN) is a dynamic detector of uses of
+ uninitialized values in the kernel. It is based on compiler
+ instrumentation provided by Clang and thus requires Clang to build.
+
+ An important note is that KMSAN is not intended for production use,
+ because it drastically increases kernel memory footprint and slows
+ the whole system down.
+
+ See <file:Documentation/dev-tools/kmsan.rst> for more details.
+
+if KMSAN
+
+config HAVE_KMSAN_PARAM_RETVAL
+ # -fsanitize-memory-param-retval is supported only by Clang >= 14.
+ depends on HAVE_KMSAN_COMPILER
+ def_bool $(cc-option,-fsanitize=kernel-memory -fsanitize-memory-param-retval)
+
+config KMSAN_CHECK_PARAM_RETVAL
+ bool "Check for uninitialized values passed to and returned from functions"
+ default y
+ depends on HAVE_KMSAN_PARAM_RETVAL
+ help
+ If the compiler supports -fsanitize-memory-param-retval, KMSAN will
+ eagerly check every function parameter passed by value and every
+ function return value.
+
+ Disabling KMSAN_CHECK_PARAM_RETVAL will result in tracking shadow for
+ function parameters and return values across function borders. This
+ is a more relaxed mode, but it generates more instrumentation code and
+ may potentially report errors in corner cases when non-instrumented
+ functions call instrumented ones.
+
+endif
diff --git a/mm/Makefile b/mm/Makefile
index 9a564f8364035..cce88e5b6d76f 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -89,6 +89,7 @@ obj-$(CONFIG_SLAB) += slab.o
obj-$(CONFIG_SLUB) += slub.o
obj-$(CONFIG_KASAN) += kasan/
obj-$(CONFIG_KFENCE) += kfence/
+obj-$(CONFIG_KMSAN) += kmsan/
obj-$(CONFIG_FAILSLAB) += failslab.o
obj-$(CONFIG_MEMTEST) += memtest.o
obj-$(CONFIG_MIGRATION) += migrate.o
diff --git a/mm/kmsan/Makefile b/mm/kmsan/Makefile
new file mode 100644
index 0000000000000..550ad8625e4f9
--- /dev/null
+++ b/mm/kmsan/Makefile
@@ -0,0 +1,23 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for KernelMemorySanitizer (KMSAN).
+#
+#
+obj-y := core.o instrumentation.o hooks.o report.o shadow.o
+
+KMSAN_SANITIZE := n
+KCOV_INSTRUMENT := n
+UBSAN_SANITIZE := n
+
+# Disable instrumentation of KMSAN runtime with other tools.
+CC_FLAGS_KMSAN_RUNTIME := -fno-stack-protector
+CC_FLAGS_KMSAN_RUNTIME += $(call cc-option,-fno-conserve-stack)
+CC_FLAGS_KMSAN_RUNTIME += -DDISABLE_BRANCH_PROFILING
+
+CFLAGS_REMOVE.o = $(CC_FLAGS_FTRACE)
+
+CFLAGS_core.o := $(CC_FLAGS_KMSAN_RUNTIME)
+CFLAGS_hooks.o := $(CC_FLAGS_KMSAN_RUNTIME)
+CFLAGS_instrumentation.o := $(CC_FLAGS_KMSAN_RUNTIME)
+CFLAGS_report.o := $(CC_FLAGS_KMSAN_RUNTIME)
+CFLAGS_shadow.o := $(CC_FLAGS_KMSAN_RUNTIME)
diff --git a/mm/kmsan/core.c b/mm/kmsan/core.c
new file mode 100644
index 0000000000000..5330138fda5bc
--- /dev/null
+++ b/mm/kmsan/core.c
@@ -0,0 +1,440 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN runtime library.
+ *
+ * Copyright (C) 2017-2022 Google LLC
+ * Author: Alexander Potapenko <gli...@google.com>
+ *
+ */
+
+#include <asm/page.h>
+#include <linux/compiler.h>
+#include <linux/export.h>
+#include <linux/highmem.h>
+#include <linux/interrupt.h>
+#include <linux/kernel.h>
+#include <linux/kmsan_types.h>
+#include <linux/memory.h>
+#include <linux/mm.h>
+#include <linux/mm_types.h>
+#include <linux/mmzone.h>
+#include <linux/percpu-defs.h>
+#include <linux/preempt.h>
+#include <linux/slab.h>
+#include <linux/stackdepot.h>
+#include <linux/stacktrace.h>
+#include <linux/types.h>
+#include <linux/vmalloc.h>
+
+#include "../slab.h"
+#include "kmsan.h"
+
+bool kmsan_enabled __read_mostly;
+
+/*
+ * Per-CPU KMSAN context to be used in interrupts, where current->kmsan is
+ * unavaliable.
+ */
+DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
+
+void kmsan_internal_poison_memory(void *address, size_t size, gfp_t flags,
+ unsigned int poison_flags)
+{
+ u32 extra_bits =
+ kmsan_extra_bits(/*depth*/ 0, poison_flags & KMSAN_POISON_FREE);
+ bool checked = poison_flags & KMSAN_POISON_CHECK;
+ depot_stack_handle_t handle;
+
+ handle = kmsan_save_stack_with_flags(flags, extra_bits);
+ kmsan_internal_set_shadow_origin(address, size, -1, handle, checked);
+}
+
+void kmsan_internal_unpoison_memory(void *address, size_t size, bool checked)
+{
+ kmsan_internal_set_shadow_origin(address, size, 0, 0, checked);
+}
+
+depot_stack_handle_t kmsan_save_stack_with_flags(gfp_t flags,
+ unsigned int extra)
+{
+ unsigned long entries[KMSAN_STACK_DEPTH];
+ unsigned int nr_entries;
+
+ nr_entries = stack_trace_save(entries, KMSAN_STACK_DEPTH, 0);
+
+ /* Don't sleep (see might_sleep_if() in __alloc_pages_nodemask()). */
+ flags &= ~__GFP_DIRECT_RECLAIM;
+
+ return __stack_depot_save(entries, nr_entries, extra, flags, true);
+}
+
+/* Copy the metadata following the memmove() behavior. */
+void kmsan_internal_memmove_metadata(void *dst, void *src, size_t n)
+{
+ depot_stack_handle_t old_origin = 0, new_origin = 0;
+ int src_slots, dst_slots, i, iter, step, skip_bits;
+ depot_stack_handle_t *origin_src, *origin_dst;
+ void *shadow_src, *shadow_dst;
+ u32 *align_shadow_src, shadow;
+ bool backwards;
+
+ shadow_dst = kmsan_get_metadata(dst, KMSAN_META_SHADOW);
+ if (!shadow_dst)
+ return;
+ KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(dst, n));
+
+ shadow_src = kmsan_get_metadata(src, KMSAN_META_SHADOW);
+ if (!shadow_src) {
+ /*
+ * @src is untracked: zero out destination shadow, ignore the
+ * origins, we're done.
+ */
+ __memset(shadow_dst, 0, n);
+ return;
+ }
+ KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(src, n));
+
+ __memmove(shadow_dst, shadow_src, n);
+
+ origin_dst = kmsan_get_metadata(dst, KMSAN_META_ORIGIN);
+ origin_src = kmsan_get_metadata(src, KMSAN_META_ORIGIN);
+ KMSAN_WARN_ON(!origin_dst || !origin_src);
+ src_slots = (ALIGN((u64)src + n, KMSAN_ORIGIN_SIZE) -
+ ALIGN_DOWN((u64)src, KMSAN_ORIGIN_SIZE)) /
+ KMSAN_ORIGIN_SIZE;
+ dst_slots = (ALIGN((u64)dst + n, KMSAN_ORIGIN_SIZE) -
+ ALIGN_DOWN((u64)dst, KMSAN_ORIGIN_SIZE)) /
+ KMSAN_ORIGIN_SIZE;
+ KMSAN_WARN_ON((src_slots < 1) || (dst_slots < 1));
+ KMSAN_WARN_ON((src_slots - dst_slots > 1) ||
+ (dst_slots - src_slots < -1));
+
+ backwards = dst > src;
+ i = backwards ? min(src_slots, dst_slots) - 1 : 0;
+ iter = backwards ? -1 : 1;
+
+ align_shadow_src =
+ (u32 *)ALIGN_DOWN((u64)shadow_src, KMSAN_ORIGIN_SIZE);
+ for (step = 0; step < min(src_slots, dst_slots); step++, i += iter) {
+ KMSAN_WARN_ON(i < 0);
+ shadow = align_shadow_src[i];
+ if (i == 0) {
+ /*
+ * If @src isn't aligned on KMSAN_ORIGIN_SIZE, don't
+ * look at the first @src % KMSAN_ORIGIN_SIZE bytes
+ * of the first shadow slot.
+ */
+ skip_bits = ((u64)src % KMSAN_ORIGIN_SIZE) * 8;
+ shadow = (shadow >> skip_bits) << skip_bits;
+ }
+ if (i == src_slots - 1) {
+ /*
+ * If @src + n isn't aligned on
+ * KMSAN_ORIGIN_SIZE, don't look at the last
+ * (@src + n) % KMSAN_ORIGIN_SIZE bytes of the
+ * last shadow slot.
+ */
+ skip_bits = (((u64)src + n) % KMSAN_ORIGIN_SIZE) * 8;
+ shadow = (shadow << skip_bits) >> skip_bits;
+ }
+ /*
+ * Overwrite the origin only if the corresponding
+ * shadow is nonempty.
+ */
+ if (origin_src[i] && (origin_src[i] != old_origin) && shadow) {
+ old_origin = origin_src[i];
+ new_origin = kmsan_internal_chain_origin(old_origin);
+ /*
+ * kmsan_internal_chain_origin() may return
+ * NULL, but we don't want to lose the previous
+ * origin value.
+ */
+ if (!new_origin)
+ new_origin = old_origin;
+ }
+ if (shadow)
+ origin_dst[i] = new_origin;
+ else
+ origin_dst[i] = 0;
+ }
+ /*
+ * If dst_slots is greater than src_slots (i.e.
+ * dst_slots == src_slots + 1), there is an extra origin slot at the
+ * beginning or end of the destination buffer, for which we take the
+ * origin from the previous slot.
+ * This is only done if the part of the source shadow corresponding to
+ * slot is non-zero.
+ *
+ * E.g. if we copy 8 aligned bytes that are marked as uninitialized
+ * and have origins o111 and o222, to an unaligned buffer with offset 1,
+ * these two origins are copied to three origin slots, so one of then
+ * needs to be duplicated, depending on the copy direction (@backwards)
+ *
+ * src shadow: |uuuu|uuuu|....|
+ * src origin: |o111|o222|....|
+ *
+ * backwards = 0:
+ * dst shadow: |.uuu|uuuu|u...|
+ * dst origin: |....|o111|o222| - fill the empty slot with o111
+ * backwards = 1:
+ * dst shadow: |.uuu|uuuu|u...|
+ * dst origin: |o111|o222|....| - fill the empty slot with o222
+ */
+ if (src_slots < dst_slots) {
+ if (backwards) {
+ shadow = align_shadow_src[src_slots - 1];
+ skip_bits = (((u64)dst + n) % KMSAN_ORIGIN_SIZE) * 8;
+ shadow = (shadow << skip_bits) >> skip_bits;
+ if (shadow)
+ /* src_slots > 0, therefore dst_slots is at least 2 */
+ origin_dst[dst_slots - 1] =
+ origin_dst[dst_slots - 2];
+ } else {
+ shadow = align_shadow_src[0];
+ skip_bits = ((u64)dst % KMSAN_ORIGIN_SIZE) * 8;
+ shadow = (shadow >> skip_bits) << skip_bits;
+ if (shadow)
+ origin_dst[0] = origin_dst[1];
+ }
+ }
+}
+
+depot_stack_handle_t kmsan_internal_chain_origin(depot_stack_handle_t id)
+{
+ unsigned long entries[3];
+ u32 extra_bits;
+ int depth;
+ bool uaf;
+
+ if (!id)
+ return id;
+ /*
+ * Make sure we have enough spare bits in @id to hold the UAF bit and
+ * the chain depth.
+ */
+ BUILD_BUG_ON(
+ (1 << STACK_DEPOT_EXTRA_BITS) <= (KMSAN_MAX_ORIGIN_DEPTH << 1));
+
+ extra_bits = stack_depot_get_extra_bits(id);
+ depth = kmsan_depth_from_eb(extra_bits);
+ uaf = kmsan_uaf_from_eb(extra_bits);
+
+ /*
+ * Stop chaining origins once the depth reached KMSAN_MAX_ORIGIN_DEPTH.
+ * This mostly happens in the case structures with uninitialized padding
+ * are copied around many times. Origin chains for such structures are
+ * usually periodic, and it does not make sense to fully store them.
+ */
+ if (depth == KMSAN_MAX_ORIGIN_DEPTH)
+ return id;
+
+ depth++;
+ extra_bits = kmsan_extra_bits(depth, uaf);
+
+ entries[0] = KMSAN_CHAIN_MAGIC_ORIGIN;
+ entries[1] = kmsan_save_stack_with_flags(GFP_ATOMIC, 0);
+ entries[2] = id;
+ /*
+ * @entries is a local var in non-instrumented code, so KMSAN does not
+ * know it is initialized. Explicitly unpoison it to avoid false
+ * positives when __stack_depot_save() passes it to instrumented code.
+ */
+ kmsan_internal_unpoison_memory(entries, sizeof(entries), false);
+ return __stack_depot_save(entries, ARRAY_SIZE(entries), extra_bits,
+ GFP_ATOMIC, true);
+}
+
+void kmsan_internal_set_shadow_origin(void *addr, size_t size, int b,
+ u32 origin, bool checked)
+{
+ u64 address = (u64)addr;
+ void *shadow_start;
+ u32 *origin_start;
+ size_t pad = 0;
+
+ KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(addr, size));
+ shadow_start = kmsan_get_metadata(addr, KMSAN_META_SHADOW);
+ if (!shadow_start) {
+ /*
+ * kmsan_metadata_is_contiguous() is true, so either all shadow
+ * and origin pages are NULL, or all are non-NULL.
+ */
+ if (checked) {
+ pr_err("%s: not memsetting %ld bytes starting at %px, because the shadow is NULL\n",
+ __func__, size, addr);
+ KMSAN_WARN_ON(true);
+ }
+ return;
+ }
+ __memset(shadow_start, b, size);
+
+ if (!IS_ALIGNED(address, KMSAN_ORIGIN_SIZE)) {
+ pad = address % KMSAN_ORIGIN_SIZE;
+ address -= pad;
+ size += pad;
+ }
+ size = ALIGN(size, KMSAN_ORIGIN_SIZE);
+ origin_start =
+ (u32 *)kmsan_get_metadata((void *)address, KMSAN_META_ORIGIN);
+
+ for (int i = 0; i < size / KMSAN_ORIGIN_SIZE; i++)
+ origin_start[i] = origin;
+}
+
+struct page *kmsan_vmalloc_to_page_or_null(void *vaddr)
+{
+ struct page *page;
+
+ if (!kmsan_internal_is_vmalloc_addr(vaddr) &&
+ !kmsan_internal_is_module_addr(vaddr))
+ return NULL;
+ page = vmalloc_to_page(vaddr);
+ if (pfn_valid(page_to_pfn(page)))
+ return page;
+ else
+ return NULL;
+}
+
+void kmsan_internal_check_memory(void *addr, size_t size, const void *user_addr,
+ int reason)
+{
+ depot_stack_handle_t cur_origin = 0, new_origin = 0;
+ unsigned long addr64 = (unsigned long)addr;
+ depot_stack_handle_t *origin = NULL;
+ unsigned char *shadow = NULL;
+ int cur_off_start = -1;
+ int chunk_size;
+ size_t pos = 0;
+
+ if (!size)
+ return;
+ KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(addr, size));
+ while (pos < size) {
+ chunk_size = min(size - pos,
+ PAGE_SIZE - ((addr64 + pos) % PAGE_SIZE));
+ shadow = kmsan_get_metadata((void *)(addr64 + pos),
+ KMSAN_META_SHADOW);
+ if (!shadow) {
+ /*
+ * This page is untracked. If there were uninitialized
+ * bytes before, report them.
+ */
+ if (cur_origin) {
+ kmsan_enter_runtime();
+ kmsan_report(cur_origin, addr, size,
+ cur_off_start, pos - 1, user_addr,
+ reason);
+ kmsan_leave_runtime();
+ }
+ cur_origin = 0;
+ cur_off_start = -1;
+ pos += chunk_size;
+ continue;
+ }
+ for (int i = 0; i < chunk_size; i++) {
+ if (!shadow[i]) {
+ /*
+ * This byte is unpoisoned. If there were
+ * poisoned bytes before, report them.
+ */
+ if (cur_origin) {
+ kmsan_enter_runtime();
+ kmsan_report(cur_origin, addr, size,
+ cur_off_start, pos + i - 1,
+ user_addr, reason);
+ kmsan_leave_runtime();
+ }
+ cur_origin = 0;
+ cur_off_start = -1;
+ continue;
+ }
+ origin = kmsan_get_metadata((void *)(addr64 + pos + i),
+ KMSAN_META_ORIGIN);
+ KMSAN_WARN_ON(!origin);
+ new_origin = *origin;
+ /*
+ * Encountered new origin - report the previous
+ * uninitialized range.
+ */
+ if (cur_origin != new_origin) {
+ if (cur_origin) {
+ kmsan_enter_runtime();
+ kmsan_report(cur_origin, addr, size,
+ cur_off_start, pos + i - 1,
+ user_addr, reason);
+ kmsan_leave_runtime();
+ }
+ cur_origin = new_origin;
+ cur_off_start = pos + i;
+ }
+ }
+ pos += chunk_size;
+ }
+ KMSAN_WARN_ON(pos != size);
+ if (cur_origin) {
+ kmsan_enter_runtime();
+ kmsan_report(cur_origin, addr, size, cur_off_start, pos - 1,
+ user_addr, reason);
+ kmsan_leave_runtime();
+ }
+}
+
+bool kmsan_metadata_is_contiguous(void *addr, size_t size)
+{
+ char *cur_shadow = NULL, *next_shadow = NULL, *cur_origin = NULL,
+ *next_origin = NULL;
+ u64 cur_addr = (u64)addr, next_addr = cur_addr + PAGE_SIZE;
+ depot_stack_handle_t *origin_p;
+ bool all_untracked = false;
+
+ if (!size)
+ return true;
+
+ /* The whole range belongs to the same page. */
+ if (ALIGN_DOWN(cur_addr + size - 1, PAGE_SIZE) ==
+ ALIGN_DOWN(cur_addr, PAGE_SIZE))
+ return true;
+
+ cur_shadow = kmsan_get_metadata((void *)cur_addr, /*is_origin*/ false);
+ if (!cur_shadow)
+ all_untracked = true;
+ cur_origin = kmsan_get_metadata((void *)cur_addr, /*is_origin*/ true);
+ if (all_untracked && cur_origin)
+ goto report;
+
+ for (; next_addr < (u64)addr + size;
+ cur_addr = next_addr, cur_shadow = next_shadow,
+ cur_origin = next_origin, next_addr += PAGE_SIZE) {
+ next_shadow = kmsan_get_metadata((void *)next_addr, false);
+ next_origin = kmsan_get_metadata((void *)next_addr, true);
+ if (all_untracked) {
+ if (next_shadow || next_origin)
+ goto report;
+ if (!next_shadow && !next_origin)
+ continue;
+ }
+ if (((u64)cur_shadow == ((u64)next_shadow - PAGE_SIZE)) &&
+ ((u64)cur_origin == ((u64)next_origin - PAGE_SIZE)))
+ continue;
+ goto report;
+ }
+ return true;
+
+report:
+ pr_err("%s: attempting to access two shadow page ranges.\n", __func__);
+ pr_err("Access of size %ld at %px.\n", size, addr);
+ pr_err("Addresses belonging to different ranges: %px and %px\n",
+ (void *)cur_addr, (void *)next_addr);
+ pr_err("page[0].shadow: %px, page[1].shadow: %px\n", cur_shadow,
+ next_shadow);
+ pr_err("page[0].origin: %px, page[1].origin: %px\n", cur_origin,
+ next_origin);
+ origin_p = kmsan_get_metadata(addr, KMSAN_META_ORIGIN);
+ if (origin_p) {
+ pr_err("Origin: %08x\n", *origin_p);
+ kmsan_print_origin(*origin_p);
+ } else {
+ pr_err("Origin: unavailable\n");
+ }
+ return false;
+}
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
new file mode 100644
index 0000000000000..4ac62fa67a02a
--- /dev/null
+++ b/mm/kmsan/hooks.c
@@ -0,0 +1,66 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN hooks for kernel subsystems.
+ *
+ * These functions handle creation of KMSAN metadata for memory allocations.
+ *
+ * Copyright (C) 2018-2022 Google LLC
+ * Author: Alexander Potapenko <gli...@google.com>
+ *
+ */
+
+#include <linux/cacheflush.h>
+#include <linux/gfp.h>
+#include <linux/mm.h>
+#include <linux/mm_types.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include "../internal.h"
+#include "../slab.h"
+#include "kmsan.h"
+
+/*
+ * Instrumented functions shouldn't be called under
+ * kmsan_enter_runtime()/kmsan_leave_runtime(), because this will lead to
+ * skipping effects of functions like memset() inside instrumented code.
+ */
+
+/* Functions from kmsan-checks.h follow. */
+void kmsan_poison_memory(const void *address, size_t size, gfp_t flags)
+{
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ kmsan_enter_runtime();
+ /* The users may want to poison/unpoison random memory. */
+ kmsan_internal_poison_memory((void *)address, size, flags,
+ KMSAN_POISON_NOCHECK);
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_poison_memory);
+
+void kmsan_unpoison_memory(const void *address, size_t size)
+{
+ unsigned long ua_flags;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ ua_flags = user_access_save();
+ kmsan_enter_runtime();
+ /* The users may want to poison/unpoison random memory. */
+ kmsan_internal_unpoison_memory((void *)address, size,
+ KMSAN_POISON_NOCHECK);
+ kmsan_leave_runtime();
+ user_access_restore(ua_flags);
+}
+EXPORT_SYMBOL(kmsan_unpoison_memory);
+
+void kmsan_check_memory(const void *addr, size_t size)
+{
+ if (!kmsan_enabled)
+ return;
+ return kmsan_internal_check_memory((void *)addr, size, /*user_addr*/ 0,
+ REASON_ANY);
+}
+EXPORT_SYMBOL(kmsan_check_memory);
diff --git a/mm/kmsan/instrumentation.c b/mm/kmsan/instrumentation.c
new file mode 100644
index 0000000000000..280d154132684
--- /dev/null
+++ b/mm/kmsan/instrumentation.c
@@ -0,0 +1,307 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN compiler API.
+ *
+ * This file implements __msan_XXX hooks that Clang inserts into the code
+ * compiled with -fsanitize=kernel-memory.
+ * See Documentation/dev-tools/kmsan.rst for more information on how KMSAN
+ * instrumentation works.
+ *
+ * Copyright (C) 2017-2022 Google LLC
+ * Author: Alexander Potapenko <gli...@google.com>
+ *
+ */
+
+#include "kmsan.h"
+#include <linux/gfp.h>
+#include <linux/mm.h>
+#include <linux/uaccess.h>
+
+static inline bool is_bad_asm_addr(void *addr, uintptr_t size, bool is_store)
+{
+ if ((u64)addr < TASK_SIZE)
+ return true;
+ if (!kmsan_get_metadata(addr, KMSAN_META_SHADOW))
+ return true;
+ return false;
+}
+
+static inline struct shadow_origin_ptr
+get_shadow_origin_ptr(void *addr, u64 size, bool store)
+{
+ unsigned long ua_flags = user_access_save();
+ struct shadow_origin_ptr ret;
+
+ ret = kmsan_get_shadow_origin_ptr(addr, size, store);
+ user_access_restore(ua_flags);
+ return ret;
+}
+
+/* Get shadow and origin pointers for a memory load with non-standard size. */
+struct shadow_origin_ptr __msan_metadata_ptr_for_load_n(void *addr,
+ uintptr_t size)
+{
+ return get_shadow_origin_ptr(addr, size, /*store*/ false);
+}
+EXPORT_SYMBOL(__msan_metadata_ptr_for_load_n);
+
+/* Get shadow and origin pointers for a memory store with non-standard size. */
+struct shadow_origin_ptr __msan_metadata_ptr_for_store_n(void *addr,
+ uintptr_t size)
+{
+ return get_shadow_origin_ptr(addr, size, /*store*/ true);
+}
+EXPORT_SYMBOL(__msan_metadata_ptr_for_store_n);
+
+/*
+ * Declare functions that obtain shadow/origin pointers for loads and stores
+ * with fixed size.
+ */
+#define DECLARE_METADATA_PTR_GETTER(size) \
+ struct shadow_origin_ptr __msan_metadata_ptr_for_load_##size( \
+ void *addr) \
+ { \
+ return get_shadow_origin_ptr(addr, size, /*store*/ false); \
+ } \
+ EXPORT_SYMBOL(__msan_metadata_ptr_for_load_##size); \
+ struct shadow_origin_ptr __msan_metadata_ptr_for_store_##size( \
+ void *addr) \
+ { \
+ return get_shadow_origin_ptr(addr, size, /*store*/ true); \
+ } \
+ EXPORT_SYMBOL(__msan_metadata_ptr_for_store_##size)
+
+DECLARE_METADATA_PTR_GETTER(1);
+DECLARE_METADATA_PTR_GETTER(2);
+DECLARE_METADATA_PTR_GETTER(4);
+DECLARE_METADATA_PTR_GETTER(8);
+
+/*
+ * Handle a memory store performed by inline assembly. KMSAN conservatively
+ * attempts to unpoison the outputs of asm() directives to prevent false
+ * positives caused by missed stores.
+ */
+void __msan_instrument_asm_store(void *addr, uintptr_t size)
+{
+ unsigned long ua_flags;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ ua_flags = user_access_save();
+ /*
+ * Most of the accesses are below 32 bytes. The two exceptions so far
+ * are clwb() (64 bytes) and FPU state (512 bytes).
+ * It's unlikely that the assembly will touch more than 512 bytes.
+ */
+ if (size > 512) {
+ WARN_ONCE(1, "assembly store size too big: %ld\n", size);
+ size = 8;
+ }
+ if (is_bad_asm_addr(addr, size, /*is_store*/ true)) {
+ user_access_restore(ua_flags);
+ return;
+ }
+ kmsan_enter_runtime();
+ /* Unpoisoning the memory on best effort. */
+ kmsan_internal_unpoison_memory(addr, size, /*checked*/ false);
+ kmsan_leave_runtime();
+ user_access_restore(ua_flags);
+}
+EXPORT_SYMBOL(__msan_instrument_asm_store);
+
+/*
+ * KMSAN instrumentation pass replaces LLVM memcpy, memmove and memset
+ * intrinsics with calls to respective __msan_ functions. We use
+ * get_param0_metadata() and set_retval_metadata() to store the shadow/origin
+ * values for the destination argument of these functions and use them for the
+ * functions' return values.
+ */
+static inline void get_param0_metadata(u64 *shadow,
+ depot_stack_handle_t *origin)
+{
+ struct kmsan_ctx *ctx = kmsan_get_context();
+
+ *shadow = *(u64 *)(ctx->cstate.param_tls);
+ *origin = ctx->cstate.param_origin_tls[0];
+}
+
+static inline void set_retval_metadata(u64 shadow, depot_stack_handle_t origin)
+{
+ struct kmsan_ctx *ctx = kmsan_get_context();
+
+ *(u64 *)(ctx->cstate.retval_tls) = shadow;
+ ctx->cstate.retval_origin_tls = origin;
+}
+
+/* Handle llvm.memmove intrinsic. */
+void *__msan_memmove(void *dst, const void *src, uintptr_t n)
+{
+ depot_stack_handle_t origin;
+ void *result;
+ u64 shadow;
+
+ get_param0_metadata(&shadow, &origin);
+ result = __memmove(dst, src, n);
+ if (!n)
+ /* Some people call memmove() with zero length. */
+ return result;
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return result;
+
+ kmsan_enter_runtime();
+ kmsan_internal_memmove_metadata(dst, (void *)src, n);
+ kmsan_leave_runtime();
+
+ set_retval_metadata(shadow, origin);
+ return result;
+}
+EXPORT_SYMBOL(__msan_memmove);
+
+/* Handle llvm.memcpy intrinsic. */
+void *__msan_memcpy(void *dst, const void *src, uintptr_t n)
+{
+ depot_stack_handle_t origin;
+ void *result;
+ u64 shadow;
+
+ get_param0_metadata(&shadow, &origin);
+ result = __memcpy(dst, src, n);
+ if (!n)
+ /* Some people call memcpy() with zero length. */
+ return result;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return result;
+
+ kmsan_enter_runtime();
+ /* Using memmove instead of memcpy doesn't affect correctness. */
+ kmsan_internal_memmove_metadata(dst, (void *)src, n);
+ kmsan_leave_runtime();
+
+ set_retval_metadata(shadow, origin);
+ return result;
+}
+EXPORT_SYMBOL(__msan_memcpy);
+
+/* Handle llvm.memset intrinsic. */
+void *__msan_memset(void *dst, int c, uintptr_t n)
+{
+ depot_stack_handle_t origin;
+ void *result;
+ u64 shadow;
+
+ get_param0_metadata(&shadow, &origin);
+ result = __memset(dst, c, n);
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return result;
+
+ kmsan_enter_runtime();
+ /*
+ * Clang doesn't pass parameter metadata here, so it is impossible to
+ * use shadow of @c to set up the shadow for @dst.
+ */
+ kmsan_internal_unpoison_memory(dst, n, /*checked*/ false);
+ kmsan_leave_runtime();
+
+ set_retval_metadata(shadow, origin);
+ return result;
+}
+EXPORT_SYMBOL(__msan_memset);
+
+/*
+ * Create a new origin from an old one. This is done when storing an
+ * uninitialized value to memory. When reporting an error, KMSAN unrolls and
+ * prints the whole chain of stores that preceded the use of this value.
+ */
+depot_stack_handle_t __msan_chain_origin(depot_stack_handle_t origin)
+{
+ depot_stack_handle_t ret = 0;
+ unsigned long ua_flags;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return ret;
+
+ ua_flags = user_access_save();
+
+ /* Creating new origins may allocate memory. */
+ kmsan_enter_runtime();
+ ret = kmsan_internal_chain_origin(origin);
+ kmsan_leave_runtime();
+ user_access_restore(ua_flags);
+ return ret;
+}
+EXPORT_SYMBOL(__msan_chain_origin);
+
+/* Poison a local variable when entering a function. */
+void __msan_poison_alloca(void *address, uintptr_t size, char *descr)
+{
+ depot_stack_handle_t handle;
+ unsigned long entries[4];
+ unsigned long ua_flags;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ ua_flags = user_access_save();
+ entries[0] = KMSAN_ALLOCA_MAGIC_ORIGIN;
+ entries[1] = (u64)descr;
+ entries[2] = (u64)__builtin_return_address(0);
+ /*
+ * With frame pointers enabled, it is possible to quickly fetch the
+ * second frame of the caller stack without calling the unwinder.
+ * Without them, simply do not bother.
+ */
+ if (IS_ENABLED(CONFIG_UNWINDER_FRAME_POINTER))
+ entries[3] = (u64)__builtin_return_address(1);
+ else
+ entries[3] = 0;
+
+ /* stack_depot_save() may allocate memory. */
+ kmsan_enter_runtime();
+ handle = stack_depot_save(entries, ARRAY_SIZE(entries), GFP_ATOMIC);
+ kmsan_leave_runtime();
+
+ kmsan_internal_set_shadow_origin(address, size, -1, handle,
+ /*checked*/ true);
+ user_access_restore(ua_flags);
+}
+EXPORT_SYMBOL(__msan_poison_alloca);
+
+/* Unpoison a local variable. */
+void __msan_unpoison_alloca(void *address, uintptr_t size)
+{
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ kmsan_enter_runtime();
+ kmsan_internal_unpoison_memory(address, size, /*checked*/ true);
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(__msan_unpoison_alloca);
+
+/*
+ * Report that an uninitialized value with the given origin was used in a way
+ * that constituted undefined behavior.
+ */
+void __msan_warning(u32 origin)
+{
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ kmsan_enter_runtime();
+ kmsan_report(origin, /*address*/ 0, /*size*/ 0,
+ /*off_first*/ 0, /*off_last*/ 0, /*user_addr*/ 0,
+ REASON_ANY);
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(__msan_warning);
+
+/*
+ * At the beginning of an instrumented function, obtain the pointer to
+ * `struct kmsan_context_state` holding the metadata for function parameters.
+ */
+struct kmsan_context_state *__msan_get_context_state(void)
+{
+ return &kmsan_get_context()->cstate;
+}
+EXPORT_SYMBOL(__msan_get_context_state);
diff --git a/mm/kmsan/kmsan.h b/mm/kmsan/kmsan.h
new file mode 100644
index 0000000000000..97d48b45dba58
--- /dev/null
+++ b/mm/kmsan/kmsan.h
@@ -0,0 +1,204 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Functions used by the KMSAN runtime.
+ *
+ * Copyright (C) 2017-2022 Google LLC
+ * Author: Alexander Potapenko <gli...@google.com>
+ *
+ */
+
+#ifndef __MM_KMSAN_KMSAN_H
+#define __MM_KMSAN_KMSAN_H
+
+#include <asm/pgtable_64_types.h>
+#include <linux/irqflags.h>
+#include <linux/sched.h>
+#include <linux/stackdepot.h>
+#include <linux/stacktrace.h>
+#include <linux/nmi.h>
+#include <linux/mm.h>
+#include <linux/printk.h>
+
+#define KMSAN_ALLOCA_MAGIC_ORIGIN 0xabcd0100
+#define KMSAN_CHAIN_MAGIC_ORIGIN 0xabcd0200
+
+#define KMSAN_POISON_NOCHECK 0x0
+#define KMSAN_POISON_CHECK 0x1
+#define KMSAN_POISON_FREE 0x2
+
+#define KMSAN_ORIGIN_SIZE 4
+#define KMSAN_MAX_ORIGIN_DEPTH 7
+
+#define KMSAN_STACK_DEPTH 64
+
+#define KMSAN_META_SHADOW (false)
+#define KMSAN_META_ORIGIN (true)
+
+extern bool kmsan_enabled;
+extern int panic_on_kmsan;
+
+/*
+ * KMSAN performs a lot of consistency checks that are currently enabled by
+ * default. BUG_ON is normally discouraged in the kernel, unless used for
+ * debugging, but KMSAN itself is a debugging tool, so it makes little sense to
+ * recover if something goes wrong.
+ */
+#define KMSAN_WARN_ON(cond) \
+ ({ \
+ const bool __cond = WARN_ON(cond); \
+ if (unlikely(__cond)) { \
+ WRITE_ONCE(kmsan_enabled, false); \
+ if (panic_on_kmsan) { \
+ /* Can't call panic() here because */ \
+ /* of uaccess checks. */ \
+ BUG(); \
+ } \
+ } \
+ __cond; \
+ })
+
+/*
+ * A pair of metadata pointers to be returned by the instrumentation functions.
+ */
+struct shadow_origin_ptr {
+ void *shadow, *origin;
+};
+
+struct shadow_origin_ptr kmsan_get_shadow_origin_ptr(void *addr, u64 size,
+ bool store);
+void *kmsan_get_metadata(void *addr, bool is_origin);
+
+enum kmsan_bug_reason {
+ REASON_ANY,
+ REASON_COPY_TO_USER,
+ REASON_SUBMIT_URB,
+};
+
+void kmsan_print_origin(depot_stack_handle_t origin);
+
+/**
+ * kmsan_report() - Report a use of uninitialized value.
+ * @origin: Stack ID of the uninitialized value.
+ * @address: Address at which the memory access happens.
+ * @size: Memory access size.
+ * @off_first: Offset (from @address) of the first byte to be reported.
+ * @off_last: Offset (from @address) of the last byte to be reported.
+ * @user_addr: When non-NULL, denotes the userspace address to which the kernel
+ * is leaking data.
+ * @reason: Error type from enum kmsan_bug_reason.
+ *
+ * kmsan_report() prints an error message for a consequent group of bytes
+ * sharing the same origin. If an uninitialized value is used in a comparison,
+ * this function is called once without specifying the addresses. When checking
+ * a memory range, KMSAN may call kmsan_report() multiple times with the same
+ * @address, @size, @user_addr and @reason, but different @off_first and
+ * @off_last corresponding to different @origin values.
+ */
+void kmsan_report(depot_stack_handle_t origin, void *address, int size,
+ int off_first, int off_last, const void *user_addr,
+ enum kmsan_bug_reason reason);
+
+DECLARE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
+
+static __always_inline struct kmsan_ctx *kmsan_get_context(void)
+{
+ return in_task() ? &current->kmsan_ctx : raw_cpu_ptr(&kmsan_percpu_ctx);
+}
+
+/*
+ * When a compiler hook or KMSAN runtime function is invoked, it may make a
+ * call to instrumented code and eventually call itself recursively. To avoid
+ * that, we guard the runtime entry regions with
+ * kmsan_enter_runtime()/kmsan_leave_runtime() and exit the hook if
+ * kmsan_in_runtime() is true.
+ *
+ * Non-runtime code may occasionally get executed in nested IRQs from the
+ * runtime code (e.g. when called via smp_call_function_single()). Because some
+ * KMSAN routines may take locks (e.g. for memory allocation), we conservatively
+ * bail out instead of calling them. To minimize the effect of this (potentially
+ * missing initialization events) kmsan_in_runtime() is not checked in
+ * non-blocking runtime functions.
+ */
+static __always_inline bool kmsan_in_runtime(void)
+{
+ if ((hardirq_count() >> HARDIRQ_SHIFT) > 1)
+ return true;
+ return kmsan_get_context()->kmsan_in_runtime;
+}
+
+static __always_inline void kmsan_enter_runtime(void)
+{
+ struct kmsan_ctx *ctx;
+
+ ctx = kmsan_get_context();
+ KMSAN_WARN_ON(ctx->kmsan_in_runtime++);
+}
+
+static __always_inline void kmsan_leave_runtime(void)
+{
+ struct kmsan_ctx *ctx = kmsan_get_context();
+
+ KMSAN_WARN_ON(--ctx->kmsan_in_runtime);
+}
+
+depot_stack_handle_t kmsan_save_stack(void);
+depot_stack_handle_t kmsan_save_stack_with_flags(gfp_t flags,
+ unsigned int extra_bits);
+
+/*
+ * Pack and unpack the origin chain depth and UAF flag to/from the extra bits
+ * provided by the stack depot.
+ * The UAF flag is stored in the lowest bit, followed by the depth in the upper
+ * bits.
+ * set_dsh_extra_bits() is responsible for clamping the value.
+ */
+static __always_inline unsigned int kmsan_extra_bits(unsigned int depth,
+ bool uaf)
+{
+ return (depth << 1) | uaf;
+}
+
+static __always_inline bool kmsan_uaf_from_eb(unsigned int extra_bits)
+{
+ return extra_bits & 1;
+}
+
+static __always_inline unsigned int kmsan_depth_from_eb(unsigned int extra_bits)
+{
+ return extra_bits >> 1;
+}
+
+/*
+ * kmsan_internal_ functions are supposed to be very simple and not require the
+ * kmsan_in_runtime() checks.
+ */
+void kmsan_internal_memmove_metadata(void *dst, void *src, size_t n);
+void kmsan_internal_poison_memory(void *address, size_t size, gfp_t flags,
+ unsigned int poison_flags);
+void kmsan_internal_unpoison_memory(void *address, size_t size, bool checked);
+void kmsan_internal_set_shadow_origin(void *address, size_t size, int b,
+ u32 origin, bool checked);
+depot_stack_handle_t kmsan_internal_chain_origin(depot_stack_handle_t id);
+
+bool kmsan_metadata_is_contiguous(void *addr, size_t size);
+void kmsan_internal_check_memory(void *addr, size_t size, const void *user_addr,
+ int reason);
+
+struct page *kmsan_vmalloc_to_page_or_null(void *vaddr);
+
+/*
+ * kmsan_internal_is_module_addr() and kmsan_internal_is_vmalloc_addr() are
+ * non-instrumented versions of is_module_address() and is_vmalloc_addr() that
+ * are safe to call from KMSAN runtime without recursion.
+ */
+static inline bool kmsan_internal_is_module_addr(void *vaddr)
+{
+ return ((u64)vaddr >= MODULES_VADDR) && ((u64)vaddr < MODULES_END);
+}
+
+static inline bool kmsan_internal_is_vmalloc_addr(void *addr)
+{
+ return ((u64)addr >= VMALLOC_START) && ((u64)addr < VMALLOC_END);
+}
+
+#endif /* __MM_KMSAN_KMSAN_H */
diff --git a/mm/kmsan/report.c b/mm/kmsan/report.c
new file mode 100644
index 0000000000000..02736ec757f2c
--- /dev/null
+++ b/mm/kmsan/report.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN error reporting routines.
+ *
+ * Copyright (C) 2019-2022 Google LLC
+ * Author: Alexander Potapenko <gli...@google.com>
+ *
+ */
+
+#include <linux/console.h>
+#include <linux/moduleparam.h>
+#include <linux/stackdepot.h>
+#include <linux/stacktrace.h>
+#include <linux/uaccess.h>
+
+#include "kmsan.h"
+
+static DEFINE_RAW_SPINLOCK(kmsan_report_lock);
+#define DESCR_SIZE 128
+/* Protected by kmsan_report_lock */
+static char report_local_descr[DESCR_SIZE];
+int panic_on_kmsan __read_mostly;
+
+#ifdef MODULE_PARAM_PREFIX
+#undef MODULE_PARAM_PREFIX
+#endif
+#define MODULE_PARAM_PREFIX "kmsan."
+module_param_named(panic, panic_on_kmsan, int, 0);
+
+/*
+ * Skip internal KMSAN frames.
+ */
+static int get_stack_skipnr(const unsigned long stack_entries[],
+ int num_entries)
+{
+ int len, skip;
+ char buf[64];
+
+ for (skip = 0; skip < num_entries; ++skip) {
+ len = scnprintf(buf, sizeof(buf), "%ps",
+ (void *)stack_entries[skip]);
+
+ /* Never show __msan_* or kmsan_* functions. */
+ if ((strnstr(buf, "__msan_", len) == buf) ||
+ (strnstr(buf, "kmsan_", len) == buf))
+ continue;
+
+ /*
+ * No match for runtime functions -- @skip entries to skip to
+ * get to first frame of interest.
+ */
+ break;
+ }
+
+ return skip;
+}
+
+/*
+ * Currently the descriptions of locals generated by Clang look as follows:
+ * ----local_name@function_name
+ * We want to print only the name of the local, as other information in that
+ * description can be confusing.
+ * The meaningful part of the description is copied to a global buffer to avoid
+ * allocating memory.
+ */
+static char *pretty_descr(char *descr)
+{
+ int pos = 0, len = strlen(descr);
+
+ for (int i = 0; i < len; i++) {
+ if (descr[i] == '@')
+ break;
+ if (descr[i] == '-')
+ continue;
+ report_local_descr[pos] = descr[i];
+ if (pos + 1 == DESCR_SIZE)
+ break;
+ pos++;
+ }
+ report_local_descr[pos] = 0;
+ return report_local_descr;
+}
+
+void kmsan_print_origin(depot_stack_handle_t origin)
+{
+ unsigned long *entries = NULL, *chained_entries = NULL;
+ unsigned int nr_entries, chained_nr_entries, skipnr;
+ void *pc1 = NULL, *pc2 = NULL;
+ depot_stack_handle_t head;
+ unsigned long magic;
+ char *descr = NULL;
+ unsigned int depth;
+
+ if (!origin)
+ return;
+
+ while (true) {
+ nr_entries = stack_depot_fetch(origin, &entries);
+ depth = kmsan_depth_from_eb(stack_depot_get_extra_bits(origin));
+ magic = nr_entries ? entries[0] : 0;
+ if ((nr_entries == 4) && (magic == KMSAN_ALLOCA_MAGIC_ORIGIN)) {
+ descr = (char *)entries[1];
+ pc1 = (void *)entries[2];
+ pc2 = (void *)entries[3];
+ pr_err("Local variable %s created at:\n",
+ pretty_descr(descr));
+ if (pc1)
+ pr_err(" %pSb\n", pc1);
+ if (pc2)
+ pr_err(" %pSb\n", pc2);
+ break;
+ }
+ if ((nr_entries == 3) && (magic == KMSAN_CHAIN_MAGIC_ORIGIN)) {
+ /*
+ * Origin chains deeper than KMSAN_MAX_ORIGIN_DEPTH are
+ * not stored, so the output may be incomplete.
+ */
+ if (depth == KMSAN_MAX_ORIGIN_DEPTH)
+ pr_err("<Zero or more stacks not recorded to save memory>\n\n");
+ head = entries[1];
+ origin = entries[2];
+ pr_err("Uninit was stored to memory at:\n");
+ chained_nr_entries =
+ stack_depot_fetch(head, &chained_entries);
+ kmsan_internal_unpoison_memory(
+ chained_entries,
+ chained_nr_entries * sizeof(*chained_entries),
+ /*checked*/ false);
+ skipnr = get_stack_skipnr(chained_entries,
+ chained_nr_entries);
+ stack_trace_print(chained_entries + skipnr,
+ chained_nr_entries - skipnr, 0);
+ pr_err("\n");
+ continue;
+ }
+ pr_err("Uninit was created at:\n");
+ if (nr_entries) {
+ skipnr = get_stack_skipnr(entries, nr_entries);
+ stack_trace_print(entries + skipnr, nr_entries - skipnr,
+ 0);
+ } else {
+ pr_err("(stack is not available)\n");
+ }
+ break;
+ }
+}
+
+void kmsan_report(depot_stack_handle_t origin, void *address, int size,
+ int off_first, int off_last, const void *user_addr,
+ enum kmsan_bug_reason reason)
+{
+ unsigned long stack_entries[KMSAN_STACK_DEPTH];
+ int num_stack_entries, skipnr;
+ char *bug_type = NULL;
+ unsigned long ua_flags;
+ bool is_uaf;
+
+ if (!kmsan_enabled)
+ return;
+ if (!current->kmsan_ctx.allow_reporting)
+ return;
+ if (!origin)
+ return;
+
+ current->kmsan_ctx.allow_reporting = false;
+ ua_flags = user_access_save();
+ raw_spin_lock(&kmsan_report_lock);
+ pr_err("=====================================================\n");
+ is_uaf = kmsan_uaf_from_eb(stack_depot_get_extra_bits(origin));
+ switch (reason) {
+ case REASON_ANY:
+ bug_type = is_uaf ? "use-after-free" : "uninit-value";
+ break;
+ case REASON_COPY_TO_USER:
+ bug_type = is_uaf ? "kernel-infoleak-after-free" :
+ "kernel-infoleak";
+ break;
+ case REASON_SUBMIT_URB:
+ bug_type = is_uaf ? "kernel-usb-infoleak-after-free" :
+ "kernel-usb-infoleak";
+ break;
+ }
+
+ num_stack_entries =
+ stack_trace_save(stack_entries, KMSAN_STACK_DEPTH, 1);
+ skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
+
+ pr_err("BUG: KMSAN: %s in %pSb\n", bug_type,
+ (void *)stack_entries[skipnr]);
+ stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
+ 0);
+ pr_err("\n");
+
+ kmsan_print_origin(origin);
+
+ if (size) {
+ pr_err("\n");
+ if (off_first == off_last)
+ pr_err("Byte %d of %d is uninitialized\n", off_first,
+ size);
+ else
+ pr_err("Bytes %d-%d of %d are uninitialized\n",
+ off_first, off_last, size);
+ }
+ if (address)
+ pr_err("Memory access of size %d starts at %px\n", size,
+ address);
+ if (user_addr && reason == REASON_COPY_TO_USER)
+ pr_err("Data copied to user address %px\n", user_addr);
+ pr_err("\n");
+ dump_stack_print_info(KERN_ERR);
+ pr_err("=====================================================\n");
+ add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
+ raw_spin_unlock(&kmsan_report_lock);
+ if (panic_on_kmsan)
+ panic("kmsan.panic set ...\n");
+ user_access_restore(ua_flags);
+ current->kmsan_ctx.allow_reporting = true;
+}
diff --git a/mm/kmsan/shadow.c b/mm/kmsan/shadow.c
new file mode 100644
index 0000000000000..acc5279acc3be
--- /dev/null
+++ b/mm/kmsan/shadow.c
@@ -0,0 +1,147 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN shadow implementation.
+ *
+ * Copyright (C) 2017-2022 Google LLC
+ * Author: Alexander Potapenko <gli...@google.com>
+ *
+ */
+
+#include <asm/kmsan.h>
+#include <asm/tlbflush.h>
+#include <linux/cacheflush.h>
+#include <linux/memblock.h>
+#include <linux/mm_types.h>
+#include <linux/percpu-defs.h>
+#include <linux/slab.h>
+#include <linux/smp.h>
+#include <linux/stddef.h>
+
+#include "../internal.h"
+#include "kmsan.h"
+
+#define shadow_page_for(page) ((page)->kmsan_shadow)
+
+#define origin_page_for(page) ((page)->kmsan_origin)
+
+static void *shadow_ptr_for(struct page *page)
+{
+ return page_address(shadow_page_for(page));
+}
+
+static void *origin_ptr_for(struct page *page)
+{
+ return page_address(origin_page_for(page));
+}
+
+static bool page_has_metadata(struct page *page)
+{
+ return shadow_page_for(page) && origin_page_for(page);
+}
+
+static void set_no_shadow_origin_page(struct page *page)
+{
+ shadow_page_for(page) = NULL;
+ origin_page_for(page) = NULL;
+}
+
+/*
+ * Dummy load and store pages to be used when the real metadata is unavailable.
+ * There are separate pages for loads and stores, so that every load returns a
+ * zero, and every store doesn't affect other loads.
+ */
+static char dummy_load_page[PAGE_SIZE] __aligned(PAGE_SIZE);
+static char dummy_store_page[PAGE_SIZE] __aligned(PAGE_SIZE);
+
+static unsigned long vmalloc_meta(void *addr, bool is_origin)
+{
+ unsigned long addr64 = (unsigned long)addr, off;
+
+ KMSAN_WARN_ON(is_origin && !IS_ALIGNED(addr64, KMSAN_ORIGIN_SIZE));
+ if (kmsan_internal_is_vmalloc_addr(addr)) {
+ off = addr64 - VMALLOC_START;
+ return off + (is_origin ? KMSAN_VMALLOC_ORIGIN_START :
+ KMSAN_VMALLOC_SHADOW_START);
+ }
+ if (kmsan_internal_is_module_addr(addr)) {
+ off = addr64 - MODULES_VADDR;
+ return off + (is_origin ? KMSAN_MODULES_ORIGIN_START :
+ KMSAN_MODULES_SHADOW_START);
+ }
+ return 0;
+}
+
+static struct page *virt_to_page_or_null(void *vaddr)
+{
+ if (kmsan_virt_addr_valid(vaddr))
+ return virt_to_page(vaddr);
+ else
+ return NULL;
+}
+
+struct shadow_origin_ptr kmsan_get_shadow_origin_ptr(void *address, u64 size,
+ bool store)
+{
+ struct shadow_origin_ptr ret;
+ void *shadow;
+
+ /*
+ * Even if we redirect this memory access to the dummy page, it will
+ * go out of bounds.
+ */
+ KMSAN_WARN_ON(size > PAGE_SIZE);
+
+ if (!kmsan_enabled)
+ goto return_dummy;
+
+ KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(address, size));
+ shadow = kmsan_get_metadata(address, KMSAN_META_SHADOW);
+ if (!shadow)
+ goto return_dummy;
+
+ ret.shadow = shadow;
+ ret.origin = kmsan_get_metadata(address, KMSAN_META_ORIGIN);
+ return ret;
+
+return_dummy:
+ if (store) {
+ /* Ignore this store. */
+ ret.shadow = dummy_store_page;
+ ret.origin = dummy_store_page;
+ } else {
+ /* This load will return zero. */
+ ret.shadow = dummy_load_page;
+ ret.origin = dummy_load_page;
+ }
+ return ret;
+}
+
+/*
+ * Obtain the shadow or origin pointer for the given address, or NULL if there's
+ * none. The caller must check the return value for being non-NULL if needed.
+ * The return value of this function should not depend on whether we're in the
+ * runtime or not.
+ */
+void *kmsan_get_metadata(void *address, bool is_origin)
+{
+ u64 addr = (u64)address, pad, off;
+ struct page *page;
+
+ if (is_origin && !IS_ALIGNED(addr, KMSAN_ORIGIN_SIZE)) {
+ pad = addr % KMSAN_ORIGIN_SIZE;
+ addr -= pad;
+ }
+ address = (void *)addr;
+ if (kmsan_internal_is_vmalloc_addr(address) ||
+ kmsan_internal_is_module_addr(address))
+ return (void *)vmalloc_meta(address, is_origin);
+
+ page = virt_to_page_or_null(address);
+ if (!page)
+ return NULL;
+ if (!page_has_metadata(page))
+ return NULL;
+ off = addr % PAGE_SIZE;
+
+ return (is_origin ? origin_ptr_for(page) : shadow_ptr_for(page)) + off;
+}
diff --git a/scripts/Makefile.kmsan b/scripts/Makefile.kmsan
new file mode 100644
index 0000000000000..b5b0aa61322ec
--- /dev/null
+++ b/scripts/Makefile.kmsan
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0
+kmsan-cflags := -fsanitize=kernel-memory
+
+ifdef CONFIG_KMSAN_CHECK_PARAM_RETVAL
+kmsan-cflags += -fsanitize-memory-param-retval
+endif
+
+export CFLAGS_KMSAN := $(kmsan-cflags)
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 3fb6a99e78c47..ac32429e93b73 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -157,6 +157,15 @@ _c_flags += $(if $(patsubst n%,, \
endif
endif

+ifeq ($(CONFIG_KMSAN),y)
+_c_flags += $(if $(patsubst n%,, \
+ $(KMSAN_SANITIZE_$(basetarget).o)$(KMSAN_SANITIZE)y), \
+ $(CFLAGS_KMSAN))
+_c_flags += $(if $(patsubst n%,, \
+ $(KMSAN_ENABLE_CHECKS_$(basetarget).o)$(KMSAN_ENABLE_CHECKS)y), \
+ , -mllvm -msan-disable-checks=1)
+endif
+
ifeq ($(CONFIG_UBSAN),y)
_c_flags += $(if $(patsubst n%,, \
$(UBSAN_SANITIZE_$(basetarget).o)$(UBSAN_SANITIZE)$(CONFIG_UBSAN_SANITIZE_ALL)), \
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:05:12 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
EFI stub cannot be linked with KMSAN runtime, so we disable
instrumentation for it.

Instrumenting kcov, stackdepot or lockdep leads to infinite recursion
caused by instrumentation hooks calling instrumented code again.

Signed-off-by: Alexander Potapenko <gli...@google.com>
Reviewed-by: Marco Elver <el...@google.com>
---
v4:
-- This patch was previously part of "kmsan: disable KMSAN
instrumentation for certain kernel parts", but was split away per
Mark Rutland's request.

v5:
-- remove unnecessary comment belonging to another patch

Link: https://linux-review.googlesource.com/id/I41ae706bd3474f074f6a870bfc3f0f90e9c720f7
---
drivers/firmware/efi/libstub/Makefile | 1 +
kernel/Makefile | 1 +
kernel/locking/Makefile | 3 ++-
lib/Makefile | 3 +++
4 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 2c67f71f23753..2c1eb1fb0f226 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -53,6 +53,7 @@ GCOV_PROFILE := n
# Sanitizer runtimes are unavailable and cannot be linked here.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
+KMSAN_SANITIZE := n
UBSAN_SANITIZE := n
OBJECT_FILES_NON_STANDARD := y

diff --git a/kernel/Makefile b/kernel/Makefile
index 318789c728d32..d754e0be1176d 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -38,6 +38,7 @@ KCOV_INSTRUMENT_kcov.o := n
KASAN_SANITIZE_kcov.o := n
KCSAN_SANITIZE_kcov.o := n
UBSAN_SANITIZE_kcov.o := n
+KMSAN_SANITIZE_kcov.o := n
CFLAGS_kcov.o := $(call cc-option, -fno-conserve-stack) -fno-stack-protector

# Don't instrument error handlers
diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
index d51cabf28f382..ea925731fa40f 100644
--- a/kernel/locking/Makefile
+++ b/kernel/locking/Makefile
@@ -5,8 +5,9 @@ KCOV_INSTRUMENT := n

obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o

-# Avoid recursion lockdep -> KCSAN -> ... -> lockdep.
+# Avoid recursion lockdep -> sanitizer -> ... -> lockdep.
KCSAN_SANITIZE_lockdep.o := n
+KMSAN_SANITIZE_lockdep.o := n

ifdef CONFIG_FUNCTION_TRACER
CFLAGS_REMOVE_lockdep.o = $(CC_FLAGS_FTRACE)
diff --git a/lib/Makefile b/lib/Makefile
index ffabc30a27d4e..fcebece0f5b6f 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -275,6 +275,9 @@ obj-$(CONFIG_POLYNOMIAL) += polynomial.o
CFLAGS_stackdepot.o += -fno-builtin
obj-$(CONFIG_STACKDEPOT) += stackdepot.o
KASAN_SANITIZE_stackdepot.o := n
+# In particular, instrumenting stackdepot.c with KMSAN will result in infinite
+# recursion.
+KMSAN_SANITIZE_stackdepot.o := n
KCOV_INSTRUMENT_stackdepot.o := n

obj-$(CONFIG_REF_TRACKER) += ref_tracker.o
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:05:15 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
Add entry for KMSAN maintainers/reviewers.

Signed-off-by: Alexander Potapenko <gli...@google.com>
---

v5:
-- add arch/*/include/asm/kmsan.h

Link: https://linux-review.googlesource.com/id/Ic5836c2bceb6b63f71a60d3327d18af3aa3dab77
---
MAINTAINERS | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 936490dcc97b6..517e71ea02156 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11373,6 +11373,19 @@ F: kernel/kmod.c
F: lib/test_kmod.c
F: tools/testing/selftests/kmod/

+KMSAN
+M: Alexander Potapenko <gli...@google.com>
+R: Marco Elver <el...@google.com>
+R: Dmitry Vyukov <dvy...@google.com>
+L: kasa...@googlegroups.com
+S: Maintained
+F: Documentation/dev-tools/kmsan.rst
+F: arch/*/include/asm/kmsan.h
+F: include/linux/kmsan*.h
+F: lib/Kconfig.kmsan
+F: mm/kmsan/
+F: scripts/Makefile.kmsan
+
KPROBES
M: Naveen N. Rao <naveen...@linux.ibm.com>
M: Anil S Keshavamurthy <anil.s.kes...@intel.com>
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:05:18 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
Insert KMSAN hooks that make the necessary bookkeeping changes:
- poison page shadow and origins in alloc_pages()/free_page();
- clear page shadow and origins in clear_page(), copy_user_highpage();
- copy page metadata in copy_highpage(), wp_page_copy();
- handle vmap()/vunmap()/iounmap();

Signed-off-by: Alexander Potapenko <gli...@google.com>
---
v2:
-- move page metadata hooks implementation here
-- remove call to kmsan_memblock_free_pages()

v3:
-- use PAGE_SHIFT in kmsan_ioremap_page_range()

v4:
-- change sizeof(type) to sizeof(*ptr)
-- replace occurrences of |var| with @var
-- swap mm: and kmsan: in the subject
-- drop __no_sanitize_memory from clear_page()

v5:
-- do not export KMSAN hooks that are not called from modules
-- use modern style for-loops
-- simplify clear_page() instrumentation as suggested by Marco Elver
-- move forward declaration of `struct page` in kmsan.h to this patch

v6:
-- <linux/kmsan.h> doesn't exist prior to this patch

Link: https://linux-review.googlesource.com/id/I6d4f53a0e7eab46fa29f0348f3095d9f2e326850
---
arch/x86/include/asm/page_64.h | 7 ++
arch/x86/mm/ioremap.c | 3 +
include/linux/highmem.h | 3 +
include/linux/kmsan.h | 145 +++++++++++++++++++++++++++++++++
mm/internal.h | 6 ++
mm/kmsan/hooks.c | 86 +++++++++++++++++++
mm/kmsan/shadow.c | 113 +++++++++++++++++++++++++
mm/memory.c | 2 +
mm/page_alloc.c | 11 +++
mm/vmalloc.c | 20 ++++-
10 files changed, 394 insertions(+), 2 deletions(-)
create mode 100644 include/linux/kmsan.h

diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index baa70451b8df5..198e03e59ca19 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -8,6 +8,8 @@
#include <asm/cpufeatures.h>
#include <asm/alternative.h>

+#include <linux/kmsan-checks.h>
+
/* duplicated to the one in bootmem.h */
extern unsigned long max_pfn;
extern unsigned long phys_base;
@@ -47,6 +49,11 @@ void clear_page_erms(void *page);

static inline void clear_page(void *page)
{
+ /*
+ * Clean up KMSAN metadata for the page being cleared. The assembly call
+ * below clobbers @page, so we perform unpoisoning before it.
+ */
+ kmsan_unpoison_memory(page, PAGE_SIZE);
alternative_call_2(clear_page_orig,
clear_page_rep, X86_FEATURE_REP_GOOD,
clear_page_erms, X86_FEATURE_ERMS,
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 1ad0228f8ceb9..78c5bc654cff5 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -17,6 +17,7 @@
#include <linux/cc_platform.h>
#include <linux/efi.h>
#include <linux/pgtable.h>
+#include <linux/kmsan.h>

#include <asm/set_memory.h>
#include <asm/e820/api.h>
@@ -479,6 +480,8 @@ void iounmap(volatile void __iomem *addr)
return;
}

+ kmsan_iounmap_page_range((unsigned long)addr,
+ (unsigned long)addr + get_vm_area_size(p));
memtype_free(p->phys_addr, p->phys_addr + get_vm_area_size(p));

/* Finally remove it */
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 25679035ca283..e9912da5441b4 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -6,6 +6,7 @@
#include <linux/kernel.h>
#include <linux/bug.h>
#include <linux/cacheflush.h>
+#include <linux/kmsan.h>
#include <linux/mm.h>
#include <linux/uaccess.h>
#include <linux/hardirq.h>
@@ -311,6 +312,7 @@ static inline void copy_user_highpage(struct page *to, struct page *from,
vfrom = kmap_local_page(from);
vto = kmap_local_page(to);
copy_user_page(vto, vfrom, vaddr, to);
+ kmsan_unpoison_memory(page_address(to), PAGE_SIZE);
kunmap_local(vto);
kunmap_local(vfrom);
}
@@ -326,6 +328,7 @@ static inline void copy_highpage(struct page *to, struct page *from)
vfrom = kmap_local_page(from);
vto = kmap_local_page(to);
copy_page(vto, vfrom);
+ kmsan_copy_page_meta(to, from);
kunmap_local(vto);
kunmap_local(vfrom);
}
diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
new file mode 100644
index 0000000000000..b36bf3db835ee
--- /dev/null
+++ b/include/linux/kmsan.h
@@ -0,0 +1,145 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * KMSAN API for subsystems.
+ *
+ * Copyright (C) 2017-2022 Google LLC
+ * Author: Alexander Potapenko <gli...@google.com>
+ *
+ */
+#ifndef _LINUX_KMSAN_H
+#define _LINUX_KMSAN_H
+
+#include <linux/gfp.h>
+#include <linux/kmsan-checks.h>
+#include <linux/types.h>
+
+struct page;
+
+#ifdef CONFIG_KMSAN
+
+/**
+ * kmsan_alloc_page() - Notify KMSAN about an alloc_pages() call.
+ * @page: struct page pointer returned by alloc_pages().
+ * @order: order of allocated struct page.
+ * @flags: GFP flags used by alloc_pages()
+ *
+ * KMSAN marks 1<<@order pages starting at @page as uninitialized, unless
+ * @flags contain __GFP_ZERO.
+ */
+void kmsan_alloc_page(struct page *page, unsigned int order, gfp_t flags);
+
+/**
+ * kmsan_free_page() - Notify KMSAN about a free_pages() call.
+ * @page: struct page pointer passed to free_pages().
+ * @order: order of deallocated struct page.
+ *
+ * KMSAN marks freed memory as uninitialized.
+ */
+void kmsan_free_page(struct page *page, unsigned int order);
+
+/**
+ * kmsan_copy_page_meta() - Copy KMSAN metadata between two pages.
+ * @dst: destination page.
+ * @src: source page.
+ *
+ * KMSAN copies the contents of metadata pages for @src into the metadata pages
+ * for @dst. If @dst has no associated metadata pages, nothing happens.
+ * If @src has no associated metadata pages, @dst metadata pages are unpoisoned.
+ */
+void kmsan_copy_page_meta(struct page *dst, struct page *src);
+
+/**
+ * kmsan_map_kernel_range_noflush() - Notify KMSAN about a vmap.
+ * @start: start of vmapped range.
+ * @end: end of vmapped range.
+ * @prot: page protection flags used for vmap.
+ * @pages: array of pages.
+ * @page_shift: page_shift passed to vmap_range_noflush().
+ *
+ * KMSAN maps shadow and origin pages of @pages into contiguous ranges in
+ * vmalloc metadata address range.
+ */
+void kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end,
+ pgprot_t prot, struct page **pages,
+ unsigned int page_shift);
+
+/**
+ * kmsan_vunmap_kernel_range_noflush() - Notify KMSAN about a vunmap.
+ * @start: start of vunmapped range.
+ * @end: end of vunmapped range.
+ *
+ * KMSAN unmaps the contiguous metadata ranges created by
+ * kmsan_map_kernel_range_noflush().
+ */
+void kmsan_vunmap_range_noflush(unsigned long start, unsigned long end);
+
+/**
+ * kmsan_ioremap_page_range() - Notify KMSAN about a ioremap_page_range() call.
+ * @addr: range start.
+ * @end: range end.
+ * @phys_addr: physical range start.
+ * @prot: page protection flags used for ioremap_page_range().
+ * @page_shift: page_shift argument passed to vmap_range_noflush().
+ *
+ * KMSAN creates new metadata pages for the physical pages mapped into the
+ * virtual memory.
+ */
+void kmsan_ioremap_page_range(unsigned long addr, unsigned long end,
+ phys_addr_t phys_addr, pgprot_t prot,
+ unsigned int page_shift);
+
+/**
+ * kmsan_iounmap_page_range() - Notify KMSAN about a iounmap_page_range() call.
+ * @start: range start.
+ * @end: range end.
+ *
+ * KMSAN unmaps the metadata pages for the given range and, unlike for
+ * vunmap_page_range(), also deallocates them.
+ */
+void kmsan_iounmap_page_range(unsigned long start, unsigned long end);
+
+#else
+
+static inline int kmsan_alloc_page(struct page *page, unsigned int order,
+ gfp_t flags)
+{
+ return 0;
+}
+
+static inline void kmsan_free_page(struct page *page, unsigned int order)
+{
+}
+
+static inline void kmsan_copy_page_meta(struct page *dst, struct page *src)
+{
+}
+
+static inline void kmsan_vmap_pages_range_noflush(unsigned long start,
+ unsigned long end,
+ pgprot_t prot,
+ struct page **pages,
+ unsigned int page_shift)
+{
+}
+
+static inline void kmsan_vunmap_range_noflush(unsigned long start,
+ unsigned long end)
+{
+}
+
+static inline void kmsan_ioremap_page_range(unsigned long start,
+ unsigned long end,
+ phys_addr_t phys_addr,
+ pgprot_t prot,
+ unsigned int page_shift)
+{
+}
+
+static inline void kmsan_iounmap_page_range(unsigned long start,
+ unsigned long end)
+{
+}
+
+#endif
+
+#endif /* _LINUX_KMSAN_H */
diff --git a/mm/internal.h b/mm/internal.h
index 785409805ed79..fd7247a2367ed 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -847,8 +847,14 @@ int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
}
#endif

+int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
+ pgprot_t prot, struct page **pages,
+ unsigned int page_shift);
+
void vunmap_range_noflush(unsigned long start, unsigned long end);

+void __vunmap_range_noflush(unsigned long start, unsigned long end);
+
int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
unsigned long addr, int page_nid, int *flags);

diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 4ac62fa67a02a..040111bb9f6a3 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -11,6 +11,7 @@

#include <linux/cacheflush.h>
#include <linux/gfp.h>
+#include <linux/kmsan.h>
#include <linux/mm.h>
#include <linux/mm_types.h>
#include <linux/slab.h>
@@ -26,6 +27,91 @@
* skipping effects of functions like memset() inside instrumented code.
*/

+static unsigned long vmalloc_shadow(unsigned long addr)
+{
+ return (unsigned long)kmsan_get_metadata((void *)addr,
+ KMSAN_META_SHADOW);
+}
+
+static unsigned long vmalloc_origin(unsigned long addr)
+{
+ return (unsigned long)kmsan_get_metadata((void *)addr,
+ KMSAN_META_ORIGIN);
+}
+
+void kmsan_vunmap_range_noflush(unsigned long start, unsigned long end)
+{
+ __vunmap_range_noflush(vmalloc_shadow(start), vmalloc_shadow(end));
+ __vunmap_range_noflush(vmalloc_origin(start), vmalloc_origin(end));
+ flush_cache_vmap(vmalloc_shadow(start), vmalloc_shadow(end));
+ flush_cache_vmap(vmalloc_origin(start), vmalloc_origin(end));
+}
+
+/*
+ * This function creates new shadow/origin pages for the physical pages mapped
+ * into the virtual memory. If those physical pages already had shadow/origin,
+ * those are ignored.
+ */
+void kmsan_ioremap_page_range(unsigned long start, unsigned long end,
+ phys_addr_t phys_addr, pgprot_t prot,
+ unsigned int page_shift)
+{
+ gfp_t gfp_mask = GFP_KERNEL | __GFP_ZERO;
+ struct page *shadow, *origin;
+ unsigned long off = 0;
+ int nr;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ nr = (end - start) / PAGE_SIZE;
+ kmsan_enter_runtime();
+ for (int i = 0; i < nr; i++, off += PAGE_SIZE) {
+ shadow = alloc_pages(gfp_mask, 1);
+ origin = alloc_pages(gfp_mask, 1);
+ __vmap_pages_range_noflush(
+ vmalloc_shadow(start + off),
+ vmalloc_shadow(start + off + PAGE_SIZE), prot, &shadow,
+ PAGE_SHIFT);
+ __vmap_pages_range_noflush(
+ vmalloc_origin(start + off),
+ vmalloc_origin(start + off + PAGE_SIZE), prot, &origin,
+ PAGE_SHIFT);
+ }
+ flush_cache_vmap(vmalloc_shadow(start), vmalloc_shadow(end));
+ flush_cache_vmap(vmalloc_origin(start), vmalloc_origin(end));
+ kmsan_leave_runtime();
+}
+
+void kmsan_iounmap_page_range(unsigned long start, unsigned long end)
+{
+ unsigned long v_shadow, v_origin;
+ struct page *shadow, *origin;
+ int nr;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ nr = (end - start) / PAGE_SIZE;
+ kmsan_enter_runtime();
+ v_shadow = (unsigned long)vmalloc_shadow(start);
+ v_origin = (unsigned long)vmalloc_origin(start);
+ for (int i = 0; i < nr;
+ i++, v_shadow += PAGE_SIZE, v_origin += PAGE_SIZE) {
+ shadow = kmsan_vmalloc_to_page_or_null((void *)v_shadow);
+ origin = kmsan_vmalloc_to_page_or_null((void *)v_origin);
+ __vunmap_range_noflush(v_shadow, vmalloc_shadow(end));
+ __vunmap_range_noflush(v_origin, vmalloc_origin(end));
+ if (shadow)
+ __free_pages(shadow, 1);
+ if (origin)
+ __free_pages(origin, 1);
+ }
+ flush_cache_vmap(vmalloc_shadow(start), vmalloc_shadow(end));
+ flush_cache_vmap(vmalloc_origin(start), vmalloc_origin(end));
+ kmsan_leave_runtime();
+}
+
/* Functions from kmsan-checks.h follow. */
void kmsan_poison_memory(const void *address, size_t size, gfp_t flags)
{
diff --git a/mm/kmsan/shadow.c b/mm/kmsan/shadow.c
index acc5279acc3be..8c81a059beea6 100644
--- a/mm/kmsan/shadow.c
+++ b/mm/kmsan/shadow.c
@@ -145,3 +145,116 @@ void *kmsan_get_metadata(void *address, bool is_origin)

return (is_origin ? origin_ptr_for(page) : shadow_ptr_for(page)) + off;
}
+
+void kmsan_copy_page_meta(struct page *dst, struct page *src)
+{
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ if (!dst || !page_has_metadata(dst))
+ return;
+ if (!src || !page_has_metadata(src)) {
+ kmsan_internal_unpoison_memory(page_address(dst), PAGE_SIZE,
+ /*checked*/ false);
+ return;
+ }
+
+ kmsan_enter_runtime();
+ __memcpy(shadow_ptr_for(dst), shadow_ptr_for(src), PAGE_SIZE);
+ __memcpy(origin_ptr_for(dst), origin_ptr_for(src), PAGE_SIZE);
+ kmsan_leave_runtime();
+}
+
+void kmsan_alloc_page(struct page *page, unsigned int order, gfp_t flags)
+{
+ bool initialized = (flags & __GFP_ZERO) || !kmsan_enabled;
+ struct page *shadow, *origin;
+ depot_stack_handle_t handle;
+ int pages = 1 << order;
+
+ if (!page)
+ return;
+
+ shadow = shadow_page_for(page);
+ origin = origin_page_for(page);
+
+ if (initialized) {
+ __memset(page_address(shadow), 0, PAGE_SIZE * pages);
+ __memset(page_address(origin), 0, PAGE_SIZE * pages);
+ return;
+ }
+
+ /* Zero pages allocated by the runtime should also be initialized. */
+ if (kmsan_in_runtime())
+ return;
+
+ __memset(page_address(shadow), -1, PAGE_SIZE * pages);
+ kmsan_enter_runtime();
+ handle = kmsan_save_stack_with_flags(flags, /*extra_bits*/ 0);
+ kmsan_leave_runtime();
+ /*
+ * Addresses are page-aligned, pages are contiguous, so it's ok
+ * to just fill the origin pages with @handle.
+ */
+ for (int i = 0; i < PAGE_SIZE * pages / sizeof(handle); i++)
+ ((depot_stack_handle_t *)page_address(origin))[i] = handle;
+}
+
+void kmsan_free_page(struct page *page, unsigned int order)
+{
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ kmsan_enter_runtime();
+ kmsan_internal_poison_memory(page_address(page),
+ PAGE_SIZE << compound_order(page),
+ GFP_KERNEL,
+ KMSAN_POISON_CHECK | KMSAN_POISON_FREE);
+ kmsan_leave_runtime();
+}
+
+void kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end,
+ pgprot_t prot, struct page **pages,
+ unsigned int page_shift)
+{
+ unsigned long shadow_start, origin_start, shadow_end, origin_end;
+ struct page **s_pages, **o_pages;
+ int nr, mapped;
+
+ if (!kmsan_enabled)
+ return;
+
+ shadow_start = vmalloc_meta((void *)start, KMSAN_META_SHADOW);
+ shadow_end = vmalloc_meta((void *)end, KMSAN_META_SHADOW);
+ if (!shadow_start)
+ return;
+
+ nr = (end - start) / PAGE_SIZE;
+ s_pages = kcalloc(nr, sizeof(*s_pages), GFP_KERNEL);
+ o_pages = kcalloc(nr, sizeof(*o_pages), GFP_KERNEL);
+ if (!s_pages || !o_pages)
+ goto ret;
+ for (int i = 0; i < nr; i++) {
+ s_pages[i] = shadow_page_for(pages[i]);
+ o_pages[i] = origin_page_for(pages[i]);
+ }
+ prot = __pgprot(pgprot_val(prot) | _PAGE_NX);
+ prot = PAGE_KERNEL;
+
+ origin_start = vmalloc_meta((void *)start, KMSAN_META_ORIGIN);
+ origin_end = vmalloc_meta((void *)end, KMSAN_META_ORIGIN);
+ kmsan_enter_runtime();
+ mapped = __vmap_pages_range_noflush(shadow_start, shadow_end, prot,
+ s_pages, page_shift);
+ KMSAN_WARN_ON(mapped);
+ mapped = __vmap_pages_range_noflush(origin_start, origin_end, prot,
+ o_pages, page_shift);
+ KMSAN_WARN_ON(mapped);
+ kmsan_leave_runtime();
+ flush_tlb_kernel_range(shadow_start, shadow_end);
+ flush_tlb_kernel_range(origin_start, origin_end);
+ flush_cache_vmap(shadow_start, shadow_end);
+ flush_cache_vmap(origin_start, origin_end);
+
+ret:
+ kfree(s_pages);
+ kfree(o_pages);
+}
diff --git a/mm/memory.c b/mm/memory.c
index 4ba73f5aa8bb7..6cc35d2cae8fd 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -52,6 +52,7 @@
#include <linux/highmem.h>
#include <linux/pagemap.h>
#include <linux/memremap.h>
+#include <linux/kmsan.h>
#include <linux/ksm.h>
#include <linux/rmap.h>
#include <linux/export.h>
@@ -3128,6 +3129,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
delayacct_wpcopy_end();
return 0;
}
+ kmsan_copy_page_meta(new_page, old_page);
}

if (mem_cgroup_charge(page_folio(new_page), mm, GFP_KERNEL))
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e5486d47406e8..d488dab76a6e8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -27,6 +27,7 @@
#include <linux/compiler.h>
#include <linux/kernel.h>
#include <linux/kasan.h>
+#include <linux/kmsan.h>
#include <linux/module.h>
#include <linux/suspend.h>
#include <linux/pagevec.h>
@@ -1398,6 +1399,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
VM_BUG_ON_PAGE(PageTail(page), page);

trace_mm_page_free(page, order);
+ kmsan_free_page(page, order);

if (unlikely(PageHWPoison(page)) && !order) {
/*
@@ -3817,6 +3819,14 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
/*
* Allocate a page from the given zone. Use pcplists for order-0 allocations.
*/
+
+/*
+ * Do not instrument rmqueue() with KMSAN. This function may call
+ * __msan_poison_alloca() through a call to set_pfnblock_flags_mask().
+ * If __msan_poison_alloca() attempts to allocate pages for the stack depot, it
+ * may call rmqueue() again, which will result in a deadlock.
+ */
+__no_sanitize_memory
static inline
struct page *rmqueue(struct zone *preferred_zone,
struct zone *zone, unsigned int order,
@@ -5535,6 +5545,7 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
}

trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype);
+ kmsan_alloc_page(page, order, alloc_gfp);

return page;
}
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index dd6cdb2011953..68b656e0125c9 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -320,6 +320,9 @@ int ioremap_page_range(unsigned long addr, unsigned long end,
err = vmap_range_noflush(addr, end, phys_addr, pgprot_nx(prot),
ioremap_max_page_shift);
flush_cache_vmap(addr, end);
+ if (!err)
+ kmsan_ioremap_page_range(addr, end, phys_addr, prot,
+ ioremap_max_page_shift);
return err;
}

@@ -416,7 +419,7 @@ static void vunmap_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end,
*
* This is an internal function only. Do not use outside mm/.
*/
-void vunmap_range_noflush(unsigned long start, unsigned long end)
+void __vunmap_range_noflush(unsigned long start, unsigned long end)
{
unsigned long next;
pgd_t *pgd;
@@ -438,6 +441,12 @@ void vunmap_range_noflush(unsigned long start, unsigned long end)
arch_sync_kernel_mappings(start, end);
}

+void vunmap_range_noflush(unsigned long start, unsigned long end)
+{
+ kmsan_vunmap_range_noflush(start, end);
+ __vunmap_range_noflush(start, end);
+}
+
/**
* vunmap_range - unmap kernel virtual addresses
* @addr: start of the VM area to unmap
@@ -575,7 +584,7 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
*
* This is an internal function only. Do not use outside mm/.
*/
-int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
+int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
pgprot_t prot, struct page **pages, unsigned int page_shift)
{
unsigned int i, nr = (end - addr) >> PAGE_SHIFT;
@@ -601,6 +610,13 @@ int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
return 0;
}

+int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
+ pgprot_t prot, struct page **pages, unsigned int page_shift)
+{
+ kmsan_vmap_pages_range_noflush(addr, end, prot, pages, page_shift);
+ return __vmap_pages_range_noflush(addr, end, prot, pages, page_shift);
+}
+
/**
* vmap_pages_range - map pages to a kernel virtual address
* @addr: start of the VM area to map
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:05:22 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
In order to report uninitialized memory coming from heap allocations
KMSAN has to poison them unless they're created with __GFP_ZERO.

It's handy that we need KMSAN hooks in the places where
init_on_alloc/init_on_free initialization is performed.

In addition, we apply __no_kmsan_checks to get_freepointer_safe() to
suppress reports when accessing freelist pointers that reside in freed
objects.

Signed-off-by: Alexander Potapenko <gli...@google.com>
Reviewed-by: Marco Elver <el...@google.com>

---
v2:
-- move the implementation of SLUB hooks here

v4:
-- change sizeof(type) to sizeof(*ptr)
-- swap mm: and kmsan: in the subject
-- get rid of kmsan_init(), replace it with __no_kmsan_checks

v5:
-- do not export KMSAN hooks that are not called from modules
-- drop an unnecessary whitespace change

Link: https://linux-review.googlesource.com/id/I6954b386c5c5d7f99f48bb6cbcc74b75136ce86e
---
include/linux/kmsan.h | 57 ++++++++++++++++++++++++++++++++
mm/kmsan/hooks.c | 76 +++++++++++++++++++++++++++++++++++++++++++
mm/slab.h | 1 +
mm/slub.c | 17 ++++++++++
4 files changed, 151 insertions(+)

diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index b36bf3db835ee..5c4e0079054e6 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -14,6 +14,7 @@
#include <linux/types.h>

struct page;
+struct kmem_cache;

#ifdef CONFIG_KMSAN

@@ -48,6 +49,44 @@ void kmsan_free_page(struct page *page, unsigned int order);
*/
void kmsan_copy_page_meta(struct page *dst, struct page *src);

+/**
+ * kmsan_slab_alloc() - Notify KMSAN about a slab allocation.
+ * @s: slab cache the object belongs to.
+ * @object: object pointer.
+ * @flags: GFP flags passed to the allocator.
+ *
+ * Depending on cache flags and GFP flags, KMSAN sets up the metadata of the
+ * newly created object, marking it as initialized or uninitialized.
+ */
+void kmsan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags);
+
+/**
+ * kmsan_slab_free() - Notify KMSAN about a slab deallocation.
+ * @s: slab cache the object belongs to.
+ * @object: object pointer.
+ *
+ * KMSAN marks the freed object as uninitialized.
+ */
+void kmsan_slab_free(struct kmem_cache *s, void *object);
+
+/**
+ * kmsan_kmalloc_large() - Notify KMSAN about a large slab allocation.
+ * @ptr: object pointer.
+ * @size: object size.
+ * @flags: GFP flags passed to the allocator.
+ *
+ * Similar to kmsan_slab_alloc(), but for large allocations.
+ */
+void kmsan_kmalloc_large(const void *ptr, size_t size, gfp_t flags);
+
+/**
+ * kmsan_kfree_large() - Notify KMSAN about a large slab deallocation.
+ * @ptr: object pointer.
+ *
+ * Similar to kmsan_slab_free(), but for large allocations.
+ */
+void kmsan_kfree_large(const void *ptr);
+
/**
* kmsan_map_kernel_range_noflush() - Notify KMSAN about a vmap.
* @start: start of vmapped range.
@@ -114,6 +153,24 @@ static inline void kmsan_copy_page_meta(struct page *dst, struct page *src)
{
}

+static inline void kmsan_slab_alloc(struct kmem_cache *s, void *object,
+ gfp_t flags)
+{
+}
+
+static inline void kmsan_slab_free(struct kmem_cache *s, void *object)
+{
+}
+
+static inline void kmsan_kmalloc_large(const void *ptr, size_t size,
+ gfp_t flags)
+{
+}
+
+static inline void kmsan_kfree_large(const void *ptr)
+{
+}
+
static inline void kmsan_vmap_pages_range_noflush(unsigned long start,
unsigned long end,
pgprot_t prot,
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 040111bb9f6a3..000703c563a4d 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -27,6 +27,82 @@
* skipping effects of functions like memset() inside instrumented code.
*/

+void kmsan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags)
+{
+ if (unlikely(object == NULL))
+ return;
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ /*
+ * There's a ctor or this is an RCU cache - do nothing. The memory
+ * status hasn't changed since last use.
+ */
+ if (s->ctor || (s->flags & SLAB_TYPESAFE_BY_RCU))
+ return;
+
+ kmsan_enter_runtime();
+ if (flags & __GFP_ZERO)
+ kmsan_internal_unpoison_memory(object, s->object_size,
+ KMSAN_POISON_CHECK);
+ else
+ kmsan_internal_poison_memory(object, s->object_size, flags,
+ KMSAN_POISON_CHECK);
+ kmsan_leave_runtime();
+}
+
+void kmsan_slab_free(struct kmem_cache *s, void *object)
+{
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ /* RCU slabs could be legally used after free within the RCU period */
+ if (unlikely(s->flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)))
+ return;
+ /*
+ * If there's a constructor, freed memory must remain in the same state
+ * until the next allocation. We cannot save its state to detect
+ * use-after-free bugs, instead we just keep it unpoisoned.
+ */
+ if (s->ctor)
+ return;
+ kmsan_enter_runtime();
+ kmsan_internal_poison_memory(object, s->object_size, GFP_KERNEL,
+ KMSAN_POISON_CHECK | KMSAN_POISON_FREE);
+ kmsan_leave_runtime();
+}
+
+void kmsan_kmalloc_large(const void *ptr, size_t size, gfp_t flags)
+{
+ if (unlikely(ptr == NULL))
+ return;
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ kmsan_enter_runtime();
+ if (flags & __GFP_ZERO)
+ kmsan_internal_unpoison_memory((void *)ptr, size,
+ /*checked*/ true);
+ else
+ kmsan_internal_poison_memory((void *)ptr, size, flags,
+ KMSAN_POISON_CHECK);
+ kmsan_leave_runtime();
+}
+
+void kmsan_kfree_large(const void *ptr)
+{
+ struct page *page;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ kmsan_enter_runtime();
+ page = virt_to_head_page((void *)ptr);
+ KMSAN_WARN_ON(ptr != page_address(page));
+ kmsan_internal_poison_memory((void *)ptr,
+ PAGE_SIZE << compound_order(page),
+ GFP_KERNEL,
+ KMSAN_POISON_CHECK | KMSAN_POISON_FREE);
+ kmsan_leave_runtime();
+}
+
static unsigned long vmalloc_shadow(unsigned long addr)
{
return (unsigned long)kmsan_get_metadata((void *)addr,
diff --git a/mm/slab.h b/mm/slab.h
index 4ec82bec15ecd..9d0afd2985df7 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -729,6 +729,7 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s,
memset(p[i], 0, s->object_size);
kmemleak_alloc_recursive(p[i], s->object_size, 1,
s->flags, flags);
+ kmsan_slab_alloc(s, p[i], flags);
}

memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
diff --git a/mm/slub.c b/mm/slub.c
index 862dbd9af4f52..2c323d83d0526 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -22,6 +22,7 @@
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <linux/kasan.h>
+#include <linux/kmsan.h>
#include <linux/cpu.h>
#include <linux/cpuset.h>
#include <linux/mempolicy.h>
@@ -359,6 +360,17 @@ static void prefetch_freepointer(const struct kmem_cache *s, void *object)
prefetchw(object + s->offset);
}

+/*
+ * When running under KMSAN, get_freepointer_safe() may return an uninitialized
+ * pointer value in the case the current thread loses the race for the next
+ * memory chunk in the freelist. In that case this_cpu_cmpxchg_double() in
+ * slab_alloc_node() will fail, so the uninitialized value won't be used, but
+ * KMSAN will still check all arguments of cmpxchg because of imperfect
+ * handling of inline assembly.
+ * To work around this problem, we apply __no_kmsan_checks to ensure that
+ * get_freepointer_safe() returns initialized memory.
+ */
+__no_kmsan_checks
static inline void *get_freepointer_safe(struct kmem_cache *s, void *object)
{
unsigned long freepointer_addr;
@@ -1709,6 +1721,7 @@ static inline void *kmalloc_large_node_hook(void *ptr, size_t size, gfp_t flags)
ptr = kasan_kmalloc_large(ptr, size, flags);
/* As ptr might get tagged, call kmemleak hook after KASAN. */
kmemleak_alloc(ptr, size, 1, flags);
+ kmsan_kmalloc_large(ptr, size, flags);
return ptr;
}

@@ -1716,12 +1729,14 @@ static __always_inline void kfree_hook(void *x)
{
kmemleak_free(x);
kasan_kfree_large(x);
+ kmsan_kfree_large(x);
}

static __always_inline bool slab_free_hook(struct kmem_cache *s,
void *x, bool init)
{
kmemleak_free_recursive(x, s->flags);
+ kmsan_slab_free(s, x);

debug_check_no_locks_freed(x, s->object_size);

@@ -5915,6 +5930,7 @@ static char *create_unique_id(struct kmem_cache *s)
p += sprintf(p, "%07u", s->size);

BUG_ON(p > name + ID_STR_LENGTH - 1);
+ kmsan_unpoison_memory(name, p - name);
return name;
}

@@ -6016,6 +6032,7 @@ static int sysfs_slab_alias(struct kmem_cache *s, const char *name)
al->name = name;
al->next = alias_list;
alias_list = al;
+ kmsan_unpoison_memory(al, sizeof(*al));
return 0;
}

--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:05:25 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
Tell KMSAN that a new task is created, so the tool creates a backing
metadata structure for that task.

Signed-off-by: Alexander Potapenko <gli...@google.com>
---
v2:
-- move implementation of kmsan_task_create() and kmsan_task_exit() here

v4:
-- change sizeof(type) to sizeof(*ptr)

v5:
-- do not export KMSAN hooks that are not called from modules
-- minor comment fix

Link: https://linux-review.googlesource.com/id/I0f41c3a1c7d66f7e14aabcfdfc7c69addb945805
---
include/linux/kmsan.h | 21 +++++++++++++++++++++
kernel/exit.c | 2 ++
kernel/fork.c | 2 ++
mm/kmsan/core.c | 10 ++++++++++
mm/kmsan/hooks.c | 17 +++++++++++++++++
mm/kmsan/kmsan.h | 2 ++
6 files changed, 54 insertions(+)

diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index 5c4e0079054e6..354aee6f7b1a2 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -15,9 +15,22 @@

struct page;
struct kmem_cache;
+struct task_struct;

#ifdef CONFIG_KMSAN

+/**
+ * kmsan_task_create() - Initialize KMSAN state for the task.
+ * @task: task to initialize.
+ */
+void kmsan_task_create(struct task_struct *task);
+
+/**
+ * kmsan_task_exit() - Notify KMSAN that a task has exited.
+ * @task: task about to finish.
+ */
+void kmsan_task_exit(struct task_struct *task);
+
/**
* kmsan_alloc_page() - Notify KMSAN about an alloc_pages() call.
* @page: struct page pointer returned by alloc_pages().
@@ -139,6 +152,14 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end);

#else

+static inline void kmsan_task_create(struct task_struct *task)
+{
+}
+
+static inline void kmsan_task_exit(struct task_struct *task)
+{
+}
+
static inline int kmsan_alloc_page(struct page *page, unsigned int order,
gfp_t flags)
{
diff --git a/kernel/exit.c b/kernel/exit.c
index 84021b24f79e3..f5d620c315662 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -60,6 +60,7 @@
#include <linux/writeback.h>
#include <linux/shm.h>
#include <linux/kcov.h>
+#include <linux/kmsan.h>
#include <linux/random.h>
#include <linux/rcuwait.h>
#include <linux/compat.h>
@@ -741,6 +742,7 @@ void __noreturn do_exit(long code)
WARN_ON(tsk->plug);

kcov_task_exit(tsk);
+ kmsan_task_exit(tsk);

coredump_task_exit(tsk);
ptrace_event(PTRACE_EVENT_EXIT, code);
diff --git a/kernel/fork.c b/kernel/fork.c
index 8a9e92068b150..a438f5ee3aed5 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -37,6 +37,7 @@
#include <linux/fdtable.h>
#include <linux/iocontext.h>
#include <linux/key.h>
+#include <linux/kmsan.h>
#include <linux/binfmts.h>
#include <linux/mman.h>
#include <linux/mmu_notifier.h>
@@ -1026,6 +1027,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
tsk->worker_private = NULL;

kcov_task_init(tsk);
+ kmsan_task_create(tsk);
kmap_local_fork(tsk);

#ifdef CONFIG_FAULT_INJECTION
diff --git a/mm/kmsan/core.c b/mm/kmsan/core.c
index 5330138fda5bc..112dce135c7f6 100644
--- a/mm/kmsan/core.c
+++ b/mm/kmsan/core.c
@@ -37,6 +37,16 @@ bool kmsan_enabled __read_mostly;
*/
DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);

+void kmsan_internal_task_create(struct task_struct *task)
+{
+ struct kmsan_ctx *ctx = &task->kmsan_ctx;
+ struct thread_info *info = current_thread_info();
+
+ __memset(ctx, 0, sizeof(*ctx));
+ ctx->allow_reporting = true;
+ kmsan_internal_unpoison_memory(info, sizeof(*info), false);
+}
+
void kmsan_internal_poison_memory(void *address, size_t size, gfp_t flags,
unsigned int poison_flags)
{
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 000703c563a4d..6f3e64b0b61f8 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -27,6 +27,23 @@
* skipping effects of functions like memset() inside instrumented code.
*/

+void kmsan_task_create(struct task_struct *task)
+{
+ kmsan_enter_runtime();
+ kmsan_internal_task_create(task);
+ kmsan_leave_runtime();
+}
+
+void kmsan_task_exit(struct task_struct *task)
+{
+ struct kmsan_ctx *ctx = &task->kmsan_ctx;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ ctx->allow_reporting = false;
+}
+
void kmsan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags)
{
if (unlikely(object == NULL))
diff --git a/mm/kmsan/kmsan.h b/mm/kmsan/kmsan.h
index 97d48b45dba58..77ee068c04ae9 100644
--- a/mm/kmsan/kmsan.h
+++ b/mm/kmsan/kmsan.h
@@ -180,6 +180,8 @@ void kmsan_internal_set_shadow_origin(void *address, size_t size, int b,
u32 origin, bool checked);
depot_stack_handle_t kmsan_internal_chain_origin(depot_stack_handle_t id);

+void kmsan_internal_task_create(struct task_struct *task);
+
bool kmsan_metadata_is_contiguous(void *addr, size_t size);
void kmsan_internal_check_memory(void *addr, size_t size, const void *user_addr,
int reason);
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:05:27 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
kmsan_init_shadow() scans the mappings created at boot time and creates
metadata pages for those mappings.

When the memblock allocator returns pages to pagealloc, we reserve 2/3
of those pages and use them as metadata for the remaining 1/3. Once KMSAN
starts, every page allocated by pagealloc has its associated shadow and
origin pages.

kmsan_initialize() initializes the bookkeeping for init_task and enables
KMSAN.

Signed-off-by: Alexander Potapenko <gli...@google.com>
---
v2:
-- move mm/kmsan/init.c and kmsan_memblock_free_pages() to this patch
-- print a warning that KMSAN is a debugging tool (per Greg K-H's
request)

v4:
-- change sizeof(type) to sizeof(*ptr)
-- replace occurrences of |var| with @var
-- swap init: and kmsan: in the subject

v5:
-- address Marco Elver's comments
-- don't export initialization routines
-- use modern style for-loops
-- better name for struct page_pair
-- delete duplicate function prototypes

Link: https://linux-review.googlesource.com/id/I7bc53706141275914326df2345881ffe0cdd16bd
---
include/linux/kmsan.h | 36 +++++++
init/main.c | 3 +
mm/kmsan/Makefile | 3 +-
mm/kmsan/init.c | 235 ++++++++++++++++++++++++++++++++++++++++++
mm/kmsan/kmsan.h | 3 +
mm/kmsan/shadow.c | 34 ++++++
mm/page_alloc.c | 4 +
7 files changed, 317 insertions(+), 1 deletion(-)
create mode 100644 mm/kmsan/init.c

diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index 354aee6f7b1a2..e00de976ee438 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -31,6 +31,28 @@ void kmsan_task_create(struct task_struct *task);
*/
void kmsan_task_exit(struct task_struct *task);

+/**
+ * kmsan_init_shadow() - Initialize KMSAN shadow at boot time.
+ *
+ * Allocate and initialize KMSAN metadata for early allocations.
+ */
+void __init kmsan_init_shadow(void);
+
+/**
+ * kmsan_init_runtime() - Initialize KMSAN state and enable KMSAN.
+ */
+void __init kmsan_init_runtime(void);
+
+/**
+ * kmsan_memblock_free_pages() - handle freeing of memblock pages.
+ * @page: struct page to free.
+ * @order: order of @page.
+ *
+ * Freed pages are either returned to buddy allocator or held back to be used
+ * as metadata pages.
+ */
+bool __init kmsan_memblock_free_pages(struct page *page, unsigned int order);
+
/**
* kmsan_alloc_page() - Notify KMSAN about an alloc_pages() call.
* @page: struct page pointer returned by alloc_pages().
@@ -152,6 +174,20 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end);

#else

+static inline void kmsan_init_shadow(void)
+{
+}
+
+static inline void kmsan_init_runtime(void)
+{
+}
+
+static inline bool kmsan_memblock_free_pages(struct page *page,
+ unsigned int order)
+{
+ return true;
+}
+
static inline void kmsan_task_create(struct task_struct *task)
{
}
diff --git a/init/main.c b/init/main.c
index 1fe7942f5d4a8..3afed7bf9f683 100644
--- a/init/main.c
+++ b/init/main.c
@@ -34,6 +34,7 @@
#include <linux/percpu.h>
#include <linux/kmod.h>
#include <linux/kprobes.h>
+#include <linux/kmsan.h>
#include <linux/vmalloc.h>
#include <linux/kernel_stat.h>
#include <linux/start_kernel.h>
@@ -836,6 +837,7 @@ static void __init mm_init(void)
init_mem_debugging_and_hardening();
kfence_alloc_pool();
report_meminit();
+ kmsan_init_shadow();
stack_depot_early_init();
mem_init();
mem_init_print_info();
@@ -853,6 +855,7 @@ static void __init mm_init(void)
init_espfix_bsp();
/* Should be run after espfix64 is set up. */
pti_init();
+ kmsan_init_runtime();
}

#ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET
diff --git a/mm/kmsan/Makefile b/mm/kmsan/Makefile
index 550ad8625e4f9..401acb1a491ce 100644
--- a/mm/kmsan/Makefile
+++ b/mm/kmsan/Makefile
@@ -3,7 +3,7 @@
# Makefile for KernelMemorySanitizer (KMSAN).
#
#
-obj-y := core.o instrumentation.o hooks.o report.o shadow.o
+obj-y := core.o instrumentation.o init.o hooks.o report.o shadow.o

KMSAN_SANITIZE := n
KCOV_INSTRUMENT := n
@@ -18,6 +18,7 @@ CFLAGS_REMOVE.o = $(CC_FLAGS_FTRACE)

CFLAGS_core.o := $(CC_FLAGS_KMSAN_RUNTIME)
CFLAGS_hooks.o := $(CC_FLAGS_KMSAN_RUNTIME)
+CFLAGS_init.o := $(CC_FLAGS_KMSAN_RUNTIME)
CFLAGS_instrumentation.o := $(CC_FLAGS_KMSAN_RUNTIME)
CFLAGS_report.o := $(CC_FLAGS_KMSAN_RUNTIME)
CFLAGS_shadow.o := $(CC_FLAGS_KMSAN_RUNTIME)
diff --git a/mm/kmsan/init.c b/mm/kmsan/init.c
new file mode 100644
index 0000000000000..7fb794242fad0
--- /dev/null
+++ b/mm/kmsan/init.c
@@ -0,0 +1,235 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN initialization routines.
+ *
+ * Copyright (C) 2017-2021 Google LLC
+ * Author: Alexander Potapenko <gli...@google.com>
+ *
+ */
+
+#include "kmsan.h"
+
+#include <asm/sections.h>
+#include <linux/mm.h>
+#include <linux/memblock.h>
+
+#include "../internal.h"
+
+#define NUM_FUTURE_RANGES 128
+struct start_end_pair {
+ u64 start, end;
+};
+
+static struct start_end_pair start_end_pairs[NUM_FUTURE_RANGES] __initdata;
+static int future_index __initdata;
+
+/*
+ * Record a range of memory for which the metadata pages will be created once
+ * the page allocator becomes available.
+ */
+static void __init kmsan_record_future_shadow_range(void *start, void *end)
+{
+ u64 nstart = (u64)start, nend = (u64)end, cstart, cend;
+ bool merged = false;
+
+ KMSAN_WARN_ON(future_index == NUM_FUTURE_RANGES);
+ KMSAN_WARN_ON((nstart >= nend) || !nstart || !nend);
+ nstart = ALIGN_DOWN(nstart, PAGE_SIZE);
+ nend = ALIGN(nend, PAGE_SIZE);
+
+ /*
+ * Scan the existing ranges to see if any of them overlaps with
+ * [start, end). In that case, merge the two ranges instead of
+ * creating a new one.
+ * The number of ranges is less than 20, so there is no need to organize
+ * them into a more intelligent data structure.
+ */
+ for (int i = 0; i < future_index; i++) {
+ cstart = start_end_pairs[i].start;
+ cend = start_end_pairs[i].end;
+ if ((cstart < nstart && cend < nstart) ||
+ (cstart > nend && cend > nend))
+ /* ranges are disjoint - do not merge */
+ continue;
+ start_end_pairs[i].start = min(nstart, cstart);
+ start_end_pairs[i].end = max(nend, cend);
+ merged = true;
+ break;
+ }
+ if (merged)
+ return;
+ start_end_pairs[future_index].start = nstart;
+ start_end_pairs[future_index].end = nend;
+ future_index++;
+}
+
+/*
+ * Initialize the shadow for existing mappings during kernel initialization.
+ * These include kernel text/data sections, NODE_DATA and future ranges
+ * registered while creating other data (e.g. percpu).
+ *
+ * Allocations via memblock can be only done before slab is initialized.
+ */
+void __init kmsan_init_shadow(void)
+{
+ const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
+ phys_addr_t p_start, p_end;
+ u64 loop;
+ int nid;
+
+ for_each_reserved_mem_range(loop, &p_start, &p_end)
+ kmsan_record_future_shadow_range(phys_to_virt(p_start),
+ phys_to_virt(p_end));
+ /* Allocate shadow for .data */
+ kmsan_record_future_shadow_range(_sdata, _edata);
+
+ for_each_online_node(nid)
+ kmsan_record_future_shadow_range(
+ NODE_DATA(nid), (char *)NODE_DATA(nid) + nd_size);
+
+ for (int i = 0; i < future_index; i++)
+ kmsan_init_alloc_meta_for_range(
+ (void *)start_end_pairs[i].start,
+ (void *)start_end_pairs[i].end);
+}
+
+struct metadata_page_pair {
+ struct page *shadow, *origin;
+};
+static struct metadata_page_pair held_back[MAX_ORDER] __initdata;
+
+/*
+ * Eager metadata allocation. When the memblock allocator is freeing pages to
+ * pagealloc, we use 2/3 of them as metadata for the remaining 1/3.
+ * We store the pointers to the returned blocks of pages in held_back[] grouped
+ * by their order: when kmsan_memblock_free_pages() is called for the first
+ * time with a certain order, it is reserved as a shadow block, for the second
+ * time - as an origin block. On the third time the incoming block receives its
+ * shadow and origin ranges from the previously saved shadow and origin blocks,
+ * after which held_back[order] can be used again.
+ *
+ * At the very end there may be leftover blocks in held_back[]. They are
+ * collected later by kmsan_memblock_discard().
+ */
+bool kmsan_memblock_free_pages(struct page *page, unsigned int order)
+{
+ struct page *shadow, *origin;
+
+ if (!held_back[order].shadow) {
+ held_back[order].shadow = page;
+ return false;
+ }
+ if (!held_back[order].origin) {
+ held_back[order].origin = page;
+ return false;
+ }
+ shadow = held_back[order].shadow;
+ origin = held_back[order].origin;
+ kmsan_setup_meta(page, shadow, origin, order);
+
+ held_back[order].shadow = NULL;
+ held_back[order].origin = NULL;
+ return true;
+}
+
+#define MAX_BLOCKS 8
+struct smallstack {
+ struct page *items[MAX_BLOCKS];
+ int index;
+ int order;
+};
+
+static struct smallstack collect = {
+ .index = 0,
+ .order = MAX_ORDER,
+};
+
+static void smallstack_push(struct smallstack *stack, struct page *pages)
+{
+ KMSAN_WARN_ON(stack->index == MAX_BLOCKS);
+ stack->items[stack->index] = pages;
+ stack->index++;
+}
+#undef MAX_BLOCKS
+
+static struct page *smallstack_pop(struct smallstack *stack)
+{
+ struct page *ret;
+
+ KMSAN_WARN_ON(stack->index == 0);
+ stack->index--;
+ ret = stack->items[stack->index];
+ stack->items[stack->index] = NULL;
+ return ret;
+}
+
+static void do_collection(void)
+{
+ struct page *page, *shadow, *origin;
+
+ while (collect.index >= 3) {
+ page = smallstack_pop(&collect);
+ shadow = smallstack_pop(&collect);
+ origin = smallstack_pop(&collect);
+ kmsan_setup_meta(page, shadow, origin, collect.order);
+ __free_pages_core(page, collect.order);
+ }
+}
+
+static void collect_split(void)
+{
+ struct smallstack tmp = {
+ .order = collect.order - 1,
+ .index = 0,
+ };
+ struct page *page;
+
+ if (!collect.order)
+ return;
+ while (collect.index) {
+ page = smallstack_pop(&collect);
+ smallstack_push(&tmp, &page[0]);
+ smallstack_push(&tmp, &page[1 << tmp.order]);
+ }
+ __memcpy(&collect, &tmp, sizeof(tmp));
+}
+
+/*
+ * Memblock is about to go away. Split the page blocks left over in held_back[]
+ * and return 1/3 of that memory to the system.
+ */
+static void kmsan_memblock_discard(void)
+{
+ /*
+ * For each order=N:
+ * - push held_back[N].shadow and .origin to @collect;
+ * - while there are >= 3 elements in @collect, do garbage collection:
+ * - pop 3 ranges from @collect;
+ * - use two of them as shadow and origin for the third one;
+ * - repeat;
+ * - split each remaining element from @collect into 2 ranges of
+ * order=N-1,
+ * - repeat.
+ */
+ collect.order = MAX_ORDER - 1;
+ for (int i = MAX_ORDER - 1; i >= 0; i--) {
+ if (held_back[i].shadow)
+ smallstack_push(&collect, held_back[i].shadow);
+ if (held_back[i].origin)
+ smallstack_push(&collect, held_back[i].origin);
+ held_back[i].shadow = NULL;
+ held_back[i].origin = NULL;
+ do_collection();
+ collect_split();
+ }
+}
+
+void __init kmsan_init_runtime(void)
+{
+ /* Assuming current is init_task */
+ kmsan_internal_task_create(current);
+ kmsan_memblock_discard();
+ pr_info("Starting KernelMemorySanitizer\n");
+ pr_info("ATTENTION: KMSAN is a debugging tool! Do not use it on production machines!\n");
+ kmsan_enabled = true;
+}
diff --git a/mm/kmsan/kmsan.h b/mm/kmsan/kmsan.h
index 77ee068c04ae9..7019c46d33a74 100644
--- a/mm/kmsan/kmsan.h
+++ b/mm/kmsan/kmsan.h
@@ -67,6 +67,7 @@ struct shadow_origin_ptr {
struct shadow_origin_ptr kmsan_get_shadow_origin_ptr(void *addr, u64 size,
bool store);
void *kmsan_get_metadata(void *addr, bool is_origin);
+void __init kmsan_init_alloc_meta_for_range(void *start, void *end);

enum kmsan_bug_reason {
REASON_ANY,
@@ -187,6 +188,8 @@ void kmsan_internal_check_memory(void *addr, size_t size, const void *user_addr,
int reason);

struct page *kmsan_vmalloc_to_page_or_null(void *vaddr);
+void kmsan_setup_meta(struct page *page, struct page *shadow,
+ struct page *origin, int order);

/*
* kmsan_internal_is_module_addr() and kmsan_internal_is_vmalloc_addr() are
diff --git a/mm/kmsan/shadow.c b/mm/kmsan/shadow.c
index 8c81a059beea6..6e90a806a7045 100644
--- a/mm/kmsan/shadow.c
+++ b/mm/kmsan/shadow.c
@@ -258,3 +258,37 @@ void kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end,
kfree(s_pages);
kfree(o_pages);
}
+
+/* Allocate metadata for pages allocated at boot time. */
+void __init kmsan_init_alloc_meta_for_range(void *start, void *end)
+{
+ struct page *shadow_p, *origin_p;
+ void *shadow, *origin;
+ struct page *page;
+ u64 size;
+
+ start = (void *)ALIGN_DOWN((u64)start, PAGE_SIZE);
+ size = ALIGN((u64)end - (u64)start, PAGE_SIZE);
+ shadow = memblock_alloc(size, PAGE_SIZE);
+ origin = memblock_alloc(size, PAGE_SIZE);
+ for (u64 addr = 0; addr < size; addr += PAGE_SIZE) {
+ page = virt_to_page_or_null((char *)start + addr);
+ shadow_p = virt_to_page_or_null((char *)shadow + addr);
+ set_no_shadow_origin_page(shadow_p);
+ shadow_page_for(page) = shadow_p;
+ origin_p = virt_to_page_or_null((char *)origin + addr);
+ set_no_shadow_origin_page(origin_p);
+ origin_page_for(page) = origin_p;
+ }
+}
+
+void kmsan_setup_meta(struct page *page, struct page *shadow,
+ struct page *origin, int order)
+{
+ for (int i = 0; i < (1 << order); i++) {
+ set_no_shadow_origin_page(&shadow[i]);
+ set_no_shadow_origin_page(&origin[i]);
+ shadow_page_for(&page[i]) = &shadow[i];
+ origin_page_for(&page[i]) = &origin[i];
+ }
+}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d488dab76a6e8..b28093e3bb42a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1806,6 +1806,10 @@ void __init memblock_free_pages(struct page *page, unsigned long pfn,
{
if (early_page_uninitialised(pfn))
return;
+ if (!kmsan_memblock_free_pages(page, order)) {
+ /* KMSAN will take care of these pages. */
+ return;
+ }
__free_pages_core(page, order);
}

--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:05:29 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
To avoid false positives, KMSAN needs to unpoison the data copied from
the userspace. To detect infoleaks - check the memory buffer passed to
copy_to_user().

Signed-off-by: Alexander Potapenko <gli...@google.com>
Reviewed-by: Marco Elver <el...@google.com>

---
v2:
-- move implementation of kmsan_copy_to_user() here

v5:
-- simplify kmsan_copy_to_user()
-- provide instrument_get_user() and instrument_put_user()

v6:
-- rebase after changing "x86: asm: instrument usercopy in get_user()
and put_user()"

Link: https://linux-review.googlesource.com/id/I43e93b9c02709e6be8d222342f1b044ac8bdbaaf
---
include/linux/instrumented.h | 18 ++++++++++++-----
include/linux/kmsan-checks.h | 19 ++++++++++++++++++
mm/kmsan/hooks.c | 38 ++++++++++++++++++++++++++++++++++++
3 files changed, 70 insertions(+), 5 deletions(-)

diff --git a/include/linux/instrumented.h b/include/linux/instrumented.h
index 9f1dba8f717b0..501fa84867494 100644
--- a/include/linux/instrumented.h
+++ b/include/linux/instrumented.h
@@ -2,7 +2,7 @@

/*
* This header provides generic wrappers for memory access instrumentation that
- * the compiler cannot emit for: KASAN, KCSAN.
+ * the compiler cannot emit for: KASAN, KCSAN, KMSAN.
*/
#ifndef _LINUX_INSTRUMENTED_H
#define _LINUX_INSTRUMENTED_H
@@ -10,6 +10,7 @@
#include <linux/compiler.h>
#include <linux/kasan-checks.h>
#include <linux/kcsan-checks.h>
+#include <linux/kmsan-checks.h>
#include <linux/types.h>

/**
@@ -117,6 +118,7 @@ instrument_copy_to_user(void __user *to, const void *from, unsigned long n)
{
kasan_check_read(from, n);
kcsan_check_read(from, n);
+ kmsan_copy_to_user(to, from, n, 0);
}

/**
@@ -151,6 +153,7 @@ static __always_inline void
instrument_copy_from_user_after(const void *to, const void __user *from,
unsigned long n, unsigned long left)
{
+ kmsan_unpoison_memory(to, n - left);
}

/**
@@ -162,10 +165,14 @@ instrument_copy_from_user_after(const void *to, const void __user *from,
*
* @to destination variable, may not be address-taken
*/
-#define instrument_get_user(to) \
-({ \
+#define instrument_get_user(to) \
+({ \
+ u64 __tmp = (u64)(to); \
+ kmsan_unpoison_memory(&__tmp, sizeof(__tmp)); \
+ to = __tmp; \
})

+
/**
* instrument_put_user() - add instrumentation to put_user()-like macros
*
@@ -177,8 +184,9 @@ instrument_copy_from_user_after(const void *to, const void __user *from,
* @ptr userspace pointer to copy to
* @size number of bytes to copy
*/
-#define instrument_put_user(from, ptr, size) \
-({ \
+#define instrument_put_user(from, ptr, size) \
+({ \
+ kmsan_copy_to_user(ptr, &from, sizeof(from), 0); \
})

#endif /* _LINUX_INSTRUMENTED_H */
diff --git a/include/linux/kmsan-checks.h b/include/linux/kmsan-checks.h
index a6522a0c28df9..c4cae333deec5 100644
--- a/include/linux/kmsan-checks.h
+++ b/include/linux/kmsan-checks.h
@@ -46,6 +46,21 @@ void kmsan_unpoison_memory(const void *address, size_t size);
*/
void kmsan_check_memory(const void *address, size_t size);

+/**
+ * kmsan_copy_to_user() - Notify KMSAN about a data transfer to userspace.
+ * @to: destination address in the userspace.
+ * @from: source address in the kernel.
+ * @to_copy: number of bytes to copy.
+ * @left: number of bytes not copied.
+ *
+ * If this is a real userspace data transfer, KMSAN checks the bytes that were
+ * actually copied to ensure there was no information leak. If @to belongs to
+ * the kernel space (which is possible for compat syscalls), KMSAN just copies
+ * the metadata.
+ */
+void kmsan_copy_to_user(void __user *to, const void *from, size_t to_copy,
+ size_t left);
+
#else

static inline void kmsan_poison_memory(const void *address, size_t size,
@@ -58,6 +73,10 @@ static inline void kmsan_unpoison_memory(const void *address, size_t size)
static inline void kmsan_check_memory(const void *address, size_t size)
{
}
+static inline void kmsan_copy_to_user(void __user *to, const void *from,
+ size_t to_copy, size_t left)
+{
+}

#endif

diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 6f3e64b0b61f8..5c0eb25d984d7 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -205,6 +205,44 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end)
kmsan_leave_runtime();
}

+void kmsan_copy_to_user(void __user *to, const void *from, size_t to_copy,
+ size_t left)
+{
+ unsigned long ua_flags;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ /*
+ * At this point we've copied the memory already. It's hard to check it
+ * before copying, as the size of actually copied buffer is unknown.
+ */
+
+ /* copy_to_user() may copy zero bytes. No need to check. */
+ if (!to_copy)
+ return;
+ /* Or maybe copy_to_user() failed to copy anything. */
+ if (to_copy <= left)
+ return;
+
+ ua_flags = user_access_save();
+ if ((u64)to < TASK_SIZE) {
+ /* This is a user memory access, check it. */
+ kmsan_internal_check_memory((void *)from, to_copy - left, to,
+ REASON_COPY_TO_USER);
+ } else {
+ /* Otherwise this is a kernel memory access. This happens when a
+ * compat syscall passes an argument allocated on the kernel
+ * stack to a real syscall.
+ * Don't check anything, just copy the shadow of the copied
+ * bytes.
+ */
+ kmsan_internal_memmove_metadata((void *)to, (void *)from,
+ to_copy - left);
+ }
+ user_access_restore(ua_flags);
+}
+EXPORT_SYMBOL(kmsan_copy_to_user);
+
/* Functions from kmsan-checks.h follow. */
void kmsan_poison_memory(const void *address, size_t size, gfp_t flags)
{
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:05:32 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
Functions from lib/iomap.c interact with hardware, so KMSAN must ensure
that:
- every read function returns an initialized value
- every write function checks values before sending them to hardware.

Signed-off-by: Alexander Potapenko <gli...@google.com>

---

v4:
-- switch from __no_sanitize_memory (which now means "no KMSAN
instrumentation") to __no_kmsan_checks (i.e. "unpoison everything")

Link: https://linux-review.googlesource.com/id/I45527599f09090aca046dfe1a26df453adab100d
---
lib/iomap.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)

diff --git a/lib/iomap.c b/lib/iomap.c
index fbaa3e8f19d6c..4f8b31baa5752 100644
--- a/lib/iomap.c
+++ b/lib/iomap.c
@@ -6,6 +6,7 @@
*/
#include <linux/pci.h>
#include <linux/io.h>
+#include <linux/kmsan-checks.h>

#include <linux/export.h>

@@ -70,26 +71,35 @@ static void bad_io_access(unsigned long port, const char *access)
#define mmio_read64be(addr) swab64(readq(addr))
#endif

+/*
+ * Here and below, we apply __no_kmsan_checks to functions reading data from
+ * hardware, to ensure that KMSAN marks their return values as initialized.
+ */
+__no_kmsan_checks
unsigned int ioread8(const void __iomem *addr)
{
IO_COND(addr, return inb(port), return readb(addr));
return 0xff;
}
+__no_kmsan_checks
unsigned int ioread16(const void __iomem *addr)
{
IO_COND(addr, return inw(port), return readw(addr));
return 0xffff;
}
+__no_kmsan_checks
unsigned int ioread16be(const void __iomem *addr)
{
IO_COND(addr, return pio_read16be(port), return mmio_read16be(addr));
return 0xffff;
}
+__no_kmsan_checks
unsigned int ioread32(const void __iomem *addr)
{
IO_COND(addr, return inl(port), return readl(addr));
return 0xffffffff;
}
+__no_kmsan_checks
unsigned int ioread32be(const void __iomem *addr)
{
IO_COND(addr, return pio_read32be(port), return mmio_read32be(addr));
@@ -142,18 +152,21 @@ static u64 pio_read64be_hi_lo(unsigned long port)
return lo | (hi << 32);
}

+__no_kmsan_checks
u64 ioread64_lo_hi(const void __iomem *addr)
{
IO_COND(addr, return pio_read64_lo_hi(port), return readq(addr));
return 0xffffffffffffffffULL;
}

+__no_kmsan_checks
u64 ioread64_hi_lo(const void __iomem *addr)
{
IO_COND(addr, return pio_read64_hi_lo(port), return readq(addr));
return 0xffffffffffffffffULL;
}

+__no_kmsan_checks
u64 ioread64be_lo_hi(const void __iomem *addr)
{
IO_COND(addr, return pio_read64be_lo_hi(port),
@@ -161,6 +174,7 @@ u64 ioread64be_lo_hi(const void __iomem *addr)
return 0xffffffffffffffffULL;
}

+__no_kmsan_checks
u64 ioread64be_hi_lo(const void __iomem *addr)
{
IO_COND(addr, return pio_read64be_hi_lo(port),
@@ -188,22 +202,32 @@ EXPORT_SYMBOL(ioread64be_hi_lo);

void iowrite8(u8 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, outb(val,port), writeb(val, addr));
}
void iowrite16(u16 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, outw(val,port), writew(val, addr));
}
void iowrite16be(u16 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, pio_write16be(val,port), mmio_write16be(val, addr));
}
void iowrite32(u32 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, outl(val,port), writel(val, addr));
}
void iowrite32be(u32 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, pio_write32be(val,port), mmio_write32be(val, addr));
}
EXPORT_SYMBOL(iowrite8);
@@ -239,24 +263,32 @@ static void pio_write64be_hi_lo(u64 val, unsigned long port)

void iowrite64_lo_hi(u64 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, pio_write64_lo_hi(val, port),
writeq(val, addr));
}

void iowrite64_hi_lo(u64 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, pio_write64_hi_lo(val, port),
writeq(val, addr));
}

void iowrite64be_lo_hi(u64 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, pio_write64be_lo_hi(val, port),
mmio_write64be(val, addr));
}

void iowrite64be_hi_lo(u64 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, pio_write64be_hi_lo(val, port),
mmio_write64be(val, addr));
}
@@ -328,14 +360,20 @@ static inline void mmio_outsl(void __iomem *addr, const u32 *src, int count)
void ioread8_rep(const void __iomem *addr, void *dst, unsigned long count)
{
IO_COND(addr, insb(port,dst,count), mmio_insb(addr, dst, count));
+ /* KMSAN must treat values read from devices as initialized. */
+ kmsan_unpoison_memory(dst, count);
}
void ioread16_rep(const void __iomem *addr, void *dst, unsigned long count)
{
IO_COND(addr, insw(port,dst,count), mmio_insw(addr, dst, count));
+ /* KMSAN must treat values read from devices as initialized. */
+ kmsan_unpoison_memory(dst, count * 2);
}
void ioread32_rep(const void __iomem *addr, void *dst, unsigned long count)
{
IO_COND(addr, insl(port,dst,count), mmio_insl(addr, dst, count));
+ /* KMSAN must treat values read from devices as initialized. */
+ kmsan_unpoison_memory(dst, count * 4);
}
EXPORT_SYMBOL(ioread8_rep);
EXPORT_SYMBOL(ioread16_rep);
@@ -343,14 +381,20 @@ EXPORT_SYMBOL(ioread32_rep);

void iowrite8_rep(void __iomem *addr, const void *src, unsigned long count)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(src, count);
IO_COND(addr, outsb(port, src, count), mmio_outsb(addr, src, count));
}
void iowrite16_rep(void __iomem *addr, const void *src, unsigned long count)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(src, count * 2);
IO_COND(addr, outsw(port, src, count), mmio_outsw(addr, src, count));
}
void iowrite32_rep(void __iomem *addr, const void *src, unsigned long count)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(src, count * 4);
IO_COND(addr, outsl(port, src,count), mmio_outsl(addr, src, count));
}
EXPORT_SYMBOL(iowrite8_rep);
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:05:35 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
KMSAN does not know that the device initializes certain bytes in
ps2dev->cmdbuf. Call kmsan_unpoison_memory() to explicitly mark them as
initialized.

Signed-off-by: Alexander Potapenko <gli...@google.com>
---
Link: https://linux-review.googlesource.com/id/I2d26f6baa45271d37320d3f4a528c39cb7e545f0
---
drivers/input/serio/libps2.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/input/serio/libps2.c b/drivers/input/serio/libps2.c
index 250e213cc80c6..3e19344eda93c 100644
--- a/drivers/input/serio/libps2.c
+++ b/drivers/input/serio/libps2.c
@@ -12,6 +12,7 @@
#include <linux/sched.h>
#include <linux/interrupt.h>
#include <linux/input.h>
+#include <linux/kmsan-checks.h>
#include <linux/serio.h>
#include <linux/i8042.h>
#include <linux/libps2.h>
@@ -294,9 +295,11 @@ int __ps2_command(struct ps2dev *ps2dev, u8 *param, unsigned int command)

serio_pause_rx(ps2dev->serio);

- if (param)
+ if (param) {
for (i = 0; i < receive; i++)
param[i] = ps2dev->cmdbuf[(receive - 1) - i];
+ kmsan_unpoison_memory(param, receive);
+ }

if (ps2dev->cmdcnt &&
(command != PS2_CMD_RESET_BAT || ps2dev->cmdcnt != 1)) {
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:05:37 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
KMSAN doesn't know about DMA memory writes performed by devices.
We unpoison such memory when it's mapped to avoid false positive
reports.

Signed-off-by: Alexander Potapenko <gli...@google.com>
---
v2:
-- move implementation of kmsan_handle_dma() and kmsan_handle_dma_sg() here

v4:
-- swap dma: and kmsan: int the subject

v5:
-- do not export KMSAN hooks that are not called from modules

v6:
-- add a missing #include <linux/kmsan.h>

Link: https://linux-review.googlesource.com/id/Ia162dc4c5a92e74d4686c1be32a4dfeffc5c32cd
---
include/linux/kmsan.h | 41 ++++++++++++++++++++++++++++++
kernel/dma/mapping.c | 10 +++++---
mm/kmsan/hooks.c | 59 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 107 insertions(+), 3 deletions(-)

diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index e00de976ee438..dac296da45c55 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -9,6 +9,7 @@
#ifndef _LINUX_KMSAN_H
#define _LINUX_KMSAN_H

+#include <linux/dma-direction.h>
#include <linux/gfp.h>
#include <linux/kmsan-checks.h>
#include <linux/types.h>
@@ -16,6 +17,7 @@
struct page;
struct kmem_cache;
struct task_struct;
+struct scatterlist;

#ifdef CONFIG_KMSAN

@@ -172,6 +174,35 @@ void kmsan_ioremap_page_range(unsigned long addr, unsigned long end,
*/
void kmsan_iounmap_page_range(unsigned long start, unsigned long end);

+/**
+ * kmsan_handle_dma() - Handle a DMA data transfer.
+ * @page: first page of the buffer.
+ * @offset: offset of the buffer within the first page.
+ * @size: buffer size.
+ * @dir: one of possible dma_data_direction values.
+ *
+ * Depending on @direction, KMSAN:
+ * * checks the buffer, if it is copied to device;
+ * * initializes the buffer, if it is copied from device;
+ * * does both, if this is a DMA_BIDIRECTIONAL transfer.
+ */
+void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+ enum dma_data_direction dir);
+
+/**
+ * kmsan_handle_dma_sg() - Handle a DMA transfer using scatterlist.
+ * @sg: scatterlist holding DMA buffers.
+ * @nents: number of scatterlist entries.
+ * @dir: one of possible dma_data_direction values.
+ *
+ * Depending on @direction, KMSAN:
+ * * checks the buffers in the scatterlist, if they are copied to device;
+ * * initializes the buffers, if they are copied from device;
+ * * does both, if this is a DMA_BIDIRECTIONAL transfer.
+ */
+void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
+ enum dma_data_direction dir);
+
#else

static inline void kmsan_init_shadow(void)
@@ -254,6 +285,16 @@ static inline void kmsan_iounmap_page_range(unsigned long start,
{
}

+static inline void kmsan_handle_dma(struct page *page, size_t offset,
+ size_t size, enum dma_data_direction dir)
+{
+}
+
+static inline void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
+ enum dma_data_direction dir)
+{
+}
+
#endif

#endif /* _LINUX_KMSAN_H */
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 27f272381cf27..33437d6206445 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -10,6 +10,7 @@
#include <linux/dma-map-ops.h>
#include <linux/export.h>
#include <linux/gfp.h>
+#include <linux/kmsan.h>
#include <linux/of_device.h>
#include <linux/slab.h>
#include <linux/vmalloc.h>
@@ -156,6 +157,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
+ kmsan_handle_dma(page, offset, size, dir);
debug_dma_map_page(dev, page, offset, size, dir, addr, attrs);

return addr;
@@ -194,11 +196,13 @@ static int __dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
else
ents = ops->map_sg(dev, sg, nents, dir, attrs);

- if (ents > 0)
+ if (ents > 0) {
+ kmsan_handle_dma_sg(sg, nents, dir);
debug_dma_map_sg(dev, sg, nents, ents, dir, attrs);
- else if (WARN_ON_ONCE(ents != -EINVAL && ents != -ENOMEM &&
- ents != -EIO && ents != -EREMOTEIO))
+ } else if (WARN_ON_ONCE(ents != -EINVAL && ents != -ENOMEM &&
+ ents != -EIO && ents != -EREMOTEIO)) {
return -EIO;
+ }

return ents;
}
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 5c0eb25d984d7..563c09443a37a 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -10,10 +10,12 @@
*/

#include <linux/cacheflush.h>
+#include <linux/dma-direction.h>
#include <linux/gfp.h>
#include <linux/kmsan.h>
#include <linux/mm.h>
#include <linux/mm_types.h>
+#include <linux/scatterlist.h>
#include <linux/slab.h>
#include <linux/uaccess.h>

@@ -243,6 +245,63 @@ void kmsan_copy_to_user(void __user *to, const void *from, size_t to_copy,
}
EXPORT_SYMBOL(kmsan_copy_to_user);

+static void kmsan_handle_dma_page(const void *addr, size_t size,
+ enum dma_data_direction dir)
+{
+ switch (dir) {
+ case DMA_BIDIRECTIONAL:
+ kmsan_internal_check_memory((void *)addr, size, /*user_addr*/ 0,
+ REASON_ANY);
+ kmsan_internal_unpoison_memory((void *)addr, size,
+ /*checked*/ false);
+ break;
+ case DMA_TO_DEVICE:
+ kmsan_internal_check_memory((void *)addr, size, /*user_addr*/ 0,
+ REASON_ANY);
+ break;
+ case DMA_FROM_DEVICE:
+ kmsan_internal_unpoison_memory((void *)addr, size,
+ /*checked*/ false);
+ break;
+ case DMA_NONE:
+ break;
+ }
+}
+
+/* Helper function to handle DMA data transfers. */
+void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+ enum dma_data_direction dir)
+{
+ u64 page_offset, to_go, addr;
+
+ if (PageHighMem(page))
+ return;
+ addr = (u64)page_address(page) + offset;
+ /*
+ * The kernel may occasionally give us adjacent DMA pages not belonging
+ * to the same allocation. Process them separately to avoid triggering
+ * internal KMSAN checks.
+ */
+ while (size > 0) {
+ page_offset = addr % PAGE_SIZE;
+ to_go = min(PAGE_SIZE - page_offset, (u64)size);
+ kmsan_handle_dma_page((void *)addr, to_go, dir);
+ addr += to_go;
+ size -= to_go;
+ }
+}
+
+void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
+ enum dma_data_direction dir)
+{
+ struct scatterlist *item;
+ int i;
+
+ for_each_sg(sg, item, nents, i)
+ kmsan_handle_dma(sg_page(item), item->offset, item->length,
+ dir);
+}

Alexander Potapenko

unread,
Sep 15, 2022, 11:05:40 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
If vring doesn't use the DMA API, KMSAN is unable to tell whether the
memory is initialized by hardware. Explicitly call kmsan_handle_dma()
from vring_map_one_sg() in this case to prevent false positives.

Signed-off-by: Alexander Potapenko <gli...@google.com>
Acked-by: Michael S. Tsirkin <m...@redhat.com>

---
v4:
-- swap virtio: and kmsan: in the subject

v6:
-- use <linux/kmsan.h> instead of <linux/kmsan-checks.h>

Link: https://linux-review.googlesource.com/id/I211533ecb86a66624e151551f83ddd749536b3af
---
drivers/virtio/virtio_ring.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 4620e9d79dde8..8974c34b40fda 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -11,6 +11,7 @@
#include <linux/module.h>
#include <linux/hrtimer.h>
#include <linux/dma-mapping.h>
+#include <linux/kmsan.h>
#include <linux/spinlock.h>
#include <xen/xen.h>

@@ -352,8 +353,15 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
struct scatterlist *sg,
enum dma_data_direction direction)
{
- if (!vq->use_dma_api)
+ if (!vq->use_dma_api) {
+ /*
+ * If DMA is not used, KMSAN doesn't know that the scatterlist
+ * is initialized by the hardware. Explicitly check/unpoison it
+ * depending on the direction.
+ */
+ kmsan_handle_dma(sg_page(sg), sg->offset, sg->length, direction);
return (dma_addr_t)sg_phys(sg);
+ }

/*
* We can't use dma_map_sg, because we don't use scatterlists in
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:05:43 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
Depending on the value of is_out kmsan_handle_urb() KMSAN either
marks the data copied to the kernel from a USB device as initialized,
or checks the data sent to the device for being initialized.

Signed-off-by: Alexander Potapenko <gli...@google.com>

---
v2:
-- move kmsan_handle_urb() implementation to this patch

v5:
-- do not export KMSAN hooks that are not called from modules

v6:
-- use <linux/kmsan.h> instead of <linux/kmsan-checks.h>

Link: https://linux-review.googlesource.com/id/Ifa67fb72015d4de14c30e971556f99fc8b2ee506
---
drivers/usb/core/urb.c | 2 ++
include/linux/kmsan.h | 15 +++++++++++++++
mm/kmsan/hooks.c | 16 ++++++++++++++++
3 files changed, 33 insertions(+)

diff --git a/drivers/usb/core/urb.c b/drivers/usb/core/urb.c
index 33d62d7e3929f..9f3c54032556e 100644
--- a/drivers/usb/core/urb.c
+++ b/drivers/usb/core/urb.c
@@ -8,6 +8,7 @@
#include <linux/bitops.h>
#include <linux/slab.h>
#include <linux/log2.h>
+#include <linux/kmsan.h>
#include <linux/usb.h>
#include <linux/wait.h>
#include <linux/usb/hcd.h>
@@ -426,6 +427,7 @@ int usb_submit_urb(struct urb *urb, gfp_t mem_flags)
URB_SETUP_MAP_SINGLE | URB_SETUP_MAP_LOCAL |
URB_DMA_SG_COMBINED);
urb->transfer_flags |= (is_out ? URB_DIR_OUT : URB_DIR_IN);
+ kmsan_handle_urb(urb, is_out);

if (xfertype != USB_ENDPOINT_XFER_CONTROL &&
dev->state < USB_STATE_CONFIGURED)
diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index dac296da45c55..c473e0e21683c 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -18,6 +18,7 @@ struct page;
struct kmem_cache;
struct task_struct;
struct scatterlist;
+struct urb;

#ifdef CONFIG_KMSAN

@@ -203,6 +204,16 @@ void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
enum dma_data_direction dir);

+/**
+ * kmsan_handle_urb() - Handle a USB data transfer.
+ * @urb: struct urb pointer.
+ * @is_out: data transfer direction (true means output to hardware).
+ *
+ * If @is_out is true, KMSAN checks the transfer buffer of @urb. Otherwise,
+ * KMSAN initializes the transfer buffer.
+ */
+void kmsan_handle_urb(const struct urb *urb, bool is_out);
+
#else

static inline void kmsan_init_shadow(void)
@@ -295,6 +306,10 @@ static inline void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
{
}

+static inline void kmsan_handle_urb(const struct urb *urb, bool is_out)
+{
+}
+
#endif

#endif /* _LINUX_KMSAN_H */
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 563c09443a37a..79d7e73e2cfd8 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -18,6 +18,7 @@
#include <linux/scatterlist.h>
#include <linux/slab.h>
#include <linux/uaccess.h>
+#include <linux/usb.h>

#include "../internal.h"
#include "../slab.h"
@@ -245,6 +246,21 @@ void kmsan_copy_to_user(void __user *to, const void *from, size_t to_copy,
}
EXPORT_SYMBOL(kmsan_copy_to_user);

+/* Helper function to check an URB. */
+void kmsan_handle_urb(const struct urb *urb, bool is_out)
+{
+ if (!urb)
+ return;
+ if (is_out)
+ kmsan_internal_check_memory(urb->transfer_buffer,
+ urb->transfer_buffer_length,
+ /*user_addr*/ 0, REASON_SUBMIT_URB);
+ else
+ kmsan_internal_unpoison_memory(urb->transfer_buffer,
+ urb->transfer_buffer_length,
+ /*checked*/ false);
+}
+
static void kmsan_handle_dma_page(const void *addr, size_t size,
enum dma_data_direction dir)
{
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:05:45 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
The testing module triggers KMSAN warnings in different cases and checks
that the errors are properly reported, using console probes to capture
the tool's output.

Signed-off-by: Alexander Potapenko <gli...@google.com>
---
v2:
-- add memcpy tests

v4:
-- change sizeof(type) to sizeof(*ptr)
-- add test expectations for CONFIG_KMSAN_CHECK_PARAM_RETVAL

v5:
-- reapply clang-format
-- use modern style for-loops
-- address Marco Elver's comments

v7:
-- add test_long_origin_chain to check for origin chains exceeding max
depth

Link: https://linux-review.googlesource.com/id/I49c3f59014cc37fd13541c80beb0b75a75244650
---
lib/Kconfig.kmsan | 12 +
mm/kmsan/Makefile | 4 +
mm/kmsan/kmsan_test.c | 581 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 597 insertions(+)
create mode 100644 mm/kmsan/kmsan_test.c

diff --git a/lib/Kconfig.kmsan b/lib/Kconfig.kmsan
index 5b19dbd34d76e..b2489dd6503fa 100644
--- a/lib/Kconfig.kmsan
+++ b/lib/Kconfig.kmsan
@@ -47,4 +47,16 @@ config KMSAN_CHECK_PARAM_RETVAL
may potentially report errors in corner cases when non-instrumented
functions call instrumented ones.

+config KMSAN_KUNIT_TEST
+ tristate "KMSAN integration test suite" if !KUNIT_ALL_TESTS
+ default KUNIT_ALL_TESTS
+ depends on TRACEPOINTS && KUNIT
+ help
+ Test suite for KMSAN, testing various error detection scenarios,
+ and checking that reports are correctly output to console.
+
+ Say Y here if you want the test to be built into the kernel and run
+ during boot; say M if you want the test to build as a module; say N
+ if you are unsure.
+
endif
diff --git a/mm/kmsan/Makefile b/mm/kmsan/Makefile
index 401acb1a491ce..98eab2856626f 100644
--- a/mm/kmsan/Makefile
+++ b/mm/kmsan/Makefile
@@ -22,3 +22,7 @@ CFLAGS_init.o := $(CC_FLAGS_KMSAN_RUNTIME)
CFLAGS_instrumentation.o := $(CC_FLAGS_KMSAN_RUNTIME)
CFLAGS_report.o := $(CC_FLAGS_KMSAN_RUNTIME)
CFLAGS_shadow.o := $(CC_FLAGS_KMSAN_RUNTIME)
+
+obj-$(CONFIG_KMSAN_KUNIT_TEST) += kmsan_test.o
+KMSAN_SANITIZE_kmsan_test.o := y
+CFLAGS_kmsan_test.o += $(call cc-disable-warning, uninitialized)
diff --git a/mm/kmsan/kmsan_test.c b/mm/kmsan/kmsan_test.c
new file mode 100644
index 0000000000000..9a29ea2dbfb9b
--- /dev/null
+++ b/mm/kmsan/kmsan_test.c
@@ -0,0 +1,581 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Test cases for KMSAN.
+ * For each test case checks the presence (or absence) of generated reports.
+ * Relies on 'console' tracepoint to capture reports as they appear in the
+ * kernel log.
+ *
+ * Copyright (C) 2021-2022, Google LLC.
+ * Author: Alexander Potapenko <gli...@google.com>
+ *
+ */
+
+#include <kunit/test.h>
+#include "kmsan.h"
+
+#include <linux/jiffies.h>
+#include <linux/kernel.h>
+#include <linux/kmsan.h>
+#include <linux/mm.h>
+#include <linux/random.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/string.h>
+#include <linux/tracepoint.h>
+#include <trace/events/printk.h>
+
+static DEFINE_PER_CPU(int, per_cpu_var);
+
+/* Report as observed from console. */
+static struct {
+ spinlock_t lock;
+ bool available;
+ bool ignore; /* Stop console output collection. */
+ char header[256];
+} observed = {
+ .lock = __SPIN_LOCK_UNLOCKED(observed.lock),
+};
+
+/* Probe for console output: obtains observed lines of interest. */
+static void probe_console(void *ignore, const char *buf, size_t len)
+{
+ unsigned long flags;
+
+ if (observed.ignore)
+ return;
+ spin_lock_irqsave(&observed.lock, flags);
+
+ if (strnstr(buf, "BUG: KMSAN: ", len)) {
+ /*
+ * KMSAN report and related to the test.
+ *
+ * The provided @buf is not NUL-terminated; copy no more than
+ * @len bytes and let strscpy() add the missing NUL-terminator.
+ */
+ strscpy(observed.header, buf,
+ min(len + 1, sizeof(observed.header)));
+ WRITE_ONCE(observed.available, true);
+ observed.ignore = true;
+ }
+ spin_unlock_irqrestore(&observed.lock, flags);
+}
+
+/* Check if a report related to the test exists. */
+static bool report_available(void)
+{
+ return READ_ONCE(observed.available);
+}
+
+/* Information we expect in a report. */
+struct expect_report {
+ const char *error_type; /* Error type. */
+ /*
+ * Kernel symbol from the error header, or NULL if no report is
+ * expected.
+ */
+ const char *symbol;
+};
+
+/* Check observed report matches information in @r. */
+static bool report_matches(const struct expect_report *r)
+{
+ typeof(observed.header) expected_header;
+ unsigned long flags;
+ bool ret = false;
+ const char *end;
+ char *cur;
+
+ /* Doubled-checked locking. */
+ if (!report_available() || !r->symbol)
+ return (!report_available() && !r->symbol);
+
+ /* Generate expected report contents. */
+
+ /* Title */
+ cur = expected_header;
+ end = &expected_header[sizeof(expected_header) - 1];
+
+ cur += scnprintf(cur, end - cur, "BUG: KMSAN: %s", r->error_type);
+
+ scnprintf(cur, end - cur, " in %s", r->symbol);
+ /* The exact offset won't match, remove it; also strip module name. */
+ cur = strchr(expected_header, '+');
+ if (cur)
+ *cur = '\0';
+
+ spin_lock_irqsave(&observed.lock, flags);
+ if (!report_available())
+ goto out; /* A new report is being captured. */
+
+ /* Finally match expected output to what we actually observed. */
+ ret = strstr(observed.header, expected_header);
+out:
+ spin_unlock_irqrestore(&observed.lock, flags);
+
+ return ret;
+}
+
+/* ===== Test cases ===== */
+
+/* Prevent replacing branch with select in LLVM. */
+static noinline void check_true(char *arg)
+{
+ pr_info("%s is true\n", arg);
+}
+
+static noinline void check_false(char *arg)
+{
+ pr_info("%s is false\n", arg);
+}
+
+#define USE(x) \
+ do { \
+ if (x) \
+ check_true(#x); \
+ else \
+ check_false(#x); \
+ } while (0)
+
+#define EXPECTATION_ETYPE_FN(e, reason, fn) \
+ struct expect_report e = { \
+ .error_type = reason, \
+ .symbol = fn, \
+ }
+
+#define EXPECTATION_NO_REPORT(e) EXPECTATION_ETYPE_FN(e, NULL, NULL)
+#define EXPECTATION_UNINIT_VALUE_FN(e, fn) \
+ EXPECTATION_ETYPE_FN(e, "uninit-value", fn)
+#define EXPECTATION_UNINIT_VALUE(e) EXPECTATION_UNINIT_VALUE_FN(e, __func__)
+#define EXPECTATION_USE_AFTER_FREE(e) \
+ EXPECTATION_ETYPE_FN(e, "use-after-free", __func__)
+
+/* Test case: ensure that kmalloc() returns uninitialized memory. */
+static void test_uninit_kmalloc(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE(expect);
+ int *ptr;
+
+ kunit_info(test, "uninitialized kmalloc test (UMR report)\n");
+ ptr = kmalloc(sizeof(*ptr), GFP_KERNEL);
+ USE(*ptr);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * Test case: ensure that kmalloc'ed memory becomes initialized after memset().
+ */
+static void test_init_kmalloc(struct kunit *test)
+{
+ EXPECTATION_NO_REPORT(expect);
+ int *ptr;
+
+ kunit_info(test, "initialized kmalloc test (no reports)\n");
+ ptr = kmalloc(sizeof(*ptr), GFP_KERNEL);
+ memset(ptr, 0, sizeof(*ptr));
+ USE(*ptr);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/* Test case: ensure that kzalloc() returns initialized memory. */
+static void test_init_kzalloc(struct kunit *test)
+{
+ EXPECTATION_NO_REPORT(expect);
+ int *ptr;
+
+ kunit_info(test, "initialized kzalloc test (no reports)\n");
+ ptr = kzalloc(sizeof(*ptr), GFP_KERNEL);
+ USE(*ptr);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/* Test case: ensure that local variables are uninitialized by default. */
+static void test_uninit_stack_var(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE(expect);
+ volatile int cond;
+
+ kunit_info(test, "uninitialized stack variable (UMR report)\n");
+ USE(cond);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/* Test case: ensure that local variables with initializers are initialized. */
+static void test_init_stack_var(struct kunit *test)
+{
+ EXPECTATION_NO_REPORT(expect);
+ volatile int cond = 1;
+
+ kunit_info(test, "initialized stack variable (no reports)\n");
+ USE(cond);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static noinline void two_param_fn_2(int arg1, int arg2)
+{
+ USE(arg1);
+ USE(arg2);
+}
+
+static noinline void one_param_fn(int arg)
+{
+ two_param_fn_2(arg, arg);
+ USE(arg);
+}
+
+static noinline void two_param_fn(int arg1, int arg2)
+{
+ int init = 0;
+
+ one_param_fn(init);
+ USE(arg1);
+ USE(arg2);
+}
+
+static void test_params(struct kunit *test)
+{
+#ifdef CONFIG_KMSAN_CHECK_PARAM_RETVAL
+ /*
+ * With eager param/retval checking enabled, KMSAN will report an error
+ * before the call to two_param_fn().
+ */
+ EXPECTATION_UNINIT_VALUE_FN(expect, "test_params");
+#else
+ EXPECTATION_UNINIT_VALUE_FN(expect, "two_param_fn");
+#endif
+ volatile int uninit, init = 1;
+
+ kunit_info(test,
+ "uninit passed through a function parameter (UMR report)\n");
+ two_param_fn(uninit, init);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static int signed_sum3(int a, int b, int c)
+{
+ return a + b + c;
+}
+
+/*
+ * Test case: ensure that uninitialized values are tracked through function
+ * arguments.
+ */
+static void test_uninit_multiple_params(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE(expect);
+ volatile char b = 3, c;
+ volatile int a;
+
+ kunit_info(test, "uninitialized local passed to fn (UMR report)\n");
+ USE(signed_sum3(a, b, c));
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/* Helper function to make an array uninitialized. */
+static noinline void do_uninit_local_array(char *array, int start, int stop)
+{
+ volatile char uninit;
+
+ for (int i = start; i < stop; i++)
+ array[i] = uninit;
+}
+
+/*
+ * Test case: ensure kmsan_check_memory() reports an error when checking
+ * uninitialized memory.
+ */
+static void test_uninit_kmsan_check_memory(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE_FN(expect, "test_uninit_kmsan_check_memory");
+ volatile char local_array[8];
+
+ kunit_info(
+ test,
+ "kmsan_check_memory() called on uninit local (UMR report)\n");
+ do_uninit_local_array((char *)local_array, 5, 7);
+
+ kmsan_check_memory((char *)local_array, 8);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * Test case: check that a virtual memory range created with vmap() from
+ * initialized pages is still considered as initialized.
+ */
+static void test_init_kmsan_vmap_vunmap(struct kunit *test)
+{
+ EXPECTATION_NO_REPORT(expect);
+ const int npages = 2;
+ struct page **pages;
+ void *vbuf;
+
+ kunit_info(test, "pages initialized via vmap (no reports)\n");
+
+ pages = kmalloc_array(npages, sizeof(*pages), GFP_KERNEL);
+ for (int i = 0; i < npages; i++)
+ pages[i] = alloc_page(GFP_KERNEL);
+ vbuf = vmap(pages, npages, VM_MAP, PAGE_KERNEL);
+ memset(vbuf, 0xfe, npages * PAGE_SIZE);
+ for (int i = 0; i < npages; i++)
+ kmsan_check_memory(page_address(pages[i]), PAGE_SIZE);
+
+ if (vbuf)
+ vunmap(vbuf);
+ for (int i = 0; i < npages; i++) {
+ if (pages[i])
+ __free_page(pages[i]);
+ }
+ kfree(pages);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * Test case: ensure that memset() can initialize a buffer allocated via
+ * vmalloc().
+ */
+static void test_init_vmalloc(struct kunit *test)
+{
+ EXPECTATION_NO_REPORT(expect);
+ int npages = 8;
+ char *buf;
+
+ kunit_info(test, "vmalloc buffer can be initialized (no reports)\n");
+ buf = vmalloc(PAGE_SIZE * npages);
+ buf[0] = 1;
+ memset(buf, 0xfe, PAGE_SIZE * npages);
+ USE(buf[0]);
+ for (int i = 0; i < npages; i++)
+ kmsan_check_memory(&buf[PAGE_SIZE * i], PAGE_SIZE);
+ vfree(buf);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/* Test case: ensure that use-after-free reporting works. */
+static void test_uaf(struct kunit *test)
+{
+ EXPECTATION_USE_AFTER_FREE(expect);
+ volatile int value;
+ volatile int *var;
+
+ kunit_info(test, "use-after-free in kmalloc-ed buffer (UMR report)\n");
+ var = kmalloc(80, GFP_KERNEL);
+ var[3] = 0xfeedface;
+ kfree((int *)var);
+ /* Copy the invalid value before checking it. */
+ value = var[3];
+ USE(value);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * Test case: ensure that uninitialized values are propagated through per-CPU
+ * memory.
+ */
+static void test_percpu_propagate(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE(expect);
+ volatile int uninit, check;
+
+ kunit_info(test,
+ "uninit local stored to per_cpu memory (UMR report)\n");
+
+ this_cpu_write(per_cpu_var, uninit);
+ check = this_cpu_read(per_cpu_var);
+ USE(check);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * Test case: ensure that passing uninitialized values to printk() leads to an
+ * error report.
+ */
+static void test_printk(struct kunit *test)
+{
+#ifdef CONFIG_KMSAN_CHECK_PARAM_RETVAL
+ /*
+ * With eager param/retval checking enabled, KMSAN will report an error
+ * before the call to pr_info().
+ */
+ EXPECTATION_UNINIT_VALUE_FN(expect, "test_printk");
+#else
+ EXPECTATION_UNINIT_VALUE_FN(expect, "number");
+#endif
+ volatile int uninit;
+
+ kunit_info(test, "uninit local passed to pr_info() (UMR report)\n");
+ pr_info("%px contains %d\n", &uninit, uninit);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * Test case: ensure that memcpy() correctly copies uninitialized values between
+ * aligned `src` and `dst`.
+ */
+static void test_memcpy_aligned_to_aligned(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE_FN(expect, "test_memcpy_aligned_to_aligned");
+ volatile int uninit_src;
+ volatile int dst = 0;
+
+ kunit_info(
+ test,
+ "memcpy()ing aligned uninit src to aligned dst (UMR report)\n");
+ memcpy((void *)&dst, (void *)&uninit_src, sizeof(uninit_src));
+ kmsan_check_memory((void *)&dst, sizeof(dst));
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * Test case: ensure that memcpy() correctly copies uninitialized values between
+ * aligned `src` and unaligned `dst`.
+ *
+ * Copying aligned 4-byte value to an unaligned one leads to touching two
+ * aligned 4-byte values. This test case checks that KMSAN correctly reports an
+ * error on the first of the two values.
+ */
+static void test_memcpy_aligned_to_unaligned(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE_FN(expect, "test_memcpy_aligned_to_unaligned");
+ volatile int uninit_src;
+ volatile char dst[8] = { 0 };
+
+ kunit_info(
+ test,
+ "memcpy()ing aligned uninit src to unaligned dst (UMR report)\n");
+ memcpy((void *)&dst[1], (void *)&uninit_src, sizeof(uninit_src));
+ kmsan_check_memory((void *)dst, 4);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * Test case: ensure that memcpy() correctly copies uninitialized values between
+ * aligned `src` and unaligned `dst`.
+ *
+ * Copying aligned 4-byte value to an unaligned one leads to touching two
+ * aligned 4-byte values. This test case checks that KMSAN correctly reports an
+ * error on the second of the two values.
+ */
+static void test_memcpy_aligned_to_unaligned2(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE_FN(expect,
+ "test_memcpy_aligned_to_unaligned2");
+ volatile int uninit_src;
+ volatile char dst[8] = { 0 };
+
+ kunit_info(
+ test,
+ "memcpy()ing aligned uninit src to unaligned dst - part 2 (UMR report)\n");
+ memcpy((void *)&dst[1], (void *)&uninit_src, sizeof(uninit_src));
+ kmsan_check_memory((void *)&dst[4], sizeof(uninit_src));
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static noinline void fibonacci(int *array, int size, int start) {
+ if (start < 2 || (start == size))
+ return;
+ array[start] = array[start - 1] + array[start - 2];
+ fibonacci(array, size, start + 1);
+}
+
+static void test_long_origin_chain(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE_FN(expect,
+ "test_long_origin_chain");
+ /* (KMSAN_MAX_ORIGIN_DEPTH * 2) recursive calls to fibonacci(). */
+ volatile int accum[KMSAN_MAX_ORIGIN_DEPTH * 2 + 2];
+ int last = ARRAY_SIZE(accum) - 1;
+
+ kunit_info(
+ test,
+ "origin chain exceeding KMSAN_MAX_ORIGIN_DEPTH (UMR report)\n");
+ /*
+ * We do not set accum[1] to 0, so the uninitializedness will be carried
+ * over to accum[2..last].
+ */
+ accum[0] = 1;
+ fibonacci((int *)accum, ARRAY_SIZE(accum), 2);
+ kmsan_check_memory((void *)&accum[last], sizeof(int));
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static struct kunit_case kmsan_test_cases[] = {
+ KUNIT_CASE(test_uninit_kmalloc),
+ KUNIT_CASE(test_init_kmalloc),
+ KUNIT_CASE(test_init_kzalloc),
+ KUNIT_CASE(test_uninit_stack_var),
+ KUNIT_CASE(test_init_stack_var),
+ KUNIT_CASE(test_params),
+ KUNIT_CASE(test_uninit_multiple_params),
+ KUNIT_CASE(test_uninit_kmsan_check_memory),
+ KUNIT_CASE(test_init_kmsan_vmap_vunmap),
+ KUNIT_CASE(test_init_vmalloc),
+ KUNIT_CASE(test_uaf),
+ KUNIT_CASE(test_percpu_propagate),
+ KUNIT_CASE(test_printk),
+ KUNIT_CASE(test_memcpy_aligned_to_aligned),
+ KUNIT_CASE(test_memcpy_aligned_to_unaligned),
+ KUNIT_CASE(test_memcpy_aligned_to_unaligned2),
+ KUNIT_CASE(test_long_origin_chain),
+ {},
+};
+
+/* ===== End test cases ===== */
+
+static int test_init(struct kunit *test)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&observed.lock, flags);
+ observed.header[0] = '\0';
+ observed.ignore = false;
+ observed.available = false;
+ spin_unlock_irqrestore(&observed.lock, flags);
+
+ return 0;
+}
+
+static void test_exit(struct kunit *test)
+{
+}
+
+static void register_tracepoints(struct tracepoint *tp, void *ignore)
+{
+ check_trace_callback_type_console(probe_console);
+ if (!strcmp(tp->name, "console"))
+ WARN_ON(tracepoint_probe_register(tp, probe_console, NULL));
+}
+
+static void unregister_tracepoints(struct tracepoint *tp, void *ignore)
+{
+ if (!strcmp(tp->name, "console"))
+ tracepoint_probe_unregister(tp, probe_console, NULL);
+}
+
+static int kmsan_suite_init(struct kunit_suite *suite)
+{
+ /*
+ * Because we want to be able to build the test as a module, we need to
+ * iterate through all known tracepoints, since the static registration
+ * won't work here.
+ */
+ for_each_kernel_tracepoint(register_tracepoints, NULL);
+ return 0;
+}
+
+static void kmsan_suite_exit(struct kunit_suite *suite)
+{
+ for_each_kernel_tracepoint(unregister_tracepoints, NULL);
+ tracepoint_synchronize_unregister();
+}
+
+static struct kunit_suite kmsan_test_suite = {
+ .name = "kmsan",
+ .test_cases = kmsan_test_cases,
+ .init = test_init,
+ .exit = test_exit,
+ .suite_init = kmsan_suite_init,
+ .suite_exit = kmsan_suite_exit,
+};
+kunit_test_suites(&kmsan_test_suite);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Alexander Potapenko <gli...@google.com>");
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:05:48 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
Disable the efficient 8-byte reading under KMSAN to avoid false positives.

Signed-off-by: Alexander Potapenko <gli...@google.com>

---

Link: https://linux-review.googlesource.com/id/Iffd8336965e88fce915db2e6a9d6524422975f69
---
lib/string.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/lib/string.c b/lib/string.c
index 6f334420f6871..3371d26a0e390 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -197,6 +197,14 @@ ssize_t strscpy(char *dest, const char *src, size_t count)
max = 0;
#endif

+ /*
+ * read_word_at_a_time() below may read uninitialized bytes after the
+ * trailing zero and use them in comparisons. Disable this optimization
+ * under KMSAN to prevent false positive reports.
+ */
+ if (IS_ENABLED(CONFIG_KMSAN))
+ max = 0;
+
while (max >= sizeof(unsigned long)) {
unsigned long c, data;

--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:05:52 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
KMSAN is unable to understand when initialized values come from assembly.
Disable accelerated configs in KMSAN builds to prevent false positive
reports.

Signed-off-by: Alexander Potapenko <gli...@google.com>

---

Link: https://linux-review.googlesource.com/id/Idb2334bf3a1b68b31b399709baefaa763038cc50
---
crypto/Kconfig | 30 ++++++++++++++++++++++++++++++
drivers/net/Kconfig | 1 +
2 files changed, 31 insertions(+)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index bb427a835e44a..182fb817ebb52 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -319,6 +319,7 @@ config CRYPTO_CURVE25519
config CRYPTO_CURVE25519_X86
tristate "x86_64 accelerated Curve25519 scalar multiplication library"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_LIB_CURVE25519_GENERIC
select CRYPTO_ARCH_HAVE_LIB_CURVE25519

@@ -367,11 +368,13 @@ config CRYPTO_AEGIS128
config CRYPTO_AEGIS128_SIMD
bool "Support SIMD acceleration for AEGIS-128"
depends on CRYPTO_AEGIS128 && ((ARM || ARM64) && KERNEL_MODE_NEON)
+ depends on !KMSAN # avoid false positives from assembly
default y

config CRYPTO_AEGIS128_AESNI_SSE2
tristate "AEGIS-128 AEAD algorithm (x86_64 AESNI+SSE2 implementation)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_AEAD
select CRYPTO_SIMD
help
@@ -517,6 +520,7 @@ config CRYPTO_NHPOLY1305
config CRYPTO_NHPOLY1305_SSE2
tristate "NHPoly1305 hash function (x86_64 SSE2 implementation)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_NHPOLY1305
help
SSE2 optimized implementation of the hash function used by the
@@ -525,6 +529,7 @@ config CRYPTO_NHPOLY1305_SSE2
config CRYPTO_NHPOLY1305_AVX2
tristate "NHPoly1305 hash function (x86_64 AVX2 implementation)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_NHPOLY1305
help
AVX2 optimized implementation of the hash function used by the
@@ -649,6 +654,7 @@ config CRYPTO_CRC32C
config CRYPTO_CRC32C_INTEL
tristate "CRC32c INTEL hardware acceleration"
depends on X86
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_HASH
help
In Intel processor with SSE4.2 supported, the processor will
@@ -689,6 +695,7 @@ config CRYPTO_CRC32
config CRYPTO_CRC32_PCLMUL
tristate "CRC32 PCLMULQDQ hardware acceleration"
depends on X86
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_HASH
select CRC32
help
@@ -748,6 +755,7 @@ config CRYPTO_BLAKE2B
config CRYPTO_BLAKE2S_X86
bool "BLAKE2s digest algorithm (x86 accelerated version)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_LIB_BLAKE2S_GENERIC
select CRYPTO_ARCH_HAVE_LIB_BLAKE2S

@@ -762,6 +770,7 @@ config CRYPTO_CRCT10DIF
config CRYPTO_CRCT10DIF_PCLMUL
tristate "CRCT10DIF PCLMULQDQ hardware acceleration"
depends on X86 && 64BIT && CRC_T10DIF
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_HASH
help
For x86_64 processors with SSE4.2 and PCLMULQDQ supported,
@@ -831,6 +840,7 @@ config CRYPTO_POLY1305
config CRYPTO_POLY1305_X86_64
tristate "Poly1305 authenticator algorithm (x86_64/SSE2/AVX2)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_LIB_POLY1305_GENERIC
select CRYPTO_ARCH_HAVE_LIB_POLY1305
help
@@ -920,6 +930,7 @@ config CRYPTO_SHA1
config CRYPTO_SHA1_SSSE3
tristate "SHA1 digest algorithm (SSSE3/AVX/AVX2/SHA-NI)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SHA1
select CRYPTO_HASH
help
@@ -931,6 +942,7 @@ config CRYPTO_SHA1_SSSE3
config CRYPTO_SHA256_SSSE3
tristate "SHA256 digest algorithm (SSSE3/AVX/AVX2/SHA-NI)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SHA256
select CRYPTO_HASH
help
@@ -943,6 +955,7 @@ config CRYPTO_SHA256_SSSE3
config CRYPTO_SHA512_SSSE3
tristate "SHA512 digest algorithm (SSSE3/AVX/AVX2)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SHA512
select CRYPTO_HASH
help
@@ -1168,6 +1181,7 @@ config CRYPTO_WP512
config CRYPTO_GHASH_CLMUL_NI_INTEL
tristate "GHASH hash function (CLMUL-NI accelerated)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_CRYPTD
help
This is the x86_64 CLMUL-NI accelerated implementation of
@@ -1228,6 +1242,7 @@ config CRYPTO_AES_TI
config CRYPTO_AES_NI_INTEL
tristate "AES cipher algorithms (AES-NI)"
depends on X86
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_AEAD
select CRYPTO_LIB_AES
select CRYPTO_ALGAPI
@@ -1369,6 +1384,7 @@ config CRYPTO_BLOWFISH_COMMON
config CRYPTO_BLOWFISH_X86_64
tristate "Blowfish cipher algorithm (x86_64)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_BLOWFISH_COMMON
imply CRYPTO_CTR
@@ -1399,6 +1415,7 @@ config CRYPTO_CAMELLIA
config CRYPTO_CAMELLIA_X86_64
tristate "Camellia cipher algorithm (x86_64)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
imply CRYPTO_CTR
help
@@ -1415,6 +1432,7 @@ config CRYPTO_CAMELLIA_X86_64
config CRYPTO_CAMELLIA_AESNI_AVX_X86_64
tristate "Camellia cipher algorithm (x86_64/AES-NI/AVX)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_CAMELLIA_X86_64
select CRYPTO_SIMD
@@ -1433,6 +1451,7 @@ config CRYPTO_CAMELLIA_AESNI_AVX_X86_64
config CRYPTO_CAMELLIA_AESNI_AVX2_X86_64
tristate "Camellia cipher algorithm (x86_64/AES-NI/AVX2)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_CAMELLIA_AESNI_AVX_X86_64
help
Camellia cipher algorithm module (x86_64/AES-NI/AVX2).
@@ -1478,6 +1497,7 @@ config CRYPTO_CAST5
config CRYPTO_CAST5_AVX_X86_64
tristate "CAST5 (CAST-128) cipher algorithm (x86_64/AVX)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_CAST5
select CRYPTO_CAST_COMMON
@@ -1501,6 +1521,7 @@ config CRYPTO_CAST6
config CRYPTO_CAST6_AVX_X86_64
tristate "CAST6 (CAST-256) cipher algorithm (x86_64/AVX)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_CAST6
select CRYPTO_CAST_COMMON
@@ -1534,6 +1555,7 @@ config CRYPTO_DES_SPARC64
config CRYPTO_DES3_EDE_X86_64
tristate "Triple DES EDE cipher algorithm (x86-64)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_LIB_DES
imply CRYPTO_CTR
@@ -1604,6 +1626,7 @@ config CRYPTO_CHACHA20
config CRYPTO_CHACHA20_X86_64
tristate "ChaCha stream cipher algorithms (x86_64/SSSE3/AVX2/AVX-512VL)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_LIB_CHACHA_GENERIC
select CRYPTO_ARCH_HAVE_LIB_CHACHA
@@ -1674,6 +1697,7 @@ config CRYPTO_SERPENT
config CRYPTO_SERPENT_SSE2_X86_64
tristate "Serpent cipher algorithm (x86_64/SSE2)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_SERPENT
select CRYPTO_SIMD
@@ -1693,6 +1717,7 @@ config CRYPTO_SERPENT_SSE2_X86_64
config CRYPTO_SERPENT_SSE2_586
tristate "Serpent cipher algorithm (i586/SSE2)"
depends on X86 && !64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_SERPENT
select CRYPTO_SIMD
@@ -1712,6 +1737,7 @@ config CRYPTO_SERPENT_SSE2_586
config CRYPTO_SERPENT_AVX_X86_64
tristate "Serpent cipher algorithm (x86_64/AVX)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_SERPENT
select CRYPTO_SIMD
@@ -1732,6 +1758,7 @@ config CRYPTO_SERPENT_AVX_X86_64
config CRYPTO_SERPENT_AVX2_X86_64
tristate "Serpent cipher algorithm (x86_64/AVX2)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SERPENT_AVX_X86_64
help
Serpent cipher algorithm, by Anderson, Biham & Knudsen.
@@ -1876,6 +1903,7 @@ config CRYPTO_TWOFISH_586
config CRYPTO_TWOFISH_X86_64
tristate "Twofish cipher algorithm (x86_64)"
depends on (X86 || UML_X86) && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_ALGAPI
select CRYPTO_TWOFISH_COMMON
imply CRYPTO_CTR
@@ -1893,6 +1921,7 @@ config CRYPTO_TWOFISH_X86_64
config CRYPTO_TWOFISH_X86_64_3WAY
tristate "Twofish cipher algorithm (x86_64, 3-way parallel)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_TWOFISH_COMMON
select CRYPTO_TWOFISH_X86_64
@@ -1913,6 +1942,7 @@ config CRYPTO_TWOFISH_X86_64_3WAY
config CRYPTO_TWOFISH_AVX_X86_64
tristate "Twofish cipher algorithm (x86_64/AVX)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_SIMD
select CRYPTO_TWOFISH_COMMON
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 94c889802566a..2aaf02bfe6f7e 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -76,6 +76,7 @@ config WIREGUARD
tristate "WireGuard secure network tunnel"
depends on NET && INET
depends on IPV6 || !IPV6
+ depends on !KMSAN # KMSAN doesn't support the crypto configs below
select NET_UDP_TUNNEL
select DST_CACHE
select CRYPTO
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:05:54 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
KMSAN metadata for adjacent physical pages may not be adjacent,
therefore accessing such pages together may lead to metadata
corruption.
We disable merging pages in biovec to prevent such corruptions.

Signed-off-by: Alexander Potapenko <gli...@google.com>
---

Link: https://linux-review.googlesource.com/id/Iece16041be5ee47904fbc98121b105e5be5fea5c
---
block/blk.h | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/block/blk.h b/block/blk.h
index d7142c4d2fefb..af02b93c1dba5 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -88,6 +88,13 @@ static inline bool biovec_phys_mergeable(struct request_queue *q,
phys_addr_t addr1 = page_to_phys(vec1->bv_page) + vec1->bv_offset;
phys_addr_t addr2 = page_to_phys(vec2->bv_page) + vec2->bv_offset;

+ /*
+ * Merging adjacent physical pages may not work correctly under KMSAN
+ * if their metadata pages aren't adjacent. Just disable merging.
+ */
+ if (IS_ENABLED(CONFIG_KMSAN))
+ return false;
+
if (addr1 + vec1->bv_len != addr2)
return false;
if (xen_domain() && !xen_biovec_phys_mergeable(vec1, vec2->bv_page))
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:05:56 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org, Eric Biggers
KMSAN doesn't allow treating adjacent memory pages as such, if they were
allocated by different alloc_pages() calls.
The block layer however does so: adjacent pages end up being used
together. To prevent this, make page_is_mergeable() return false under
KMSAN.

Suggested-by: Eric Biggers <ebig...@google.com>
Signed-off-by: Alexander Potapenko <gli...@google.com>

---

v4:
-- swap block: and kmsan: in the subject

v5:
-- address Marco Elver's comments

Link: https://linux-review.googlesource.com/id/Ie29cc2464c70032347c32ab2a22e1e7a0b37b905
---
block/bio.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index 3d3a2678fea25..106ef14f28c2a 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -869,6 +869,8 @@ static inline bool page_is_mergeable(const struct bio_vec *bv,
*same_page = ((vec_end_addr & PAGE_MASK) == page_addr);
if (*same_page)
return true;
+ else if (IS_ENABLED(CONFIG_KMSAN))
+ return false;
return (bv->bv_page + bv_end / PAGE_SIZE) == (page + off / PAGE_SIZE);
}

--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:05:59 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
KMSAN does not instrument kernel/kcov.c for performance reasons (with
CONFIG_KCOV=y virtually every place in the kernel invokes kcov
instrumentation). Therefore the tool may miss writes from kcov.c that
initialize memory.

When CONFIG_DEBUG_LIST is enabled, list pointers from kernel/kcov.c are
passed to instrumented helpers in lib/list_debug.c, resulting in false
positives.

To work around these reports, we unpoison the contents of area->list after
initializing it.

Signed-off-by: Alexander Potapenko <gli...@google.com>
---

v4:
-- change sizeof(type) to sizeof(*ptr)
-- swap kcov: and kmsan: in the subject

Link: https://linux-review.googlesource.com/id/Ie17f2ee47a7af58f5cdf716d585ebf0769348a5a
---
kernel/kcov.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/kernel/kcov.c b/kernel/kcov.c
index e19c84b02452e..e5cd09fd8a050 100644
--- a/kernel/kcov.c
+++ b/kernel/kcov.c
@@ -11,6 +11,7 @@
#include <linux/fs.h>
#include <linux/hashtable.h>
#include <linux/init.h>
+#include <linux/kmsan-checks.h>
#include <linux/mm.h>
#include <linux/preempt.h>
#include <linux/printk.h>
@@ -152,6 +153,12 @@ static void kcov_remote_area_put(struct kcov_remote_area *area,
INIT_LIST_HEAD(&area->list);
area->size = size;
list_add(&area->list, &kcov_remote_areas);
+ /*
+ * KMSAN doesn't instrument this file, so it may not know area->list
+ * is initialized. Unpoison it explicitly to avoid reports in
+ * kcov_remote_area_get().
+ */
+ kmsan_unpoison_memory(&area->list, sizeof(area->list));
}

static notrace bool check_kcov_mode(enum kcov_mode needed_mode, struct task_struct *t)
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:06:01 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
Heap and stack initialization is great, but not when we are trying
uses of uninitialized memory. When the kernel is built with KMSAN,
having kernel memory initialization enabled may introduce false
negatives.

We disable CONFIG_INIT_STACK_ALL_PATTERN and CONFIG_INIT_STACK_ALL_ZERO
under CONFIG_KMSAN, making it impossible to auto-initialize stack
variables in KMSAN builds. We also disable CONFIG_INIT_ON_ALLOC_DEFAULT_ON
and CONFIG_INIT_ON_FREE_DEFAULT_ON to prevent accidental use of heap
auto-initialization.

We however still let the users enable heap auto-initialization at
boot-time (by setting init_on_alloc=1 or init_on_free=1), in which case
a warning is printed.

Signed-off-by: Alexander Potapenko <gli...@google.com>
---
Link: https://linux-review.googlesource.com/id/I86608dd867018683a14ae1870f1928ad925f42e9
---
mm/page_alloc.c | 4 ++++
security/Kconfig.hardening | 4 ++++
2 files changed, 8 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b28093e3bb42a..e5eed276ee41d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -936,6 +936,10 @@ void init_mem_debugging_and_hardening(void)
else
static_branch_disable(&init_on_free);

+ if (IS_ENABLED(CONFIG_KMSAN) &&
+ (_init_on_alloc_enabled_early || _init_on_free_enabled_early))
+ pr_info("mem auto-init: please make sure init_on_alloc and init_on_free are disabled when running KMSAN\n");
+
#ifdef CONFIG_DEBUG_PAGEALLOC
if (!debug_pagealloc_enabled())
return;
diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index bd2aabb2c60f9..2739a6776454e 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -106,6 +106,7 @@ choice
config INIT_STACK_ALL_PATTERN
bool "pattern-init everything (strongest)"
depends on CC_HAS_AUTO_VAR_INIT_PATTERN
+ depends on !KMSAN
help
Initializes everything on the stack (including padding)
with a specific debug value. This is intended to eliminate
@@ -124,6 +125,7 @@ choice
config INIT_STACK_ALL_ZERO
bool "zero-init everything (strongest and safest)"
depends on CC_HAS_AUTO_VAR_INIT_ZERO
+ depends on !KMSAN
help
Initializes everything on the stack (including padding)
with a zero value. This is intended to eliminate all
@@ -218,6 +220,7 @@ config STACKLEAK_RUNTIME_DISABLE

config INIT_ON_ALLOC_DEFAULT_ON
bool "Enable heap memory zeroing on allocation by default"
+ depends on !KMSAN
help
This has the effect of setting "init_on_alloc=1" on the kernel
command line. This can be disabled with "init_on_alloc=0".
@@ -230,6 +233,7 @@ config INIT_ON_ALLOC_DEFAULT_ON

config INIT_ON_FREE_DEFAULT_ON
bool "Enable heap memory zeroing on free by default"
+ depends on !KMSAN
help
This has the effect of setting "init_on_free=1" on the kernel
command line. This can be disabled with "init_on_free=0".
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:06:04 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
KMSAN inserts API function calls in a lot of places (function entries
and exits, local variables, memory accesses), so they may get called
from the uaccess regions as well.

KMSAN API functions are used to update the metadata (shadow/origin pages)
for kernel memory accesses. The metadata pages for kernel pointers are
also located in the kernel memory, so touching them is not a problem.
For userspace pointers, no metadata is allocated.

If an API function is supposed to read or modify the metadata, it does so
for kernel pointers and ignores userspace pointers.
If an API function is supposed to return a pair of metadata pointers for
the instrumentation to use (like all __msan_metadata_ptr_for_TYPE_SIZE()
functions do), it returns the allocated metadata for kernel pointers and
special dummy buffers residing in the kernel memory for userspace
pointers.

As a result, none of KMSAN API functions perform userspace accesses, but
since they might be called from UACCESS regions they use
user_access_save/restore().

Signed-off-by: Alexander Potapenko <gli...@google.com>
---
v3:
-- updated the patch description

v4:
-- add kmsan_unpoison_entry_regs()

Link: https://linux-review.googlesource.com/id/I242bc9816273fecad4ea3d977393784396bb3c35
---
tools/objtool/check.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index e55fdf952a3a1..7c048c11ce7da 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1062,6 +1062,26 @@ static const char *uaccess_safe_builtin[] = {
"__sanitizer_cov_trace_cmp4",
"__sanitizer_cov_trace_cmp8",
"__sanitizer_cov_trace_switch",
+ /* KMSAN */
+ "kmsan_copy_to_user",
+ "kmsan_report",
+ "kmsan_unpoison_entry_regs",
+ "kmsan_unpoison_memory",
+ "__msan_chain_origin",
+ "__msan_get_context_state",
+ "__msan_instrument_asm_store",
+ "__msan_metadata_ptr_for_load_1",
+ "__msan_metadata_ptr_for_load_2",
+ "__msan_metadata_ptr_for_load_4",
+ "__msan_metadata_ptr_for_load_8",
+ "__msan_metadata_ptr_for_load_n",
+ "__msan_metadata_ptr_for_store_1",
+ "__msan_metadata_ptr_for_store_2",
+ "__msan_metadata_ptr_for_store_4",
+ "__msan_metadata_ptr_for_store_8",
+ "__msan_metadata_ptr_for_store_n",
+ "__msan_poison_alloca",
+ "__msan_warning",
/* UBSAN */
"ubsan_type_mismatch_common",
"__ubsan_handle_type_mismatch",
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:06:07 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
Instrumenting some files with KMSAN will result in kernel being unable
to link, boot or crashing at runtime for various reasons (e.g. infinite
recursion caused by instrumentation hooks calling instrumented code again).

Completely omit KMSAN instrumentation in the following places:
- arch/x86/boot and arch/x86/realmode/rm, as KMSAN doesn't work for i386;
- arch/x86/entry/vdso, which isn't linked with KMSAN runtime;
- three files in arch/x86/kernel - boot problems;
- arch/x86/mm/cpu_entry_area.c - recursion.

Signed-off-by: Alexander Potapenko <gli...@google.com>
---
v2:
-- moved the patch earlier in the series so that KMSAN can compile
-- split off the non-x86 part into a separate patch

v3:
-- added a comment to lib/Makefile

v5:
-- removed a comment belonging to another patch

Link: https://linux-review.googlesource.com/id/Id5e5c4a9f9d53c24a35ebb633b814c414628d81b
---
arch/x86/boot/Makefile | 1 +
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/entry/vdso/Makefile | 3 +++
arch/x86/kernel/Makefile | 2 ++
arch/x86/kernel/cpu/Makefile | 1 +
arch/x86/mm/Makefile | 2 ++
arch/x86/realmode/rm/Makefile | 1 +
7 files changed, 11 insertions(+)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index ffec8bb01ba8c..9860ca5979f8a 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -12,6 +12,7 @@
# Sanitizer runtimes are unavailable and cannot be linked for early boot code.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
+KMSAN_SANITIZE := n
OBJECT_FILES_NON_STANDARD := y

# Kernel does not boot with kcov instrumentation here.
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 35ce1a64068b7..3a261abb6d158 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -20,6 +20,7 @@
# Sanitizer runtimes are unavailable and cannot be linked for early boot code.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
+KMSAN_SANITIZE := n
OBJECT_FILES_NON_STANDARD := y

# Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 12f6c4d714cd6..ce4eb7e44e5b8 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -11,6 +11,9 @@ include $(srctree)/lib/vdso/Makefile

# Sanitizer runtimes are unavailable and cannot be linked here.
KASAN_SANITIZE := n
+KMSAN_SANITIZE_vclock_gettime.o := n
+KMSAN_SANITIZE_vgetcpu.o := n
+
UBSAN_SANITIZE := n
KCSAN_SANITIZE := n
OBJECT_FILES_NON_STANDARD := y
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index a20a5ebfacd73..ac564c5d7b1f0 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -33,6 +33,8 @@ KASAN_SANITIZE_sev.o := n
# With some compiler versions the generated code results in boot hangs, caused
# by several compilation units. To be safe, disable all instrumentation.
KCSAN_SANITIZE := n
+KMSAN_SANITIZE_head$(BITS).o := n
+KMSAN_SANITIZE_nmi.o := n

# If instrumentation of this dir is enabled, boot hangs during first second.
# Probably could be more selective here, but note that files related to irqs,
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 9661e3e802be5..f10a921ee7565 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -12,6 +12,7 @@ endif
# If these files are instrumented, boot hangs during the first second.
KCOV_INSTRUMENT_common.o := n
KCOV_INSTRUMENT_perf_event.o := n
+KMSAN_SANITIZE_common.o := n

# As above, instrumenting secondary CPU boot code causes boot hangs.
KCSAN_SANITIZE_common.o := n
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index f8220fd2c169a..39c0700c9955c 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -12,6 +12,8 @@ KASAN_SANITIZE_mem_encrypt_identity.o := n
# Disable KCSAN entirely, because otherwise we get warnings that some functions
# reference __initdata sections.
KCSAN_SANITIZE := n
+# Avoid recursion by not calling KMSAN hooks for CEA code.
+KMSAN_SANITIZE_cpu_entry_area.o := n

ifdef CONFIG_FUNCTION_TRACER
CFLAGS_REMOVE_mem_encrypt.o = -pg
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index 83f1b6a56449f..f614009d3e4e2 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -10,6 +10,7 @@
# Sanitizer runtimes are unavailable and cannot be linked here.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
+KMSAN_SANITIZE := n
OBJECT_FILES_NON_STANDARD := y

# Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:06:09 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
When instrumenting functions, KMSAN obtains the per-task state (mostly
pointers to metadata for function arguments and return values) once per
function at its beginning, using the `current` pointer.

Every time the instrumented function calls another function, this state
(`struct kmsan_context_state`) is updated with shadow/origin data of the
passed and returned values.

When `current` changes in the low-level arch code, instrumented code can
not notice that, and will still refer to the old state, possibly corrupting
it or using stale data. This may result in false positive reports.

To deal with that, we need to apply __no_kmsan_checks to the functions
performing context switching - this will result in skipping all KMSAN
shadow checks and marking newly created values as initialized,
preventing all false positive reports in those functions. False negatives
are still possible, but we expect them to be rare and impersistent.

Suggested-by: Marco Elver <el...@google.com>
Signed-off-by: Alexander Potapenko <gli...@google.com>

---
v2:
-- This patch was previously called "kmsan: skip shadow checks in files
doing context switches". Per Mark Rutland's suggestion, we now only
skip checks in low-level arch-specific code, as context switches in
common code should be invisible to KMSAN. We also apply the checks
to precisely the functions performing the context switch instead of
the whole file.

v5:
-- Replace KMSAN_ENABLE_CHECKS_process_64.o with __no_kmsan_checks

Link: https://linux-review.googlesource.com/id/I45e3ed9c5f66ee79b0409d1673d66ae419029bcb
---
arch/x86/kernel/process_64.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 1962008fe7437..6b3418bff3261 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -553,6 +553,7 @@ void compat_start_thread(struct pt_regs *regs, u32 new_ip, u32 new_sp, bool x32)
* Kprobes not supported here. Set the probe on schedule instead.
* Function graph tracer not supported too.
*/
+__no_kmsan_checks
__visible __notrace_funcgraph struct task_struct *
__switch_to(struct task_struct *prev_p, struct task_struct *next_p)
{
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:06:12 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
KMSAN cannot intercept memory accesses within asm() statements.
That's why we add kmsan_unpoison_memory() and kmsan_check_memory() to
hint it how to handle memory copied from/to I/O memory.

Signed-off-by: Alexander Potapenko <gli...@google.com>
---
Link: https://linux-review.googlesource.com/id/Icb16bf17269087e475debf07a7fe7d4bebc3df23
---
arch/x86/lib/iomem.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/lib/iomem.c b/arch/x86/lib/iomem.c
index 3e2f33fc33de2..e0411a3774d49 100644
--- a/arch/x86/lib/iomem.c
+++ b/arch/x86/lib/iomem.c
@@ -1,6 +1,7 @@
#include <linux/string.h>
#include <linux/module.h>
#include <linux/io.h>
+#include <linux/kmsan-checks.h>

#define movs(type,to,from) \
asm volatile("movs" type:"=&D" (to), "=&S" (from):"0" (to), "1" (from):"memory")
@@ -37,6 +38,8 @@ static void string_memcpy_fromio(void *to, const volatile void __iomem *from, si
n-=2;
}
rep_movs(to, (const void *)from, n);
+ /* KMSAN must treat values read from devices as initialized. */
+ kmsan_unpoison_memory(to, n);
}

static void string_memcpy_toio(volatile void __iomem *to, const void *from, size_t n)
@@ -44,6 +47,8 @@ static void string_memcpy_toio(volatile void __iomem *to, const void *from, size
if (unlikely(!n))
return;

+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(from, n);
/* Align any unaligned destination IO */
if (unlikely(1 & (unsigned long)to)) {
movs("b", to, from);
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:06:15 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
Unless stated otherwise (by explicitly calling __memcpy(), __memset() or
__memmove()) we want all string functions to call their __msan_ versions
(e.g. __msan_memcpy() instead of memcpy()), so that shadow and origin
values are updated accordingly.

Bootloader must still use the default string functions to avoid crashes.

Signed-off-by: Alexander Potapenko <gli...@google.com>
---

Link: https://linux-review.googlesource.com/id/I7ca9bd6b4f5c9b9816404862ae87ca7984395f33
---
arch/x86/include/asm/string_64.h | 23 +++++++++++++++++++++--
include/linux/fortify-string.h | 2 ++
2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 6e450827f677a..3b87d889b6e16 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -11,11 +11,23 @@
function. */

#define __HAVE_ARCH_MEMCPY 1
+#if defined(__SANITIZE_MEMORY__)
+#undef memcpy
+void *__msan_memcpy(void *dst, const void *src, size_t size);
+#define memcpy __msan_memcpy
+#else
extern void *memcpy(void *to, const void *from, size_t len);
+#endif
extern void *__memcpy(void *to, const void *from, size_t len);

#define __HAVE_ARCH_MEMSET
+#if defined(__SANITIZE_MEMORY__)
+extern void *__msan_memset(void *s, int c, size_t n);
+#undef memset
+#define memset __msan_memset
+#else
void *memset(void *s, int c, size_t n);
+#endif
void *__memset(void *s, int c, size_t n);

#define __HAVE_ARCH_MEMSET16
@@ -55,7 +67,13 @@ static inline void *memset64(uint64_t *s, uint64_t v, size_t n)
}

#define __HAVE_ARCH_MEMMOVE
+#if defined(__SANITIZE_MEMORY__)
+#undef memmove
+void *__msan_memmove(void *dest, const void *src, size_t len);
+#define memmove __msan_memmove
+#else
void *memmove(void *dest, const void *src, size_t count);
+#endif
void *__memmove(void *dest, const void *src, size_t count);

int memcmp(const void *cs, const void *ct, size_t count);
@@ -64,8 +82,7 @@ char *strcpy(char *dest, const char *src);
char *strcat(char *dest, const char *src);
int strcmp(const char *cs, const char *ct);

-#if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
-
+#if (defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__))
/*
* For files that not instrumented (e.g. mm/slub.c) we
* should use not instrumented version of mem* functions.
@@ -73,7 +90,9 @@ int strcmp(const char *cs, const char *ct);

#undef memcpy
#define memcpy(dst, src, len) __memcpy(dst, src, len)
+#undef memmove
#define memmove(dst, src, len) __memmove(dst, src, len)
+#undef memset
#define memset(s, c, n) __memset(s, c, n)

#ifndef __NO_FORTIFY
diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h
index 3b401fa0f3746..6c8a1a29d0b63 100644
--- a/include/linux/fortify-string.h
+++ b/include/linux/fortify-string.h
@@ -285,8 +285,10 @@ __FORTIFY_INLINE void fortify_memset_chk(__kernel_size_t size,
* __builtin_object_size() must be captured here to avoid evaluating argument
* side-effects further into the macro layers.
*/
+#ifndef CONFIG_KMSAN
#define memset(p, c, s) __fortify_memset_chk(p, c, s, \
__builtin_object_size(p, 0), __builtin_object_size(p, 1))
+#endif

/*
* To make sure the compiler can enforce protection against buffer overflows,
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:06:18 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
KMSAN assumes shadow and origin pages for every allocated page are
accessible. For pages between [VMALLOC_START, VMALLOC_END] those metadata
pages start at KMSAN_VMALLOC_SHADOW_START and
KMSAN_VMALLOC_ORIGIN_START, therefore we must sync a bigger memory
region.

Signed-off-by: Alexander Potapenko <gli...@google.com>

---

v2:
-- addressed reports from kernel test robot <l...@intel.com>

Link: https://linux-review.googlesource.com/id/Ia5bd541e54f1ecc11b86666c3ec87c62ac0bdfb8
---
arch/x86/mm/fault.c | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fa71a5d12e872..d728791be8ace 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -260,7 +260,7 @@ static noinline int vmalloc_fault(unsigned long address)
}
NOKPROBE_SYMBOL(vmalloc_fault);

-void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
+static void __arch_sync_kernel_mappings(unsigned long start, unsigned long end)
{
unsigned long addr;

@@ -284,6 +284,27 @@ void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
}
}

+void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
+{
+ __arch_sync_kernel_mappings(start, end);
+#ifdef CONFIG_KMSAN
+ /*
+ * KMSAN maintains two additional metadata page mappings for the
+ * [VMALLOC_START, VMALLOC_END) range. These mappings start at
+ * KMSAN_VMALLOC_SHADOW_START and KMSAN_VMALLOC_ORIGIN_START and
+ * have to be synced together with the vmalloc memory mapping.
+ */
+ if (start >= VMALLOC_START && end < VMALLOC_END) {
+ __arch_sync_kernel_mappings(
+ start - VMALLOC_START + KMSAN_VMALLOC_SHADOW_START,
+ end - VMALLOC_START + KMSAN_VMALLOC_SHADOW_START);
+ __arch_sync_kernel_mappings(
+ start - VMALLOC_START + KMSAN_VMALLOC_ORIGIN_START,
+ end - VMALLOC_START + KMSAN_VMALLOC_ORIGIN_START);
+ }
+#endif
+}
+
static bool low_pfn(unsigned long pfn)
{
return pfn < max_low_pfn;
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:06:20 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
This is needed to allow memory tools like KASAN and KMSAN see the
memory accesses from the checksum code. Without CONFIG_GENERIC_CSUM the
tools can't see memory accesses originating from handwritten assembly
code.
For KASAN it's a question of detecting more bugs, for KMSAN using the C
implementation also helps avoid false positives originating from
seemingly uninitialized checksum values.

Signed-off-by: Alexander Potapenko <gli...@google.com>

---

Link: https://linux-review.googlesource.com/id/I3e95247be55b1112af59dbba07e8cbf34e50a581
---
arch/x86/Kconfig | 4 ++++
arch/x86/include/asm/checksum.h | 16 ++++++++++------
arch/x86/lib/Makefile | 2 ++
3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f9920f1341c8d..33f4d4baba079 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -324,6 +324,10 @@ config GENERIC_ISA_DMA
def_bool y
depends on ISA_DMA_API

+config GENERIC_CSUM
+ bool
+ default y if KMSAN || KASAN
+
config GENERIC_BUG
def_bool y
depends on BUG
diff --git a/arch/x86/include/asm/checksum.h b/arch/x86/include/asm/checksum.h
index bca625a60186c..6df6ece8a28ec 100644
--- a/arch/x86/include/asm/checksum.h
+++ b/arch/x86/include/asm/checksum.h
@@ -1,9 +1,13 @@
/* SPDX-License-Identifier: GPL-2.0 */
-#define _HAVE_ARCH_COPY_AND_CSUM_FROM_USER 1
-#define HAVE_CSUM_COPY_USER
-#define _HAVE_ARCH_CSUM_AND_COPY
-#ifdef CONFIG_X86_32
-# include <asm/checksum_32.h>
+#ifdef CONFIG_GENERIC_CSUM
+# include <asm-generic/checksum.h>
#else
-# include <asm/checksum_64.h>
+# define _HAVE_ARCH_COPY_AND_CSUM_FROM_USER 1
+# define HAVE_CSUM_COPY_USER
+# define _HAVE_ARCH_CSUM_AND_COPY
+# ifdef CONFIG_X86_32
+# include <asm/checksum_32.h>
+# else
+# include <asm/checksum_64.h>
+# endif
#endif
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index f76747862bd2e..7ba5f61d72735 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -65,7 +65,9 @@ ifneq ($(CONFIG_X86_CMPXCHG64),y)
endif
else
obj-y += iomap_copy_64.o
+ifneq ($(CONFIG_GENERIC_CSUM),y)
lib-y += csum-partial_64.o csum-copy_64.o csum-wrappers_64.o
+endif
lib-y += clear_page_64.o copy_page_64.o
lib-y += memmove_64.o memset_64.o
lib-y += copy_user_64.o
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:06:23 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org, Andrey Konovalov
dentry_string_cmp() calls read_word_at_a_time(), which might read
uninitialized bytes to optimize string comparisons.
Disabling CONFIG_DCACHE_WORD_ACCESS should prohibit this optimization,
as well as (probably) similar ones.

Suggested-by: Andrey Konovalov <andre...@gmail.com>
Signed-off-by: Alexander Potapenko <gli...@google.com>
---
Link: https://linux-review.googlesource.com/id/I4c0073224ac2897cafb8c037362c49dda9cfa133
---
arch/x86/Kconfig | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 33f4d4baba079..697da8dae1418 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -128,7 +128,9 @@ config X86
select CLKEVT_I8253
select CLOCKSOURCE_VALIDATE_LAST_CYCLE
select CLOCKSOURCE_WATCHDOG
- select DCACHE_WORD_ACCESS
+ # Word-size accesses may read uninitialized data past the trailing \0
+ # in strings and cause false KMSAN reports.
+ select DCACHE_WORD_ACCESS if !KMSAN
select DYNAMIC_SIGFRAME
select EDAC_ATOMIC_SCRUB
select EDAC_SUPPORT
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:06:26 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
Upon function exit, KMSAN marks local variables as uninitialized.
Further function calls may result in the compiler creating the stack
frame where these local variables resided. This results in frame
pointers being marked as uninitialized data, which is normally correct,
because they are not stack-allocated.

However stack unwinding functions are supposed to read and dereference
the frame pointers, in which case KMSAN might be reporting uses of
uninitialized values.

To work around that, we mark update_stack_state(), unwind_next_frame()
and show_trace_log_lvl() with __no_kmsan_checks, preventing all KMSAN
reports inside those functions and making them return initialized
values.

Signed-off-by: Alexander Potapenko <gli...@google.com>
---
Link: https://linux-review.googlesource.com/id/I6550563768fbb08aa60b2a96803675dcba93d802
---
arch/x86/kernel/dumpstack.c | 6 ++++++
arch/x86/kernel/unwind_frame.c | 11 +++++++++++
2 files changed, 17 insertions(+)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index afae4dd774951..476eb504084e4 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -177,6 +177,12 @@ static void show_regs_if_on_stack(struct stack_info *info, struct pt_regs *regs,
}
}

+/*
+ * This function reads pointers from the stack and dereferences them. The
+ * pointers may not have their KMSAN shadow set up properly, which may result
+ * in false positive reports. Disable instrumentation to avoid those.
+ */
+__no_kmsan_checks
static void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
unsigned long *stack, const char *log_lvl)
{
diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index 8e1c50c86e5db..d8ba93778ae32 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -183,6 +183,16 @@ static struct pt_regs *decode_frame_pointer(unsigned long *bp)
}
#endif

+/*
+ * While walking the stack, KMSAN may stomp on stale locals from other
+ * functions that were marked as uninitialized upon function exit, and
+ * now hold the call frame information for the current function (e.g. the frame
+ * pointer). Because KMSAN does not specifically mark call frames as
+ * initialized, false positive reports are possible. To prevent such reports,
+ * we mark the functions scanning the stack (here and below) with
+ * __no_kmsan_checks.
+ */
+__no_kmsan_checks
static bool update_stack_state(struct unwind_state *state,
unsigned long *next_bp)
{
@@ -250,6 +260,7 @@ static bool update_stack_state(struct unwind_state *state,
return true;
}

+__no_kmsan_checks
bool unwind_next_frame(struct unwind_state *state)
{
struct pt_regs *regs;
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:06:29 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
struct pt_regs passed into IRQ entry code is set up by uninstrumented
asm functions, therefore KMSAN may not notice the registers are
initialized.

kmsan_unpoison_entry_regs() unpoisons the contents of struct pt_regs,
preventing potential false positives. Unlike kmsan_unpoison_memory(),
it can be called under kmsan_in_runtime(), which is often the case in
IRQ entry code.

Signed-off-by: Alexander Potapenko <gli...@google.com>

---
Link: https://linux-review.googlesource.com/id/Ibfd7018ac847fd8e5491681f508ba5d14e4669cf
---
include/linux/kmsan.h | 15 +++++++++++++++
kernel/entry/common.c | 5 +++++
mm/kmsan/hooks.c | 26 ++++++++++++++++++++++++++
3 files changed, 46 insertions(+)

diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index c473e0e21683c..e38ae3c346184 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -214,6 +214,17 @@ void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
*/
void kmsan_handle_urb(const struct urb *urb, bool is_out);

+/**
+ * kmsan_unpoison_entry_regs() - Handle pt_regs in low-level entry code.
+ * @regs: struct pt_regs pointer received from assembly code.
+ *
+ * KMSAN unpoisons the contents of the passed pt_regs, preventing potential
+ * false positive reports. Unlike kmsan_unpoison_memory(),
+ * kmsan_unpoison_entry_regs() can be called from the regions where
+ * kmsan_in_runtime() returns true, which is the case in early entry code.
+ */
+void kmsan_unpoison_entry_regs(const struct pt_regs *regs);
+
#else

static inline void kmsan_init_shadow(void)
@@ -310,6 +321,10 @@ static inline void kmsan_handle_urb(const struct urb *urb, bool is_out)
{
}

+static inline void kmsan_unpoison_entry_regs(const struct pt_regs *regs)
+{
+}
+
#endif

#endif /* _LINUX_KMSAN_H */
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 063068a9ea9b3..846add8394c41 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -5,6 +5,7 @@
#include <linux/resume_user_mode.h>
#include <linux/highmem.h>
#include <linux/jump_label.h>
+#include <linux/kmsan.h>
#include <linux/livepatch.h>
#include <linux/audit.h>
#include <linux/tick.h>
@@ -24,6 +25,7 @@ static __always_inline void __enter_from_user_mode(struct pt_regs *regs)
user_exit_irqoff();

instrumentation_begin();
+ kmsan_unpoison_entry_regs(regs);
trace_hardirqs_off_finish();
instrumentation_end();
}
@@ -352,6 +354,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
lockdep_hardirqs_off(CALLER_ADDR0);
ct_irq_enter();
instrumentation_begin();
+ kmsan_unpoison_entry_regs(regs);
trace_hardirqs_off_finish();
instrumentation_end();

@@ -367,6 +370,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
*/
lockdep_hardirqs_off(CALLER_ADDR0);
instrumentation_begin();
+ kmsan_unpoison_entry_regs(regs);
rcu_irq_enter_check_tick();
trace_hardirqs_off_finish();
instrumentation_end();
@@ -452,6 +456,7 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs)
ct_nmi_enter();

instrumentation_begin();
+ kmsan_unpoison_entry_regs(regs);
trace_hardirqs_off_finish();
ftrace_nmi_enter();
instrumentation_end();
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 79d7e73e2cfd8..35f6b6e6a908c 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -348,6 +348,32 @@ void kmsan_unpoison_memory(const void *address, size_t size)
}
EXPORT_SYMBOL(kmsan_unpoison_memory);

+/*
+ * Version of kmsan_unpoison_memory() that can be called from within the KMSAN
+ * runtime.
+ *
+ * Non-instrumented IRQ entry functions receive struct pt_regs from assembly
+ * code. Those regs need to be unpoisoned, otherwise using them will result in
+ * false positives.
+ * Using kmsan_unpoison_memory() is not an option in entry code, because the
+ * return value of in_task() is inconsistent - as a result, certain calls to
+ * kmsan_unpoison_memory() are ignored. kmsan_unpoison_entry_regs() ensures that
+ * the registers are unpoisoned even if kmsan_in_runtime() is true in the early
+ * entry code.
+ */
+void kmsan_unpoison_entry_regs(const struct pt_regs *regs)
+{
+ unsigned long ua_flags;
+
+ if (!kmsan_enabled)
+ return;
+
+ ua_flags = user_access_save();
+ kmsan_internal_unpoison_memory((void *)regs, sizeof(*regs),
+ KMSAN_POISON_NOCHECK);
+ user_access_restore(ua_flags);
+}
+
void kmsan_check_memory(const void *addr, size_t size)
{
if (!kmsan_enabled)
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:06:31 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
When executing BPF programs, certain registers may get passed
uninitialized to helper functions. E.g. when performing a JMP_CALL,
registers BPF_R1-BPF_R5 are always passed to the helper, no matter how
many of them are actually used.

Passing uninitialized values as function parameters is technically
undefined behavior, so we work around it by always initializing the
registers.

Signed-off-by: Alexander Potapenko <gli...@google.com>
---
Link: https://linux-review.googlesource.com/id/I8ef9dbe94724cee5ad1e3a162f2b805345bc0586
---
kernel/bpf/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 3d9eb3ae334ce..21c74fac5131c 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2002,7 +2002,7 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn)
static unsigned int PROG_NAME(stack_size)(const void *ctx, const struct bpf_insn *insn) \
{ \
u64 stack[stack_size / sizeof(u64)]; \
- u64 regs[MAX_BPF_EXT_REG]; \
+ u64 regs[MAX_BPF_EXT_REG] = {}; \
\
FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)]; \
ARG1 = (u64) (unsigned long) ctx; \
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:06:34 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
Functions implementing the a_ops->write_end() interface accept the
`void *fsdata` parameter that is supposed to be initialized by the
corresponding a_ops->write_begin() (which accepts `void **fsdata`).

However not all a_ops->write_begin() implementations initialize `fsdata`
unconditionally, so it may get passed uninitialized to a_ops->write_end(),
resulting in undefined behavior.

Fix this by initializing fsdata with NULL before the call to
write_begin(), rather than doing so in all possible a_ops
implementations.

This patch covers only the following cases found by running x86 KMSAN
under syzkaller:

- generic_perform_write()
- cont_expand_zero() and generic_cont_expand_simple()
- page_symlink()

Other cases of passing uninitialized fsdata may persist in the codebase.

Signed-off-by: Alexander Potapenko <gli...@google.com>
---
Link: https://linux-review.googlesource.com/id/Ie300c21bbe9dea69a730745bd3c6d2720953bf41
---
fs/buffer.c | 4 ++--
fs/namei.c | 2 +-
mm/filemap.c | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 55e762a58eb65..e1198f4b28c8f 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2352,7 +2352,7 @@ int generic_cont_expand_simple(struct inode *inode, loff_t size)
struct address_space *mapping = inode->i_mapping;
const struct address_space_operations *aops = mapping->a_ops;
struct page *page;
- void *fsdata;
+ void *fsdata = NULL;
int err;

err = inode_newsize_ok(inode, size);
@@ -2378,7 +2378,7 @@ static int cont_expand_zero(struct file *file, struct address_space *mapping,
const struct address_space_operations *aops = mapping->a_ops;
unsigned int blocksize = i_blocksize(inode);
struct page *page;
- void *fsdata;
+ void *fsdata = NULL;
pgoff_t index, curidx;
loff_t curpos;
unsigned zerofrom, offset, len;
diff --git a/fs/namei.c b/fs/namei.c
index 53b4bc094db23..076ae96ca0b14 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -5088,7 +5088,7 @@ int page_symlink(struct inode *inode, const char *symname, int len)
const struct address_space_operations *aops = mapping->a_ops;
bool nofs = !mapping_gfp_constraint(mapping, __GFP_FS);
struct page *page;
- void *fsdata;
+ void *fsdata = NULL;
int err;
unsigned int flags;

diff --git a/mm/filemap.c b/mm/filemap.c
index 15800334147b3..ada25b9f45ad1 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3712,7 +3712,7 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
unsigned long offset; /* Offset into pagecache page */
unsigned long bytes; /* Bytes to write to page */
size_t copied; /* Bytes copied from user */
- void *fsdata;
+ void *fsdata = NULL;

offset = (pos & (PAGE_SIZE - 1));
bytes = min_t(unsigned long, PAGE_SIZE - offset,
--
2.37.2.789.g6183377224-goog

Alexander Potapenko

unread,
Sep 15, 2022, 11:06:37 AM9/15/22
to gli...@google.com, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
Make KMSAN usable by adding the necessary Kconfig bits.

Also declare x86-specific functions checking address validity
in arch/x86/include/asm/kmsan.h.

Signed-off-by: Alexander Potapenko <gli...@google.com>
---
v4:
-- per Marco Elver's request, create arch/x86/include/asm/kmsan.h
and move arch-specific inline functions there.

Link: https://linux-review.googlesource.com/id/I1d295ce8159ce15faa496d20089d953a919c125e
---
arch/x86/Kconfig | 1 +
arch/x86/include/asm/kmsan.h | 55 ++++++++++++++++++++++++++++++++++++
2 files changed, 56 insertions(+)
create mode 100644 arch/x86/include/asm/kmsan.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 697da8dae1418..bd9436cd0f29b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -168,6 +168,7 @@ config X86
select HAVE_ARCH_KASAN if X86_64
select HAVE_ARCH_KASAN_VMALLOC if X86_64
select HAVE_ARCH_KFENCE
+ select HAVE_ARCH_KMSAN if X86_64
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS if MMU
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if MMU && COMPAT
diff --git a/arch/x86/include/asm/kmsan.h b/arch/x86/include/asm/kmsan.h
new file mode 100644
index 0000000000000..a790b865d0a68
--- /dev/null
+++ b/arch/x86/include/asm/kmsan.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * x86 KMSAN support.
+ *
+ * Copyright (C) 2022, Google LLC
+ * Author: Alexander Potapenko <gli...@google.com>
+ */
+
+#ifndef _ASM_X86_KMSAN_H
+#define _ASM_X86_KMSAN_H
+
+#ifndef MODULE
+
+#include <asm/processor.h>
+#include <linux/mmzone.h>
+
+/*
+ * Taken from arch/x86/mm/physaddr.h to avoid using an instrumented version.
+ */
+static inline bool kmsan_phys_addr_valid(unsigned long addr)
+{
+ if (IS_ENABLED(CONFIG_PHYS_ADDR_T_64BIT))
+ return !(addr >> boot_cpu_data.x86_phys_bits);
+ else
+ return true;
+}
+
+/*
+ * Taken from arch/x86/mm/physaddr.c to avoid using an instrumented version.
+ */
+static inline bool kmsan_virt_addr_valid(void *addr)
+{
+ unsigned long x = (unsigned long)addr;
+ unsigned long y = x - __START_KERNEL_map;
+
+ /* use the carry flag to determine if x was < __START_KERNEL_map */
+ if (unlikely(x > y)) {
+ x = y + phys_base;
+
+ if (y >= KERNEL_IMAGE_SIZE)
+ return false;
+ } else {
+ x = y + (__START_KERNEL_map - PAGE_OFFSET);
+
+ /* carry flag will be set if starting x was >= PAGE_OFFSET */
+ if ((x > y) || !kmsan_phys_addr_valid(x))
+ return false;
+ }
+
+ return pfn_valid(x >> PAGE_SHIFT);
+}
+
+#endif /* !MODULE */
+
+#endif /* _ASM_X86_KMSAN_H */
--
2.37.2.789.g6183377224-goog

Andrew Morton

unread,
Sep 15, 2022, 4:58:43 PM9/15/22
to Alexander Potapenko, Alexander Viro, Alexei Starovoitov, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
On Thu, 15 Sep 2022 17:04:01 +0200 Alexander Potapenko <gli...@google.com> wrote:

> KMSAN metadata for adjacent physical pages may not be adjacent,
> therefore accessing such pages together may lead to metadata
> corruption.
> We disable merging pages in biovec to prevent such corruptions.
>
> ...
>
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -88,6 +88,13 @@ static inline bool biovec_phys_mergeable(struct request_queue *q,
> phys_addr_t addr1 = page_to_phys(vec1->bv_page) + vec1->bv_offset;
> phys_addr_t addr2 = page_to_phys(vec2->bv_page) + vec2->bv_offset;
>
> + /*
> + * Merging adjacent physical pages may not work correctly under KMSAN
> + * if their metadata pages aren't adjacent. Just disable merging.
> + */
> + if (IS_ENABLED(CONFIG_KMSAN))
> + return false;
> +
> if (addr1 + vec1->bv_len != addr2)
> return false;
> if (xen_domain() && !xen_biovec_phys_mergeable(vec1, vec2->bv_page))

What are the runtime effects of this? In other words, how much
slowdown is this likely to cause in a reasonable worst-case?

Andrew Morton

unread,
Sep 15, 2022, 5:05:56 PM9/15/22
to Alexander Potapenko, Alexander Viro, Alexei Starovoitov, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
On Thu, 15 Sep 2022 17:03:34 +0200 Alexander Potapenko <gli...@google.com> wrote:

> Patchset v7 includes only minor changes to origin tracking that allowed
> us to drop "kmsan: unpoison @tlb in arch_tlb_gather_mmu()" from the
> series.
>
> For the following patches diff from v6 is non-trivial:
> - kmsan: add KMSAN runtime core
> - kmsan: add tests for KMSAN

I'm not sure this really merits a whole new patchbombing, but I'll do
it that way anyway.

For the curious, the major changes are:

For "kmsan: add KMSAN runtime core":

mm/kmsan/core.c | 28 ++++++++++------------------
mm/kmsan/kmsan.h | 1 +
mm/kmsan/report.c | 8 ++++++++
3 files changed, 19 insertions(+), 18 deletions(-)

--- a/mm/kmsan/core.c~kmsan-add-kmsan-runtime-core-v7
+++ a/mm/kmsan/core.c
@@ -29,13 +29,6 @@
#include "../slab.h"
#include "kmsan.h"

-/*
- * Avoid creating too long origin chains, these are unlikely to participate in
- * real reports.
- */
-#define MAX_CHAIN_DEPTH 7
-#define NUM_SKIPPED_TO_WARN 10000
-
bool kmsan_enabled __read_mostly;

/*
@@ -219,23 +212,22 @@ depot_stack_handle_t kmsan_internal_chai
* Make sure we have enough spare bits in @id to hold the UAF bit and
* the chain depth.
*/
- BUILD_BUG_ON((1 << STACK_DEPOT_EXTRA_BITS) <= (MAX_CHAIN_DEPTH << 1));
+ BUILD_BUG_ON(
+ (1 << STACK_DEPOT_EXTRA_BITS) <= (KMSAN_MAX_ORIGIN_DEPTH << 1));

extra_bits = stack_depot_get_extra_bits(id);
depth = kmsan_depth_from_eb(extra_bits);
uaf = kmsan_uaf_from_eb(extra_bits);

- if (depth >= MAX_CHAIN_DEPTH) {
- static atomic_long_t kmsan_skipped_origins;
- long skipped = atomic_long_inc_return(&kmsan_skipped_origins);
-
- if (skipped % NUM_SKIPPED_TO_WARN == 0) {
- pr_warn("not chained %ld origins\n", skipped);
- dump_stack();
- kmsan_print_origin(id);
- }
+ /*
+ * Stop chaining origins once the depth reached KMSAN_MAX_ORIGIN_DEPTH.
+ * This mostly happens in the case structures with uninitialized padding
+ * are copied around many times. Origin chains for such structures are
+ * usually periodic, and it does not make sense to fully store them.
+ */
+ if (depth == KMSAN_MAX_ORIGIN_DEPTH)
return id;
- }
+
depth++;
extra_bits = kmsan_extra_bits(depth, uaf);

--- a/mm/kmsan/kmsan.h~kmsan-add-kmsan-runtime-core-v7
+++ a/mm/kmsan/kmsan.h
@@ -27,6 +27,7 @@
#define KMSAN_POISON_FREE 0x2

#define KMSAN_ORIGIN_SIZE 4
+#define KMSAN_MAX_ORIGIN_DEPTH 7

#define KMSAN_STACK_DEPTH 64

--- a/mm/kmsan/report.c~kmsan-add-kmsan-runtime-core-v7
+++ a/mm/kmsan/report.c
@@ -89,12 +89,14 @@ void kmsan_print_origin(depot_stack_hand
depot_stack_handle_t head;
unsigned long magic;
char *descr = NULL;
+ unsigned int depth;

if (!origin)
return;

while (true) {
nr_entries = stack_depot_fetch(origin, &entries);
+ depth = kmsan_depth_from_eb(stack_depot_get_extra_bits(origin));
magic = nr_entries ? entries[0] : 0;
if ((nr_entries == 4) && (magic == KMSAN_ALLOCA_MAGIC_ORIGIN)) {
descr = (char *)entries[1];
@@ -109,6 +111,12 @@ void kmsan_print_origin(depot_stack_hand
break;
}
if ((nr_entries == 3) && (magic == KMSAN_CHAIN_MAGIC_ORIGIN)) {
+ /*
+ * Origin chains deeper than KMSAN_MAX_ORIGIN_DEPTH are
+ * not stored, so the output may be incomplete.
+ */
+ if (depth == KMSAN_MAX_ORIGIN_DEPTH)
+ pr_err("<Zero or more stacks not recorded to save memory>\n\n");
head = entries[1];
origin = entries[2];
pr_err("Uninit was stored to memory at:\n");
_

and for "kmsan: add tests for KMSAN":

--- a/mm/kmsan/kmsan_test.c~kmsan-add-tests-for-kmsan-v7
+++ a/mm/kmsan/kmsan_test.c
@@ -469,6 +469,34 @@ static void test_memcpy_aligned_to_unali
KUNIT_EXPECT_TRUE(test, report_matches(&expect));
static struct kunit_case kmsan_test_cases[] = {
KUNIT_CASE(test_uninit_kmalloc),
KUNIT_CASE(test_init_kmalloc),
@@ -486,6 +514,7 @@ static struct kunit_case kmsan_test_case
KUNIT_CASE(test_memcpy_aligned_to_aligned),
KUNIT_CASE(test_memcpy_aligned_to_unaligned),
KUNIT_CASE(test_memcpy_aligned_to_unaligned2),
+ KUNIT_CASE(test_long_origin_chain),
{},
};

_

Andrew Morton

unread,
Sep 15, 2022, 5:07:52 PM9/15/22
to Alexander Potapenko, Alexander Viro, Alexei Starovoitov, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
On Thu, 15 Sep 2022 14:05:51 -0700 Andrew Morton <ak...@linux-foundation.org> wrote:

>
> For "kmsan: add KMSAN runtime core":
>
> ...
>
> @@ -219,23 +212,22 @@ depot_stack_handle_t kmsan_internal_chai
> * Make sure we have enough spare bits in @id to hold the UAF bit and
> * the chain depth.
> */
> - BUILD_BUG_ON((1 << STACK_DEPOT_EXTRA_BITS) <= (MAX_CHAIN_DEPTH << 1));
> + BUILD_BUG_ON(
> + (1 << STACK_DEPOT_EXTRA_BITS) <= (KMSAN_MAX_ORIGIN_DEPTH << 1));
>
> extra_bits = stack_depot_get_extra_bits(id);
> depth = kmsan_depth_from_eb(extra_bits);
> uaf = kmsan_uaf_from_eb(extra_bits);
>
> - if (depth >= MAX_CHAIN_DEPTH) {
> - static atomic_long_t kmsan_skipped_origins;
> - long skipped = atomic_long_inc_return(&kmsan_skipped_origins);
> -
> - if (skipped % NUM_SKIPPED_TO_WARN == 0) {
> - pr_warn("not chained %ld origins\n", skipped);
> - dump_stack();
> - kmsan_print_origin(id);
> - }

Wouldn't it be neat if printk_ratelimited() returned true if it printed
something.

But you deleted this user of that neatness anyway ;)


Alexander Potapenko

unread,
Sep 16, 2022, 5:13:21 AM9/16/22
to Andrew Morton, Alexander Viro, Alexei Starovoitov, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasan-dev, Linux Memory Management List, Linux-Arch, LKML
To be honest, I have no idea. KMSAN already introduces a lot of
runtime overhead to every memory access, it's unlikely that disabling
some filesystem optimization will add anything on top of that.
Anyway, KMSAN is a debugging tool that is not supposed to be used in
production (there's a big boot-time warning about that now :) )

--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

youling 257

unread,
Oct 19, 2022, 1:37:49 PM10/19/22
to Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Marco Elver, Mark Rutland, Matthew Wilcox, "Michael S. Tsirkin", Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org


---------- Forwarded message ---------
发件人: youling257 <youli...@gmail.com>
Date: 2022年10月20日周四 上午1:36
Subject: Re: [PATCH v7 18/43] instrumented.h: add KMSAN support
To: <gli...@google.com>
Cc: <youli...@gmail.com>


i using linux kernel 6.1rc1 on android, i use gcc12 build kernel 6.1 for android, CONFIG_KMSAN is not set.
"instrumented.h: add KMSAN support" cause android bluetooth high CPU usage.
git bisect linux kernel 6.1rc1, "instrumented.h: add KMSAN support" is a bad commit for my android.

this is my kernel 6.1,  revert include/linux/instrumented.h fix high cpu usage problem.
https://github.com/youling257/android-mainline/commits/6.1

Marco Elver

unread,
Oct 19, 2022, 1:59:00 PM10/19/22
to youling 257, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
What arch?
If x86, can you try to revert only the change to
instrument_get_user()? (I wonder if the u64 conversion is causing
issues.)

youling 257

unread,
Oct 19, 2022, 3:29:03 PM10/19/22
to Marco Elver, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
arch x86, this's my revert,
https://github.com/youling257/android-mainline/commit/401cbfa61cbfc20c87a5be8e2dda68ac5702389f
i tried different revert, have to remove kmsan_copy_to_user.

Marco Elver

unread,
Oct 19, 2022, 4:00:12 PM10/19/22
to youling 257, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
On Thu, Oct 20, 2022 at 03:29AM +0800, youling 257 wrote:
[...]
> > What arch?
> > If x86, can you try to revert only the change to
> > instrument_get_user()? (I wonder if the u64 conversion is causing
> > issues.)
> >
> arch x86, this's my revert,
> https://github.com/youling257/android-mainline/commit/401cbfa61cbfc20c87a5be8e2dda68ac5702389f
> i tried different revert, have to remove kmsan_copy_to_user.

There you reverted only instrument_put_user() - does it fix the issue?

If not, can you try only something like this (only revert
instrument_get_user()):

diff --git a/include/linux/instrumented.h b/include/linux/instrumented.h
index 501fa8486749..dbe3ec38d0e6 100644
--- a/include/linux/instrumented.h
+++ b/include/linux/instrumented.h
@@ -167,9 +167,6 @@ instrument_copy_from_user_after(const void *to, const void __user *from,
*/
#define instrument_get_user(to) \
({ \
- u64 __tmp = (u64)(to); \
- kmsan_unpoison_memory(&__tmp, sizeof(__tmp)); \
- to = __tmp; \
})


Once we know which one of these is the issue, we can figure out a proper
fix.

Thanks,

-- Marco

youling 257

unread,
Oct 19, 2022, 4:07:12 PM10/19/22
to Marco Elver, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
That is i did,i already test, remove "u64 __tmp…kmsan_unpoison_memory", no help.
i only remove kmsan_copy_to_user, fix my issue.

Marco Elver

unread,
Oct 19, 2022, 5:37:05 PM10/19/22
to youling 257, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
On Thu, Oct 20, 2022 at 04:07AM +0800, youling 257 wrote:
> That is i did,i already test, remove "u64 __tmp…kmsan_unpoison_memory", no help.
> i only remove kmsan_copy_to_user, fix my issue.

Ok - does only the below work (without the reverts)?

diff --git a/include/linux/kmsan-checks.h b/include/linux/kmsan-checks.h
index c4cae333deec..eb05caa8f523 100644
--- a/include/linux/kmsan-checks.h
+++ b/include/linux/kmsan-checks.h
@@ -73,8 +73,8 @@ static inline void kmsan_unpoison_memory(const void *address, size_t size)
static inline void kmsan_check_memory(const void *address, size_t size)
{
}
-static inline void kmsan_copy_to_user(void __user *to, const void *from,
- size_t to_copy, size_t left)
+static __always_inline void kmsan_copy_to_user(void __user *to, const void *from,
+ size_t to_copy, size_t left)
{
}


... because when you say only removing kmsan_copy_to_user() (from
instrument_put_user()) works, it really doesn't make any sense. The only
explanation would be if the compiler inlining is broken.

Alexander Potapenko

unread,
Oct 19, 2022, 5:45:07 PM10/19/22
to youling 257, Marco Elver, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
On Wed, Oct 19, 2022 at 1:07 PM youling 257 <youli...@gmail.com> wrote:
>
> That is i did,i already test, remove "u64 __tmp…kmsan_unpoison_memory", no help.
> i only remove kmsan_copy_to_user, fix my issue.

Do you have any profiling results that can help pinpoint functions
that are affected by this change?
Also, am I getting it right you are building x86 Android?
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/CAOzgRdY6KSxDMRJ%2Bq2BWHs4hRQc5y-PZ2NYG%2B%2B-AMcUrO8YOgA%40mail.gmail.com.

youling 257

unread,
Oct 20, 2022, 1:53:06 AM10/20/22
to Marco Elver, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
without the revert, this include/linux/kmsan-checks.h patch no help.
busybox top, com.android.bluetooth cpu usage 12%.
my kernel config, config-6.1.0-rc1-android-x86_64+

CPU: 3.6% usr 12.1% sys 0.0% nic 84.1% idle 0.0% io 0.0% irq 0.0% sirq
Load average: 1.49 0.65 0.25 4/1336 4940
PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND
1381 926 1002 S 1265m 16.1 5 12.4 {droid.bluetooth}
com.android.bluetooth
919 1 0 S< 187m 2.4 2 2.2 /system/bin/surfaceflinger
2220 926 10327 S 1882m 24.0 7 1.1 com.baidu.input
3163 926 10175 S 2153m 27.5 0 0.0 {rojekti.clipper}
fi.rojekti.clipper
1250 926 1000 S< 2140m 27.3 0 0.0 system_server
2548 926 10125 S 2078m 26.5 4 0.0 {chiztech.swapps}
com.schiztech.swapps
2516 926 10217 S 2058m 26.3 2 0.0 {ogle.android.gm}
com.google.android.gm
2925 926 10318 S 1876m 24.0 1 0.0 {onelli.juicessh}
com.sonelli.juicessh
926 1 0 S 1780m 22.7 7 0.0 {main} zygote
3374 3369 0 S 1541m 19.7 2 0.0 {main} com.topjohnwu.magisk:root
3079 926 10021 S 1477m 18.9 0 0.0 {ndroid.systemui}
com.android.systemui
config-6.1.0-rc1-android-x86_64+

Alexander Potapenko

unread,
Oct 20, 2022, 2:15:03 PM10/20/22
to Marco Elver, youling 257, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
If what Marco suggests does not help, could you post the output of `nm -S vmlinux` with and without your revert so that we can see which functions were affected by the change?

Unfortunately the top results are of no help, do you have the `perf` tool available in your system? 
 
--
You received this message because you are subscribed to the Google Groups "kasan-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+...@googlegroups.com.

youling 257

unread,
Oct 21, 2022, 1:55:08 AM10/21/22
to Alexander Potapenko, Marco Elver, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
How to use perf tool?
I use diff compare "nm -S vmlinux".
android_x86:/ $ su
android_x86:/ # diff /sdcard/with_vmlinux /sdcard/without_vmlinux
--- /sdcard/with_vmlinux
+++ /sdcard/without_vmlinux
@@ -12951,7 +12951,7 @@
ffffffff81338400 00000000000000dc T compat_only_sysfs_link_entry_to_kobj
ffffffff810884d0 00000000000002b7 T compat_ptrace_request
ffffffff812b4e80 0000000000000025 T compat_ptr_ioctl
-ffffffff811380d0 000000000000008d T compat_put_bitmap
+ffffffff811380d0 000000000000008a T compat_put_bitmap
ffffffff81a62180 000000000000000f t compat_raw_ioctl
ffffffff81afa060 000000000000000f t compat_rawv6_ioctl
ffffffff81090790 0000000000000018 T compat_restore_altstack
@@ -14260,7 +14260,7 @@
ffffffff8167d380 00000000000000dd T cppc_set_enable
ffffffff8167d460 0000000000000221 T cppc_set_perf
ffffffff818cdbe0 000000000000004f t cppc_update_perf
-ffffffff8102bba0 000000000000012b t cp_stat64
+ffffffff8102bba0 000000000000012a t cp_stat64
ffffffff81494810 0000000000000028 t cp_status_show
ffffffff812a33e0 00000000000001b2 t cp_statx
ffffffff82e1c934 0000000000000004 b cpu0_hotpluggable
@@ -26993,7 +26993,6 @@
ffffffff821b8000 0000000000000038 d CSWTCH.102
ffffffff821cd6c0 0000000000000048 d CSWTCH.107
ffffffff821d4b80 0000000000000018 d CSWTCH.11
-ffffffff8203dd88 000000000000000c d CSWTCH.111
ffffffff82152de0 00000000000000a0 d CSWTCH.111
ffffffff82105ea0 0000000000000020 d CSWTCH.1116
ffffffff82105e80 0000000000000020 d CSWTCH.1121
@@ -27002,6 +27001,7 @@
ffffffff821d4b60 0000000000000018 d CSWTCH.12
ffffffff821b2c60 0000000000000024 d CSWTCH.120
ffffffff82033fc0 0000000000000068 d CSWTCH.123
+ffffffff8203dd88 000000000000000c d CSWTCH.123
ffffffff82014b80 00000000000000a0 d CSWTCH.125
ffffffff82014b40 0000000000000028 d CSWTCH.128
ffffffff82033f40 0000000000000064 d CSWTCH.128
@@ -27026,7 +27026,6 @@
ffffffff821c9b80 0000000000000018 d CSWTCH.189
ffffffff821b39a0 0000000000000028 d CSWTCH.19
ffffffff820055c0 0000000000000024 d CSWTCH.195
-ffffffff82208de0 0000000000000004 d CSWTCH.199
ffffffff82146127 0000000000000006 d CSWTCH.2
ffffffff82146380 0000000000000014 d CSWTCH.2
ffffffff8219f770 0000000000000018 d CSWTCH.20
@@ -27033,5 +27032,6 @@
ffffffff821d9c90 0000000000000018 d CSWTCH.20
ffffffff821b7e60 0000000000000128 d CSWTCH.202
+ffffffff82208de0 0000000000000004 d CSWTCH.202
ffffffff821b7cc0 0000000000000188 d CSWTCH.204
ffffffff821b7c80 0000000000000038 d CSWTCH.206
ffffffff821a4ac0 0000000000000006 d CSWTCH.207
@@ -27043,8 +27043,8 @@
ffffffff821b7be0 0000000000000038 d CSWTCH.214
ffffffff821c9940 0000000000000030 d CSWTCH.217
ffffffff8219f750 0000000000000018 d CSWTCH.22
-ffffffff82033580 0000000000000023 d CSWTCH.221
ffffffff8213de80 0000000000000028 d CSWTCH.222
+ffffffff82033580 0000000000000023 d CSWTCH.223
ffffffff82216c10 0000000000000014 d CSWTCH.225
ffffffff82216bf0 0000000000000014 d CSWTCH.232
ffffffff82216be0 0000000000000010 d CSWTCH.234
@@ -27052,7 +27052,7 @@
ffffffff821b3800 0000000000000188 d CSWTCH.24
ffffffff821f3340 0000000000000058 d CSWTCH.242
ffffffff82158c60 0000000000000030 d CSWTCH.244
-ffffffff821a9868 000000000000000a d CSWTCH.255
+ffffffff821a9868 000000000000000a d CSWTCH.258
ffffffff82106a20 0000000000000010 d CSWTCH.261
ffffffff821a42a0 0000000000000004 d CSWTCH.261
ffffffff8203d9e0 0000000000000030 d CSWTCH.265
@@ -27059,12 +27059,12 @@
ffffffff82126780 0000000000000028 d CSWTCH.27
ffffffff8214d360 0000000000000040 d CSWTCH.278
ffffffff82136f80 0000000000000024 d CSWTCH.28
-ffffffff821b3000 0000000000000024 d CSWTCH.281
-ffffffff821b2fc0 0000000000000024 d CSWTCH.282
-ffffffff8210ed40 000000000000002c d CSWTCH.289
-ffffffff8210ed00 000000000000002b d CSWTCH.290
-ffffffff8210ecc0 000000000000002c d CSWTCH.292
+ffffffff821b3000 0000000000000024 d CSWTCH.285
+ffffffff821b2fc0 0000000000000024 d CSWTCH.286
+ffffffff8210ed40 000000000000002c d CSWTCH.292
ffffffff821b3ae0 0000000000000020 d CSWTCH.292
+ffffffff8210ed00 000000000000002b d CSWTCH.293
+ffffffff8210ecc0 000000000000002c d CSWTCH.295
ffffffff8214d340 0000000000000020 d CSWTCH.296
ffffffff82146121 0000000000000006 d CSWTCH.3
ffffffff821b7ba0 0000000000000020 d CSWTCH.3
@@ -27075,17 +27075,17 @@
ffffffff821d3c40 0000000000000020 d CSWTCH.31
ffffffff820ff780 0000000000000004 d CSWTCH.318
ffffffff821b5cc0 0000000000000038 d CSWTCH.33
-ffffffff8214efa0 0000000000000010 d CSWTCH.334
+ffffffff8214efa0 0000000000000010 d CSWTCH.346
ffffffff8214a8c0 0000000000000018 d CSWTCH.35
-ffffffff8210ecb0 000000000000000c d CSWTCH.359
-ffffffff8202cae0 000000000000002c d CSWTCH.360
+ffffffff8210ecb0 000000000000000c d CSWTCH.362
ffffffff82016300 0000000000000020 d CSWTCH.363
ffffffff821fce00 0000000000000040 d CSWTCH.38
-ffffffff82032780 0000000000000074 d CSWTCH.387
+ffffffff8202cae0 000000000000002c d CSWTCH.381
+ffffffff82032780 0000000000000074 d CSWTCH.390
ffffffff82118100 0000000000000018 d CSWTCH.40
ffffffff821598a0 0000000000000020 d CSWTCH.41
-ffffffff82032760 0000000000000020 d CSWTCH.428
ffffffff82139a20 0000000000000038 d CSWTCH.43
+ffffffff82032760 0000000000000020 d CSWTCH.431
ffffffff82015d40 0000000000000040 d CSWTCH.45
ffffffff821399e0 0000000000000040 d CSWTCH.45
ffffffff822194e0 000000000000000c d CSWTCH.450
@@ -27092,5 +27092,5 @@
ffffffff8214d2e0 0000000000000048 d CSWTCH.459
-ffffffff8210eca0 000000000000000c d CSWTCH.464
+ffffffff8210eca0 000000000000000c d CSWTCH.467
ffffffff821399c0 0000000000000020 d CSWTCH.47
ffffffff8214a8e0 00000000000000c8 d CSWTCH.48
ffffffff82028ac0 0000000000000018 d CSWTCH.49
@@ -27097,8 +27097,8 @@
ffffffff821399a0 0000000000000020 d CSWTCH.49
-ffffffff82032740 0000000000000018 d CSWTCH.507
-ffffffff82032730 000000000000000c d CSWTCH.508
-ffffffff82032720 000000000000000c d CSWTCH.509
ffffffff82139960 0000000000000030 d CSWTCH.51
+ffffffff82032740 0000000000000018 d CSWTCH.510
+ffffffff82032730 000000000000000c d CSWTCH.511
+ffffffff82032720 000000000000000c d CSWTCH.512
ffffffff821595c0 0000000000000100 d CSWTCH.52
ffffffff821b6240 0000000000000018 d CSWTCH.52
ffffffff821371c0 0000000000000024 d CSWTCH.523
@@ -27125,7 +27125,7 @@
ffffffff82139680 0000000000000058 d CSWTCH.72
ffffffff82139620 0000000000000048 d CSWTCH.74
ffffffff8213c740 0000000000000020 d CSWTCH.74
-ffffffff821f04a0 0000000000000018 d CSWTCH.753
+ffffffff821f04a0 0000000000000018 d CSWTCH.766
ffffffff8200e4e0 0000000000000020 d CSWTCH.77
ffffffff821bffc0 0000000000000018 d CSWTCH.78
ffffffff8200e4c0 0000000000000020 d CSWTCH.79
1|android_x86:/ #

Marco Elver

unread,
Oct 21, 2022, 2:17:26 AM10/21/22
to youling 257, Alexander Potapenko, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
On Thu, 20 Oct 2022 at 22:55, youling 257 <youli...@gmail.com> wrote:
>
> How to use perf tool?

The simplest would be to try just "perf top" - and see which kernel
functions consume most CPU cycles. I would suggest you compare both
kernels, and see if you can spot a function which uses more cycles% in
the problematic kernel.

youling 257

unread,
Oct 21, 2022, 2:39:47 AM10/21/22
to Marco Elver, Alexander Potapenko, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
PerfTop: 8253 irqs/sec kernel:75.3% exact: 100.0% lost: 0/0 drop:
0/17899 [4000Hz cycles], (all, 8 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

14.87% [kernel] [k] 0xffffffff941d1f37
6.71% [kernel] [k] 0xffffffff942016cf

what is 0xffffffff941d1f37?

Marco Elver

unread,
Oct 21, 2022, 3:38:03 AM10/21/22
to youling 257, Alexander Potapenko, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
On Thu, 20 Oct 2022 at 23:39, youling 257 <youli...@gmail.com> wrote:
>
> PerfTop: 8253 irqs/sec kernel:75.3% exact: 100.0% lost: 0/0 drop:
> 0/17899 [4000Hz cycles], (all, 8 CPUs)
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> 14.87% [kernel] [k] 0xffffffff941d1f37
> 6.71% [kernel] [k] 0xffffffff942016cf
>
> what is 0xffffffff941d1f37?

You need to build with debug symbols:
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y

Then it'll show function names.

youling 257

unread,
Oct 21, 2022, 11:19:28 AM10/21/22
to Marco Elver, Alexander Potapenko, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
CONFIG_DEBUG_INFO=y
CONFIG_AS_HAS_NON_CONST_LEB128=y
# CONFIG_DEBUG_INFO_NONE is not set
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
# CONFIG_DEBUG_INFO_DWARF4 is not set
# CONFIG_DEBUG_INFO_DWARF5 is not set
# CONFIG_DEBUG_INFO_REDUCED is not set
# CONFIG_DEBUG_INFO_COMPRESSED is not set
# CONFIG_DEBUG_INFO_SPLIT is not set
# CONFIG_DEBUG_INFO_BTF is not set
# CONFIG_GDB_SCRIPTS is not set

perf top still no function name.

12.90% [kernel] [k] 0xffffffff833dfa64
3.78% [kernel] [k] 0xffffffff8285b439
3.61% [kernel] [k] 0xffffffff83370254
2.32% [kernel] [k] 0xffffffff8337025b
1.88% bluetooth.default.so [.] 0x000000000000d09d

Alexander Potapenko

unread,
Oct 21, 2022, 1:02:43 PM10/21/22
to youling 257, Marco Elver, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
On Fri, Oct 21, 2022 at 8:19 AM youling 257 <youli...@gmail.com> wrote:
CONFIG_DEBUG_INFO=y
CONFIG_AS_HAS_NON_CONST_LEB128=y
# CONFIG_DEBUG_INFO_NONE is not set
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
# CONFIG_DEBUG_INFO_DWARF4 is not set
# CONFIG_DEBUG_INFO_DWARF5 is not set
# CONFIG_DEBUG_INFO_REDUCED is not set
# CONFIG_DEBUG_INFO_COMPRESSED is not set
# CONFIG_DEBUG_INFO_SPLIT is not set
# CONFIG_DEBUG_INFO_BTF is not set
# CONFIG_GDB_SCRIPTS is not set

perf top still no function name.
Will it help if you disable CONFIG_RANDOMIZE_BASE?
(if it doesn't show the symbols, at least we'll be able to figure out the offending function by running nm)
 

12.90%  [kernel]              [k] 0xffffffff833dfa64
     3.78%  [kernel]              [k] 0xffffffff8285b439
     3.61%  [kernel]              [k] 0xffffffff83370254
     2.32%  [kernel]              [k] 0xffffffff8337025b
     1.88%  bluetooth.default.so  [.] 0x000000000000d09d

2022-10-21 15:37 GMT+08:00, Marco Elver <el...@google.com>:
> On Thu, 20 Oct 2022 at 23:39, youling 257 <youli...@gmail.com> wrote:
>>
>> PerfTop:    8253 irqs/sec  kernel:75.3%  exact: 100.0% lost: 0/0 drop:
>> 0/17899 [4000Hz cycles],  (all, 8 CPUs)
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>>     14.87%  [kernel]              [k] 0xffffffff941d1f37
>>      6.71%  [kernel]              [k] 0xffffffff942016cf
>>
>> what is 0xffffffff941d1f37?
>
> You need to build with debug symbols:
> CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
>
> Then it'll show function names.
>
>> 2022-10-21 14:16 GMT+08:00, Marco Elver <el...@google.com>:
>> > On Thu, 20 Oct 2022 at 22:55, youling 257 <youli...@gmail.com> wrote:
>> >>
>> >> How to use perf tool?
>> >
>> > The simplest would be to try just "perf top" - and see which kernel
>> > functions consume most CPU cycles. I would suggest you compare both
>> > kernels, and see if you can spot a function which uses more cycles% in
>> > the problematic kernel.
>> >
>

Kees Cook

unread,
Oct 21, 2022, 1:21:14 PM10/21/22
to Alexander Potapenko, youling 257, Marco Elver, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
On October 21, 2022 10:02:05 AM PDT, Alexander Potapenko <gli...@google.com> wrote:
>On Fri, Oct 21, 2022 at 8:19 AM youling 257 <youli...@gmail.com> wrote:
>
>> CONFIG_DEBUG_INFO=y
>> CONFIG_AS_HAS_NON_CONST_LEB128=y
>> # CONFIG_DEBUG_INFO_NONE is not set
>> CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
>> # CONFIG_DEBUG_INFO_DWARF4 is not set
>> # CONFIG_DEBUG_INFO_DWARF5 is not set
>> # CONFIG_DEBUG_INFO_REDUCED is not set
>> # CONFIG_DEBUG_INFO_COMPRESSED is not set
>> # CONFIG_DEBUG_INFO_SPLIT is not set
>> # CONFIG_DEBUG_INFO_BTF is not set
>> # CONFIG_GDB_SCRIPTS is not set
>>
>> perf top still no function name.
>>
>Will it help if you disable CONFIG_RANDOMIZE_BASE?
>(if it doesn't show the symbols, at least we'll be able to figure out the
>offending function by running nm)

Is KALLSYMS needed?
Kees Cook

Alexander Potapenko

unread,
Oct 21, 2022, 4:38:08 PM10/21/22
to youling 257, Marco Elver, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
On Fri, Oct 21, 2022 at 8:19 AM youling 257 <youli...@gmail.com> wrote:
CONFIG_DEBUG_INFO=y
CONFIG_AS_HAS_NON_CONST_LEB128=y
# CONFIG_DEBUG_INFO_NONE is not set
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
# CONFIG_DEBUG_INFO_DWARF4 is not set
# CONFIG_DEBUG_INFO_DWARF5 is not set
# CONFIG_DEBUG_INFO_REDUCED is not set
# CONFIG_DEBUG_INFO_COMPRESSED is not set
# CONFIG_DEBUG_INFO_SPLIT is not set
# CONFIG_DEBUG_INFO_BTF is not set
# CONFIG_GDB_SCRIPTS is not set

perf top still no function name.

12.90%  [kernel]              [k] 0xffffffff833dfa64

I think I know what's going on. The two functions that differ with and without the patch were passing an incremented pointer to unsafe_put_user(), which is a macro, e.g.:

   unsafe_put_user((compat_ulong_t)m, umask++, Efault);

Because that macro didn't evaluate its second parameter, "umask++" was passed to a call to kmsan_copy_to_user(), which resulted in an extra increment of umask.
This probably violated some expectations of the userspace app, which in turn led to repetitive kernel calls.

Could you please check if the patch below fixes the problem for you?

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 8bc614cfe21b9..1cc756eafa447 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -254,24 +254,25 @@ extern void __put_user_nocheck_8(void);
 #define __put_user_size(x, ptr, size, label)                           \
 do {                                                                   \
        __typeof__(*(ptr)) __x = (x); /* eval x once */                 \
-       __chk_user_ptr(ptr);                                            \
+       __typeof__(ptr) __ptr = (ptr); /* eval ptr once */              \
+       __chk_user_ptr(__ptr);                                          \
        switch (size) {                                                 \
        case 1:                                                         \
-               __put_user_goto(__x, ptr, "b", "iq", label);            \
+               __put_user_goto(__x, __ptr, "b", "iq", label);          \
                break;                                                  \
        case 2:                                                         \
-               __put_user_goto(__x, ptr, "w", "ir", label);            \
+               __put_user_goto(__x, __ptr, "w", "ir", label);          \
                break;                                                  \
        case 4:                                                         \
-               __put_user_goto(__x, ptr, "l", "ir", label);            \
+               __put_user_goto(__x, __ptr, "l", "ir", label);          \
                break;                                                  \
        case 8:                                                         \
-               __put_user_goto_u64(__x, ptr, label);                   \
+               __put_user_goto_u64(__x, __ptr, label);                 \
                break;                                                  \
        default:                                                        \
                __put_user_bad();                                       \
        }                                                               \
-       instrument_put_user(__x, ptr, size);                            \
+       instrument_put_user(__x, __ptr, size);                          \
 } while (0)
 
 #ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT

youling 257

unread,
Oct 22, 2022, 2:24:07 AM10/22/22
to Alexander Potapenko, Marco Elver, Alexander Viro, Alexei Starovoitov, Andrew Morton, Andrey Konovalov, Andy Lutomirski, Arnd Bergmann, Borislav Petkov, Christoph Hellwig, Christoph Lameter, David Rientjes, Dmitry Vyukov, Eric Biggers, Eric Dumazet, Greg Kroah-Hartman, Herbert Xu, Ilya Leoshkevich, Ingo Molnar, Jens Axboe, Joonsoo Kim, Kees Cook, Mark Rutland, Matthew Wilcox, Michael S. Tsirkin, Pekka Enberg, Peter Zijlstra, Petr Mladek, Stephen Rothwell, Steven Rostedt, Thomas Gleixner, Vasily Gorbik, Vegard Nossum, Vlastimil Babka, kasa...@googlegroups.com, linu...@kvack.org, linux...@vger.kernel.org, linux-...@vger.kernel.org
I test this patch fix my problem.
Reply all
Reply to author
Forward
0 new messages