KMSAN stability

Kirill A. Shutemov

unread,

Jun 18, 2024, 12:42:12 AM6/18/24

to Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, x...@kernel.org, Dave Hansen

Hi,

I attempted to use KMSAN, but I had difficulty getting the system to boot
with the feature enabled.

The kernel boots successfully in my x86-64 VM if I enable KMSAN on top of
defconfig. However, if I try to enable *any* of the following options, the
boot stops:

- CONFIG_DEBUG_VIRTUAL
- CONFIG_DEBUG_LOCK_ALLOC
- CONFIG_DEFERRED_STRUCT_PAGE_INIT

The kernel becomes stuck just after KMSAN is initialized. I do not
understand the internals of KMSAN and I do not see an obvious reason for
this failure.

I have a feeling that the list of problematic options is not exhaustive.

Any ideas?

I also noticed an instrumentation issue:

vmlinux.o: warning: objtool: handle_bug+0x4: call to kmsan_unpoison_entry_regs() leaves .noinstr.text section

--
Kiryl Shutsemau / Kirill A. Shutemov

Alexander Potapenko

unread,

Jun 20, 2024, 10:13:11 AM6/20/24

to Kirill A. Shutemov, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, x...@kernel.org, Dave Hansen

Hi Kirill,

KMSAN has limited support for non-default configs due to a lack of
extensive testing beyond the syzbot config.

CONFIG_DEFERRED_STRUCT_PAGE_INIT:
This option conflicts with KMSAN's assumption that all pages are
reclaimed simultaneously. A pending patch by Ilya Leoshkevich will
disable it when KMSAN is enabled.

CONFIG_DEBUG_VIRTUAL, CONFIG_DEBUG_LOCK_ALLOC: These options cause
virt_to_page_or_null() to call instrumented code, which should ideally
never happen.

KMSAN calls kmsan_get_metadata() for every memory access, and this
function is usually fast because it's mostly inlined and only calls
non-instrumented code from mm/kmsan/. However, debug configs can add
checks to virt_to_page_or_nul()l that call instrumented code from
lib/, potentially leading to deadlocks (if locks are involved) or
stack overflows (if memory is touched).

For CONFIG_DEBUG_VIRTUAL, I'm preparing a patch to selectively disable
instrumentation of arch/x86/mm/physaddr.c to address this.

The issue with CONFIG_DEBUG_LOCK_ALLOC is more complex and similar to
a previous issue where RCU calls caused infinite recursion
(https://lore.kernel.org/all/20240115184430....@google.com/T/#u).

Possible workarounds include:
- using kmsan_enter_runtime()/kmsan_leave_runtime() to avoid recursion
in kmsan_get_shadow_origin_ptr() (this _might_ lead to KMSAN false
positives in lockdep, as we will be ignoring writes to the global lock
map);
- using a separate per-CPU counter to detect nested calls of
pfn_valid(), which is already called with disabled preemption.
The latter option sort of works, but doesn't eliminate the impact of
lockdep checks being introduced to _every_ memory access. Given that,
I am not sure having CONFIG_DEBUG_LOCK_ALLOC together with
CONFIG_KMSAN is worth it.

I've filed these issues for tracking:

https://github.com/google/kmsan/issues/94
https://github.com/google/kmsan/issues/95

Additionally, I'll send a patch to fix the objtool warning about
missing instrumentation around the KASAN call in handle_bug().

HTH

--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Kirill A. Shutemov

unread,

Jun 21, 2024, 11:18:57 AM6/21/24

to Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, x...@kernel.org, Dave Hansen

On Thu, Jun 20, 2024 at 04:12:28PM +0200, Alexander Potapenko wrote:
>
> Hi Kirill,
>
> KMSAN has limited support for non-default configs due to a lack of
> extensive testing beyond the syzbot config.

Thanks for the patchset that addressing reported issues.

There's one more problematic option I've found: CONFIG_DEBUG_PREEMPT.

Dave Hansen

unread,

Jun 21, 2024, 12:04:08 PM6/21/24

to Kirill A. Shutemov, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, x...@kernel.org, Dave Hansen

On 6/21/24 08:18, Kirill A. Shutemov wrote:
> On Thu, Jun 20, 2024 at 04:12:28PM +0200, Alexander Potapenko wrote:
>> Hi Kirill,
>>
>> KMSAN has limited support for non-default configs due to a lack of
>> extensive testing beyond the syzbot config.
> Thanks for the patchset that addressing reported issues.
>
> There's one more problematic option I've found: CONFIG_DEBUG_PREEMPT.

It seems like testing using clang as the compiler is a bit lacking. I'm
a bit surprised there are so many bugs here.

Alexander Potapenko

unread,

Jun 24, 2024, 7:36:16 AM6/24/24

to Dave Hansen, Kirill A. Shutemov, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, x...@kernel.org, Dave Hansen

Well, it's not Clang itself that is lacking testing: Clang builds with
all those debug configs are quite reliable.
The problem here is that KMSAN is doing pretty elaborate stuff on
every instrumented memory access, and when we enable any of the debug
configs on top of that, we often end up calling debug code from the
instrumentation code, and that debug code is instrumented, so there's
infinite recursion.

Because KMSAN is primarily used by syzbot, which already runs these
debug configs with faster tools (KASAN), they are indeed undertested,
and people keep running into incompatibilities like those Kirill
reported.

There are several possible approaches to addressing this, each having
its benefits and drawbacks.

1. Find all code that could be potentially called from
kmsan_virt_addr_valid(), mark it as noinstr, KMSAN_SANITIZE:=n,
__no_sanitize_memory etc.
+ Debug configs still work
- If this code is called from somewhere else, it won't be
instrumented, so we can miss KMSAN errors in it.
- Future debug configs may introduce more instrumented code.

2. Provide simplified versions of primitives needed by KMSAN without
debug checks (e.g. preempt_disable(), pfn_valid(), phys_addr()) that
won't be instrumented.
+ Covers all existing and future debug configs.
- Code duplication is bad, we'll need to keep both implementations in
sync. (We could refactor the existing primitives though, so that there
is a single version for which checks can be disabled).

3. Disable low-level debugging configs under KMSAN.
+ No more issues with known problematic configs.
- Some people may actually want to have these configs enabled.
- New debug configs may still break in the absence of testing.

4. Use reentrancy counters to stop KMSAN functions from recursively
calling each other.
+ Is config-agnostic.
- This is brittle: there are situations in which we want instrumented
code called from KMSAN runtime to correctly initialize the metadata,
not just bail out
(e.g. when allocating heap storage for stack traces saved by KMSAN in
the stackdepot, we'd better not ignore the stores to freelist pointers
to avoid false positives later on).

Dave Hansen

unread,

Jun 25, 2024, 11:31:05 AM6/25/24

to Alexander Potapenko, Kirill A. Shutemov, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, x...@kernel.org, Dave Hansen

On 6/24/24 04:35, Alexander Potapenko wrote:
...

> 2. Provide simplified versions of primitives needed by KMSAN without
> debug checks (e.g. preempt_disable(), pfn_valid(), phys_addr()) that
> won't be instrumented.
> + Covers all existing and future debug configs.
> - Code duplication is bad, we'll need to keep both implementations in
> sync. (We could refactor the existing primitives though, so that there
> is a single version for which checks can be disabled).

Creating refactored, single versions of these functions that KMSAN needs
would be my first preference for a solution.

But, honestly, anything is better than what we have at the moment.

Reply all

Reply to author

Forward