[PATCH] mm/kmsan: Fix kmsan kmalloc hook when no stack depots are allocated yet

12 views
Skip to first unread message

Aleksei Nikiforov

unread,
Sep 30, 2025, 7:57:02 AMSep 30
to Alexander Potapenko, Marco Elver, Dmitry Vyukov, Andrew Morton, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Ilya Leoshkevich, Aleksei Nikiforov
If no stack depot is allocated yet,
due to masking out __GFP_RECLAIM flags
kmsan called from kmalloc cannot allocate stack depot.
kmsan fails to record origin and report issues.

Reusing flags from kmalloc without modifying them should be safe for kmsan.
For example, such chain of calls is possible:
test_uninit_kmalloc -> kmalloc -> __kmalloc_cache_noprof ->
slab_alloc_node -> slab_post_alloc_hook ->
kmsan_slab_alloc -> kmsan_internal_poison_memory.

Only when it is called in a context without flags present
should __GFP_RECLAIM flags be masked.

With this change all kmsan tests start working reliably.

Signed-off-by: Aleksei Nikiforov <aleksei....@linux.ibm.com>
---
mm/kmsan/core.c | 3 ---
mm/kmsan/hooks.c | 6 ++++--
mm/kmsan/shadow.c | 2 +-
3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/mm/kmsan/core.c b/mm/kmsan/core.c
index 1ea711786c52..4d3042c1269c 100644
--- a/mm/kmsan/core.c
+++ b/mm/kmsan/core.c
@@ -72,9 +72,6 @@ depot_stack_handle_t kmsan_save_stack_with_flags(gfp_t flags,

nr_entries = stack_trace_save(entries, KMSAN_STACK_DEPTH, 0);

- /* Don't sleep. */
- flags &= ~(__GFP_DIRECT_RECLAIM | __GFP_KSWAPD_RECLAIM);
-
handle = stack_depot_save(entries, nr_entries, flags);
return stack_depot_set_extra_bits(handle, extra);
}
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 97de3d6194f0..92ebc0f557d0 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -84,7 +84,8 @@ void kmsan_slab_free(struct kmem_cache *s, void *object)
if (s->ctor)
return;
kmsan_enter_runtime();
- kmsan_internal_poison_memory(object, s->object_size, GFP_KERNEL,
+ kmsan_internal_poison_memory(object, s->object_size,
+ GFP_KERNEL & ~(__GFP_RECLAIM),
KMSAN_POISON_CHECK | KMSAN_POISON_FREE);
kmsan_leave_runtime();
}
@@ -114,7 +115,8 @@ void kmsan_kfree_large(const void *ptr)
kmsan_enter_runtime();
page = virt_to_head_page((void *)ptr);
KMSAN_WARN_ON(ptr != page_address(page));
- kmsan_internal_poison_memory((void *)ptr, page_size(page), GFP_KERNEL,
+ kmsan_internal_poison_memory((void *)ptr, page_size(page),
+ GFP_KERNEL & ~(__GFP_RECLAIM),
KMSAN_POISON_CHECK | KMSAN_POISON_FREE);
kmsan_leave_runtime();
}
diff --git a/mm/kmsan/shadow.c b/mm/kmsan/shadow.c
index 54f3c3c962f0..55fdea199aaf 100644
--- a/mm/kmsan/shadow.c
+++ b/mm/kmsan/shadow.c
@@ -208,7 +208,7 @@ void kmsan_free_page(struct page *page, unsigned int order)
return;
kmsan_enter_runtime();
kmsan_internal_poison_memory(page_address(page), page_size(page),
- GFP_KERNEL,
+ GFP_KERNEL & ~(__GFP_RECLAIM),
KMSAN_POISON_CHECK | KMSAN_POISON_FREE);
kmsan_leave_runtime();
}
--
2.43.7

Andrew Morton

unread,
Oct 8, 2025, 11:31:14 PMOct 8
to Aleksei Nikiforov, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Ilya Leoshkevich
On Tue, 30 Sep 2025 13:56:01 +0200 Aleksei Nikiforov <aleksei....@linux.ibm.com> wrote:

> If no stack depot is allocated yet,
> due to masking out __GFP_RECLAIM flags
> kmsan called from kmalloc cannot allocate stack depot.
> kmsan fails to record origin and report issues.
>
> Reusing flags from kmalloc without modifying them should be safe for kmsan.
> For example, such chain of calls is possible:
> test_uninit_kmalloc -> kmalloc -> __kmalloc_cache_noprof ->
> slab_alloc_node -> slab_post_alloc_hook ->
> kmsan_slab_alloc -> kmsan_internal_poison_memory.
>
> Only when it is called in a context without flags present
> should __GFP_RECLAIM flags be masked.
>
> With this change all kmsan tests start working reliably.

I'm not seeing reports of "hey, kmsan is broken", so I assume this
failure only occurs under special circumstances?

Please explain how you're triggering this failure and whether you think
we should backport the fix into -stable kernels and if so, are you able
to identify a suitable Fixes: target?

Thanks.

Aleksei Nikiforov

unread,
Oct 10, 2025, 4:07:12 AMOct 10
to Andrew Morton, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Ilya Leoshkevich
On 10/9/25 05:31, Andrew Morton wrote:
> On Tue, 30 Sep 2025 13:56:01 +0200 Aleksei Nikiforov <aleksei....@linux.ibm.com> wrote:
>
>> If no stack depot is allocated yet,
>> due to masking out __GFP_RECLAIM flags
>> kmsan called from kmalloc cannot allocate stack depot.
>> kmsan fails to record origin and report issues.
>>
>> Reusing flags from kmalloc without modifying them should be safe for kmsan.
>> For example, such chain of calls is possible:
>> test_uninit_kmalloc -> kmalloc -> __kmalloc_cache_noprof ->
>> slab_alloc_node -> slab_post_alloc_hook ->
>> kmsan_slab_alloc -> kmsan_internal_poison_memory.
>>
>> Only when it is called in a context without flags present
>> should __GFP_RECLAIM flags be masked.
>>
>> With this change all kmsan tests start working reliably.
>
> I'm not seeing reports of "hey, kmsan is broken", so I assume this
> failure only occurs under special circumstances?

Hi,

kmsan might report less issues than it detects due to not allocating
stack depots and not reporting issues without stack depots. Lack of
reports may go unnoticed, that's why you don't get reports of kmsan
being broken.

I'm not sure what exactly causes me to hit this issue, but I reproduce
it pretty reliably on one s390x machine and two x86_64 machines. I
didn't try more different machines yet.

Here's how I reproduce it on Fedora 42 x86_64 machine using podman.

I've got following files in same directory:

$ ls
busybox.init busybox.patch debug.config kmsan.config
kmsan.Dockerfile qemu.sh
$ cat busybox.init
#!/bin/sh

mount -t proc none /proc
mount -t sysfs none /sys

cat <<!


Boot took $(cut -d' ' -f1 /proc/uptime) seconds

_ _ __ _
/\/\ (_)_ __ (_) / /(_)_ __ _ ___ __
/ \| | '_ \| | / / | | '_ \| | | \ \/ /
/ /\/\ \ | | | | | / /__| | | | | |_| |> <
\/ \/_|_| |_|_| \____/_|_| |_|\__,_/_/\_\


Welcome to mini_linux


!
exec /bin/sh
$ cat busybox.patch
diff --git a/libbb/appletlib.c b/libbb/appletlib.c
index d9cc48423..a0c502fde 100644
--- a/libbb/appletlib.c
+++ b/libbb/appletlib.c
@@ -718,8 +718,8 @@ static int find_script_by_name(const char *name)
return -1;
}

-int scripted_main(int argc UNUSED_PARAM, char **argv)
MAIN_EXTERNALLY_VISIBLE;
-int scripted_main(int argc UNUSED_PARAM, char **argv)
+int scripted_main(int argc UNUSED_PARAM, char **argv)
MAIN_EXTERNALLY_VISIBLE //;
+//int scripted_main(int argc UNUSED_PARAM, char **argv)
{
int script = find_script_by_name(applet_name);
if (script >= 0)
diff --git a/scripts/kconfig/lxdialog/check-lxdialog.sh
b/scripts/kconfig/lxdialog/check-lxdialog.sh
index 5075ebf2d..c644d1d48 100755
--- a/scripts/kconfig/lxdialog/check-lxdialog.sh
+++ b/scripts/kconfig/lxdialog/check-lxdialog.sh
@@ -45,9 +45,9 @@ trap "rm -f $tmp" 0 1 2 3 15

# Check if we can link to ncurses
check() {
- $cc -x c - -o $tmp 2>/dev/null <<'EOF'
+ $cc -x c - -o $tmp <<'EOF'
#include CURSES_LOC
-main() {}
+int main() { return 0; }
EOF
if [ $? != 0 ]; then
echo " *** Unable to find the ncurses libraries or the"
1>&2
$ cat debug.config
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_KERNEL=y
CONFIG_GDB_SCRIPTS=y
$ cat kmsan.config
CONFIG_KUNIT=y
CONFIG_KMSAN=y
CONFIG_KMSAN_CHECK_PARAM_RETVAL=y
CONFIG_KMSAN_KUNIT_TEST=y
CONFIG_FRAME_WARN=4096
# CONFIG_PROVE_LOCKING is not set
# CONFIG_LOCK_STAT is not set
# CONFIG_DEBUG_WW_MUTEX_SLOWPATH is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
# CONFIG_PREEMPT_TRACER is not set
# CONFIG_DEBUG_PREEMPT is not set
# CONFIG_TRACE_PREEMPT_TOGGLE is not set
# CONFIG_DEBUG_VIRTUAL is not set
$ cat kmsan.Dockerfile
FROM fedora:42

RUN dnf update -y ; dnf install -y git bash-completion util-linux nano
patch \
qemu qemu-kvm openssl openssl-devel ncurses-devel gcc gcc-c++
clang clang++ \
flex bison bc awk cpio gzip sudo elfutils-libelf-devel pod2html
glibc-static

RUN useradd -m user ; echo "user ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers

USER user
WORKDIR /home/user

RUN mkdir src ; cd src ; git clone --depth=1 --branch v6.17
https://github.com/torvalds/linux ; \
git clone --depth=1 https://github.com/mirror/busybox

COPY --chown=user:user busybox.patch /home/user/busybox.patch
COPY --chown=user:user qemu.sh /home/user/qemu.sh
COPY --chown=user:user kmsan.config /home/user/kmsan.config
COPY --chown=user:user debug.config /home/user/debug.config
COPY --chown=user:user busybox.init /home/user/busybox.init

RUN chmod +x qemu.sh ; cd src/linux ; make CC=clang defconfig ; \
cat ~/kmsan.config >> .config ; cat ~/debug.config >> .config ; \
make CC=clang -j8

RUN cd src/busybox ; patch -p1 < ~/busybox.patch ; make defconfig ; \
sed -i -e 's:CONFIG_TC=y:# CONFIG_TC is not set:' -e
's:CONFIG_FEATURE_TC_INGRESS=y:# CONFIG_FEATURE_TC_INGRESS is not set:'
.config ; \
sed -i -e 's:# CONFIG_STATIC is not set:CONFIG_STATIC=y:'
.config ; \
make -j8 ; make install

RUN mkdir src/initramfs ; cd src/initramfs ; mkdir -p bin sbin etc proc
sys usr/bin usr/sbin ; \
cp -a ~/src/busybox/_install/* . ; cp ~/busybox.init ./init ;
chmod +x init ; \
find . -print0 | cpio --null -ov --format=newc | gzip -9 >
../initramfs.cpio.gz
$ cat qemu.sh
#!/bin/bash
exec qemu-system-x86_64 -m 2G -smp 4 -kernel
~/src/linux/arch/x86/boot/bzImage -initrd ~/src/initramfs.cpio.gz
-nographic -append "console=ttyS0" -enable-kvm "$@"
$

I build podman image named "kmsan" using non-root user:
$ podman build -f kmsan.Dockerfile -t kmsan .

And run it using same non-root user and privileged podman container:
$ podman run -it --rm --privileged kmsan

And inside podman container I execute qemu.sh script:
$ ./qemu.sh

Here's kmsan unit-test output I get:

[ 4.995020] KTAP version 1

[ 4.996924] # Subtest: kmsan

[ 4.998461] # module: kmsan_test

[ 4.998580] 1..25

[ 5.003992] # test_uninit_kmalloc: uninitialized kmalloc test
(UMR report)

[ 5.006948] *ptr is true

[ 5.008519] # test_uninit_kmalloc: EXPECTATION FAILED at
mm/kmsan/kmsan_test.c:173
[ 5.008519] Expected report_matches(&expect) to be true, but is false
[ 5.016673] not ok 1 test_uninit_kmalloc

[ 5.019871] # test_init_kmalloc: initialized kmalloc test (no
reports)
[ 5.022995] *ptr is false

[ 5.026736] ok 2 test_init_kmalloc

[ 5.029653] # test_init_kzalloc: initialized kzalloc test (no
reports)
[ 5.033060] *ptr is false

[ 5.037952] ok 3 test_init_kzalloc

[ 5.040898] # test_uninit_stack_var: uninitialized stack variable
(UMR report)

[ 5.044349] cond is false


[ 5.045465] # test_uninit_stack_var: EXPECTATION FAILED at
mm/kmsan/kmsan_test.c:211
[ 5.045465] Expected report_matches(&expect) to be true, but is false
[ 5.052473] not ok 4 test_uninit_stack_var

[ 5.054740] # test_init_stack_var: initialized stack variable (no
reports)

[ 5.061026] cond is true

[ 5.064956] ok 5 test_init_stack_var

[ 5.067630] # test_params: uninit passed through a function
parameter (UMR report)
[ 5.073602] arg1 is false

[ 5.074766] arg2 is false

[ 5.075939] arg is false


[ 5.077078] arg1 is false

[ 5.078317] arg2 is true
[ 5.080043] # test_params: EXPECTATION FAILED at
mm/kmsan/kmsan_test.c:262
[ 5.080043] Expected report_matches(&expect) to be true, but is
false
[ 5.086057] not ok 6 test_params

[ 5.088155] # test_uninit_multiple_params: uninitialized local
passed to fn (UMR report)
[ 5.093995] signed_sum3(a, b, c) is true
[ 5.096099] # test_uninit_multiple_params: EXPECTATION FAILED at
mm/kmsan/kmsan_test.c:282
[ 5.096099] Expected report_matches(&expect) to be true, but is false
[ 5.107367] not ok 7 test_uninit_multiple_params
[ 5.110155] # test_uninit_kmsan_check_memory:
kmsan_check_memory() called on uninit local (UMR report)
[ 5.116984] # test_uninit_kmsan_check_memory: EXPECTATION FAILED
at mm/kmsan/kmsan_test.c:309
[ 5.116984] Expected report_matches(&expect) to be true, but is false
[ 5.126356] not ok 8 test_uninit_kmsan_check_memory
[ 5.128587] # test_init_kmsan_vmap_vunmap: pages initialized via
vmap (no reports)
[ 5.137961] ok 9 test_init_kmsan_vmap_vunmap
[ 5.140564] # test_init_vmalloc: vmalloc buffer can be
initialized (no reports)
[ 5.145685] buf[0] is true
[ 5.151173] ok 10 test_init_vmalloc
[ 5.154140] # test_uaf: use-after-free in kmalloc-ed buffer (UMR
report)
[ 5.157541] value is true
[ 5.158726] # test_uaf: EXPECTATION FAILED at
mm/kmsan/kmsan_test.c:378
[ 5.158726] Expected report_matches(&expect) to be true, but is false
[ 5.165473] not ok 11 test_uaf
[ 5.167650] # test_percpu_propagate: uninit local stored to
per_cpu memory (UMR report)
[ 5.173084] check is false
[ 5.174605] # test_percpu_propagate: EXPECTATION FAILED at
mm/kmsan/kmsan_test.c:396
[ 5.174605] Expected report_matches(&expect) to be true, but is false
[ 5.183281] not ok 12 test_percpu_propagate
[ 5.185632] # test_printk: uninit local passed to pr_info() (UMR
report)
[ 5.191356] ffff9d1b00367cec contains 0
[ 5.193590] # test_printk: EXPECTATION FAILED at
mm/kmsan/kmsan_test.c:418
[ 5.193590] Expected report_matches(&expect) to be true, but is false
[ 5.200144] not ok 13 test_printk
[ 5.202139] # test_init_memcpy: memcpy()ing aligned initialized
src to aligned dst (no reports)
[ 5.208531] ok 14 test_init_memcpy
[ 5.210437] # test_memcpy_aligned_to_aligned: memcpy()ing aligned
uninit src to aligned dst (UMR report)
[ 5.216716] # test_memcpy_aligned_to_aligned: EXPECTATION FAILED
at mm/kmsan/kmsan_test.c:459
[ 5.216716] Expected report_matches(&expect) to be true, but is false
[ 5.225432] not ok 15 test_memcpy_aligned_to_aligned
[ 5.227044] # test_memcpy_aligned_to_unaligned: memcpy()ing
aligned uninit src to unaligned dst (UMR report)
[ 5.231774] # test_memcpy_aligned_to_unaligned: EXPECTATION
FAILED at mm/kmsan/kmsan_test.c:483
[ 5.231774] Expected report_matches(&expect) to be true, but is false
[ 5.236286] # test_memcpy_aligned_to_unaligned: EXPECTATION
FAILED at mm/kmsan/kmsan_test.c:486
[ 5.236286] Expected report_matches(&expect) to be true, but is false
[ 5.242427] not ok 16 test_memcpy_aligned_to_unaligned
[ 5.244753] # test_memcpy_initialized_gap: unaligned 4-byte
initialized value gets a nonzero origin after memcpy() - (2 UMR reports)
[ 5.248626] # test_memcpy_initialized_gap: EXPECTATION FAILED at
mm/kmsan/kmsan_test.c:532
[ 5.248626] Expected report_matches(&expect) to be true, but is false
[ 5.252339] # test_memcpy_initialized_gap: EXPECTATION FAILED at
mm/kmsan/kmsan_test.c:538
[ 5.252339] Expected report_matches(&expect) to be true, but is false
[ 5.258704] not ok 17 test_memcpy_initialized_gap
[ 5.261660] # test_memset16: memset16() should initialize memory
[ 5.268995] ok 18 test_memset16
[ 5.270905] # test_memset32: memset32() should initialize memory
[ 5.275684] ok 19 test_memset32
[ 5.278033] # test_memset64: memset64() should initialize memory
[ 5.283358] ok 20 test_memset64
[ 5.285848] # test_memset_on_guarded_buffer: memset() on ends of
guarded buffer should not crash
[ 5.292876] ok 21 test_memset_on_guarded_buffer
[ 5.295048] # test_long_origin_chain: origin chain exceeding
KMSAN_MAX_ORIGIN_DEPTH (UMR report)
[ 5.299320] # test_long_origin_chain: EXPECTATION FAILED at
mm/kmsan/kmsan_test.c:599
[ 5.299320] Expected report_matches(&expect) to be true, but is false
[ 5.306978] not ok 22 test_long_origin_chain
[ 5.310383] # test_stackdepot_roundtrip: testing stackdepot
roundtrip (no reports)
[ 5.317344] kunit_try_run_case+0x19b/0xa00
[ 5.319610] kunit_generic_run_threadfn_adapter+0x62/0xe0
[ 5.322374] kthread+0x89f/0xb20
[ 5.324121] ret_from_fork+0x182/0x2a0
[ 5.326284] ret_from_fork_asm+0x1a/0x30
[ 5.330550] ok 23 test_stackdepot_roundtrip
[ 5.333135] # test_unpoison_memory: unpoisoning via the
instrumentation vs. kmsan_unpoison_memory() (2 UMR reports)
[ 5.340187] =====================================================
[ 5.342896] BUG: KMSAN: uninit-value in test_unpoison_memory+0x146/0x3f0
[ 5.345803] test_unpoison_memory+0x146/0x3f0
[ 5.347698] kunit_try_run_case+0x19b/0xa00
[ 5.348883] kunit_generic_run_threadfn_adapter+0x62/0xe0
[ 5.350393] kthread+0x89f/0xb20
[ 5.351322] ret_from_fork+0x182/0x2a0
[ 5.352454] ret_from_fork_asm+0x1a/0x30
[ 5.353527]
[ 5.353917] Local variable a created at:
[ 5.354968] test_unpoison_memory+0x40/0x3f0
[ 5.356253]
[ 5.356716] Bytes 0-2 of 3 are uninitialized
[ 5.357896] Memory access of size 3 starts at ffff9d1b003f7ced
[ 5.359104]
[ 5.359473] CPU: 3 UID: 0 PID: 121 Comm: kunit_try_catch Tainted: G
N 6.17.0 #1 PREEMPT(voluntary)
[ 5.361551] Tainted: [N]=TEST
[ 5.362147] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.17.0-5.fc42 04/01/2014
[ 5.363915] =====================================================
[ 5.365146] Disabling lock debugging due to kernel taint
[ 5.366264] =====================================================
[ 5.367559] BUG: KMSAN: uninit-value in test_unpoison_memory+0x23d/0x3f0
[ 5.368626] test_unpoison_memory+0x23d/0x3f0
[ 5.369292] kunit_try_run_case+0x19b/0xa00
[ 5.369938] kunit_generic_run_threadfn_adapter+0x62/0xe0
[ 5.370768] kthread+0x89f/0xb20
[ 5.371299] ret_from_fork+0x182/0x2a0
[ 5.371862] ret_from_fork_asm+0x1a/0x30
[ 5.372478]
[ 5.372695] Local variable b created at:
[ 5.373302] test_unpoison_memory+0x56/0x3f0
[ 5.373896]
[ 5.374097] Bytes 0-2 of 3 are uninitialized
[ 5.374714] Memory access of size 3 starts at ffff9d1b003f7ce9
[ 5.375536]
[ 5.375771] CPU: 3 UID: 0 PID: 121 Comm: kunit_try_catch Tainted: G
B N 6.17.0 #1 PREEMPT(voluntary)
[ 5.377209] Tainted: [B]=BAD_PAGE, [N]=TEST
[ 5.377771] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.17.0-5.fc42 04/01/2014
[ 5.378816] =====================================================
[ 5.382141] ok 24 test_unpoison_memory
[ 5.384615] # test_copy_from_kernel_nofault: testing
copy_from_kernel_nofault with uninitialized memory
[ 5.389317] =====================================================
[ 5.391106] BUG: KMSAN: uninit-value in
copy_from_kernel_nofault+0x216/0x4b0
[ 5.393125] copy_from_kernel_nofault+0x216/0x4b0
[ 5.394564] test_copy_from_kernel_nofault+0x146/0x2c0
[ 5.396107] kunit_try_run_case+0x19b/0xa00
[ 5.397331] kunit_generic_run_threadfn_adapter+0x62/0xe0
[ 5.398582] kthread+0x89f/0xb20
[ 5.399282] ret_from_fork+0x182/0x2a0
[ 5.400070] ret_from_fork_asm+0x1a/0x30
[ 5.400912]
[ 5.401260] Local variable src created at:
[ 5.402081] test_copy_from_kernel_nofault+0x56/0x2c0
[ 5.403139]
[ 5.403525] Bytes 0-3 of 4 are uninitialized
[ 5.404396] Memory access of size 4 starts at ffff9d1b00407ce8
[ 5.405579]
[ 5.405914] CPU: 0 UID: 0 PID: 123 Comm: kunit_try_catch Tainted: G
B N 6.17.0 #1 PREEMPT(voluntary)
[ 5.407990] Tainted: [B]=BAD_PAGE, [N]=TEST
[ 5.408620] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.17.0-5.fc42 04/01/2014
[ 5.409904] =====================================================
[ 5.410823] ret is false
[ 5.411962] ok 25 test_copy_from_kernel_nofault
[ 5.426479] # kmsan: pass:13 fail:12 skip:0 total:25
[ 5.427361] # Totals: pass:13 fail:12 skip:0 total:25
[ 5.428300] not ok 1 kmsan

I've debugged it, and as I previously wrote, the cause is stack depots
not being allocated when kmsan kmalloc hook is called. Previously sent
patch fixes these unit-test failures for me.

>
> Please explain how you're triggering this failure and whether you think
> we should backport the fix into -stable kernels and if so, are you able
> to identify a suitable Fixes: target?
>
At the moment I don't think any backporting is needed.

> Thanks.

Kind regards,
Aleksei Nikiforov

Eric Biggers

unread,
Oct 21, 2025, 11:03:49 PMOct 21
to Aleksei Nikiforov, Andrew Morton, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Ilya Leoshkevich, Alexei Starovoitov
On Fri, Oct 10, 2025 at 10:07:04AM +0200, Aleksei Nikiforov wrote:
> On 10/9/25 05:31, Andrew Morton wrote:
> > On Tue, 30 Sep 2025 13:56:01 +0200 Aleksei Nikiforov <aleksei....@linux.ibm.com> wrote:
> >
> > > If no stack depot is allocated yet,
> > > due to masking out __GFP_RECLAIM flags
> > > kmsan called from kmalloc cannot allocate stack depot.
> > > kmsan fails to record origin and report issues.
> > >
> > > Reusing flags from kmalloc without modifying them should be safe for kmsan.
> > > For example, such chain of calls is possible:
> > > test_uninit_kmalloc -> kmalloc -> __kmalloc_cache_noprof ->
> > > slab_alloc_node -> slab_post_alloc_hook ->
> > > kmsan_slab_alloc -> kmsan_internal_poison_memory.
> > >
> > > Only when it is called in a context without flags present
> > > should __GFP_RECLAIM flags be masked.
> > >
> > > With this change all kmsan tests start working reliably.
> >
> > I'm not seeing reports of "hey, kmsan is broken", so I assume this
> > failure only occurs under special circumstances?
>
> Hi,
>
> kmsan might report less issues than it detects due to not allocating stack
> depots and not reporting issues without stack depots. Lack of reports may go
> unnoticed, that's why you don't get reports of kmsan being broken.

Yes, KMSAN seems to be at least partially broken currently. Besides the
fact that the kmsan KUnit test is currently failing (which I reported at
https://lore.kernel.org/r/20250911175145.GA1376@sol), I've confirmed
that the poly1305 KUnit test causes a KMSAN warning with Aleksei's patch
applied but does not cause a warning without it. The warning did get
reached via syzbot somehow
(https://lore.kernel.org/r/751b3d80293a6f599bb07770afcef24f...@kylinos.cn/),
so KMSAN must still work in some cases. But it didn't work for me.

(That particular warning in the architecture-optimized Poly1305 code is
actually a false positive due to memory being initialized by assembly
code. But that's besides the point. The point is that I should have
seen the warning earlier, but I didn't. And Aleksei's patch seems to
fix KMSAN to work reliably. It also fixes the kmsan KUnit test.)

I don't really know this code, but I can at least give:

Tested-by: Eric Biggers <ebig...@kernel.org>

If you want to add a Fixes commit I think it is either 97769a53f117e2 or
8c57b687e8331. Earlier I had confirmed that reverting those commits
fixed the kmsan test too
(https://lore.kernel.org/r/20250911192953.GG1376@sol).

- Eric

Alexander Potapenko

unread,
Oct 22, 2025, 5:44:01 AMOct 22
to Aleksei Nikiforov, Marco Elver, Dmitry Vyukov, Andrew Morton, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Ilya Leoshkevich
On Tue, Sep 30, 2025 at 1:56 PM Aleksei Nikiforov
<aleksei....@linux.ibm.com> wrote:
>
> If no stack depot is allocated yet,
> due to masking out __GFP_RECLAIM flags
> kmsan called from kmalloc cannot allocate stack depot.
> kmsan fails to record origin and report issues.
>
> Reusing flags from kmalloc without modifying them should be safe for kmsan.
> For example, such chain of calls is possible:
> test_uninit_kmalloc -> kmalloc -> __kmalloc_cache_noprof ->
> slab_alloc_node -> slab_post_alloc_hook ->
> kmsan_slab_alloc -> kmsan_internal_poison_memory.
>
> Only when it is called in a context without flags present
> should __GFP_RECLAIM flags be masked.
>
> With this change all kmsan tests start working reliably.

I think this makes sense. The whole __GFP_RECLAIM filtering was mostly
for poisoning local variables, so we don't need it for allocation
hooks.

It is still possible to pass __GFP_RECLAIM to kmsan_poison_memory(), but:
- it is actually not used in the entire codebase;
- the documentation clearly states that kmsan_poison_memory() will be
allocating memory, so one should be mindful of passing wrong GFP
flags.

> Signed-off-by: Aleksei Nikiforov <aleksei....@linux.ibm.com>

Reviewed-by: Alexander Potapenko <gli...@google.com>

Andrew Morton

unread,
Oct 22, 2025, 5:36:09 PM (14 days ago) Oct 22
to Eric Biggers, Aleksei Nikiforov, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Ilya Leoshkevich, Alexei Starovoitov
OK, thanks, I pasted the above para into the changelog to help people
understand the impact of this.

> (That particular warning in the architecture-optimized Poly1305 code is
> actually a false positive due to memory being initialized by assembly
> code. But that's besides the point. The point is that I should have
> seen the warning earlier, but I didn't. And Aleksei's patch seems to
> fix KMSAN to work reliably. It also fixes the kmsan KUnit test.)
>
> I don't really know this code, but I can at least give:
>
> Tested-by: Eric Biggers <ebig...@kernel.org>
>
> If you want to add a Fixes commit I think it is either 97769a53f117e2 or
> 8c57b687e8331. Earlier I had confirmed that reverting those commits
> fixed the kmsan test too
> (https://lore.kernel.org/r/20250911192953.GG1376@sol).

Both commits affect the same kernel version so either should be good
for a Fixes target.

I'll add a cc:stable to this and shall stage it for 6.18-rcX.

The current state is below - if people want to suggest alterations,
please go for it.



From: Aleksei Nikiforov <aleksei....@linux.ibm.com>
Subject: mm/kmsan: fix kmsan kmalloc hook when no stack depots are allocated yet
Date: Tue, 30 Sep 2025 13:56:01 +0200

If no stack depot is allocated yet, due to masking out __GFP_RECLAIM
flags kmsan called from kmalloc cannot allocate stack depot. kmsan
fails to record origin and report issues. This may result in KMSAN
failing to report issues.

Reusing flags from kmalloc without modifying them should be safe for kmsan.
For example, such chain of calls is possible:
test_uninit_kmalloc -> kmalloc -> __kmalloc_cache_noprof ->
slab_alloc_node -> slab_post_alloc_hook ->
kmsan_slab_alloc -> kmsan_internal_poison_memory.

Only when it is called in a context without flags present should
__GFP_RECLAIM flags be masked.

With this change all kmsan tests start working reliably.

Eric reported:

: Yes, KMSAN seems to be at least partially broken currently. Besides the
:_fact that the kmsan KUnit test is currently failing (which I reported at
:_https://lore.kernel.org/r/20250911175145.GA1376@sol), I've confirmed that
:_the poly1305 KUnit test causes a KMSAN warning with Aleksei's patch
:_applied but does not cause a warning without it. The warning did get
:_reached via syzbot somehow
:_(https://lore.kernel.org/r/751b3d80293a6f599bb07770afcef24f...@kylinos.cn/),
:_so KMSAN must still work in some cases. But it didn't work for me.

Link: https://lkml.kernel.org/r/20250930115600.70977...@linux.ibm.com
Link: https://lkml.kernel.org/r/20251022030213.GA35717@sol
Fixes: 97769a53f117 ("mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation")
Signed-off-by: Aleksei Nikiforov <aleksei....@linux.ibm.com>
Reviewed-by: Alexander Potapenko <gli...@google.com>
Tested-by: Eric Biggers <ebig...@kernel.org>
Cc: Dmitriy Vyukov <dvy...@google.com>
Cc: Ilya Leoshkevich <i...@linux.ibm.com>
Cc: Marco Elver <el...@google.com>
Cc: <sta...@vger.kernel.org>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
---

mm/kmsan/core.c | 3 ---
mm/kmsan/hooks.c | 6 ++++--
mm/kmsan/shadow.c | 2 +-
3 files changed, 5 insertions(+), 6 deletions(-)

--- a/mm/kmsan/core.c~mm-kmsan-fix-kmsan-kmalloc-hook-when-no-stack-depots-are-allocated-yet
+++ a/mm/kmsan/core.c
@@ -72,9 +72,6 @@ depot_stack_handle_t kmsan_save_stack_wi

nr_entries = stack_trace_save(entries, KMSAN_STACK_DEPTH, 0);

- /* Don't sleep. */
- flags &= ~(__GFP_DIRECT_RECLAIM | __GFP_KSWAPD_RECLAIM);
-
handle = stack_depot_save(entries, nr_entries, flags);
return stack_depot_set_extra_bits(handle, extra);
}
--- a/mm/kmsan/hooks.c~mm-kmsan-fix-kmsan-kmalloc-hook-when-no-stack-depots-are-allocated-yet
+++ a/mm/kmsan/hooks.c
@@ -84,7 +84,8 @@ void kmsan_slab_free(struct kmem_cache *
if (s->ctor)
return;
kmsan_enter_runtime();
- kmsan_internal_poison_memory(object, s->object_size, GFP_KERNEL,
+ kmsan_internal_poison_memory(object, s->object_size,
+ GFP_KERNEL & ~(__GFP_RECLAIM),
KMSAN_POISON_CHECK | KMSAN_POISON_FREE);
kmsan_leave_runtime();
}
@@ -114,7 +115,8 @@ void kmsan_kfree_large(const void *ptr)
kmsan_enter_runtime();
page = virt_to_head_page((void *)ptr);
KMSAN_WARN_ON(ptr != page_address(page));
- kmsan_internal_poison_memory((void *)ptr, page_size(page), GFP_KERNEL,
+ kmsan_internal_poison_memory((void *)ptr, page_size(page),
+ GFP_KERNEL & ~(__GFP_RECLAIM),
KMSAN_POISON_CHECK | KMSAN_POISON_FREE);
kmsan_leave_runtime();
}
--- a/mm/kmsan/shadow.c~mm-kmsan-fix-kmsan-kmalloc-hook-when-no-stack-depots-are-allocated-yet
+++ a/mm/kmsan/shadow.c
@@ -208,7 +208,7 @@ void kmsan_free_page(struct page *page,
return;
kmsan_enter_runtime();
kmsan_internal_poison_memory(page_address(page), page_size(page),
- GFP_KERNEL,
+ GFP_KERNEL & ~(__GFP_RECLAIM),
KMSAN_POISON_CHECK | KMSAN_POISON_FREE);
kmsan_leave_runtime();
}
_

Alexei Starovoitov

unread,
Oct 22, 2025, 9:39:47 PM (14 days ago) Oct 22
to Andrew Morton, Vlastimil Babka, Harry Yoo, Michal Hocko, Shakeel Butt, Eric Biggers, Aleksei Nikiforov, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasan-dev, linux-mm, LKML, Ilya Leoshkevich, Alexei Starovoitov
Thanks for cc-ing and for extra context.

>
>
> From: Aleksei Nikiforov <aleksei....@linux.ibm.com>
> Subject: mm/kmsan: fix kmsan kmalloc hook when no stack depots are allocated yet
> Date: Tue, 30 Sep 2025 13:56:01 +0200
>
> If no stack depot is allocated yet, due to masking out __GFP_RECLAIM
> flags kmsan called from kmalloc cannot allocate stack depot. kmsan
> fails to record origin and report issues. This may result in KMSAN
> failing to report issues.
>
> Reusing flags from kmalloc without modifying them should be safe for kmsan.
> For example, such chain of calls is possible:
> test_uninit_kmalloc -> kmalloc -> __kmalloc_cache_noprof ->
> slab_alloc_node -> slab_post_alloc_hook ->
> kmsan_slab_alloc -> kmsan_internal_poison_memory.
>
> Only when it is called in a context without flags present should
> __GFP_RECLAIM flags be masked.

I see. So this is a combination of gfpflags_allow_spinning()
and old kmsan code.
We hit this issue a few times already.

I feel the further we go the more a new __GFP_xxx flag could be justified,
but Michal is strongly against it.
This particular issue actually might tilt it in favor of Michal's position,
since fixing kmsan is the right thing to do.

The fix itself makes sense to me. No better ideas so far.

What's puzzling is that it took 9 month to discover it ?!
and allegedly Eric is seeing it by running kmsan selftest,
but Alexander couldn't repro it initially?
Looks like there is a gap in kmsan test coverage.
People that care about kmsan should really step up.

Alexander Potapenko

unread,
Oct 31, 2025, 7:57:41 AM (5 days ago) Oct 31
to Alexei Starovoitov, Andrew Morton, Vlastimil Babka, Harry Yoo, Michal Hocko, Shakeel Butt, Eric Biggers, Aleksei Nikiforov, Marco Elver, Dmitry Vyukov, kasan-dev, linux-mm, LKML, Ilya Leoshkevich, Alexei Starovoitov
> What's puzzling is that it took 9 month to discover it ?!
> and allegedly Eric is seeing it by running kmsan selftest,
> but Alexander couldn't repro it initially?

If I understand correctly, Eric was linking his tests into the kernel
(CONFIG_KMSAN_KUNIT_TEST=y was implicitly set because
CONFIG_MODULES=n), whereas I ran them as a module.
After the kernel booted up, the stack depot was already initialized,
so the tests behaved just fine.
KMSAN also continued to work normally on syzbot and report bugs (see
https://syzkaller.appspot.com/upstream/graph/found-bugs), so it wasn't
really obvious that something was broken.

> Looks like there is a gap in kmsan test coverage.
> People that care about kmsan should really step up.

You are right, we should add KMSAN KUnit tests to some CI (wonder if
there are KernelCI instances allowing that?)
I'll look into that.
Reply all
Reply to author
Forward
0 new messages