[PATCH 0/3] riscv: kfence: Handle the spurious fault after kfence_unprotect()

1 view
Skip to first unread message

Vivian Wang

unread,
Mar 1, 2026, 9:21:54 PMMar 1
to Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti, Alexander Potapenko, Marco Elver, Dmitry Vyukov, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Palmer Dabbelt, Vivian Wang, sta...@vger.kernel.org, Yanko Kaneti
kfence_unprotect() on RISC-V doesn't flush TLBs, because we can't send
IPIs in some contexts where kfence objects are allocated. This leads to
spurious faults and kfence false positives.

Avoid these spurious faults using the same "new_vmalloc" mechanism,
which I have renamed new_valid_map_cpus to avoid confusion, since the
kfence pool comes from the linear mapping, not vmalloc.

Commit b3431a8bb336 ("riscv: Fix IPIs usage in kfence_protect_page()")
only seemed to consider false negatives, which are indeed tolerable.
False positives on the other hand are not okay since they waste
developer time (or just my time somehow?) and spam kmsg making
diagnosing other problems difficult.

Patch 3 is the implementation to poke (what was called) new_vmalloc upon
kfence_unprotect(). Patch 1 and 2 are just refactoring. In particular
Patch 1 is just a substitution job, to make reviewing easier.

How this was found
------------------

This came up after a user reported some nonsensical kfence
use-after-free reports relating to k1_emac on SpacemiT K1, like this:

[ 64.160199] ==================================================================
[ 64.164773] BUG: KFENCE: use-after-free read in sk_skb_reason_drop+0x22/0x1e8
[ 64.164773]
[ 64.173365] Use-after-free read at 0xffffffd77fecc0cc (in kfence-#101):
[ 64.179962] sk_skb_reason_drop+0x22/0x1e8
[ 64.179972] dev_kfree_skb_any_reason+0x32/0x3c

[...]

[ 64.181440] kfence-#101: 0xffffffd77fecc000-0xffffffd77fecc0cf, size=208, cache=skbuff_head_cache
[ 64.181440]
[ 64.181450] allocated by task 142 on cpu 1 at 63.665866s (0.515583s ago):
[ 64.181476] __alloc_skb+0x66/0x244
[ 64.181484] alloc_skb_with_frags+0x3a/0x1ac

[...]

[ 64.182917] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 7.0.0-rc1-dirty #34 PREEMPTLAZY
[ 64.182926] Hardware name: Banana Pi BPI-F3 (DT)
[ 64.183111] ==================================================================

In particular, these supposed use-after-free accesses:

- Were never reported by KASAN despite being rather easy to reproduce
- Never contain a "freed by task" section
- Never happen on the same CPU as the "allocated by task" info
- And, most importantly, were not found to have been caused by the
object being freed by anyone at that point

An interesting corollary of this observation is that the SpacemiT X60
CPU *does* cache invalid PTEs, and for a significant amount of time, or
at least long enough to be observable in practice. Or maybe only in an
wfi, given how most of these reports I've seen had the faulting CPU in
an IRQ?

---
Vivian Wang (3):
riscv: mm: Rename new_vmalloc into new_valid_map_cpus
riscv: mm: Extract helper mark_new_valid_map()
riscv: kfence: Call mark_new_valid_map() for kfence_unprotect()

arch/riscv/include/asm/cacheflush.h | 27 +++++++++++++----------
arch/riscv/include/asm/kfence.h | 7 ++++--
arch/riscv/kernel/entry.S | 44 +++++++++++++++++++------------------
arch/riscv/mm/init.c | 2 +-
4 files changed, 44 insertions(+), 36 deletions(-)
---
base-commit: 6de23f81a5e08be8fbf5e8d7e9febc72a5b5f27f
change-id: 20260228-handle-kfence-protect-spurious-fault-62100afb9734

Best regards,
--
Vivian "dramforever" Wang

Vivian Wang

unread,
Mar 1, 2026, 9:21:59 PMMar 1
to Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti, Alexander Potapenko, Marco Elver, Dmitry Vyukov, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Palmer Dabbelt, sta...@vger.kernel.org, Yanko Kaneti, Vivian Wang
In kfence_protect_page(), which kfence_unprotect() calls, we cannot send
IPIs to other CPUs to ask them to flush TLB. This may lead to those CPUs
spuriously faulting on a recently allocated kfence object despite it
being valid, leading to false positive use-after-free reports.

Fix this by calling mark_new_valid_map() so that the page fault handling
code path notices the spurious fault and flushes TLB then retries the
access.

Update the comment in handle_exception to indicate that
new_valid_map_cpus_check also handles kfence_unprotect() spurious
faults.

Note that kfence_protect() has the same stale TLB entries problem, but
that leads to false negatives, which is fine with kfence.

Cc: <sta...@vger.kernel.org>
Reported-by: Yanko Kaneti <yan...@declera.com>
Fixes: b3431a8bb336 ("riscv: Fix IPIs usage in kfence_protect_page()")
Signed-off-by: Vivian Wang <wangr...@iscas.ac.cn>
---
arch/riscv/include/asm/kfence.h | 7 +++++--
arch/riscv/kernel/entry.S | 6 ++++--
2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/include/asm/kfence.h b/arch/riscv/include/asm/kfence.h
index d08bf7fb3aee..29cb3a6ee113 100644
--- a/arch/riscv/include/asm/kfence.h
+++ b/arch/riscv/include/asm/kfence.h
@@ -6,6 +6,7 @@
#include <linux/kfence.h>
#include <linux/pfn.h>
#include <asm-generic/pgalloc.h>
+#include <asm/cacheflush.h>
#include <asm/pgtable.h>

static inline bool arch_kfence_init_pool(void)
@@ -17,10 +18,12 @@ static inline bool kfence_protect_page(unsigned long addr, bool protect)
{
pte_t *pte = virt_to_kpte(addr);

- if (protect)
+ if (protect) {
set_pte(pte, __pte(pte_val(ptep_get(pte)) & ~_PAGE_PRESENT));
- else
+ } else {
set_pte(pte, __pte(pte_val(ptep_get(pte)) | _PAGE_PRESENT));
+ mark_new_valid_map();
+ }

preempt_disable();
local_flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
index e57a0f550860..9c6acfd09141 100644
--- a/arch/riscv/kernel/entry.S
+++ b/arch/riscv/kernel/entry.S
@@ -136,8 +136,10 @@ SYM_CODE_START(handle_exception)

#ifdef CONFIG_64BIT
/*
- * The RISC-V kernel does not eagerly emit a sfence.vma after each
- * new vmalloc mapping, which may result in exceptions:
+ * The RISC-V kernel does not flush TLBs on all CPUS after each new
+ * vmalloc mapping or kfence_unprotect(), which may result in
+ * exceptions:
+ *
* - if the uarch caches invalid entries, the new mapping would not be
* observed by the page table walker and an invalidation is needed.
* - if the uarch does not cache invalid entries, a reordered access

--
2.52.0

Vivian Wang

unread,
Mar 3, 2026, 12:30:49 AMMar 3
to Paul Walmsley, Palmer Dabbelt, Alexandre Ghiti, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Yunhui Cui, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Palmer Dabbelt, sta...@vger.kernel.org, Yanko Kaneti, Vivian Wang
index 60eb221296a6..ced7a2b160ce 100644
--- a/arch/riscv/kernel/entry.S
+++ b/arch/riscv/kernel/entry.S
@@ -136,8 +136,10 @@ SYM_CODE_START(handle_exception)

#ifdef CONFIG_64BIT
/*
- * The RISC-V kernel does not eagerly emit a sfence.vma after each
- * new vmalloc mapping, which may result in exceptions:
+ * The RISC-V kernel does not flush TLBs on all CPUS after each new
+ * vmalloc mapping or kfence_unprotect(), which may result in
+ * exceptions:
+ *
* - if the uarch caches invalid entries, the new mapping would not be
* observed by the page table walker and an invalidation is needed.
* - if the uarch does not cache invalid entries, a reordered access

--
2.53.0

Reply all
Reply to author
Forward
0 new messages