[PATCH] mm/kfence: add reboot notifier to disable KFENCE on shutdown

4 views
Skip to first unread message

Breno Leitao

unread,
Nov 26, 2025, 12:46:35 PM11/26/25
to Alexander Potapenko, Marco Elver, Dmitry Vyukov, Andrew Morton, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, kerne...@meta.com, Breno Leitao
During system shutdown, KFENCE can cause IPI synchronization issues if
it remains active through the reboot process. To prevent this, register
a reboot notifier that disables KFENCE and cancels any pending timer
work early in the shutdown sequence.

This is only necessary when CONFIG_KFENCE_STATIC_KEYS is enabled, as
this configuration sends IPIs that can interfere with shutdown. Without
static keys, no IPIs are generated and KFENCE can safely remain active.

The notifier uses maximum priority (INT_MAX) to ensure KFENCE shuts
down before other subsystems that might still depend on stable memory
allocation behavior.

This fixes a late kexec CSD lockup[1] when kfence is trying to IPI a CPU
that is busy in a IRQ-disabled context printing characters to the
console.

Link: https://lore.kernel.org/all/sqwajvt7utnt463tzxgwu2yctyn5m6bjwrslsnupfexeml6hkd@v6sqmpbu3vvu/ [1]

Signed-off-by: Breno Leitao <lei...@debian.org>
---
mm/kfence/core.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index 727c20c94ac5..162a026871ab 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -26,6 +26,7 @@
#include <linux/panic_notifier.h>
#include <linux/random.h>
#include <linux/rcupdate.h>
+#include <linux/reboot.h>
#include <linux/sched/clock.h>
#include <linux/seq_file.h>
#include <linux/slab.h>
@@ -820,6 +821,25 @@ static struct notifier_block kfence_check_canary_notifier = {
static struct delayed_work kfence_timer;

#ifdef CONFIG_KFENCE_STATIC_KEYS
+static int kfence_reboot_callback(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ /*
+ * Disable kfence to avoid static keys IPI synchronization during
+ * late shutdown/kexec
+ */
+ WRITE_ONCE(kfence_enabled, false);
+ /* Cancel any pending timer work */
+ cancel_delayed_work_sync(&kfence_timer);
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block kfence_reboot_notifier = {
+ .notifier_call = kfence_reboot_callback,
+ .priority = INT_MAX, /* Run early to stop timers ASAP */
+};
+
/* Wait queue to wake up allocation-gate timer task. */
static DECLARE_WAIT_QUEUE_HEAD(allocation_wait);

@@ -901,6 +921,10 @@ static void kfence_init_enable(void)
if (kfence_check_on_panic)
atomic_notifier_chain_register(&panic_notifier_list, &kfence_check_canary_notifier);

+#ifdef CONFIG_KFENCE_STATIC_KEYS
+ register_reboot_notifier(&kfence_reboot_notifier);
+#endif
+
WRITE_ONCE(kfence_enabled, true);
queue_delayed_work(system_unbound_wq, &kfence_timer, 0);


---
base-commit: ab084f0b8d6d2ee4b1c6a28f39a2a7430bdfa7f0
change-id: 20251126-kfence-42c93f9b3979

Best regards,
--
Breno Leitao <lei...@debian.org>

Marco Elver

unread,
Nov 26, 2025, 12:50:25 PM11/26/25
to Breno Leitao, Alexander Potapenko, Dmitry Vyukov, Andrew Morton, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, kerne...@meta.com
On Wed, 26 Nov 2025 at 18:46, Breno Leitao <lei...@debian.org> wrote:
>
> During system shutdown, KFENCE can cause IPI synchronization issues if
> it remains active through the reboot process. To prevent this, register
> a reboot notifier that disables KFENCE and cancels any pending timer
> work early in the shutdown sequence.
>
> This is only necessary when CONFIG_KFENCE_STATIC_KEYS is enabled, as
> this configuration sends IPIs that can interfere with shutdown. Without
> static keys, no IPIs are generated and KFENCE can safely remain active.
>
> The notifier uses maximum priority (INT_MAX) to ensure KFENCE shuts
> down before other subsystems that might still depend on stable memory
> allocation behavior.
>
> This fixes a late kexec CSD lockup[1] when kfence is trying to IPI a CPU
> that is busy in a IRQ-disabled context printing characters to the
> console.
>
> Link: https://lore.kernel.org/all/sqwajvt7utnt463tzxgwu2yctyn5m6bjwrslsnupfexeml6hkd@v6sqmpbu3vvu/ [1]
>
> Signed-off-by: Breno Leitao <lei...@debian.org>

Looks good as discussed in [1]:

Reviewed-by: Marco Elver <el...@google.com>

Andrew Morton

unread,
Nov 26, 2025, 1:14:56 PM11/26/25
to Breno Leitao, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, kerne...@meta.com
On Wed, 26 Nov 2025 09:46:18 -0800 Breno Leitao <lei...@debian.org> wrote:

> During system shutdown, KFENCE can cause IPI synchronization issues if
> it remains active through the reboot process. To prevent this, register
> a reboot notifier that disables KFENCE and cancels any pending timer
> work early in the shutdown sequence.
>
> This is only necessary when CONFIG_KFENCE_STATIC_KEYS is enabled, as
> this configuration sends IPIs that can interfere with shutdown. Without
> static keys, no IPIs are generated and KFENCE can safely remain active.
>
> The notifier uses maximum priority (INT_MAX) to ensure KFENCE shuts
> down before other subsystems that might still depend on stable memory
> allocation behavior.
>
> This fixes a late kexec CSD lockup[1] when kfence is trying to IPI a CPU
> that is busy in a IRQ-disabled context printing characters to the
> console.
>
> Link: https://lore.kernel.org/all/sqwajvt7utnt463tzxgwu2yctyn5m6bjwrslsnupfexeml6hkd@v6sqmpbu3vvu/ [1]

6.13 kernels and earlier, so I assume we'll want a cc:stable on this.
And I assume there's really no identifiable Fixes: target.

Breno Leitao

unread,
Nov 27, 2025, 6:12:21 AM11/27/25
to Andrew Morton, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, kerne...@meta.com
This infrastructure showed up when kfence was created, so, a possible
Fixes: target would point to commit 0ce20dd84089 ("mm: add Kernel
Electric-Fence infrastructure")

Breno Leitao

unread,
Nov 27, 2025, 9:52:02 AM11/27/25
to Alexander Potapenko, Marco Elver, Dmitry Vyukov, Andrew Morton, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, kerne...@meta.com, sta...@vger.kernel.org, Breno Leitao
During system shutdown, KFENCE can cause IPI synchronization issues if
it remains active through the reboot process. To prevent this, register
a reboot notifier that disables KFENCE and cancels any pending timer
work early in the shutdown sequence.

This is only necessary when CONFIG_KFENCE_STATIC_KEYS is enabled, as
this configuration sends IPIs that can interfere with shutdown. Without
static keys, no IPIs are generated and KFENCE can safely remain active.

The notifier uses maximum priority (INT_MAX) to ensure KFENCE shuts
down before other subsystems that might still depend on stable memory
allocation behavior.

This fixes a late kexec CSD lockup[1] when kfence is trying to IPI a CPU
that is busy in a IRQ-disabled context printing characters to the
console.

Link: https://lore.kernel.org/all/sqwajvt7utnt463tzxgwu2yctyn5m6bjwrslsnupfexeml6hkd@v6sqmpbu3vvu/ [1]

Cc: sta...@vger.kernel.org
Signed-off-by: Breno Leitao <lei...@debian.org>
Reviewed-by: Marco Elver <el...@google.com>
Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure")
---
Changes in v2:
- Adding Fixes: tag and CCing stable (akpm)
- Link to v1: https://patch.msgid.link/20251126-kfence-...@debian.org

Andrew Morton

unread,
Nov 27, 2025, 2:19:43 PM11/27/25
to Breno Leitao, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, kerne...@meta.com
On Thu, 27 Nov 2025 03:12:10 -0800 Breno Leitao <lei...@debian.org> wrote:

> > > This fixes a late kexec CSD lockup[1] when kfence is trying to IPI a CPU
> > > that is busy in a IRQ-disabled context printing characters to the
> > > console.
> > >
> > > Link: https://lore.kernel.org/all/sqwajvt7utnt463tzxgwu2yctyn5m6bjwrslsnupfexeml6hkd@v6sqmpbu3vvu/ [1]
> >
> > 6.13 kernels and earlier, so I assume we'll want a cc:stable on this.
> > And I assume there's really no identifiable Fixes: target.
>
> This infrastructure showed up when kfence was created, so, a possible
> Fixes: target would point to commit 0ce20dd84089 ("mm: add Kernel
> Electric-Fence infrastructure")

Great, thanks, I added that.

Chris Mason

unread,
Jan 13, 2026, 9:03:12 AM (5 days ago) Jan 13
to Breno Leitao, Chris Mason, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Andrew Morton, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, kerne...@meta.com, sta...@vger.kernel.org
On Thu, 27 Nov 2025 06:51:54 -0800 Breno Leitao <lei...@debian.org> wrote:

> During system shutdown, KFENCE can cause IPI synchronization issues if
> it remains active through the reboot process. To prevent this, register
> a reboot notifier that disables KFENCE and cancels any pending timer
> work early in the shutdown sequence.
>

Hi Breno,

I'm running v6.19 MM commits through the patch review automation and this
one was flagged. Is this deadlock possible?

Automated review output:

> @@ -820,6 +821,25 @@ static struct notifier_block kfence_check_canary_notifier = {
> static struct delayed_work kfence_timer;
>
> #ifdef CONFIG_KFENCE_STATIC_KEYS
> +static int kfence_reboot_callback(struct notifier_block *nb,
> + unsigned long action, void *data)
> +{
> + /*
> + * Disable kfence to avoid static keys IPI synchronization during
> + * late shutdown/kexec
> + */
> + WRITE_ONCE(kfence_enabled, false);
> + /* Cancel any pending timer work */
> + cancel_delayed_work_sync(&kfence_timer);
^^^^^^^^^^^^^^^

Can cancel_delayed_work_sync() deadlock here?

If toggle_allocation_gate() is currently executing and blocked inside
wait_event_idle() (waiting for kfence_allocation_gate > 0), then
cancel_delayed_work_sync() will block forever waiting for the work to
complete.

The wait_event_idle() condition depends only on allocations occurring
to increment kfence_allocation_gate - setting kfence_enabled to false
does not wake up this wait. During shutdown when allocations may have
stopped, the work item could remain blocked indefinitely, causing the
reboot notifier to hang.

The call chain is:
kfence_reboot_callback()
-> cancel_delayed_work_sync(&kfence_timer)
-> __flush_work()
-> wait_for_completion(&barr.done)
// waits forever because...

toggle_allocation_gate() [currently running]
-> wait_event_idle(allocation_wait, kfence_allocation_gate > 0)
// never wakes up if no allocations happen

Would it be safer to use cancel_delayed_work() (non-sync) here, or add
a mechanism to wake up the wait_event_idle() when kfence_enabled becomes
false?


Breno Leitao

unread,
Jan 14, 2026, 10:21:37 AM (4 days ago) Jan 14
to Chris Mason, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Andrew Morton, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, kerne...@meta.com, sta...@vger.kernel.org
Hello Chris,
This is spot on, I think this is a real case if the following happen:


1) toggle_allocation_gate() passed beyond kfence_enabled and is waiting
for kfence_allocation_gate to be > 0.
a) kfence_allocation_gate is increased on allocation time

2) There is no more kernel allocation, thus, kfence_allocation_gate is
not incremented

3) cancel_delayed_work_sync() is for kfence_allocation_gate > 0, but
given there is no more allocation, this will never happen.

> Would it be safer to use cancel_delayed_work() (non-sync) here.

In this case toggle_allocation_gate() task will continue to be idle,
waiting for to be kfence_allocation_gate > 0 forever, but it will not
block the notifiers, unless we wake them up.

Is this a problem?

Maybe a more robust solution would include:

1) s/cancel_delayed_work_sync()/cancel_delayed_work().
- This would unblock the notifier

or/and some of the followings

2) Return from wait_event_idle() if kfence_enabled got disabled.
- Remove the waiters once kfence got disabled
- Cons: kfence_allocation_gate will continue to be negative

3) Wake up everyone in the allocation_wait() list
- This might not be necessary if we got 2, since they will wake
themselves once kfence_enabled got to 0
- Cons: kfence_allocation_gate will continue to be negative

4) bump kfence_allocation_gate > 1 on the notifier
- Avoid kfence allocation completely after it got disabled.
- Cons: it is unclear if we I cant set kfence_allocation_gate = 1 from
the notifier.


Thanks for the report,
--breno
Reply all
Reply to author
Forward
0 new messages