[PATCH] x86/kfence: Avoid writing L1TF-vulnerable PTEs

2 views
Skip to first unread message

Andrew Cooper

unread,
Jan 6, 2026, 1:04:36 PMJan 6
to LKML, Andrew Cooper, Marco Elver, Alexander Potapenko, Dmitry Vyukov, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x...@kernel.org, H. Peter Anvin, Andrew Morton, Jann Horn, kasa...@googlegroups.com
For native, the choice of PTE is fine. There's real memory backing the
non-present PTE. However, for XenPV, Xen complains:

(XEN) d1 L1TF-vulnerable L1e 8010000018200066 - Shadowing

To explain, some background on XenPV pagetables:

Xen PV guests are control their own pagetables; they choose the new PTE
value, and use hypercalls to make changes so Xen can audit for safety.

In addition to a regular reference count, Xen also maintains a type
reference count. e.g. SegDesc (referenced by vGDT/vLDT),
Writable (referenced with _PAGE_RW) or L{1..4} (referenced by vCR3 or a
lower pagetable level). This is in order to prevent e.g. a page being
inserted into the pagetables for which the guest has a writable mapping.

For non-present mappings, all other bits become software accessible, and
typically contain metadata rather a real frame address. There is nothing
that a reference count could sensibly be tied to. As such, even if Xen
could recognise the address as currently safe, nothing would prevent that
frame from changing owner to another VM in the future.

When Xen detects a PV guest writing a L1TF-PTE, it responds by activating
shadow paging. This is normally only used for the live phase of
migration, and comes with a reasonable overhead.

KFENCE only cares about getting #PF to catch wild accesses; it doesn't care
about the value for non-present mappings. Use a fully inverted PTE, to
avoid hitting the slow path when running under Xen.

While adjusting the logic, take the opportunity to skip all actions if the
PTE is already in the right state, half the number PVOps callouts, and skip
TLB maintenance on a !P -> P transition which benefits non-Xen cases too.

Fixes: 1dc0da6e9ec0 ("x86, kfence: enable KFENCE for x86")
Tested-by: Marco Elver <el...@google.com>
Signed-off-by: Andrew Cooper <andrew....@citrix.com>
---
CC: Alexander Potapenko <gli...@google.com>
CC: Marco Elver <el...@google.com>
CC: Dmitry Vyukov <dvy...@google.com>
CC: Thomas Gleixner <tg...@linutronix.de>
CC: Ingo Molnar <mi...@redhat.com>
CC: Borislav Petkov <b...@alien8.de>
CC: Dave Hansen <dave....@linux.intel.com>
CC: x...@kernel.org
CC: "H. Peter Anvin" <h...@zytor.com>
CC: Andrew Morton <ak...@linux-foundation.org>
CC: Jann Horn <ja...@google.com>
CC: kasa...@googlegroups.com
CC: linux-...@vger.kernel.org

v1:
* First public posting. This went to security@ first just in case, and
then I got districted with other things ahead of public posting.
---
arch/x86/include/asm/kfence.h | 29 ++++++++++++++++++++++++-----
1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kfence.h b/arch/x86/include/asm/kfence.h
index ff5c7134a37a..acf9ffa1a171 100644
--- a/arch/x86/include/asm/kfence.h
+++ b/arch/x86/include/asm/kfence.h
@@ -42,10 +42,34 @@ static inline bool kfence_protect_page(unsigned long addr, bool protect)
{
unsigned int level;
pte_t *pte = lookup_address(addr, &level);
+ pteval_t val;

if (WARN_ON(!pte || level != PG_LEVEL_4K))
return false;

+ val = pte_val(*pte);
+
+ /*
+ * protect requires making the page not-present. If the PTE is
+ * already in the right state, there's nothing to do.
+ */
+ if (protect != !!(val & _PAGE_PRESENT))
+ return true;
+
+ /*
+ * Otherwise, invert the entire PTE. This avoids writing out an
+ * L1TF-vulnerable PTE (not present, without the high address bits
+ * set).
+ */
+ set_pte(pte, __pte(~val));
+
+ /*
+ * If the page was protected (non-present) and we're making it
+ * present, there is no need to flush the TLB at all.
+ */
+ if (!protect)
+ return true;
+
/*
* We need to avoid IPIs, as we may get KFENCE allocations or faults
* with interrupts disabled. Therefore, the below is best-effort, and
@@ -53,11 +77,6 @@ static inline bool kfence_protect_page(unsigned long addr, bool protect)
* lazy fault handling takes care of faults after the page is PRESENT.
*/

- if (protect)
- set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT));
- else
- set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT));
-
/*
* Flush this CPU's TLB, assuming whoever did the allocation/free is
* likely to continue running on this CPU.

base-commit: 7f98ab9da046865d57c102fd3ca9669a29845f67
--
2.39.5

Alexander Potapenko

unread,
Jan 7, 2026, 6:32:01 AMJan 7
to Andrew Cooper, LKML, Marco Elver, Dmitry Vyukov, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x...@kernel.org, H. Peter Anvin, Andrew Morton, Jann Horn, kasa...@googlegroups.com
Reviewed-by: Alexander Potapenko <gli...@google.com>

> /*
> * We need to avoid IPIs, as we may get KFENCE allocations or faults
> * with interrupts disabled. Therefore, the below is best-effort, and
> @@ -53,11 +77,6 @@ static inline bool kfence_protect_page(unsigned long addr, bool protect)
> * lazy fault handling takes care of faults after the page is PRESENT.
> */
Nit: should this comment be moved above before set_pte() or merged wit
the following comment block?

Andrew Cooper

unread,
Jan 7, 2026, 7:02:08 AMJan 7
to Alexander Potapenko, Andrew Cooper, LKML, Marco Elver, Dmitry Vyukov, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x...@kernel.org, H. Peter Anvin, Andrew Morton, Jann Horn, kasa...@googlegroups.com
Thanks.

>
>> /*
>> * We need to avoid IPIs, as we may get KFENCE allocations or faults
>> * with interrupts disabled. Therefore, the below is best-effort, and
>> @@ -53,11 +77,6 @@ static inline bool kfence_protect_page(unsigned long addr, bool protect)
>> * lazy fault handling takes care of faults after the page is PRESENT.
>> */
> Nit: should this comment be moved above before set_pte() or merged wit
> the following comment block?

Hmm, probably merged as they're both about the TLB maintenance.  But the
end result is a far more messy diff:

@@ -42,23 +42,40 @@ static inline bool kfence_protect_page(unsigned long addr, bool protect)
 {
        unsigned int level;
        pte_t *pte = lookup_address(addr, &level);
+       pteval_t val;
 
        if (WARN_ON(!pte || level != PG_LEVEL_4K))
                return false;
 
+       val = pte_val(*pte);
+
        /*
-        * We need to avoid IPIs, as we may get KFENCE allocations or faults
-        * with interrupts disabled. Therefore, the below is best-effort, and
-        * does not flush TLBs on all CPUs. We can tolerate some inaccuracy;
-        * lazy fault handling takes care of faults after the page is PRESENT.
+        * protect requires making the page not-present.  If the PTE is
+        * already in the right state, there's nothing to do.
+        */
+       if (protect != !!(val & _PAGE_PRESENT))
+               return true;
+
+       /*
+        * Otherwise, invert the entire PTE.  This avoids writing out an
+        * L1TF-vulnerable PTE (not present, without the high address bits
+        * set).
         */
+       set_pte(pte, __pte(~val));
 
-       if (protect)
-               set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT));
-       else
-               set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT));
+       /*
+        * If the page was protected (non-present) and we're making it
+        * present, there is no need to flush the TLB at all.
+        */
+       if (!protect)
+               return true;
 
        /*
+        * We need to avoid IPIs, as we may get KFENCE allocations or faults
+        * with interrupts disabled. Therefore, the below is best-effort, and
+        * does not flush TLBs on all CPUs. We can tolerate some inaccuracy;
+        * lazy fault handling takes care of faults after the page is PRESENT.
+        *
         * Flush this CPU's TLB, assuming whoever did the allocation/free is
         * likely to continue running on this CPU.
         */



I need to resubmit anyway, because I've spotted one silly error in the
commit message.

I could submit two patches, with the second one stated as "to make the
previous patch legible".

Thoughts?

~Andrew

Andrew Morton

unread,
Jan 7, 2026, 6:17:03 PMJan 7
to Andrew Cooper, LKML, Marco Elver, Alexander Potapenko, Dmitry Vyukov, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x...@kernel.org, H. Peter Anvin, Jann Horn, kasa...@googlegroups.com
Seems that I sent 1dc0da6e9ec0 upstream so thanks, I'll grab this. If
an x86 person chooses to handle it then I'll drop the mm.git version.

I'll add a cc:stable to the mm.git copy, just to be sure.

> Tested-by: Marco Elver <el...@google.com>
> Signed-off-by: Andrew Cooper <andrew....@citrix.com>
> ---

That "^---$" tells tooling "changelog stops here".

> CC: Alexander Potapenko <gli...@google.com>
> CC: Marco Elver <el...@google.com>
> CC: Dmitry Vyukov <dvy...@google.com>
> CC: Thomas Gleixner <tg...@linutronix.de>
> CC: Ingo Molnar <mi...@redhat.com>
> CC: Borislav Petkov <b...@alien8.de>
> CC: Dave Hansen <dave....@linux.intel.com>
> CC: x...@kernel.org
> CC: "H. Peter Anvin" <h...@zytor.com>
> CC: Andrew Morton <ak...@linux-foundation.org>
> CC: Jann Horn <ja...@google.com>
> CC: kasa...@googlegroups.com
> CC: linux-...@vger.kernel.org
>
> v1:
> * First public posting. This went to security@ first just in case, and
> then I got districted with other things ahead of public posting.
> ---

That "^---$" would be better placed above the versioning info.

>
> ...
>

Ryusuke Konishi

unread,
Jan 26, 2026, 2:07:24 PM (12 days ago) Jan 26
to Andrew Cooper, Andrew Morton, Marco Elver, LKML, Alexander Potapenko, Dmitry Vyukov, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML, H. Peter Anvin, Jann Horn, kasa...@googlegroups.com
Hi All,

I am reporting a boot regression in v6.19-rc7 on an x86_32
environment. The kernel hangs immediately after "Booting the kernel"
and does not produce any early console output.

A git bisect identified the following commit as the first bad commit:
b505f1944535 ("x86/kfence: avoid writing L1TF-vulnerable PTEs")

Environment and Config:
- Guest Arch: x86_32 (one of my test VMs)
- Memory Config: # CONFIG_X86_PAE is not set
- KFENCE Config: CONFIG_KFENCE=y
- Host/Hypervisor: x86_64 host running KVM

The system fails to boot at a very early stage. I have confirmed that
reverting commit b505f1944535 on top of v6.19-rc7 completely resolves
the issue, and the kernel boots normally.

Could you please verify if this change is compatible with x86_32
(non-PAE) configurations?
I am happy to provide my full .config or test any potential fixes.

Best regards,
Ryusuke Konishi

Andrew Cooper

unread,
Jan 26, 2026, 2:40:01 PM (12 days ago) Jan 26
to Ryusuke Konishi, Andrew Cooper, Andrew Morton, Marco Elver, LKML, Alexander Potapenko, Dmitry Vyukov, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML, H. Peter Anvin, Jann Horn, kasa...@googlegroups.com
Hmm.  To start with, does this fix the crash?

diff --git a/arch/x86/include/asm/kfence.h b/arch/x86/include/asm/kfence.h
index acf9ffa1a171..2fe454722e54 100644
--- a/arch/x86/include/asm/kfence.h
+++ b/arch/x86/include/asm/kfence.h
@@ -67,8 +67,6 @@ static inline bool kfence_protect_page(unsigned long addr, bool protect)
         * If the page was protected (non-present) and we're making it
         * present, there is no need to flush the TLB at all.
         */
-       if (!protect)
-               return true;
 
        /*
         * We need to avoid IPIs, as we may get KFENCE allocations or faults



Re-reading, I can't spot anything obvious.

Architecturally, x86 explicitly does not need a TLB flush when turning a
non-present mapping present, and it's strictly 4k leaf mappings we're
handling here.

I wonder if something else is missing a flush, and was being covered by
this.

~Andrew

Ryusuke Konishi

unread,
Jan 26, 2026, 2:52:50 PM (12 days ago) Jan 26
to Andrew Cooper, Andrew Morton, Marco Elver, LKML, Alexander Potapenko, Dmitry Vyukov, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML, H. Peter Anvin, Jann Horn, kasa...@googlegroups.com
I tested this change, but unfortunately the boot hang still occurs.

Regards,
Ryusuke Konishi

Borislav Petkov

unread,
Jan 26, 2026, 2:54:59 PM (12 days ago) Jan 26
to Ryusuke Konishi, Andrew Cooper, Andrew Morton, Marco Elver, LKML, Alexander Potapenko, Dmitry Vyukov, Thomas Gleixner, Ingo Molnar, Dave Hansen, X86 ML, H. Peter Anvin, Jann Horn, kasa...@googlegroups.com
On Tue, Jan 27, 2026 at 04:07:04AM +0900, Ryusuke Konishi wrote:
> Hi All,
>
> I am reporting a boot regression in v6.19-rc7 on an x86_32
> environment. The kernel hangs immediately after "Booting the kernel"
> and does not produce any early console output.
>
> A git bisect identified the following commit as the first bad commit:
> b505f1944535 ("x86/kfence: avoid writing L1TF-vulnerable PTEs")

I can confirm the same - my 32-bit laptop experiences the same. The guest
splat looks like this:

[ 0.173437] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[ 0.175172] ------------[ cut here ]------------
[ 0.176066] kernel BUG at arch/x86/mm/physaddr.c:70!
[ 0.177037] Oops: invalid opcode: 0000 [#1] SMP
[ 0.177914] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.19.0-rc7+ #1 PREEMPT(full)
[ 0.179509] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 0.181363] EIP: __phys_addr+0x78/0x90
[ 0.182089] Code: 89 c8 5b 5d c3 2e 8d 74 26 00 0f 0b 8d b6 00 00 00 00 89 45 f8 e8 08 a4 1d 00 84 c0 8b 55 f8 74 b0 0f 0b 8d b4 26 00 00 00 00 <0f> 0b 8d b6 00 00 00 00 0f 0b 66 90 8d 74 26 00 2e 8d b4 26 00 00
[ 0.185723] EAX: ce383000 EBX: 00031c7c ECX: 31c7c000 EDX: 034ec000
[ 0.186972] ESI: c1ed3eec EDI: f21fd101 EBP: c2055f78 ESP: c2055f70
[ 0.188182] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00210086
[ 0.189503] CR0: 80050033 CR2: ffd98000 CR3: 029cf000 CR4: 00000090
[ 0.191045] Call Trace:
[ 0.191518] kfence_init+0x3a/0x94
[ 0.192177] start_kernel+0x4ea/0x62c
[ 0.192894] i386_start_kernel+0x65/0x68
[ 0.193653] startup_32_smp+0x151/0x154
[ 0.194397] Modules linked in:
[ 0.194987] ---[ end trace 0000000000000000 ]---
[ 0.195879] EIP: __phys_addr+0x78/0x90
[ 0.196610] Code: 89 c8 5b 5d c3 2e 8d 74 26 00 0f 0b 8d b6 00 00 00 00 89 45 f8 e8 08 a4 1d 00 84 c0 8b 55 f8 74 b0 0f 0b 8d b4 26 00 00 00 00 <0f> 0b 8d b6 00 00 00 00 0f 0b 66 90 8d 74 26 00 2e 8d b4 26 00 00
[ 0.200231] EAX: ce383000 EBX: 00031c7c ECX: 31c7c000 EDX: 034ec000
[ 0.201452] ESI: c1ed3eec EDI: f21fd101 EBP: c2055f78 ESP: c2055f70
[ 0.202693] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00210086
[ 0.204011] CR0: 80050033 CR2: ffd98000 CR3: 029cf000 CR4: 00000090
[ 0.205235] Kernel panic - not syncing: Attempted to kill the idle task!
[ 0.206897] ---[ end Kernel panic - not syncing: Attempted to: kill the idle task! ]---

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Andrew Cooper

unread,
Jan 26, 2026, 3:23:00 PM (12 days ago) Jan 26
to Borislav Petkov, Ryusuke Konishi, Andrew Cooper, Andrew Morton, Marco Elver, LKML, Alexander Potapenko, Dmitry Vyukov, Thomas Gleixner, Ingo Molnar, Dave Hansen, X86 ML, H. Peter Anvin, Jann Horn, kasa...@googlegroups.com
Ok, we're hitting a BUG, not a TLB flushing problem.  That's:

BUG_ON(slow_virt_to_phys((void *)x) != phys_addr);

so it's obviously to do with the inverted pte.  pgtable-2level.h has

/* No inverted PFNs on 2 level page tables */

and that was definitely an oversight on my behalf.  Sorry.

Does this help?

diff --git a/arch/x86/include/asm/kfence.h b/arch/x86/include/asm/kfence.h
index acf9ffa1a171..310e0193d731 100644
--- a/arch/x86/include/asm/kfence.h
+++ b/arch/x86/include/asm/kfence.h
@@ -42,7 +42,7 @@ static inline bool kfence_protect_page(unsigned long addr, bool protect)
 {
        unsigned int level;
        pte_t *pte = lookup_address(addr, &level);
-       pteval_t val;
+       pteval_t val, new;
 
        if (WARN_ON(!pte || level != PG_LEVEL_4K))
                return false;
@@ -61,7 +61,8 @@ static inline bool kfence_protect_page(unsigned long addr, bool protect)
         * L1TF-vulnerable PTE (not present, without the high address bits
         * set).
         */
-       set_pte(pte, __pte(~val));
+       new = val ^ _PAGE_PRESENT;
+       set_pte(pte, __pte(flip_protnone_guard(val, new, PTE_PFN_MASK)));
 
        /*
         * If the page was protected (non-present) and we're making it



Only compile tested.  flip_protnone_guard() seems the helper which is a
nop on 2-level paging.

~Andrew

Andrew Morton

unread,
Jan 26, 2026, 3:24:45 PM (12 days ago) Jan 26
to Ryusuke Konishi, Andrew Cooper, Marco Elver, LKML, Alexander Potapenko, Dmitry Vyukov, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML, H. Peter Anvin, Jann Horn, kasa...@googlegroups.com, stable, Greg Kroah-Hartman
On Tue, 27 Jan 2026 04:07:04 +0900 Ryusuke Konishi <konishi...@gmail.com> wrote:

> Hi All,
>
> I am reporting a boot regression in v6.19-rc7 on an x86_32
> environment. The kernel hangs immediately after "Booting the kernel"
> and does not produce any early console output.
>
> A git bisect identified the following commit as the first bad commit:
> b505f1944535 ("x86/kfence: avoid writing L1TF-vulnerable PTEs")

Thanks. b505f1944535 had cc:stable so let's add some cc's to alert
-stable maintainers.

I see that b505f1944535 prevented a Xen warning, but did it have any
other runtime effects? If not, a prompt revert may be the way to
proceed for now.

Dave Hansen

unread,
Jan 26, 2026, 3:25:28 PM (12 days ago) Jan 26
to Borislav Petkov, Ryusuke Konishi, Andrew Cooper, Andrew Morton, Marco Elver, LKML, Alexander Potapenko, Dmitry Vyukov, Thomas Gleixner, Ingo Molnar, Dave Hansen, X86 ML, H. Peter Anvin, Jann Horn, kasa...@googlegroups.com
On 1/26/26 11:54, Borislav Petkov wrote:
> [ 0.173437] rcu: srcu_init: Setting srcu_struct sizes based on contention.
> [ 0.175172] ------------[ cut here ]------------
> [ 0.176066] kernel BUG at arch/x86/mm/physaddr.c:70!

Take a look at kfence_init_pool_early(). It's riddled with __pa() which
calls down to __phys_addr() => slow_virt_to_phys().

The plain !present PTE is fine, but the inverted one trips up
slow_virt_to_phys(), I bet. The slow_virt_to_phys() only gets called on
when highmem is enabled (not when the memory is highmem) which is why
this is blowing up on 32-bit only.

The easiest hack/fix would be to just turn off kfence on 32-bit. I guess
the better fix would be to make kfence do its __pa() before it mucks
with the PTEs. The other option would be to either comprehend or ignore
those inverted PTEs.

Ugh.

Dave Hansen

unread,
Jan 26, 2026, 3:36:29 PM (12 days ago) Jan 26
to Andrew Morton, Ryusuke Konishi, Andrew Cooper, Marco Elver, LKML, Alexander Potapenko, Dmitry Vyukov, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML, H. Peter Anvin, Jann Horn, kasa...@googlegroups.com, stable, Greg Kroah-Hartman
On 1/26/26 12:24, Andrew Morton wrote:
> I see that b505f1944535 prevented a Xen warning, but did it have any
> other runtime effects? If not, a prompt revert may be the way to
> proceed for now.

Yeah, that's fine.

At the same time ... KFENCE folks: I wonder if you've been testing on
highmem and/or 32-bit x86 builds or if there's much value to keeping
KFENCE maintained there.


Ryusuke Konishi

unread,
Jan 26, 2026, 3:42:18 PM (12 days ago) Jan 26
to Andrew Cooper, Borislav Petkov, Andrew Morton, Marco Elver, LKML, Alexander Potapenko, Dmitry Vyukov, Thomas Gleixner, Ingo Molnar, Dave Hansen, X86 ML, H. Peter Anvin, Jann Horn, kasa...@googlegroups.com
Yes, after applying this, it started booting.
Leaving aside the discussion of the fix, I'll just share the test
result for now.

Regards,
Ryusuke Konishi

Andrew Cooper

unread,
Jan 26, 2026, 3:43:51 PM (12 days ago) Jan 26
to Ryusuke Konishi, Andrew Cooper, Borislav Petkov, Andrew Morton, Marco Elver, LKML, Alexander Potapenko, Dmitry Vyukov, Thomas Gleixner, Ingo Molnar, Dave Hansen, X86 ML, H. Peter Anvin, Jann Horn, kasa...@googlegroups.com
Thanks.  I'll put together a proper patch.

~Andrew

Andrew Cooper

unread,
Jan 26, 2026, 4:06:20 PM (12 days ago) Jan 26
to LKML, Andrew Cooper, Ryusuke Konishi, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x...@kernel.org, H. Peter Anvin, Andrew Morton, Jann Horn, kasa...@googlegroups.com
The original patch inverted the PTE unconditionally to avoid
L1TF-vulnerable PTEs, but Linux doesn't make this adjustment in 2-level
paging.

Adjust the logic to use the flip_protnone_guard() helper, which is a nop on
2-level paging but inverts the address bits in all other paging modes.

This doesn't matter for the Xen aspect of the original change. Linux no
longer supports running 32bit PV under Xen, and Xen doesn't support running
any 32bit PV guests without using PAE paging.

Fixes: b505f1944535 ("x86/kfence: avoid writing L1TF-vulnerable PTEs")
Reported-by: Ryusuke Konishi <konishi...@gmail.com>
Closes: https://lore.kernel.org/lkml/CAKFNMokwjw68ubYQM9WkzOuH...@mail.gmail.com/
Signed-off-by: Andrew Cooper <andrew....@citrix.com>
CC: Ryusuke Konishi <konishi...@gmail.com>
CC: Alexander Potapenko <gli...@google.com>
CC: Marco Elver <el...@google.com>
CC: Dmitry Vyukov <dvy...@google.com>
CC: Thomas Gleixner <tg...@linutronix.de>
CC: Ingo Molnar <mi...@redhat.com>
CC: Borislav Petkov <b...@alien8.de>
CC: Dave Hansen <dave....@linux.intel.com>
CC: x...@kernel.org
CC: "H. Peter Anvin" <h...@zytor.com>
CC: Andrew Morton <ak...@linux-foundation.org>
CC: Jann Horn <ja...@google.com>
CC: kasa...@googlegroups.com
CC: linux-...@vger.kernel.org
---
arch/x86/include/asm/kfence.h | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kfence.h b/arch/x86/include/asm/kfence.h
index acf9ffa1a171..40cf6a5d781d 100644
--- a/arch/x86/include/asm/kfence.h
+++ b/arch/x86/include/asm/kfence.h
@@ -42,7 +42,7 @@ static inline bool kfence_protect_page(unsigned long addr, bool protect)
{
unsigned int level;
pte_t *pte = lookup_address(addr, &level);
- pteval_t val;
+ pteval_t val, new;

if (WARN_ON(!pte || level != PG_LEVEL_4K))
return false;
@@ -57,11 +57,12 @@ static inline bool kfence_protect_page(unsigned long addr, bool protect)
return true;

/*
- * Otherwise, invert the entire PTE. This avoids writing out an
- * L1TF-vulnerable PTE (not present, without the high address bits
+ * Otherwise, flip the Present bit, taking care to avoid writing an
+ * L1TF-vulenrable PTE (not present, without the high address bits
* set).
*/
- set_pte(pte, __pte(~val));
+ new = val ^ _PAGE_PRESENT;
+ set_pte(pte, __pte(flip_protnone_guard(val, new, PTE_PFN_MASK)));

/*
* If the page was protected (non-present) and we're making it

base-commit: fcb70a56f4d81450114034b2c61f48ce7444a0e2
--
2.39.5

Andrew Cooper

unread,
Jan 26, 2026, 4:08:44 PM (12 days ago) Jan 26
to LKML, Andrew Cooper, Ryusuke Konishi, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x...@kernel.org, H. Peter Anvin, Andrew Morton, Jann Horn, kasa...@googlegroups.com
And I apparently can't spell.  I'll do a v2 immediately, seeing as this
is somewhat urgent.

~Andrew

Andrew Cooper

unread,
Jan 26, 2026, 4:10:54 PM (12 days ago) Jan 26
to LKML, Andrew Cooper, Ryusuke Konishi, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x...@kernel.org, H. Peter Anvin, Andrew Morton, Jann Horn, kasa...@googlegroups.com
v2:
* Fix a spelling mistake in the comment.
---
arch/x86/include/asm/kfence.h | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kfence.h b/arch/x86/include/asm/kfence.h
index acf9ffa1a171..dfd5c74ba41a 100644
--- a/arch/x86/include/asm/kfence.h
+++ b/arch/x86/include/asm/kfence.h
@@ -42,7 +42,7 @@ static inline bool kfence_protect_page(unsigned long addr, bool protect)
{
unsigned int level;
pte_t *pte = lookup_address(addr, &level);
- pteval_t val;
+ pteval_t val, new;

if (WARN_ON(!pte || level != PG_LEVEL_4K))
return false;
@@ -57,11 +57,12 @@ static inline bool kfence_protect_page(unsigned long addr, bool protect)
return true;

/*
- * Otherwise, invert the entire PTE. This avoids writing out an
+ * Otherwise, flip the Present bit, taking care to avoid writing an
* L1TF-vulnerable PTE (not present, without the high address bits
* set).
*/
- set_pte(pte, __pte(~val));
+ new = val ^ _PAGE_PRESENT;
+ set_pte(pte, __pte(flip_protnone_guard(val, new, PTE_PFN_MASK)));

/*
* If the page was protected (non-present) and we're making it

base-commit: fcb70a56f4d81450114034b2c61f48ce7444a0e2
--
2.39.5

Andrew Morton

unread,
Jan 26, 2026, 4:24:55 PM (12 days ago) Jan 26
to Andrew Cooper, LKML, Ryusuke Konishi, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x...@kernel.org, H. Peter Anvin, Jann Horn, kasa...@googlegroups.com
On Mon, 26 Jan 2026 21:10:46 +0000 Andrew Cooper <andrew....@citrix.com> wrote:

> The original patch inverted the PTE unconditionally to avoid
> L1TF-vulnerable PTEs, but Linux doesn't make this adjustment in 2-level
> paging.
>
> Adjust the logic to use the flip_protnone_guard() helper, which is a nop on
> 2-level paging but inverts the address bits in all other paging modes.
>
> This doesn't matter for the Xen aspect of the original change. Linux no
> longer supports running 32bit PV under Xen, and Xen doesn't support running
> any 32bit PV guests without using PAE paging.

Great thanks. I'll add

Tested-by: Ryusuke Konishi <konishi...@gmail.com>

and, importantly,

Cc: <sta...@vger.kernel.org>

to help everything get threaded together correctly.


I'll queue this as a 6.19-rcX hotfix.

Borislav Petkov

unread,
Jan 26, 2026, 4:56:39 PM (12 days ago) Jan 26
to Andrew Morton, Andrew Cooper, LKML, Ryusuke Konishi, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Thomas Gleixner, Ingo Molnar, Dave Hansen, x...@kernel.org, H. Peter Anvin, Jann Horn, kasa...@googlegroups.com
On Mon, Jan 26, 2026 at 01:24:50PM -0800, Andrew Morton wrote:
> Great thanks. I'll add
>
> Tested-by: Ryusuke Konishi <konishi...@gmail.com>
>
> and, importantly,
>
> Cc: <sta...@vger.kernel.org>
>
> to help everything get threaded together correctly.
>
>
> I'll queue this as a 6.19-rcX hotfix.

You can add also

Tested-by: Borislav Petkov (AMD) <b...@alien8.de>

Works on a real hw too.

Thx.

Andrew Cooper

unread,
Jan 26, 2026, 5:02:07 PM (12 days ago) Jan 26
to Borislav Petkov, Andrew Morton, Andrew Cooper, LKML, Ryusuke Konishi, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Thomas Gleixner, Ingo Molnar, Dave Hansen, x...@kernel.org, H. Peter Anvin, Jann Horn, kasa...@googlegroups.com
On 26/01/2026 9:56 pm, Borislav Petkov wrote:
> On Mon, Jan 26, 2026 at 01:24:50PM -0800, Andrew Morton wrote:
>> Great thanks. I'll add
>>
>> Tested-by: Ryusuke Konishi <konishi...@gmail.com>
>>
>> and, importantly,
>>
>> Cc: <sta...@vger.kernel.org>
>>
>> to help everything get threaded together correctly.
>>
>>
>> I'll queue this as a 6.19-rcX hotfix.
> You can add also
>
> Tested-by: Borislav Petkov (AMD) <b...@alien8.de>
>
> Works on a real hw too.

Thanks, and sorry for the breakage.

~Andrew

Borislav Petkov

unread,
Jan 26, 2026, 5:09:42 PM (12 days ago) Jan 26
to Andrew Cooper, Andrew Morton, LKML, Ryusuke Konishi, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Thomas Gleixner, Ingo Molnar, Dave Hansen, x...@kernel.org, H. Peter Anvin, Jann Horn, kasa...@googlegroups.com
On Mon, Jan 26, 2026 at 10:01:56PM +0000, Andrew Cooper wrote:
> Thanks, and sorry for the breakage.

Bah, no one cares about 32-bit. :-P
Reply all
Reply to author
Forward
0 new messages