[PATCH] arm64: cpufeature: Don't cpu_enable_mte() when KASAN_GENERIC is active

0 views
Skip to first unread message

Yunseong Kim

unread,
Oct 8, 2025, 5:13:21 PM (5 days ago) Oct 8
to Catalin Marinas, Will Deacon, James Morse, Yeoreum Yun, Vincenzo Frascino, Marc Zyngier, Mark Brown, Oliver Upton, Ard Biesheuvel, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, Yunseong Kim
When a kernel built with CONFIG_KASAN_GENERIC=y is booted on MTE-capable
hardware, a kernel panic occurs early in the boot process. The crash
happens when the CPU feature detection logic attempts to enable the Memory
Tagging Extension (MTE) via cpu_enable_mte().

Because the kernel is instrumented by the software-only Generic KASAN,
the code within cpu_enable_mte() itself is instrumented. This leads to
a fatal memory access fault within KASAN's shadow memory region when
the MTE initialization is attempted. Currently, the only workaround is
to boot with the "arm64.nomte" kernel parameter.

This bug was discovered during work on supporting the Debian debug kernel
on the Arm v9.2 RADXA Orion O6 board:

https://salsa.debian.org/kernel-team/linux/-/merge_requests/1670

Related kernel configs:

CONFIG_ARM64_AS_HAS_MTE=y
CONFIG_ARM64_MTE=y

CONFIG_KASAN_SHADOW_OFFSET=0xdfff800000000000
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_SW_TAGS=y
CONFIG_HAVE_ARCH_KASAN_HW_TAGS=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_CC_HAS_KASAN_SW_TAGS=y

CONFIG_KASAN=y
CONFIG_CC_HAS_KASAN_MEMINTRINSIC_PREFIX=y
CONFIG_KASAN_GENERIC=y

The panic log clearly shows the conflict:

[ 0.000000] kasan: KernelAddressSanitizer initialized (generic)
[ 0.000000] psci: probing for conduit method from ACPI.
[ 0.000000] psci: PSCIv1.1 detected in firmware.
[ 0.000000] psci: Using standard PSCI v0.2 function IDs
[ 0.000000] psci: Trusted OS migration not required
[ 0.000000] psci: SMC Calling Convention v1.2
[ 0.000000] percpu: Embedded 486 pages/cpu s1950104 r8192 d32360 u1990656
[ 0.000000] pcpu-alloc: s1950104 r8192 d32360 u1990656 alloc=486*4096
[ 0.000000] pcpu-alloc: [0] 00 [0] 01 [0] 02 [0] 03 [0] 04 [0] 05 [0] 06 [0] 07
[ 0.000000] pcpu-alloc: [0] 08 [0] 09 [0] 10 [0] 11
[ 0.000000] Detected PIPT I-cache on CPU0
[ 0.000000] CPU features: detected: Address authentication (architected QARMA3 algorithm)
[ 0.000000] CPU features: detected: GICv3 CPU interface
[ 0.000000] CPU features: detected: HCRX_EL2 register
[ 0.000000] CPU features: detected: Virtualization Host Extensions
[ 0.000000] CPU features: detected: Memory Tagging Extension
[ 0.000000] CPU features: detected: Asymmetric MTE Tag Check Fault
[ 0.000000] CPU features: detected: Spectre-v4
[ 0.000000] CPU features: detected: Spectre-BHB
[ 0.000000] CPU features: detected: SSBS not fully self-synchronizing
[ 0.000000] Unable to handle kernel paging request at virtual address dfff800000000005
[ 0.000000] KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
[ 0.000000] Mem abort info:
[ 0.000000] ESR = 0x0000000096000005
[ 0.000000] EC = 0x25: DABT (current EL), IL = 32 bits
[ 0.000000] SET = 0, FnV = 0
[ 0.000000] EA = 0, S1PTW = 0
[ 0.000000] FSC = 0x05: level 1 translation fault
[ 0.000000] Data abort info:
[ 0.000000] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[ 0.000000] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 0.000000] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 0.000000] [dfff800000000005] address between user and kernel address ranges
[ 0.000000] Internal error: Oops: 0000000096000005 [#1] SMP
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.17+unreleased-debug-arm64 #1 PREEMPTLAZY Debian 6.17-1~exp1
[ 0.000000] pstate: 800000c9 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 0.000000] pc : cpu_enable_mte+0x104/0x440
[ 0.000000] lr : cpu_enable_mte+0xf4/0x440
[ 0.000000] sp : ffff800084f67d80
[ 0.000000] x29: ffff800084f67d80 x28: 0000000000000043 x27: 0000000000000001
[ 0.000000] x26: 0000000000000001 x25: ffff800084204008 x24: ffff800084203da8
[ 0.000000] x23: ffff800084204000 x22: ffff800084203000 x21: ffff8000865a8000
[ 0.000000] x20: fffffffffffffffe x19: fffffdffddaa6a00 x18: 0000000000000011
[ 0.000000] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[ 0.000000] x14: 0000000000000000 x13: 0000000000000001 x12: ffff700010a04829
[ 0.000000] x11: 1ffff00010a04828 x10: ffff700010a04828 x9 : dfff800000000000
[ 0.000000] x8 : ffff800085024143 x7 : 0000000000000001 x6 : ffff700010a04828
[ 0.000000] x5 : ffff800084f9d200 x4 : 0000000000000000 x3 : ffff8000800794ac
[ 0.000000] x2 : 0000000000000005 x1 : dfff800000000000 x0 : 000000000000002e
[ 0.000000] Call trace:
[ 0.000000] cpu_enable_mte+0x104/0x440 (P)
[ 0.000000] enable_cpu_capabilities+0x188/0x208
[ 0.000000] setup_boot_cpu_features+0x44/0x60
[ 0.000000] smp_prepare_boot_cpu+0x9c/0xb8
[ 0.000000] start_kernel+0xc8/0x528
[ 0.000000] __primary_switched+0x8c/0xa0
[ 0.000000] Code: 9100c280 d2d00001 f2fbffe1 d343fc02 (38e16841)
[ 0.000000] ---[ end trace 0000000000000000 ]---
[ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

Signed-off-by: Yunseong Kim <y...@kzalloc.com>
---
arch/arm64/kernel/cpufeature.c | 26 ++++++++++++++++++++++----
1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 5ed401ff79e3..a0a9fa1b376d 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2340,6 +2340,24 @@ static void cpu_enable_mte(struct arm64_cpu_capabilities const *cap)

kasan_init_hw_tags_cpu();
}
+
+static bool has_usable_mte(const struct arm64_cpu_capabilities *entry, int scope)
+{
+ if (!has_cpuid_feature(entry, scope))
+ return false;
+
+ /*
+ * MTE and Generic KASAN are mutually exclusive. Generic KASAN is a
+ * software-only mode that is incompatible with the MTE hardware.
+ * Do not enable MTE if Generic KASAN is active.
+ */
+ if (IS_ENABLED(CONFIG_KASAN_GENERIC) && kasan_enabled()) {
+ pr_warn_once("MTE capability disabled due to Generic KASAN conflict\n");
+ return false;
+ }
+
+ return true;
+}
#endif /* CONFIG_ARM64_MTE */

static void user_feature_fixup(void)
@@ -2850,7 +2868,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
.desc = "Memory Tagging Extension",
.capability = ARM64_MTE,
.type = ARM64_CPUCAP_STRICT_BOOT_CPU_FEATURE,
- .matches = has_cpuid_feature,
+ .matches = has_usable_mte,
.cpu_enable = cpu_enable_mte,
ARM64_CPUID_FIELDS(ID_AA64PFR1_EL1, MTE, MTE2)
},
@@ -2858,21 +2876,21 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
.desc = "Asymmetric MTE Tag Check Fault",
.capability = ARM64_MTE_ASYMM,
.type = ARM64_CPUCAP_BOOT_CPU_FEATURE,
- .matches = has_cpuid_feature,
+ .matches = has_usable_mte,
ARM64_CPUID_FIELDS(ID_AA64PFR1_EL1, MTE, MTE3)
},
{
.desc = "FAR on MTE Tag Check Fault",
.capability = ARM64_MTE_FAR,
.type = ARM64_CPUCAP_SYSTEM_FEATURE,
- .matches = has_cpuid_feature,
+ .matches = has_usable_mte,
ARM64_CPUID_FIELDS(ID_AA64PFR2_EL1, MTEFAR, IMP)
},
{
.desc = "Store Only MTE Tag Check",
.capability = ARM64_MTE_STORE_ONLY,
.type = ARM64_CPUCAP_BOOT_CPU_FEATURE,
- .matches = has_cpuid_feature,
+ .matches = has_usable_mte,
ARM64_CPUID_FIELDS(ID_AA64PFR2_EL1, MTESTOREONLY, IMP)
},
#endif /* CONFIG_ARM64_MTE */
--
2.51.0

Andrey Konovalov

unread,
Oct 8, 2025, 5:36:31 PM (5 days ago) Oct 8
to Yunseong Kim, Catalin Marinas, Will Deacon, James Morse, Yeoreum Yun, Vincenzo Frascino, Marc Zyngier, Mark Brown, Oliver Upton, Ard Biesheuvel, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
I do not understand this. Why is Generic KASAN incompatible with MTE?
Running Generic KASAN in the kernel while having MTE enabled (and e.g.
used in userspace) seems like a valid combination.

The crash log above looks like a NULL-ptr-deref. On which line of code
does it happen?

Yunseong Kim

unread,
Oct 8, 2025, 6:28:27 PM (5 days ago) Oct 8
to Andrey Konovalov, Catalin Marinas, Will Deacon, James Morse, Yeoreum Yun, Vincenzo Frascino, Marc Zyngier, Mark Brown, Oliver Upton, Ard Biesheuvel, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
Hi Andrey,

On 10/9/25 6:36 AM, Andrey Konovalov wrote:
> On Wed, Oct 8, 2025 at 11:13 PM Yunseong Kim <y...@kzalloc.com> wrote:
>> [...]
> I do not understand this. Why is Generic KASAN incompatible with MTE?

My board wouldn't boot on the debian debug kernel, so I enabled
earlycon=pl011,0x40d0000 and checked via the UART console.

> Running Generic KASAN in the kernel while having MTE enabled (and e.g.
> used in userspace) seems like a valid combination.

Then it must be caused by something else. Thank you for letting me know.

It seems to be occurring in the call path as follows:

cpu_enable_mte()
-> try_page_mte_tagging(ZERO_PAGE(0))
-> VM_WARN_ON_ONCE(folio_test_hugetlb(page_folio(page)));

https://elixir.bootlin.com/linux/v6.17/source/arch/arm64/include/asm/mte.h#L83

> The crash log above looks like a NULL-ptr-deref. On which line of code
> does it happen?

Decoded stack trace here:

[ 0.000000] Unable to handle kernel paging request at virtual address dfff800000000005
[ 0.000000] KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
[ 0.000000] Mem abort info:
[ 0.000000] ESR = 0x0000000096000005
[ 0.000000] EC = 0x25: DABT (current EL), IL = 32 bits
[ 0.000000] SET = 0, FnV = 0
[ 0.000000] EA = 0, S1PTW = 0
[ 0.000000] FSC = 0x05: level 1 translation fault
[ 0.000000] Data abort info:
[ 0.000000] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[ 0.000000] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 0.000000] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 0.000000] [dfff800000000005] address between user and kernel address ranges
[ 0.000000] Internal error: Oops: 0000000096000005 [#1] SMP
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.17+unreleased-debug-arm64 #1 PREEMPTLAZY Debian 6.17-1~exp1
[ 0.000000] pstate: 800000c9 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 0.000000] pc : cpu_enable_mte (debian/build/build_arm64_none_debug-arm64/include/linux/page-flags.h:1065 (discriminator 1) debian/build/build_arm64_none_debug-arm64/arch/arm64/include/asm/mte.h:83 (discriminator 1) debian/build/build_arm64_none_debug-arm64/arch/arm64/kernel/cpufeature.c:2419 (discriminator 1))
[ 0.000000] lr : cpu_enable_mte (debian/build/build_arm64_none_debug-arm64/include/linux/page-flags.h:1065 (discriminator 1) debian/build/build_arm64_none_debug-arm64/arch/arm64/include/asm/mte.h:83 (discriminator 1) debian/build/build_arm64_none_debug-arm64/arch/arm64/kernel/cpufeature.c:2419 (discriminator 1))
[ 0.000000] sp : ffff800084f67d80
[ 0.000000] x29: ffff800084f67d80 x28: 0000000000000043 x27: 0000000000000001
[ 0.000000] x26: 0000000000000001 x25: ffff800084204008 x24: ffff800084203da8
[ 0.000000] x23: ffff800084204000 x22: ffff800084203000 x21: ffff8000865a8000
[ 0.000000] x20: fffffffffffffffe x19: fffffdffddaa6a00 x18: 0000000000000011
[ 0.000000] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[ 0.000000] x14: 0000000000000000 x13: 0000000000000001 x12: ffff700010a04829
[ 0.000000] x11: 1ffff00010a04828 x10: ffff700010a04828 x9 : dfff800000000000
[ 0.000000] x8 : ffff800085024143 x7 : 0000000000000001 x6 : ffff700010a04828
[ 0.000000] x5 : ffff800084f9d200 x4 : 0000000000000000 x3 : ffff8000800794ac
[ 0.000000] x2 : 0000000000000005 x1 : dfff800000000000 x0 : 000000000000002e
[ 0.000000] Call trace:
[ 0.000000] cpu_enable_mte (debian/build/build_arm64_none_debug-arm64/√ (discriminator 1) debian/build/build_arm64_none_debug-arm64/arch/arm64/include/asm/mte.h:83 (discriminator 1) debian/build/build_arm64_none_debug-arm64/arch/arm64/kernel/cpufeature.c:2419 (discriminator 1)) (P)
[ 0.000000] enable_cpu_capabilities (debian/build/build_arm64_none_debug-arm64/arch/arm64/kernel/cpufeature.c:3561 (discriminator 2))
[ 0.000000] setup_boot_cpu_features (debian/build/build_arm64_none_debug-arm64/arch/arm64/kernel/cpufeature.c:3888 debian/build/build_arm64_none_debug-arm64/arch/arm64/kernel/cpufeature.c:3906)
[ 0.000000] smp_prepare_boot_cpu (debian/build/build_arm64_none_debug-arm64/arch/arm64/kernel/smp.c:466)
[ 0.000000] start_kernel (debian/build/build_arm64_none_debug-arm64/init/main.c:929)
[ 0.000000] __primary_switched (debian/build/build_arm64_none_debug-arm64/arch/arm64/kernel/head.S:247)
[ 0.000000] Code: 9100c280 d2d00001 f2fbffe1 d343fc02 (38e16841)
All code
========
0: 9100c280 add x0, x20, #0x30
4: d2d00001 mov x1, #0x800000000000 // #140737488355328
8: f2fbffe1 movk x1, #0xdfff, lsl #48
c: d343fc02 lsr x2, x0, #3
10:* 38e16841 ldrsb w1, [x2, x1] <-- trapping instruction

Code starting with the faulting instruction
===========================================
0: 38e16841 ldrsb w1, [x2, x1]
[ 0.000000] ---[ end trace 0000000000000000 ]---
[ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---


If there are any other points you'd like me to check or directions, please
let me know.

Thank you!

Yunseong

Yunseong Kim

unread,
Oct 8, 2025, 7:11:01 PM (5 days ago) Oct 8
to Catalin Marinas, James Morse, Will Deacon, Yeoreum Yun, Vincenzo Frascino, Andrey Konovalov, Marc Zyngier, Mark Brown, Oliver Upton, Ard Biesheuvel, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
To summarize my situation, I thought the boot panic issue might be due
to incompatibility between MTE and KASAN Generic, so I sent this patch.

However, it seems that the problem is related to the call path involving
ZERO page. Also, I am curious how it works correctly in other machine.

On 10/9/25 7:28 AM, Yunseong Kim wrote:
> Hi Andrey,
>
> On 10/9/25 6:36 AM, Andrey Konovalov wrote:
>> On Wed, Oct 8, 2025 at 11:13 PM Yunseong Kim <y...@kzalloc.com> wrote:
>>> [...]
>> I do not understand this. Why is Generic KASAN incompatible with MTE?
>
> My board wouldn't boot on the debian debug kernel, so I enabled
> earlycon=pl011,0x40d0000 and checked via the UART console.
>
>> Running Generic KASAN in the kernel while having MTE enabled (and e.g.
>> used in userspace) seems like a valid combination.
>
> Then it must be caused by something else. Thank you for letting me know.
>
> It seems to be occurring in the call path as follows:
>
> cpu_enable_mte()
> -> try_page_mte_tagging(ZERO_PAGE(0))
> -> VM_WARN_ON_ONCE(folio_test_hugetlb(page_folio(page)));
>
> https://elixir.bootlin.com/linux/v6.17/source/arch/arm64/include/asm/mte.h#L83

-> page_folio(ZERO_PAGE(0))
-> (struct folio *)_compound_head(ZERO_PAGE(0))

https://elixir.bootlin.com/linux/v6.17/source/include/linux/page-flags.h#L307
Best regards,
Yunseong

Will Deacon

unread,
Oct 10, 2025, 8:29:38 AM (4 days ago) Oct 10
to Yunseong Kim, Catalin Marinas, James Morse, Yeoreum Yun, Vincenzo Frascino, Andrey Konovalov, Marc Zyngier, Mark Brown, Oliver Upton, Ard Biesheuvel, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
On Thu, Oct 09, 2025 at 08:10:53AM +0900, Yunseong Kim wrote:
> To summarize my situation, I thought the boot panic issue might be due
> to incompatibility between MTE and KASAN Generic, so I sent this patch.
>
> However, it seems that the problem is related to the call path involving
> ZERO page. Also, I am curious how it works correctly in other machine.
>
> On 10/9/25 7:28 AM, Yunseong Kim wrote:
> > Hi Andrey,
> >
> > On 10/9/25 6:36 AM, Andrey Konovalov wrote:
> >> On Wed, Oct 8, 2025 at 11:13 PM Yunseong Kim <y...@kzalloc.com> wrote:
> >>> [...]
> >> I do not understand this. Why is Generic KASAN incompatible with MTE?
> >
> > My board wouldn't boot on the debian debug kernel, so I enabled
> > earlycon=pl011,0x40d0000 and checked via the UART console.
> >
> >> Running Generic KASAN in the kernel while having MTE enabled (and e.g.
> >> used in userspace) seems like a valid combination.
> >
> > Then it must be caused by something else. Thank you for letting me know.
> >
> > It seems to be occurring in the call path as follows:
> >
> > cpu_enable_mte()
> > -> try_page_mte_tagging(ZERO_PAGE(0))
> > -> VM_WARN_ON_ONCE(folio_test_hugetlb(page_folio(page)));
> >
> > https://elixir.bootlin.com/linux/v6.17/source/arch/arm64/include/asm/mte.h#L83
>
> -> page_folio(ZERO_PAGE(0))
> -> (struct folio *)_compound_head(ZERO_PAGE(0))
>
> https://elixir.bootlin.com/linux/v6.17/source/include/linux/page-flags.h#L307

Do you have:

https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?id=f620d66af3165838bfa845dcf9f5f9b4089bf508

?

Will

Yunseong Kim

unread,
Oct 10, 2025, 10:56:43 AM (4 days ago) Oct 10
to Will Deacon, Catalin Marinas, James Morse, Yeoreum Yun, Vincenzo Frascino, Andrey Konovalov, Marc Zyngier, Mark Brown, Oliver Upton, Ard Biesheuvel, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
Hi Will,
Oh, There was a recent patch! Thanks a lot for letting me know, Will.

The current Debian kernel is based on v6.17, so this patch isn’t applied yet.
I should also let the Debian kernel team people know in advance.

I’ll apply and test it, Thanks again, Will!

Best regards,
Yunseong

Reply all
Reply to author
Forward
0 new messages