System is broken in KASAN sw_tags mode during bootup

0 views
Skip to first unread message

Baoquan He

unread,
Aug 18, 2025, 7:16:38 AMAug 18
to kasa...@googlegroups.com, ryabin...@gmail.com, gli...@google.com, andre...@gmail.com, dvy...@google.com, vincenzo...@arm.com, linu...@kvack.org
Hi,

This can be reproduced stably on hpe-apollo arm64 system with the latest
upstream kernel. I have this system at hand now, the boot log and kernel
config are attached for reference.

[ 89.257633] ==================================================================
[ 89.257646] BUG: KASAN: invalid-access in pcpu_alloc_noprof+0x42c/0x9a8
[ 89.257672] Write of size 528 at addr ddfffd7fbdc00000 by task systemd/1
[ 89.257685] Pointer tag: [dd], memory tag: [ca]
[ 89.257692]
[ 89.257703] CPU: 108 UID: 0 PID: 1 Comm: systemd Not tainted 6.17.0-rc2 #1 PREEMPT(voluntary)
[ 89.257719] Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.16 07/29/2020
[ 89.257726] Call trace:
[ 89.257731] show_stack+0x30/0x90 (C)
[ 89.257753] dump_stack_lvl+0x7c/0xa0
[ 89.257769] print_address_description.isra.0+0x90/0x2b8
[ 89.257789] print_report+0x120/0x208
[ 89.257804] kasan_report+0xc8/0x110
[ 89.257823] kasan_check_range+0x7c/0xa0
[ 89.257835] __asan_memset+0x30/0x68
[ 89.257847] pcpu_alloc_noprof+0x42c/0x9a8
[ 89.257859] mem_cgroup_alloc+0x2bc/0x560
[ 89.257873] mem_cgroup_css_alloc+0x78/0x780
[ 89.257893] cgroup_apply_control_enable+0x230/0x578
[ 89.257914] cgroup_mkdir+0xf0/0x330
[ 89.257928] kernfs_iop_mkdir+0xb0/0x120
[ 89.257947] vfs_mkdir+0x250/0x380
[ 89.257965] do_mkdirat+0x254/0x298
[ 89.257979] __arm64_sys_mkdirat+0x80/0xc0
[ 89.257994] invoke_syscall.constprop.0+0x88/0x148
[ 89.258011] el0_svc_common.constprop.0+0x78/0x148
[ 89.258025] do_el0_svc+0x38/0x50
[ 89.258037] el0_svc+0x3c/0x168
[ 89.258050] el0t_64_sync_handler+0xa0/0xf0
[ 89.258063] el0t_64_sync+0x1b0/0x1b8
[ 89.258076]
[ 89.258080] The buggy address belongs to a 0-page vmalloc region starting at 0xcafffd7fbdc00000 allocated at pcpu_get_vm_areas+0x0/0x1da0
[ 89.258111] The buggy address belongs to the physical page:
[ 89.258117] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x881ddac
[ 89.258129] flags: 0xa5c00000000000(node=1|zone=2|kasantag=0x5c)
[ 89.258148] raw: 00a5c00000000000 0000000000000000 dead000000000122 0000000000000000
[ 89.258160] raw: 0000000000000000 f3ff000813efa600 00000001ffffffff 0000000000000000
[ 89.258168] raw: 00000000000fffff 0000000000000000
[ 89.258173] page dumped because: kasan: bad access detected
[ 89.258178]
[ 89.258181] Memory state around the buggy address:
[ 89.258192] Unable to handle kernel paging request at virtual address ffff7fd7fbdbffe0
[ 89.258199] KASAN: probably wild-memory-access in range [0xfffffd7fbdbffe00-0xfffffd7fbdbffe0f]
[ 89.258207] Mem abort info:
[ 89.258211] ESR = 0x0000000096000007
[ 89.258216] EC = 0x25: DABT (current EL), IL = 32 bits
[ 89.258223] SET = 0, FnV = 0
[ 89.258228] EA = 0, S1PTW = 0
[ 89.258232] FSC = 0x07: level 3 translation fault
[ 89.258238] Data abort info:
[ 89.258241] ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
[ 89.258246] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 89.258252] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 89.258260] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000008ff8b8f000
[ 89.258267] [ffff7fd7fbdbffe0] pgd=1000008ff0275403, p4d=1000008ff0275403, pud=1000008ff0274403, pmd=1000000899079403, pte=0000000000000000
[ 89.258296] Internal error: Oops: 0000000096000007 [#1] SMP
[ 89.540859] Modules linked in: i2c_dev
[ 89.544619] CPU: 108 UID: 0 PID: 1 Comm: systemd Not tainted 6.17.0-rc2 #1 PREEMPT(voluntary)
[ 89.553234] Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.16 07/29/2020
[ 89.562970] pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 89.569933] pc : __pi_memcpy_generic+0x24/0x230
[ 89.574472] lr : kasan_metadata_fetch_row+0x20/0x30
[ 89.579350] sp : ffff8000859d76c0
[ 89.582660] x29: ffff8000859d76c0 x28: 0000000000000100 x27: ffff008ec626d800
[ 89.589807] x26: 0000000000000210 x25: 0000000000000000 x24: fffffd7fbdbfff00
[ 89.596952] x23: ffff8000826cbeb8 x22: fffffd7fbdc00000 x21: 00000000fffffffe
[ 89.604097] x20: ffff800082682ee0 x19: fffffd7fbdbffe00 x18: 00000000049016ff
[ 89.611242] x17: 3030303030303030 x16: 2066666666666666 x15: 6631303030303030
[ 89.618386] x14: 0000000000000001 x13: 0000000000000001 x12: 0000000000000001
[ 89.625530] x11: 687420646e756f72 x10: 0000000000000020 x9 : 0000000000000000
[ 89.632674] x8 : ffff78000859d766 x7 : 0000000000000000 x6 : 000000000000003a
[ 89.639818] x5 : ffff8000859d7728 x4 : ffff7fd7fbdbfff0 x3 : efff800000000000
[ 89.646963] x2 : 0000000000000010 x1 : ffff7fd7fbdbffe0 x0 : ffff8000859d7718
[ 89.654107] Call trace:
[ 89.656549] __pi_memcpy_generic+0x24/0x230 (P)
[ 89.661086] print_report+0x180/0x208
[ 89.664753] kasan_report+0xc8/0x110
[ 89.668333] kasan_check_range+0x7c/0xa0
[ 89.672258] __asan_memset+0x30/0x68
[ 89.675836] pcpu_alloc_noprof+0x42c/0x9a8
[ 89.679935] mem_cgroup_alloc+0x2bc/0x560
[ 89.683947] mem_cgroup_css_alloc+0x78/0x780
[ 89.688222] cgroup_apply_control_enable+0x230/0x578
[ 89.693191] cgroup_mkdir+0xf0/0x330
[ 89.696771] kernfs_iop_mkdir+0xb0/0x120
[ 89.700697] vfs_mkdir+0x250/0x380
[ 89.704103] do_mkdirat+0x254/0x298
[ 89.707596] __arm64_sys_mkdirat+0x80/0xc0
[ 89.711697] invoke_syscall.constprop.0+0x88/0x148
[ 89.716491] el0_svc_common.constprop.0+0x78/0x148
[ 89.721286] do_el0_svc+0x38/0x50
[ 89.724602] el0_svc+0x3c/0x168
[ 89.727746] el0t_64_sync_handler+0xa0/0xf0
[ 89.731933] el0t_64_sync+0x1b0/0x1b8
[ 89.735603] Code: f100805f 540003c8 f100405f 540000c3 (a9401c26)
[ 89.741695] ---[ end trace 0000000000000000 ]---
[ 89.746308] note: systemd[1] exi
=========================


sw_tags-boot.log
sw_tags.config

Andrey Konovalov

unread,
Sep 6, 2025, 1:23:50 PMSep 6
to Baoquan He, kasa...@googlegroups.com, ryabin...@gmail.com, gli...@google.com, dvy...@google.com, vincenzo...@arm.com, linu...@kvack.org, Maciej Wieczor-Retman
Might be the same issue as the one being fixed by Maciej here:

https://lore.kernel.org/all/bcf18f220ef3b40e02f489fdb90fc7a5a153a3...@intel.com/
https://lore.kernel.org/all/3339d11e69c9127108fe8ef80a069b7b3bb071...@intel.com/

Perhaps it makes sense to split that fix out of the series and submit
separately.

Baoquan He

unread,
Sep 13, 2025, 4:18:58 AM (11 days ago) Sep 13
to Andrey Konovalov, kasa...@googlegroups.com, ryabin...@gmail.com, gli...@google.com, dvy...@google.com, vincenzo...@arm.com, linu...@kvack.org, Maciej Wieczor-Retman
Thanks for the information. I finally got a machine to reproduce the
issue and testing the patches. It's weird it firstly can't be reproduced
in the latest 6.17.0-rc5+, not sure if I made anything wrong on steps.
Later, I started it over and can stably reproduce the problem, I can
confirm Maciej's two patches can fix the problem very well.

Will reply to Maciej's patches to add my Tested-by.

Thanks
Baoquan

Reply all
Reply to author
Forward
0 new messages