[syzbot] [mm?] WARNING in memory_failure

4 views
Skip to first unread message

syzbot

unread,
Sep 23, 2025, 12:22:30 PM (yesterday) Sep 23
to ak...@linux-foundation.org, linm...@huawei.com, linux-...@vger.kernel.org, linu...@kvack.org, nao.ho...@gmail.com, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: b5db4add5e77 Merge branch 'for-next/core' into for-kernelci
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
console output: https://syzkaller.appspot.com/x/log.txt?x=10edb8e2580000
kernel config: https://syzkaller.appspot.com/x/.config?x=d2ae34a0711ff2f1
dashboard link: https://syzkaller.appspot.com/bug?extid=e6367ea2fdab6ed46056
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
userspace arch: arm64
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=14160f12580000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1361627c580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/6eee2232d5c1/disk-b5db4add.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/a8b00f2f1234/vmlinux-b5db4add.xz
kernel image: https://storage.googleapis.com/syzbot-assets/fc0d466f156c/Image-b5db4add.gz.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+e6367e...@syzkaller.appspotmail.com

Injecting memory failure for pfn 0x104000 at process virtual address 0x20000000
------------[ cut here ]------------
WARNING: CPU: 1 PID: 6700 at mm/memory-failure.c:2391 memory_failure+0x18ec/0x1db4 mm/memory-failure.c:2391
Modules linked in:
CPU: 1 UID: 0 PID: 6700 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/30/2025
pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : memory_failure+0x18ec/0x1db4 mm/memory-failure.c:2391
lr : memory_failure+0x18ec/0x1db4 mm/memory-failure.c:2391
sp : ffff8000a41478c0
x29: ffff8000a41479a0 x28: 05ffc00000200868 x27: ffff700014828f20
x26: 1fffffbff8620001 x25: 05ffc0000020086d x24: 1fffffbff8620000
x23: fffffdffc3100008 x22: fffffdffc3100000 x21: fffffdffc3100000
x20: 0000000000000023 x19: dfff800000000000 x18: 1fffe00033793888
x17: ffff80008f7ee000 x16: ffff80008052aa64 x15: 0000000000000001
x14: 1fffffbff8620000 x13: 0000000000000000 x12: 0000000000000000
x11: ffff7fbff8620001 x10: 0000000000ff0100 x9 : 0000000000000000
x8 : ffff0000d7eedb80 x7 : ffff800080428910 x6 : 0000000000000000
x5 : 0000000000000001 x4 : 0000000000000001 x3 : ffff800080cf5438
x2 : 0000000000000001 x1 : 0000000000000040 x0 : 0000000000000000
Call trace:
memory_failure+0x18ec/0x1db4 mm/memory-failure.c:2391 (P)
madvise_inject_error mm/madvise.c:1475 [inline]
madvise_do_behavior+0x2c8/0x7c4 mm/madvise.c:1875
do_madvise+0x190/0x248 mm/madvise.c:1978
__do_sys_madvise mm/madvise.c:1987 [inline]
__se_sys_madvise mm/madvise.c:1985 [inline]
__arm64_sys_madvise+0xa4/0xc0 mm/madvise.c:1985
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x254 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x5c/0x254 arch/arm64/kernel/entry-common.c:744
el0t_64_sync_handler+0x84/0x12c arch/arm64/kernel/entry-common.c:763
el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:596
irq event stamp: 1544
hardirqs last enabled at (1543): [<ffff80008b042cd0>] __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:159 [inline]
hardirqs last enabled at (1543): [<ffff80008b042cd0>] _raw_spin_unlock_irq+0x30/0x80 kernel/locking/spinlock.c:202
hardirqs last disabled at (1544): [<ffff80008b01a1ac>] el1_brk64+0x20/0x54 arch/arm64/kernel/entry-common.c:434
softirqs last enabled at (1528): [<ffff8000803da960>] softirq_handle_end kernel/softirq.c:425 [inline]
softirqs last enabled at (1528): [<ffff8000803da960>] handle_softirqs+0xaf8/0xc88 kernel/softirq.c:607
softirqs last disabled at (1397): [<ffff800080022028>] __do_softirq+0x14/0x20 kernel/softirq.c:613
---[ end trace 0000000000000000 ]---
Memory failure: 0x104000: recovery action for huge page: Recovered
Injecting memory failure for pfn 0x131e00 at process virtual address 0x20200000
------------[ cut here ]------------
WARNING: CPU: 1 PID: 6700 at mm/memory-failure.c:2391 memory_failure+0x18ec/0x1db4 mm/memory-failure.c:2391
Modules linked in:
CPU: 1 UID: 0 PID: 6700 Comm: syz.0.17 Tainted: G W syzkaller #0 PREEMPT
Tainted: [W]=WARN
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/30/2025
pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : memory_failure+0x18ec/0x1db4 mm/memory-failure.c:2391
lr : memory_failure+0x18ec/0x1db4 mm/memory-failure.c:2391
sp : ffff8000a41478c0
x29: ffff8000a41479a0 x28: 05ffc00000200868 x27: ffff700014828f20
x26: 1fffffbff878f001 x25: 05ffc0000020086d x24: 1fffffbff878f000
x23: fffffdffc3c78008 x22: fffffdffc3c78000 x21: fffffdffc3c78000
x20: 0000000000000023 x19: dfff800000000000 x18: 00000000ffffffff
x17: ffff80009353a000 x16: ffff80008052aa64 x15: 0000000000000001
x14: 1fffffbff878f000 x13: 0000000000000000 x12: 0000000000000000
x11: ffff7fbff878f001 x10: 0000000000ff0100 x9 : 0000000000000000
x8 : ffff0000d7eedb80 x7 : ffff800080a549a8 x6 : 0000000000000000
x5 : 0000000000000000 x4 : 0000000000000001 x3 : ffff800080cf5438
x2 : 0000000000000001 x1 : 0000000000000040 x0 : 0000000000000000
Call trace:
memory_failure+0x18ec/0x1db4 mm/memory-failure.c:2391 (P)
madvise_inject_error mm/madvise.c:1475 [inline]
madvise_do_behavior+0x2c8/0x7c4 mm/madvise.c:1875
do_madvise+0x190/0x248 mm/madvise.c:1978
__do_sys_madvise mm/madvise.c:1987 [inline]
__se_sys_madvise mm/madvise.c:1985 [inline]
__arm64_sys_madvise+0xa4/0xc0 mm/madvise.c:1985
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x254 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x5c/0x254 arch/arm64/kernel/entry-common.c:744
el0t_64_sync_handler+0x84/0x12c arch/arm64/kernel/entry-common.c:763
el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:596
irq event stamp: 2162
hardirqs last enabled at (2161): [<ffff800080ca8720>] __folio_split+0xf7c/0x1438 mm/huge_memory.c:3856
hardirqs last disabled at (2162): [<ffff80008b01a1ac>] el1_brk64+0x20/0x54 arch/arm64/kernel/entry-common.c:434
softirqs last enabled at (1726): [<ffff8000803da960>] softirq_handle_end kernel/softirq.c:425 [inline]
softirqs last enabled at (1726): [<ffff8000803da960>] handle_softirqs+0xaf8/0xc88 kernel/softirq.c:607
softirqs last disabled at (1547): [<ffff800080022028>] __do_softirq+0x14/0x20 kernel/softirq.c:613
---[ end trace 0000000000000000 ]---
Memory failure: 0x131e00: recovery action for huge page: Recovered
Injecting memory failure for pfn 0x134200 at process virtual address 0x20400000
------------[ cut here ]------------
WARNING: CPU: 1 PID: 6700 at mm/memory-failure.c:2391 memory_failure+0x18ec/0x1db4 mm/memory-failure.c:2391
Modules linked in:
CPU: 1 UID: 0 PID: 6700 Comm: syz.0.17 Tainted: G W syzkaller #0 PREEMPT
Tainted: [W]=WARN
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/30/2025
pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : memory_failure+0x18ec/0x1db4 mm/memory-failure.c:2391
lr : memory_failure+0x18ec/0x1db4 mm/memory-failure.c:2391
sp : ffff8000a41478c0
x29: ffff8000a41479a0 x28: 05ffc00000200868 x27: ffff700014828f20
x26: 1fffffbff87a1001 x25: 05ffc0000020086d x24: 1fffffbff87a1000
x23: fffffdffc3d08008 x22: fffffdffc3d08000 x21: fffffdffc3d08000
x20: 0000000000000023 x19: dfff800000000000 x18: 1fffe00033793888
x17: 646461206c617574 x16: ffff80008052aa64 x15: 0000000000000001
x14: 1fffffbff87a1000 x13: 0000000000000000 x12: 0000000000000000
x11: ffff7fbff87a1001 x10: 0000000000ff0100 x9 : 0000000000000000
x8 : ffff0000d7eedb80 x7 : ffff800080a549a8 x6 : 0000000000000000
x5 : 0000000000000000 x4 : 0000000000000001 x3 : ffff800080cf5438
x2 : 0000000000000001 x1 : 0000000000000040 x0 : 0000000000000000
Call trace:
memory_failure+0x18ec/0x1db4 mm/memory-failure.c:2391 (P)
madvise_inject_error mm/madvise.c:1475 [inline]
madvise_do_behavior+0x2c8/0x7c4 mm/madvise.c:1875
do_madvise+0x190/0x248 mm/madvise.c:1978
__do_sys_madvise mm/madvise.c:1987 [inline]
__se_sys_madvise mm/madvise.c:1985 [inline]
__arm64_sys_madvise+0xa4/0xc0 mm/madvise.c:1985
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x254 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x5c/0x254 arch/arm64/kernel/entry-common.c:744
el0t_64_sync_handler+0x84/0x12c arch/arm64/kernel/entry-common.c:763
el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:596
irq event stamp: 2768
hardirqs last enabled at (2767): [<ffff800080ca8720>] __folio_split+0xf7c/0x1438 mm/huge_memory.c:3856
hardirqs last disabled at (2768): [<ffff80008b01a1ac>] el1_brk64+0x20/0x54 arch/arm64/kernel/entry-common.c:434
softirqs last enabled at (2364): [<ffff8000803da960>] softirq_handle_end kernel/softirq.c:425 [inline]
softirqs last enabled at (2364): [<ffff8000803da960>] handle_softirqs+0xaf8/0xc88 kernel/softirq.c:607
softirqs last disabled at (2321): [<ffff800080022028>] __do_softirq+0x14/0x20 kernel/softirq.c:613
---[ end trace 0000000000000000 ]---
Memory failure: 0x134200: recovery action for huge page: Recovered
Injecting memory failure for pfn 0x129000 at process virtual address 0x20600000
------------[ cut here ]------------
WARNING: CPU: 1 PID: 6700 at mm/memory-failure.c:2391 memory_failure+0x18ec/0x1db4 mm/memory-failure.c:2391
Modules linked in:
CPU: 1 UID: 0 PID: 6700 Comm: syz.0.17 Tainted: G W syzkaller #0 PREEMPT
Tainted: [W]=WARN
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/30/2025
pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : memory_failure+0x18ec/0x1db4 mm/memory-failure.c:2391
lr : memory_failure+0x18ec/0x1db4 mm/memory-failure.c:2391
sp : ffff8000a41478c0
x29: ffff8000a41479a0 x28: 05ffc00000200868 x27: ffff700014828f20
x26: 1fffffbff8748001 x25: 05ffc0000020086d x24: 1fffffbff8748000
x23: fffffdffc3a40008 x22: fffffdffc3a40000 x21: fffffdffc3a40000
x20: 0000000000000023 x19: dfff800000000000 x18: 1fffe00033793888
x17: 646461206c617574 x16: ffff80008052aa64 x15: 0000000000000001
x14: 1fffffbff8748000 x13: 0000000000000000 x12: 0000000000000000
x11: ffff7fbff8748001 x10: 0000000000ff0100 x9 : 0000000000000000
x8 : ffff0000d7eedb80 x7 : ffff800080a549a8 x6 : 0000000000000000
x5 : 0000000000000000 x4 : 0000000000000001 x3 : ffff800080cf5438
x2 : 0000000000000001 x1 : 0000000000000040 x0 : 0000000000000000
Call trace:
memory_failure+0x18ec/0x1db4 mm/memory-failure.c:2391 (P)
madvise_inject_error mm/madvise.c:1475 [inline]
madvise_do_behavior+0x2c8/0x7c4 mm/madvise.c:1875
do_madvise+0x190/0x248 mm/madvise.c:1978
__do_sys_madvise mm/madvise.c:1987 [inline]
__se_sys_madvise mm/madvise.c:1985 [inline]
__arm64_sys_madvise+0xa4/0xc0 mm/madvise.c:1985
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x254 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x5c/0x254 arch/arm64/kernel/entry-common.c:744
el0t_64_sync_handler+0x84/0x12c arch/arm64/kernel/entry-common.c:763
el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:596
irq event stamp: 3024
hardirqs last enabled at (3023): [<ffff800080ca8720>] __folio_split+0xf7c/0x1438 mm/huge_memory.c:3856
hardirqs last disabled at (3024): [<ffff80008b01a1ac>] el1_brk64+0x20/0x54 arch/arm64/kernel/entry-common.c:434
softirqs last enabled at (2986): [<ffff8000803da960>] softirq_handle_end kernel/softirq.c:425 [inline]
softirqs last enabled at (2986): [<ffff8000803da960>] handle_softirqs+0xaf8/0xc88 kernel/softirq.c:607
softirqs last disabled at (2771): [<ffff800080022028>] __do_softirq+0x14/0x20 kernel/softirq.c:613
---[ end trace 0000000000000000 ]---
Memory failure: 0x129000: recovery action for huge page: Recovered
Injecting memory failure for pfn 0x134600 at process virtual address 0x20800000
------------[ cut here ]------------
WARNING: CPU: 1 PID: 6700 at mm/memory-failure.c:2391 memory_failure+0x18ec/0x1db4 mm/memory-failure.c:2391
Modules linked in:
CPU: 1 UID: 0 PID: 6700 Comm: syz.0.17 Tainted: G W syzkaller #0 PREEMPT
Tainted: [W]=WARN
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/30/2025
pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : memory_failure+0x18ec/0x1db4 mm/memory-failure.c:2391
lr : memory_failure+0x18ec/0x1db4 mm/memory-failure.c:2391
sp : ffff8000a41478c0
x29: ffff8000a41479a0 x28: 05ffc0000020086c x27: ffff700014828f20
x26: 1fffffbff87a3001 x25: 05ffc0000020186d x24: 1fffffbff87a3000
x23: fffffdffc3d18008 x22: fffffdffc3d18000 x21: fffffdffc3d18000
x20: 0000000000000023 x19: dfff800000000000 x18: 1fffe00033793888
x17: ffff80009353a000 x16: ffff80008052aa64 x15: 0000000000000001
x14: 1fffffbff87a3000 x13: 0000000000000000 x12: 0000000000000000
x11: ffff7fbff87a3001 x10: 0000000000ff0100 x9 : 0000000000000000
x8 : ffff0000d7eedb80 x7 : ffff800080a549a8 x6 : 0000000000000000
x5 : 0000000000000000 x4 : 0000000000000001 x3 : ffff800080cf5438
x2 : 0000000000000001 x1 : 0000000000000040 x0 : 0000000000000000
Call trace:
memory_failure+0x18ec/0x1db4 mm/memory-failure.c:2391 (P)
madvise_inject_error mm/madvise.c:1475 [inline]
madvise_do_behavior+0x2c8/0x7c4 mm/madvise.c:1875
do_madvise+0x190/0x248 mm/madvise.c:1978
__do_sys_madvise mm/madvise.c:1987 [inline]
__se_sys_madvise mm/madvise.c:1985 [inline]
__arm64_sys_madvise+0xa4/0xc0 mm/madvise.c:1985
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x254 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x5c/0x254 arch/arm64/kernel/entry-common.c:744
el0t_64_sync_handler+0x84/0x12c arch/arm64/kernel/entry-common.c:763
el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:596
irq event stamp: 3462
hardirqs last enabled at (3461): [<ffff800080ca8720>] __folio_split+0xf7c/0x1438 mm/huge_memory.c:3856
hardirqs last disabled at (3462): [<ffff80008b01a1ac>] el1_brk64+0x20/0x54 arch/arm64/kernel/entry-common.c:434
softirqs last enabled at (3064): [<ffff8000803da960>] softirq_handle_end kernel/softirq.c:425 [inline]
softirqs last enabled at (3064): [<ffff8000803da960>] handle_softirqs+0xaf8/0xc88 kernel/softirq.c:607
softirqs last disabled at (3027): [<ffff800080022028>] __do_softirq+0x14/0x20 kernel/softirq.c:613
---[ end trace 0000000000000000 ]---
Memory failure: 0x134600: recovery action for huge page: Recovered
Injecting memory failure for pfn 0x134800 at proces
Injecting memory failure for pfn 0x134800 at process virtual address 0x20a00000
------------[ cut here ]------------
WARNING: CPU: 0 PID: 6700 at mm/memory-failure.c:2391 memory_failure+0x18ec/0x1db4 mm/memory-failure.c:2391
Modules linked in:
CPU: 0 UID: 0 PID: 6700 Comm: syz.0.17 Tainted: G W syzkaller #0 PREEMPT
Tainted: [W]=WARN
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/30/2025
pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : memory_failure+0x18ec/0x1db4 mm/memory-failure.c:2391
lr : memory_failure+0x18ec/0x1db4 mm/memory-failure.c:2391
sp : ffff8000a41478c0
x29: ffff8000a41479a0 x28: 05ffc0000020086c x27: ffff700014828f20
x26: 1fffffbff87a4001 x25: 05ffc0000020186d x24: 1fffffbff87a4000
x23: fffffdffc3d20008 x22: fffffdffc3d20000 x21: fffffdffc3d20000
x20: 0000000000000023 x19: dfff800000000000 x18: 1fffe0003378f088
x17: ffff80008f7ee000 x16: ffff80008052aa64 x15: 0000000000000001
x14: 1fffffbff87a4000 x13: 0000000000000000 x12: 0000000000000000
x11: ffff7fbff87a4001 x10: 0000000000ff0100 x9 : 0000000000000000
x8 : ffff0000d7eedb80 x7 : ffff800080a549a8 x6 : 0000000000000000
x5 : 0000000000000000 x4 : 0000000000000001 x3 : ffff800080cf5438
x2 : 0000000000000001 x1 : 0000000000000040 x0 : 0000000000000000
Call trace:
memory_failure+0x18ec/0x1db4 mm/memory-failure.c:2391 (P)
madvise_inject_error mm/madvise.c:1475 [inline]
madvise_do_behavior+0x2c8/0x7c4 mm/madvise.c:1875
do_madvise+0x190/0x248 mm/madvise.c:1978
__do_sys_madvise mm/madvise.c:1987 [inline]
__se_sys_madvise mm/madvise.c:1985 [inline]
__arm64_sys_madvise+0xa4/0xc0 mm/madvise.c:1985
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x254 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x5c/0x254 arch/arm64/kernel/entry-common.c:744
el0t_64_sync_handler+0x84/0x12c arch/arm64/kernel/entry-common.c:763
el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:596
irq event stamp: 3538
hardirqs last enabled at (3537): [<ffff800080ca8720>] __folio_split+0xf7c/0x1438 mm/huge_memory.c:3856
hardirqs last disabled at (3538): [<ffff80008b01a1ac>] el1_brk64+0x20/0x54 arch/arm64/kernel/entry-common.c:434
softirqs last enabled at (3500): [<ffff8000803da960>] softirq_handle_end kernel/softirq.c:425 [inline]
softirqs last enabled at (3500): [<ffff8000803da960>] handle_softirqs+0xaf8/0xc88 kernel/softirq.c:607
softirqs last disabled at (3465): [<ffff800080022028>] __do_softirq+0x14/0x20 kernel/softirq.c:613
---[ end trace 0000000000000000 ]---
Memory failure: 0x134800: recovery action for huge page: Recovered


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

David Hildenbrand

unread,
7:32 AM (16 hours ago) 7:32 AM
to syzbot, ak...@linux-foundation.org, linm...@huawei.com, linux-...@vger.kernel.org, linu...@kvack.org, nao.ho...@gmail.com, syzkall...@googlegroups.com, Zi Yan
We're running into the

WARN_ON(folio_test_large(folio));

in memory_failure().

Which is weird because we have the

if (folio_test_large(folio)) {
/*
* The flag must be set after the refcount is bumped
* otherwise it may race with THP split.
* And the flag can't be set in get_hwpoison_page() since
* it is called by soft offline too and it is just called
* for !MF_COUNT_INCREASED. So here seems to be the best
* place.
*
* Don't need care about the above error handling paths for
* get_hwpoison_page() since they handle either free page
* or unhandlable page. The refcount is bumped iff the
* page is a valid handlable page.
*/
folio_set_has_hwpoisoned(folio);
if (try_to_split_thp_page(p, false) < 0) {
res = -EHWPOISON;
kill_procs_now(p, pfn, flags, folio);
put_page(p);
action_result(pfn, MF_MSG_UNSPLIT_THP, MF_FAILED);
goto unlock_mutex;
}
VM_BUG_ON_PAGE(!page_count(p), p);
folio = page_folio(p);
}

before it.

But likely that's what I raised to Zi Yan recently: if try_to_split_thp_page()->split_huge_page()
silently decided to split to something that is not a small folio (the min_order_for_split() bit),
this changed the semantics of the function.

Likely split_huge_page() should have failed if the min_order makes us not split to order-0,
or there would have to be some "parameter" that tells split_huge_page() what expectation (order) the
callers has.

We can check folio_test_large() after the split, but really, we should just not be splitting at
all if it doesn't serve our purpose.

--
Cheers

David / dhildenb

Zi Yan

unread,
11:03 AM (13 hours ago) 11:03 AM
to David Hildenbrand, syzbot, ak...@linux-foundation.org, linm...@huawei.com, linux-...@vger.kernel.org, linu...@kvack.org, nao.ho...@gmail.com, syzkall...@googlegroups.com
But LBS might want to split from a high order to fs min_order.

What I can think of is:
0. split code always does a split to allowed minimal order,
namely max(fs_min_order, order_from_caller);
1. if split order cannot reach to order_from_caller, it just return fails,
so most of the caller will know about it;
2. for LBS code, when it sees a split failure, it should check the resulting
folio order against fs min_order. If the orders match, it regards it as
a success.

At least, most of the code does not need to be LBS aware. WDYT?

Best Regards,
Yan, Zi

David Hildenbrand

unread,
11:36 AM (12 hours ago) 11:36 AM
to Zi Yan, syzbot, ak...@linux-foundation.org, linm...@huawei.com, linux-...@vger.kernel.org, linu...@kvack.org, nao.ho...@gmail.com, syzkall...@googlegroups.com
Yes.

>
> What I can think of is:
> 0. split code always does a split to allowed minimal order,
> namely max(fs_min_order, order_from_caller);

Wouldn't max mean "allowed maximum order" ?

I guess what you mean is "split to this order or smaller" -- min?

> 1. if split order cannot reach to order_from_caller, it just return fails,
> so most of the caller will know about it;

Yes, I think this would be the case here: if we cannot split to order-0,
we can just fail right away.

> 2. for LBS code, when it sees a split failure, it should check the resulting
> folio order against fs min_order. If the orders match, it regards it as
> a success.
>
> At least, most of the code does not need to be LBS aware. WDYT?

Is my understand correct that it's either that the caller wants to

(a) Split to order-0 -- no larger folio afterwards.

(b) Split to smallest order possible, which might be the mapping min order.

If so, we could keep the interface simpler than allowing to specify
arbitrary orders as request.

Zi Yan

unread,
12:33 PM (11 hours ago) 12:33 PM
to David Hildenbrand, Luis Chamberlain, Pankaj Raghav (Samsung), syzbot, ak...@linux-foundation.org, linm...@huawei.com, linux-...@vger.kernel.org, linu...@kvack.org, nao.ho...@gmail.com, syzkall...@googlegroups.com
But LBS imposes a fs_min_order that is not 0. When a caller asks
to split to 0, folio split code needs to use fs_min_order instead of 0.
Thus the max.

>
>> 1. if split order cannot reach to order_from_caller, it just return fails,
>> so most of the caller will know about it;
>
> Yes, I think this would be the case here: if we cannot split to order-0, we can just fail right away.
>
>> 2. for LBS code, when it sees a split failure, it should check the resulting
>> folio order against fs min_order. If the orders match, it regards it as
>> a success.
>>
>> At least, most of the code does not need to be LBS aware. WDYT?
>
> Is my understand correct that it's either that the caller wants to
>
> (a) Split to order-0 -- no larger folio afterwards.
>
> (b) Split to smallest order possible, which might be the mapping min order.

Right. IIRC, most of callers are (a), since folio split was originally
called by code that cannot handle THPs (now large folios). For (b),
I actually wonder if there exists such a caller.

> If so, we could keep the interface simpler than allowing to specify arbitrary orders as request.

We might just need (a), since there is no caller of (b) in kernel, except
split_folio_to_order() is used for testing. There might be future uses
when kernel wants to convert from THP to mTHP, but it seems that we are
not there yet.



+Luis and Pankaj for their opinions on how LBS is going to use split folio
to any order.

Hi Luis and Pankaj,

It seems that bumping split folio order from 0 to mapping_min_folio_order()
instead of simply failing the split folio call gives surprises to some
callers and causes issues like the one reported by this email. I cannot think
of any situation where failing a folio split does not work. If LBS code
wants to split, it should supply mapping_min_folio_order(), right? Does
such caller exist?

Thanks.


Best Regards,
Yan, Zi

David Hildenbrand

unread,
1:06 PM (11 hours ago) 1:06 PM
to Zi Yan, Luis Chamberlain, Pankaj Raghav (Samsung), syzbot, ak...@linux-foundation.org, linm...@huawei.com, linux-...@vger.kernel.org, linu...@kvack.org, nao.ho...@gmail.com, syzkall...@googlegroups.com

>>
>>>
>>> What I can think of is:
>>> 0. split code always does a split to allowed minimal order,
>>> namely max(fs_min_order, order_from_caller);
>>
>> Wouldn't max mean "allowed maximum order" ?
>>
>> I guess what you mean is "split to this order or smaller" -- min?
>
> But LBS imposes a fs_min_order that is not 0. When a caller asks
> to split to 0, folio split code needs to use fs_min_order instead of 0.
> Thus the max.

I'd say, the point is that if someone wants to split to 0 but that is
impossible, then we should fail :)

>
>>
>>> 1. if split order cannot reach to order_from_caller, it just return fails,
>>> so most of the caller will know about it;
>>
>> Yes, I think this would be the case here: if we cannot split to order-0, we can just fail right away.
>>
>>> 2. for LBS code, when it sees a split failure, it should check the resulting
>>> folio order against fs min_order. If the orders match, it regards it as
>>> a success.
>>>
>>> At least, most of the code does not need to be LBS aware. WDYT?
>>
>> Is my understand correct that it's either that the caller wants to
>>
>> (a) Split to order-0 -- no larger folio afterwards.
>>
>> (b) Split to smallest order possible, which might be the mapping min order.
>
> Right. IIRC, most of callers are (a), since folio split was originally
> called by code that cannot handle THPs (now large folios). For (b),
> I actually wonder if there exists such a caller.
>
>> If so, we could keep the interface simpler than allowing to specify arbitrary orders as request.
>
> We might just need (a), since there is no caller of (b) in kernel, except
> split_folio_to_order() is used for testing. There might be future uses
> when kernel wants to convert from THP to mTHP, but it seems that we are
> not there yet.
>

Even better, then maybe selected interfaces could just fail if the
min-order contradicts with the request to split to a non-larger
(order-0) folio.

>
>
> +Luis and Pankaj for their opinions on how LBS is going to use split folio
> to any order.
>
> Hi Luis and Pankaj,
>
> It seems that bumping split folio order from 0 to mapping_min_folio_order()
> instead of simply failing the split folio call gives surprises to some
> callers and causes issues like the one reported by this email. I cannot think
> of any situation where failing a folio split does not work. If LBS code
> wants to split, it should supply mapping_min_folio_order(), right? Does
> such caller exist?
>
> Thanks.
>
>
> Best Regards,
> Yan, Zi
>


Zi Yan

unread,
1:52 PM (10 hours ago) 1:52 PM
to David Hildenbrand, Luis Chamberlain, Pankaj Raghav (Samsung), syzbot, ak...@linux-foundation.org, linm...@huawei.com, linux-...@vger.kernel.org, linu...@kvack.org, nao.ho...@gmail.com, syzkall...@googlegroups.com
On 24 Sep 2025, at 13:05, David Hildenbrand wrote:

>>>
>>>>
>>>> What I can think of is:
>>>> 0. split code always does a split to allowed minimal order,
>>>> namely max(fs_min_order, order_from_caller);
>>>
>>> Wouldn't max mean "allowed maximum order" ?
>>>
>>> I guess what you mean is "split to this order or smaller" -- min?
>>
>> But LBS imposes a fs_min_order that is not 0. When a caller asks
>> to split to 0, folio split code needs to use fs_min_order instead of 0.
>> Thus the max.
>
> I'd say, the point is that if someone wants to split to 0 but that is impossible, then we should fail :)

I agree.

>
>>
>>>
>>>> 1. if split order cannot reach to order_from_caller, it just return fails,
>>>> so most of the caller will know about it;
>>>
>>> Yes, I think this would be the case here: if we cannot split to order-0, we can just fail right away.
>>>
>>>> 2. for LBS code, when it sees a split failure, it should check the resulting
>>>> folio order against fs min_order. If the orders match, it regards it as
>>>> a success.
>>>>
>>>> At least, most of the code does not need to be LBS aware. WDYT?
>>>
>>> Is my understand correct that it's either that the caller wants to
>>>
>>> (a) Split to order-0 -- no larger folio afterwards.
>>>
>>> (b) Split to smallest order possible, which might be the mapping min order.
>>
>> Right. IIRC, most of callers are (a), since folio split was originally
>> called by code that cannot handle THPs (now large folios). For (b),
>> I actually wonder if there exists such a caller.
>>
>>> If so, we could keep the interface simpler than allowing to specify arbitrary orders as request.
>>
>> We might just need (a), since there is no caller of (b) in kernel, except
>> split_folio_to_order() is used for testing. There might be future uses
>> when kernel wants to convert from THP to mTHP, but it seems that we are
>> not there yet.
>>
>
> Even better, then maybe selected interfaces could just fail if the min-order contradicts with the request to split to a non-larger (order-0) folio.

Yep. Let’s hear what Luis and Pankaj will say about this.

>
>>
>>
>> +Luis and Pankaj for their opinions on how LBS is going to use split folio
>> to any order.
>>
>> Hi Luis and Pankaj,
>>
>> It seems that bumping split folio order from 0 to mapping_min_folio_order()
>> instead of simply failing the split folio call gives surprises to some
>> callers and causes issues like the one reported by this email. I cannot think
>> of any situation where failing a folio split does not work. If LBS code
>> wants to split, it should supply mapping_min_folio_order(), right? Does
>> such caller exist?
>>
>> Thanks.
>>
>>
>> Best Regards,
>> Yan, Zi
>>
>
>
> --
> Cheers
>
> David / dhildenb


Best Regards,
Yan, Zi
Reply all
Reply to author
Forward
0 new messages