[syzbot] [mm?] kernel BUG in collapse_scan_file

0 views
Skip to first unread message

syzbot

unread,
Mar 19, 2026, 3:20:35 AM (yesterday) Mar 19
to Liam.H...@oracle.com, ak...@linux-foundation.org, bao...@kernel.org, baoli...@linux.alibaba.com, da...@kernel.org, dev....@arm.com, lance...@linux.dev, linux-...@vger.kernel.org, linu...@kvack.org, l...@kernel.org, npa...@redhat.com, ryan.r...@arm.com, syzkall...@googlegroups.com, z...@nvidia.com
Hello,

syzbot found the following issue on:

HEAD commit: 95c541ddfb08 Add linux-next specific files for 20260316
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=15ccc216580000
kernel config: https://syzkaller.appspot.com/x/.config?x=ed431987028345c6
dashboard link: https://syzkaller.appspot.com/bug?extid=8961cb270ae74b4129fb
compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12f778da580000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12cc006a580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/c40f27ad73d8/disk-95c541dd.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/bd811888f684/vmlinux-95c541dd.xz
kernel image: https://storage.googleapis.com/syzbot-assets/3b72363d7dbd/bzImage-95c541dd.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+8961cb...@syzkaller.appspotmail.com

node ffff88805d558b00 offset 0 parent ffff88805d558840 shift 0 count 3 values 0 array ffff88807a8195c0 list ffff88805d558b18 ffff88805d558b18 marks 0 0 0
------------[ cut here ]------------
kernel BUG at ./include/linux/xarray.h:1441!
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 6001 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2026
RIP: 0010:XAS_INVALID include/linux/xarray.h:1441 [inline]
RIP: 0010:collapse_file mm/khugepaged.c:2055 [inline]
RIP: 0010:collapse_scan_file+0x4f98/0x5230 mm/khugepaged.c:2404
Code: ff 4c 89 e7 48 c7 c6 60 b2 dc 8b e8 82 62 f1 fe 90 0f 0b 48 85 db 0f 84 03 01 00 00 e8 71 e5 8f ff 48 89 df e8 a9 20 7b 09 90 <0f> 0b e8 61 e5 8f ff 48 89 df 48 c7 c6 60 b2 dc 8b e8 52 62 f1 fe
RSP: 0018:ffffc90003826e20 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88805d558b00 RCX: a13f20bd39c5a100
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: ffffc90003827130 R08: ffffc90003826ba7 R09: 1ffff92000704d74
R10: dffffc0000000000 R11: fffff52000704d75 R12: ffffea0001b678f0
R13: dffffc0000000000 R14: 0000000000000000 R15: ffffc90003827010
FS: 000055557e3c2500(0000) GS:ffff888125437000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000020000000b000 CR3: 000000007ac66000 CR4: 00000000003526f0
Call Trace:
<TASK>
collapse_single_pmd+0x22b/0x4510 mm/khugepaged.c:2437
madvise_collapse+0x34c/0x820 mm/khugepaged.c:2859
madvise_vma_behavior+0x1094/0x4460 mm/madvise.c:1362
madvise_walk_vmas+0x573/0xae0 mm/madvise.c:1711
madvise_do_behavior+0x386/0x540 mm/madvise.c:1927
do_madvise+0x1fa/0x2e0 mm/madvise.c:2020
__do_sys_madvise mm/madvise.c:2029 [inline]
__se_sys_madvise mm/madvise.c:2027 [inline]
__x64_sys_madvise+0xa6/0xc0 mm/madvise.c:2027
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f90d419c799
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffd50711398 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 00007f90d4415fa0 RCX: 00007f90d419c799
RDX: 0000000000000019 RSI: 0000000000600003 RDI: 0000200000000000
RBP: 00007f90d4232c99 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f90d4415fac R14: 00007f90d4415fa0 R15: 00007f90d4415fa0
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:XAS_INVALID include/linux/xarray.h:1441 [inline]
RIP: 0010:collapse_file mm/khugepaged.c:2055 [inline]
RIP: 0010:collapse_scan_file+0x4f98/0x5230 mm/khugepaged.c:2404
Code: ff 4c 89 e7 48 c7 c6 60 b2 dc 8b e8 82 62 f1 fe 90 0f 0b 48 85 db 0f 84 03 01 00 00 e8 71 e5 8f ff 48 89 df e8 a9 20 7b 09 90 <0f> 0b e8 61 e5 8f ff 48 89 df 48 c7 c6 60 b2 dc 8b e8 52 62 f1 fe
RSP: 0018:ffffc90003826e20 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88805d558b00 RCX: a13f20bd39c5a100
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: ffffc90003827130 R08: ffffc90003826ba7 R09: 1ffff92000704d74
R10: dffffc0000000000 R11: fffff52000704d75 R12: ffffea0001b678f0
R13: dffffc0000000000 R14: 0000000000000000 R15: ffffc90003827010
FS: 000055557e3c2500(0000) GS:ffff888125537000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8156602000 CR3: 000000007ac66000 CR4: 00000000003526f0


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

David Hildenbrand (Arm)

unread,
Mar 19, 2026, 3:22:55 AM (yesterday) Mar 19
to syzbot, Liam.H...@oracle.com, ak...@linux-foundation.org, bao...@kernel.org, baoli...@linux.alibaba.com, dev....@arm.com, lance...@linux.dev, linux-...@vger.kernel.org, linu...@kvack.org, l...@kernel.org, npa...@redhat.com, ryan.r...@arm.com, syzkall...@googlegroups.com, z...@nvidia.com
On 3/19/26 08:20, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 95c541ddfb08 Add linux-next specific files for 20260316
> git tree: linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=15ccc216580000
> kernel config: https://syzkaller.appspot.com/x/.config?x=ed431987028345c6
> dashboard link: https://syzkaller.appspot.com/bug?extid=8961cb270ae74b4129fb
> compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12f778da580000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12cc006a580000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/c40f27ad73d8/disk-95c541dd.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/bd811888f684/vmlinux-95c541dd.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/3b72363d7dbd/bzImage-95c541dd.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+8961cb...@syzkaller.appspotmail.com

@Nico, maybe related to your changes?
--
Cheers,

David

Lance Yang

unread,
Mar 19, 2026, 4:05:53 AM (23 hours ago) Mar 19
to syzbot, da...@kernel.org, l...@kernel.org, wi...@infradead.org, baoli...@linux.alibaba.com, npa...@redhat.com, linu...@kvack.org, bao...@kernel.org, ryan.r...@arm.com, syzkall...@googlegroups.com, dev....@arm.com, z...@nvidia.com, linux-...@vger.kernel.org, Liam.H...@oracle.com, ak...@linux-foundation.org
Ccing Willy

IIUC, this is a dup of the earlier report[1], which I looked into back
in January. The root cause is the same: collapse_file() calls
xas_lock_irq() without resetting the xas state first, tripping the
XAS_INVALID() assertion:

#define xas_lock_irq(xas) xa_lock_irq(XAS_INVALID(xas)->xa)

static inline struct xa_state *XAS_INVALID(struct xa_state *xas)
{
XA_NODE_BUG_ON(xas->xa_node, xas_valid(xas));
return xas;
}

Added by commit:

commit 43b00759f21b10142094d1ae5ff65cbb368953a3
Author: Matthew Wilcox (Oracle) <wi...@infradead.org>
Date: Sun Dec 14 10:53:31 2025 -0500

XArray: Add extra debugging check to xas_lock and friends

While tracking down a recent bug, we discovered somewhere that had
forgotten to call xas_reset() before calling xas_lock(). Add a debug
check to be sure that doesn't happen in future and fix all the
places in
the test suite which were carelessly doing just this.

Suggested-by: Linus Torvalds <torv...@linux-foundation.org>
Signed-off-by: Matthew Wilcox (Oracle) <wi...@infradead.org>

I posted a HACK fix at the time[2], but David pointed out that Willy
had mentioned it likely needs more thought[3].

[1]
https://lore.kernel.org/all/69757ea0.a00a022...@google.com/
[2] https://lore.kernel.org/all/20260125121001.3...@linux.dev/
[3]
https://lore.kernel.org/all/7bce9231-714c-424a...@kernel.org/


Thanks,
Lance

Lorenzo Stoakes (Oracle)

unread,
Mar 19, 2026, 4:53:45 AM (23 hours ago) Mar 19
to Lance Yang, syzbot, da...@kernel.org, wi...@infradead.org, baoli...@linux.alibaba.com, npa...@redhat.com, linu...@kvack.org, bao...@kernel.org, ryan.r...@arm.com, syzkall...@googlegroups.com, dev....@arm.com, z...@nvidia.com, linux-...@vger.kernel.org, Liam.H...@oracle.com, ak...@linux-foundation.org
On Thu, Mar 19, 2026 at 04:05:38PM +0800, Lance Yang wrote:
> Ccing Willy
>
> IIUC, this is a dup of the earlier report[1], which I looked into back
> in January. The root cause is the same: collapse_file() calls
> xas_lock_irq() without resetting the xas state first, tripping the
> XAS_INVALID() assertion:
>
> #define xas_lock_irq(xas) xa_lock_irq(XAS_INVALID(xas)->xa)
>
> static inline struct xa_state *XAS_INVALID(struct xa_state *xas)
> {
> XA_NODE_BUG_ON(xas->xa_node, xas_valid(xas));
> return xas;
> }
>
> Added by commit:
>
> commit 43b00759f21b10142094d1ae5ff65cbb368953a3
> Author: Matthew Wilcox (Oracle) <wi...@infradead.org>
> Date: Sun Dec 14 10:53:31 2025 -0500
>
> XArray: Add extra debugging check to xas_lock and friends
>
> While tracking down a recent bug, we discovered somewhere that had
> forgotten to call xas_reset() before calling xas_lock(). Add a debug
> check to be sure that doesn't happen in future and fix all the places in
> the test suite which were carelessly doing just this.
>
> Suggested-by: Linus Torvalds <torv...@linux-foundation.org>
> Signed-off-by: Matthew Wilcox (Oracle) <wi...@infradead.org>
>
> I posted a HACK fix at the time[2], but David pointed out that Willy
> had mentioned it likely needs more thought[3].

Hmm we shouldn't leave this bug in place while working for a fancier fix??

Can we get _something_ going as an upstream fix? We can improve whatever we do
later right?

David, thoughts?
Cheers, Lorenzo

David Hildenbrand (Arm)

unread,
Mar 19, 2026, 5:00:12 AM (22 hours ago) Mar 19
to Lorenzo Stoakes (Oracle), Lance Yang, syzbot, wi...@infradead.org, baoli...@linux.alibaba.com, npa...@redhat.com, linu...@kvack.org, bao...@kernel.org, ryan.r...@arm.com, syzkall...@googlegroups.com, dev....@arm.com, z...@nvidia.com, linux-...@vger.kernel.org, Liam.H...@oracle.com, ak...@linux-foundation.org
I recall Willy mentioning that the issue is likely a false positive.

IIUC, that commit is not upstream? So it only triggers in linux-next.

Which means:

1) If it's a false positive, upstream is not effected (no XA_NODE_BUG_ON)

2) If it's not a false positive, upstream is effected but does not
trigger the XA_NODE_BUG_ON

--
Cheers,

David

Lance Yang

unread,
Mar 19, 2026, 5:14:23 AM (22 hours ago) Mar 19
to David Hildenbrand (Arm), Lorenzo Stoakes (Oracle), syzbot, wi...@infradead.org, baoli...@linux.alibaba.com, npa...@redhat.com, linu...@kvack.org, bao...@kernel.org, ryan.r...@arm.com, syzkall...@googlegroups.com, dev....@arm.com, z...@nvidia.com, linux-...@vger.kernel.org, Liam.H...@oracle.com, ak...@linux-foundation.org
Right. That does not appear to be in upstream, I only see it in
linux-next :)

> Which means:
>
> 1) If it's a false positive, upstream is not effected (no XA_NODE_BUG_ON)
>
> 2) If it's not a false positive, upstream is effected but does not
> trigger the XA_NODE_BUG_ON

Yep. So this particular BUG_ON is not affecting upstream directly.

That said, syzbot will likely keep hitting it in linux-next and
generating noise for us until it is addressed there ...

David Hildenbrand (Arm)

unread,
Mar 19, 2026, 5:21:51 AM (22 hours ago) Mar 19
to Lance Yang, Lorenzo Stoakes (Oracle), syzbot, wi...@infradead.org, baoli...@linux.alibaba.com, npa...@redhat.com, linu...@kvack.org, bao...@kernel.org, ryan.r...@arm.com, syzkall...@googlegroups.com, dev....@arm.com, z...@nvidia.com, linux-...@vger.kernel.org, Liam.H...@oracle.com, ak...@linux-foundation.org
Right, I assume this comes through Willy's tree, so Willy should
consider removing it for the time being.

--
Cheers,

David

Lorenzo Stoakes (Oracle)

unread,
Mar 19, 2026, 6:27:39 AM (21 hours ago) Mar 19
to David Hildenbrand (Arm), Lance Yang, syzbot, wi...@infradead.org, baoli...@linux.alibaba.com, npa...@redhat.com, linu...@kvack.org, bao...@kernel.org, ryan.r...@arm.com, syzkall...@googlegroups.com, dev....@arm.com, z...@nvidia.com, linux-...@vger.kernel.org, Liam.H...@oracle.com, ak...@linux-foundation.org
OK guys you had your fun, yes I misspoke by saying upstream, I apologise. No
need to go over that repeatedly.

However, linux-next ends up in upstream unless action is taken to fix patches
heading there.

We shouldn't, really, be just ignoring splatting kernels like that. At least
that's my personal point of view on it.

But anyway I really don't have the time or energy to try to track this down or
push on this further.

Thanks, Lorenzo

David Hildenbrand (Arm)

unread,
Mar 19, 2026, 7:00:05 AM (20 hours ago) Mar 19
to Lorenzo Stoakes (Oracle), Lance Yang, syzbot, wi...@infradead.org, baoli...@linux.alibaba.com, npa...@redhat.com, linu...@kvack.org, bao...@kernel.org, ryan.r...@arm.com, syzkall...@googlegroups.com, dev....@arm.com, z...@nvidia.com, linux-...@vger.kernel.org, Liam.H...@oracle.com, ak...@linux-foundation.org
> We shouldn't, really, be just ignoring splatting kernels like that. At least
> that's my personal point of view on it.

Yes, we shouldn't.

>
> But anyway I really don't have the time or energy to try to track this down or
> push on this further.

The problem I'm having is that reverting the commit might hide a real
problem, and understanding weather there is a real problem requires ...
real work.

Willy said

"It *might* still be safe in this instance; I'll look
at this carefully for a bit and decide how best to fix it."

That likely didn't happen.

Later he said:

"But this is a long and complicated function (over 400 lines!) and I
don't know if fixing this one way or the other would serve to make other
bugs more likely or expose some future problem to the debugging code."

with

"This isn't the kind of thing where you can just jump in with a one line
patch and actually be helpful, sorry."

So I am clueless if there is a real problem there or whether it's just a
false positive. And whether (if it's a real problem) the simple fix
would actually be a good temporary fix ("make other bugs more likely").

I can try finding some time to look into that, but pagecache code is not
particularly the code I'm familiar with ...

--
Cheers,

David

Lorenzo Stoakes (Oracle)

unread,
Mar 19, 2026, 7:04:15 AM (20 hours ago) Mar 19
to Vlastimil Babka, Lance Yang, syzbot, da...@kernel.org, wi...@infradead.org, Mark Brown, baoli...@linux.alibaba.com, npa...@redhat.com, linu...@kvack.org, bao...@kernel.org, ryan.r...@arm.com, syzkall...@googlegroups.com, dev....@arm.com, z...@nvidia.com, linux-...@vger.kernel.org, Liam.H...@oracle.com, ak...@linux-foundation.org
> That "needs more thought" was Jan 5. After 2.5 months later this is still
> messing up linux-next testing due to a known unfixed problem. Completely
> unnaceptable. Willy, you need to drop the new bug check until the known
> problem is fixed.
>
> Mark, please drop https://git.infradead.org/users/willy/xarray.git from
> linux-next until it stops breaking linux-next. Thanks.

Thanks, also I don't see a Link: tag or any discussion of this patch anywhere
on-list (maybe I missed it?) the only think a search on lore brings up is a bug
report from jan 5th [0] about it.

If this is heading for a Linus PR, could we have the patch actually posted to
lore somewhere so there can be some discussion?

And is there a way to ensure this doesn't land in the next merge window unless
it's fixed? Not sure through which tree it's going (Willy's?).

In general I'm very uncomfortable 'just leaving' splatting kernels in the
-next tree.

[0]:https://lore.kernel.org/all/aVvz3tYd...@mozart.vkv.me/

Thanks, Lorenzo

Vlastimil Babka

unread,
Mar 19, 2026, 7:06:12 AM (20 hours ago) Mar 19
to Lance Yang, syzbot, da...@kernel.org, l...@kernel.org, wi...@infradead.org, Mark Brown, baoli...@linux.alibaba.com, npa...@redhat.com, linu...@kvack.org, bao...@kernel.org, ryan.r...@arm.com, syzkall...@googlegroups.com, dev....@arm.com, z...@nvidia.com, linux-...@vger.kernel.org, Liam.H...@oracle.com, ak...@linux-foundation.org
On 3/19/26 09:05, Lance Yang wrote:
That "needs more thought" was Jan 5. After 2.5 months later this is still
messing up linux-next testing due to a known unfixed problem. Completely
unnaceptable. Willy, you need to drop the new bug check until the known
problem is fixed.

Mark, please drop https://git.infradead.org/users/willy/xarray.git from
linux-next until it stops breaking linux-next. Thanks.

>

Lorenzo Stoakes (Oracle)

unread,
Mar 19, 2026, 7:07:20 AM (20 hours ago) Mar 19
to David Hildenbrand (Arm), Lance Yang, syzbot, wi...@infradead.org, baoli...@linux.alibaba.com, npa...@redhat.com, linu...@kvack.org, bao...@kernel.org, ryan.r...@arm.com, syzkall...@googlegroups.com, dev....@arm.com, z...@nvidia.com, linux-...@vger.kernel.org, Liam.H...@oracle.com, ak...@linux-foundation.org
On Thu, Mar 19, 2026 at 11:59:57AM +0100, David Hildenbrand (Arm) wrote:
> > We shouldn't, really, be just ignoring splatting kernels like that. At least
> > that's my personal point of view on it.
>
> Yes, we shouldn't.
>
> >
> > But anyway I really don't have the time or energy to try to track this down or
> > push on this further.
>
> The problem I'm having is that reverting the commit might hide a real
> problem, and understanding weather there is a real problem requires ...
> real work.
>
> Willy said
>
> "It *might* still be safe in this instance; I'll look
> at this carefully for a bit and decide how best to fix it."
>
> That likely didn't happen.

Yep, and in that case given the assert is suspect, the correct thing is to drop
the introduction of the potentially-buggy asserting logic rather than tear our
hair out as to whether there is something real or not.

>
> Later he said:
>
> "But this is a long and complicated function (over 400 lines!) and I
> don't know if fixing this one way or the other would serve to make other
> bugs more likely or expose some future problem to the debugging code."
>
> with
>
> "This isn't the kind of thing where you can just jump in with a one line
> patch and actually be helpful, sorry."


Yup, so again really I don't think the kernel should take the xarray debug patch
until it is discussed and verified as correct on-list.

See the discussion with Vlastimil.

>
> So I am clueless if there is a real problem there or whether it's just a
> false positive. And whether (if it's a real problem) the simple fix
> would actually be a good temporary fix ("make other bugs more likely").
>
> I can try finding some time to look into that, but pagecache code is not
> particularly the code I'm familiar with ...

I don't think we should waste our time on it, willy's series should be yoinked
until it can prove itself for-sure well behaved.

>
> --
> Cheers,
>
> David

Thanks, Lorenzo

David Hildenbrand (Arm)

unread,
Mar 19, 2026, 7:10:43 AM (20 hours ago) Mar 19
to Lorenzo Stoakes (Oracle), Lance Yang, syzbot, wi...@infradead.org, baoli...@linux.alibaba.com, npa...@redhat.com, linu...@kvack.org, bao...@kernel.org, ryan.r...@arm.com, syzkall...@googlegroups.com, dev....@arm.com, z...@nvidia.com, linux-...@vger.kernel.org, Liam.H...@oracle.com, ak...@linux-foundation.org
>> "This isn't the kind of thing where you can just jump in with a one line
>> patch and actually be helpful, sorry."
>
>
> Yup, so again really I don't think the kernel should take the xarray debug patch
> until it is discussed and verified as correct on-list.
>
> See the discussion with Vlastimil.

Agreed.

>
>>
>> So I am clueless if there is a real problem there or whether it's just a
>> false positive. And whether (if it's a real problem) the simple fix
>> would actually be a good temporary fix ("make other bugs more likely").
>>
>> I can try finding some time to look into that, but pagecache code is not
>> particularly the code I'm familiar with ...
>
> I don't think we should waste our time on it, willy's series should be yoinked
> until it can prove itself for-sure well behaved.

I hear you. At the same time it has the smell of "test case fails, so
let's remove the test case" vibes.

Well, arguable, it's worse than a test case that fails :(

I'm fine with dropping that asap. At the same time, I hope we won't lose
our reminder that something possibly bad is broken upstream.

--
Cheers,

David

Lorenzo Stoakes (Oracle)

unread,
Mar 19, 2026, 7:13:07 AM (20 hours ago) Mar 19
to David Hildenbrand (Arm), Lance Yang, syzbot, wi...@infradead.org, baoli...@linux.alibaba.com, npa...@redhat.com, linu...@kvack.org, bao...@kernel.org, ryan.r...@arm.com, syzkall...@googlegroups.com, dev....@arm.com, z...@nvidia.com, linux-...@vger.kernel.org, Liam.H...@oracle.com, ak...@linux-foundation.org
I mean hopefully once we have a _working_ version of this it will get reflagged
again in any case.

The uncertainty makes this worse than a bug report unfortunately because it's a
maybe/maybe not and there's literally nothing we can do with that :(

Mark Brown

unread,
Mar 19, 2026, 8:17:14 AM (19 hours ago) Mar 19
to Vlastimil Babka, Lance Yang, syzbot, da...@kernel.org, l...@kernel.org, wi...@infradead.org, baoli...@linux.alibaba.com, npa...@redhat.com, linu...@kvack.org, bao...@kernel.org, ryan.r...@arm.com, syzkall...@googlegroups.com, dev....@arm.com, z...@nvidia.com, linux-...@vger.kernel.org, Liam.H...@oracle.com, ak...@linux-foundation.org
On Thu, Mar 19, 2026 at 11:56:21AM +0100, Vlastimil Babka wrote:
> On 3/19/26 09:05, Lance Yang wrote:

> > IIUC, this is a dup of the earlier report[1], which I looked into back
> > in January. The root cause is the same: collapse_file() calls
> > xas_lock_irq() without resetting the xas state first, tripping the
> > XAS_INVALID() assertion:
> >
> > #define xas_lock_irq(xas) xa_lock_irq(XAS_INVALID(xas)->xa)
> >
> > static inline struct xa_state *XAS_INVALID(struct xa_state *xas)
> > {
> > XA_NODE_BUG_ON(xas->xa_node, xas_valid(xas));
> > return xas;
> > }

...

> > I posted a HACK fix at the time[2], but David pointed out that Willy
> > had mentioned it likely needs more thought[3].

...

> That "needs more thought" was Jan 5. After 2.5 months later this is still
> messing up linux-next testing due to a known unfixed problem. Completely
> unnaceptable. Willy, you need to drop the new bug check until the known
> problem is fixed.

> Mark, please drop https://git.infradead.org/users/willy/xarray.git from
> linux-next until it stops breaking linux-next. Thanks.

I just saw this mail, I already started running the merge beforehand and
don't 100% trust the scripts not to fall over if I make a change at this
point - I can drop from tomorrow if things aren't sorted by then. I see
the xarray tree hasn't been updated since before Christmas.
signature.asc
Reply all
Reply to author
Forward
0 new messages