[syzbot] [mm?] kernel BUG in const_folio_flags

5 views
Skip to first unread message

syzbot

unread,
Mar 21, 2024, 12:04:25 AMMar 21
to ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, muchu...@linux.dev, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: 78c3925c048c Merge tag 'soc-late-6.9' of git://git.kernel...
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1267d879180000
kernel config: https://syzkaller.appspot.com/x/.config?x=f3c2635ded15fbc9
dashboard link: https://syzkaller.appspot.com/bug?extid=3b9148f91b7869120e81
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: i386

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7bc7510fe41f/non_bootable_disk-78c3925c.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/cf2bceeccde3/vmlinux-78c3925c.xz
kernel image: https://storage.googleapis.com/syzbot-assets/fc938dfaea6d/bzImage-78c3925c.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+3b9148...@syzkaller.appspotmail.com

veth_newlink+0x627/0xa10 drivers/net/veth.c:1895
rtnl_newlink_create net/core/rtnetlink.c:3494 [inline]
__rtnl_newlink+0x119c/0x1960 net/core/rtnetlink.c:3714
rtnl_newlink+0x67/0xa0 net/core/rtnetlink.c:3727
rtnetlink_rcv_msg+0x3c7/0xe60 net/core/rtnetlink.c:6595
------------[ cut here ]------------
kernel BUG at include/linux/page-flags.h:315!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 1 PID: 42 Comm: kcompactd0 Not tainted 6.8.0-syzkaller-11725-g78c3925c048c #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:const_folio_flags+0x1bd/0x1f0 include/linux/page-flags.h:315
Code: 41 83 e4 01 44 89 e6 e8 b1 e6 a9 ff 45 84 e4 0f 85 c4 fe ff ff e8 23 ec a9 ff 48 c7 c6 e0 07 1b 8b 48 89 ef e8 34 2e ed ff 90 <0f> 0b e8 8c 6b 06 00 e9 66 fe ff ff 48 89 ef e8 7f 6b 06 00 eb b6
RSP: 0018:ffffc9000068f7f0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffc9000068f698
RDX: ffff88801744c880 RSI: ffffffff81e4265c RDI: ffffffff8b6f0060
RBP: ffffea0000a04c00 R08: 0000000000000000 R09: fffffbfff1f3deca
R10: ffffffff8f9ef657 R11: 0000000000000000 R12: 0000000000000000
R13: ffffea0000a04dc0 R14: 0000000000028137 R15: ffffc9000068fbe8
FS: 0000000000000000(0000) GS:ffff88802c300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffe623b9138 CR3: 000000001c22c000 CR4: 0000000000350ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
folio_test_hugetlb include/linux/page-flags.h:875 [inline]
PageHuge+0x219/0x2b0 mm/hugetlb.c:2174
isolate_migratepages_block+0x4a0/0x5110 mm/compaction.c:1004
isolate_migratepages mm/compaction.c:2182 [inline]
compact_zone+0x1a5c/0x4280 mm/compaction.c:2629
kcompactd_do_work+0x340/0x720 mm/compaction.c:3100
kcompactd+0x8d7/0xde0 mm/compaction.c:3199
kthread+0x2c1/0x3a0 kernel/kthread.c:388
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:const_folio_flags+0x1bd/0x1f0 include/linux/page-flags.h:315
Code: 41 83 e4 01 44 89 e6 e8 b1 e6 a9 ff 45 84 e4 0f 85 c4 fe ff ff e8 23 ec a9 ff 48 c7 c6 e0 07 1b 8b 48 89 ef e8 34 2e ed ff 90 <0f> 0b e8 8c 6b 06 00 e9 66 fe ff ff 48 89 ef e8 7f 6b 06 00 eb b6
RSP: 0018:ffffc9000068f7f0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffc9000068f698
RDX: ffff88801744c880 RSI: ffffffff81e4265c RDI: ffffffff8b6f0060
RBP: ffffea0000a04c00 R08: 0000000000000000 R09: fffffbfff1f3deca
R10: ffffffff8f9ef657 R11: 0000000000000000 R12: 0000000000000000
R13: ffffea0000a04dc0 R14: 0000000000028137 R15: ffffc9000068fbe8
FS: 0000000000000000(0000) GS:ffff88802c300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffe623b9138 CR3: 000000001c22c000 CR4: 0000000000350ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Muchun Song

unread,
Mar 21, 2024, 5:54:09 AMMar 21
to syzbot, Oscar Salvador, David Hildenbrand, Matthew Wilcox, Andrew Morton, LKML, Linux-MM, syzkall...@googlegroups.com


> On Mar 21, 2024, at 12:04, syzbot <syzbot+3b9148...@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 78c3925c048c Merge tag 'soc-late-6.9' of git://git.kernel...
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1267d879180000
> kernel config: https://syzkaller.appspot.com/x/.config?x=f3c2635ded15fbc9
> dashboard link: https://syzkaller.appspot.com/bug?extid=3b9148f91b7869120e81
> compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> userspace arch: i386
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7bc7510fe41f/non_bootable_disk-78c3925c.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/cf2bceeccde3/vmlinux-78c3925c.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/fc938dfaea6d/bzImage-78c3925c.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+3b9148...@syzkaller.appspotmail.com
>
> veth_newlink+0x627/0xa10 drivers/net/veth.c:1895
> rtnl_newlink_create net/core/rtnetlink.c:3494 [inline]
> __rtnl_newlink+0x119c/0x1960 net/core/rtnetlink.c:3714
> rtnl_newlink+0x67/0xa0 net/core/rtnetlink.c:3727
> rtnetlink_rcv_msg+0x3c7/0xe60 net/core/rtnetlink.c:6595
> ------------[ cut here ]------------
> kernel BUG at include/linux/page-flags.h:315!

There are some more page dumping information from console:

[ 61.367144][ T42] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff888028132880 pfn:0x28130
[ 61.371430][ T42] flags: 0xfff80000000000(node=0|zone=1|lastcpupid=0xfff)
[ 61.374455][ T42] page_type: 0xffffffff()
[ 61.376096][ T42] raw: 00fff80000000000 ffff888015ecd540 dead000000000100 0000000000000000
[ 61.379994][ T42] raw: ffff888028132880 0000000000190000 00000000ffffffff 0000000000000000

Alright, the page is freed (with a refcount of 0).

> invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
> CPU: 1 PID: 42 Comm: kcompactd0 Not tainted 6.8.0-syzkaller-11725-g78c3925c048c #0
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> RIP: 0010:const_folio_flags+0x1bd/0x1f0 include/linux/page-flags.h:315

The RIP is in const_folio_flags() (called from folio_test_hugetlb()):

VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);

It is reasonable to WARN because the page is freed (PG_head is not set
in this case).

The comments from folio_test_hugetlb() says "Caller should have a
reference on the folio", so the caller of PageHuge() should grab
a refcount before calling folio_test_hugetlb() since commit
9c5ccf2db04b. But it does not mean that the @page must be a HugeTLB page
even if PageHuge(@page) returns true when the user does not hold
a extra refcount on the @page. Seems the WARN could be acceptable, so
should we remove this WARN? I am not sure. Cc more experts.

Thanks.

David Hildenbrand

unread,
Mar 21, 2024, 5:58:08 AMMar 21
to Muchun Song, syzbot, Oscar Salvador, Matthew Wilcox, Andrew Morton, LKML, Linux-MM, syzkall...@googlegroups.com
Isn't this the problem Willy is fixing with the upcoing
folio_test_hugetlb() changes?

We cannot always grab a folio reference on hugetlb folios: free hugetlb
folios have a refcount of 0.

--
Cheers,

David / dhildenb

Oscar Salvador

unread,
Mar 21, 2024, 7:00:41 AMMar 21
to Muchun Song, syzbot, David Hildenbrand, Matthew Wilcox, Andrew Morton, LKML, Linux-MM, syzkall...@googlegroups.com
On Thu, Mar 21, 2024 at 05:49:49PM +0800, Muchun Song wrote:
> There are some more page dumping information from console:
>
> [ 61.367144][ T42] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff888028132880 pfn:0x28130
> [ 61.371430][ T42] flags: 0xfff80000000000(node=0|zone=1|lastcpupid=0xfff)
> [ 61.374455][ T42] page_type: 0xffffffff()
> [ 61.376096][ T42] raw: 00fff80000000000 ffff888015ecd540 dead000000000100 0000000000000000
> [ 61.379994][ T42] raw: ffff888028132880 0000000000190000 00000000ffffffff 0000000000000000
>
> Alright, the page is freed (with a refcount of 0).

Yes, basically the page changed betwen folio_test_large() (returned true
for PG_Head) and the call to const_folio_flags() (which now returned
false for PG_Head).

As David pointed out, Willy is working on making PageHutelb more
robust [1].


[1] https://lore.kernel.org/linux-mm/20240314012506....@infradead.org/

--
Oscar Salvador
SUSE Labs

Muchun Song

unread,
Mar 21, 2024, 11:24:58 PMMar 21
to Oscar Salvador, syzbot, David Hildenbrand, Matthew Wilcox, Andrew Morton, LKML, Linux-MM, syzkall...@googlegroups.com
Sorry, I am not on the CC list, so I didn't know this. But thank
you and David for this information, I think it could fix this problem.

Muchun,
Thanks.

Reply all
Reply to author
Forward
0 new messages