[syzbot] [mm?] general protection fault in dequeue_hugetlb_folio_nodemask (2)

16 views
Skip to first unread message

syzbot

unread,
Jun 11, 2024, 6:34:27ā€ÆAMJun 11
to ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, muchu...@linux.dev, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: d35b2284e966 Add linux-next specific files for 20240607
git tree: linux-next
console+strace: https://syzkaller.appspot.com/x/log.txt?x=161352e2980000
kernel config: https://syzkaller.appspot.com/x/.config?x=d8bf5cd6bcca7343
dashboard link: https://syzkaller.appspot.com/bug?extid=569ed13f4054f271087b
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15eb5e86980000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15db597e980000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/e0055a00a2cb/disk-d35b2284.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/192cbb8cf833/vmlinux-d35b2284.xz
kernel image: https://storage.googleapis.com/syzbot-assets/57804c9c9319/bzImage-d35b2284.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+569ed1...@syzkaller.appspotmail.com

Oops: general protection fault, probably for non-canonical address 0xdffffc0000000489: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: probably user-memory-access in range [0x0000000000002448-0x000000000000244f]
CPU: 1 PID: 5095 Comm: syz-executor603 Not tainted 6.10.0-rc2-next-20240607-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
RIP: 0010:zonelist_zone_idx include/linux/mmzone.h:1613 [inline]
RIP: 0010:next_zones_zonelist include/linux/mmzone.h:1644 [inline]
RIP: 0010:first_zones_zonelist include/linux/mmzone.h:1670 [inline]
RIP: 0010:dequeue_hugetlb_folio_nodemask+0x193/0xe40 mm/hugetlb.c:1362
Code: 93 7a a0 ff c7 44 24 14 00 00 00 00 83 7c 24 40 00 0f 85 97 0c 00 00 48 83 7c 24 20 00 0f 85 45 09 00 00 48 89 d8 48 c1 e8 03 <42> 0f b6 04 28 84 c0 0f 85 58 09 00 00 44 8b 33 44 89 f7 8b 5c 24
RSP: 0018:ffffc900035bf720 EFLAGS: 00010002
RAX: 0000000000000489 RBX: 0000000000002448 RCX: ffff88807651bc00
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc900035bf858 R08: ffffffff81f5e800 R09: fffff520006b7ee8
R10: dffffc0000000000 R11: fffff520006b7ee8 R12: 00000000ffffffff
R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 000055558f377380(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000005fdeb8 CR3: 000000001cfda000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
alloc_hugetlb_folio_nodemask+0xae/0x3f0 mm/hugetlb.c:2603
memfd_alloc_folio+0x15e/0x390 mm/memfd.c:75
memfd_pin_folios+0x1066/0x1720 mm/gup.c:3864
udmabuf_create+0x658/0x11c0 drivers/dma-buf/udmabuf.c:353
udmabuf_ioctl_create drivers/dma-buf/udmabuf.c:420 [inline]
udmabuf_ioctl+0x304/0x4f0 drivers/dma-buf/udmabuf.c:451
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:907 [inline]
__se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb1c16b4ab9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 c1 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff21e63e48 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb1c16b4ab9
RDX: 0000000020000000 RSI: 0000000040187542 RDI: 0000000000000003
RBP: 00007fb1c17275f0 R08: 0000000000000006 R09: 0000000000000006
R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000001
R13: 431bde82d7b634db R14: 0000000000000001 R15: 0000000000000001
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:zonelist_zone_idx include/linux/mmzone.h:1613 [inline]
RIP: 0010:next_zones_zonelist include/linux/mmzone.h:1644 [inline]
RIP: 0010:first_zones_zonelist include/linux/mmzone.h:1670 [inline]
RIP: 0010:dequeue_hugetlb_folio_nodemask+0x193/0xe40 mm/hugetlb.c:1362
Code: 93 7a a0 ff c7 44 24 14 00 00 00 00 83 7c 24 40 00 0f 85 97 0c 00 00 48 83 7c 24 20 00 0f 85 45 09 00 00 48 89 d8 48 c1 e8 03 <42> 0f b6 04 28 84 c0 0f 85 58 09 00 00 44 8b 33 44 89 f7 8b 5c 24
RSP: 0018:ffffc900035bf720 EFLAGS: 00010002
RAX: 0000000000000489 RBX: 0000000000002448 RCX: ffff88807651bc00
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc900035bf858 R08: ffffffff81f5e800 R09: fffff520006b7ee8
R10: dffffc0000000000 R11: fffff520006b7ee8 R12: 00000000ffffffff
R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 000055558f377380(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000005fdeb8 CR3: 000000001cfda000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess):
0: 93 xchg %eax,%ebx
1: 7a a0 jp 0xffffffa3
3: ff c7 inc %edi
5: 44 24 14 rex.R and $0x14,%al
8: 00 00 add %al,(%rax)
a: 00 00 add %al,(%rax)
c: 83 7c 24 40 00 cmpl $0x0,0x40(%rsp)
11: 0f 85 97 0c 00 00 jne 0xcae
17: 48 83 7c 24 20 00 cmpq $0x0,0x20(%rsp)
1d: 0f 85 45 09 00 00 jne 0x968
23: 48 89 d8 mov %rbx,%rax
26: 48 c1 e8 03 shr $0x3,%rax
* 2a: 42 0f b6 04 28 movzbl (%rax,%r13,1),%eax <-- trapping instruction
2f: 84 c0 test %al,%al
31: 0f 85 58 09 00 00 jne 0x98f
37: 44 8b 33 mov (%rbx),%r14d
3a: 44 89 f7 mov %r14d,%edi
3d: 8b .byte 0x8b
3e: 5c pop %rsp
3f: 24 .byte 0x24


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Andrew Morton

unread,
Jun 11, 2024, 1:30:09ā€ÆPMJun 11
to syzbot, linux-...@vger.kernel.org, linu...@kvack.org, muchu...@linux.dev, syzkall...@googlegroups.com, Vivek Kasireddy
On Tue, 11 Jun 2024 03:34:25 -0700 syzbot <syzbot+569ed1...@syzkaller.appspotmail.com> wrote:

> Hello,
>
> syzbot found the following issue on:

Thanks.

> Call Trace:
> <TASK>
> alloc_hugetlb_folio_nodemask+0xae/0x3f0 mm/hugetlb.c:2603
> memfd_alloc_folio+0x15e/0x390 mm/memfd.c:75
> memfd_pin_folios+0x1066/0x1720 mm/gup.c:3864
> udmabuf_create+0x658/0x11c0 drivers/dma-buf/udmabuf.c:353
> udmabuf_ioctl_create drivers/dma-buf/udmabuf.c:420 [inline]
> udmabuf_ioctl+0x304/0x4f0 drivers/dma-buf/udmabuf.c:451
> vfs_ioctl fs/ioctl.c:51 [inline]
> __do_sys_ioctl fs/ioctl.c:907 [inline]
> __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x77/0x7f

I think we can pretty confidently point at the series "mm/gup:
Introduce memfd_pin_folios() for pinning memfd folios". I'll drop the
v14 series.

Oscar Salvador

unread,
Jun 11, 2024, 1:46:40ā€ÆPMJun 11
to Andrew Morton, syzbot, linux-...@vger.kernel.org, linu...@kvack.org, muchu...@linux.dev, syzkall...@googlegroups.com, Vivek Kasireddy
jfyi: I am trying to reproduce this locally.


--
Oscar Salvador
SUSE Labs

Oscar Salvador

unread,
Jun 11, 2024, 1:52:12ā€ÆPMJun 11
to Andrew Morton, syzbot, linux-...@vger.kernel.org, linu...@kvack.org, muchu...@linux.dev, syzkall...@googlegroups.com, Vivek Kasireddy
Actually, should not memfd_alloc_folio() pass htlb_alloc_mask() instead
of GFP_USER to alloc_hugetlb_folio_nodemask? Or at least do
GFP_HIGHUSER.

Oscar Salvador

unread,
Jun 12, 2024, 1:12:04ā€ÆAMJun 12
to Andrew Morton, syzbot, linux-...@vger.kernel.org, linu...@kvack.org, muchu...@linux.dev, syzkall...@googlegroups.com, Vivek Kasireddy
Ok, I spot the issue.
memfd_alloc_folio() was calling alloc_hugetlb_folio_nodemask with
preferred_nid being NUMA_NO_NODE, but that is bad as
dequeue_hugetlb_folio_nodemask will do:

zonelist = node_zonelist(nid, gfp_mask)

which will try to get node_zonelists from nid, but since nid is -1, heh.

The below patch fixes the issue for me, but I think that the right place
to fix this up would be alloc_hugetlb_folio_nodemask(), so we can place
the numa_node_id() if preferred_nid = NUMA_NO_NODE in there as a safety
net.
This way we catch this before exploding in case the user was not careful
enough.

I will cook up a patch shortly.

Another thing is why memfd_alloc_folio uses GFP_USER instead of
GFP_HIGHUSER, but that maybe because I see that memfd_pin_folios() is
used by some DMA driver which might not have access to HIGH_MEMORY.

diff --git a/mm/memfd.c b/mm/memfd.c
index 8035c6325e3c..2692f0298adc 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -68,12 +68,13 @@ static void memfd_tag_pins(struct xa_state *xas)
struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
{
#ifdef CONFIG_HUGETLB_PAGE
+ int nid = numa_node_id();
struct folio *folio;
int err;

if (is_file_hugepages(memfd)) {
folio = alloc_hugetlb_folio_nodemask(hstate_file(memfd),
- NUMA_NO_NODE,
+ nid,
NULL,
GFP_USER,
false);

Kasireddy, Vivek

unread,
Jun 12, 2024, 1:55:32ā€ÆAMJun 12
to Oscar Salvador, Andrew Morton, syzbot, linux-...@vger.kernel.org, linu...@kvack.org, muchu...@linux.dev, syzkall...@googlegroups.com
Hi Oscar,
Thank you for fixing this issue!

>
> Another thing is why memfd_alloc_folio uses GFP_USER instead of
> GFP_HIGHUSER, but that maybe because I see that memfd_pin_folios() is
> used by some DMA driver which might not have access to HIGH_MEMORY.
Right, memfd_pin_folios() is used by udmabuf driver for DMA but the reason
why GFP_USER is chosen is because I was following this code in gup.c:
struct migration_target_control mtc = {
.nid = NUMA_NO_NODE,
.gfp_mask = GFP_USER | __GFP_NOWARN,
.reason = MR_LONGTERM_PIN,
};

if (migrate_pages(movable_folio_list, alloc_migration_target,
NULL, (unsigned long)&mtc, MIGRATE_SYNC,
MR_LONGTERM_PIN, NULL)) {

where, alloc_migration_target() does the following to allocate a hugetlb folio:
if (folio_test_hugetlb(src)) {
struct hstate *h = folio_hstate(src);

gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
return alloc_hugetlb_folio_nodemask(h, nid,
mtc->nmask, gfp_mask,
htlb_allow_alloc_fallback(mtc->reason));

but I somehow missed the early check in alloc_migration_target() where it does:
if (nid == NUMA_NO_NODE)
nid = folio_nid(src);

Thanks,
Vivek

Oscar Salvador

unread,
Jun 12, 2024, 3:46:42ā€ÆAMJun 12
to syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, muchu...@linux.dev, syzkall...@googlegroups.com
...
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzk...@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
>
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.

#syz test: https://github.com/leberus/linux hugetlb-dequeue-numa

syzbot

unread,
Jun 12, 2024, 4:16:05ā€ÆAMJun 12
to ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, muchu...@linux.dev, osal...@suse.de, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+569ed1...@syzkaller.appspotmail.com

Tested on:

commit: f2a50aed mm/hugetlb: Guard dequeue_hugetlb_folio_nodem..
git tree: https://github.com/leberus/linux hugetlb-dequeue-numa
console output: https://syzkaller.appspot.com/x/log.txt?x=1089406c980000
kernel config: https://syzkaller.appspot.com/x/.config?x=fa0ce06dcc735711
dashboard link: https://syzkaller.appspot.com/bug?extid=569ed13f4054f271087b
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

Note: no patches were applied.
Note: testing is done by a robot and is best-effort only.

syzbot

unread,
Jun 12, 2024, 7:58:08ā€ÆAMJun 12
to air...@redhat.com, ak...@linux-foundation.org, kra...@redhat.com, linux-...@vger.kernel.org, linu...@kvack.org, muchu...@linux.dev, osal...@suse.de, syzkall...@googlegroups.com, vivek.k...@intel.com
syzbot has bisected this issue to:

commit 265a5cde9462d3a816b18c6cf4f0a231f1c29d1b
Author: Vivek Kasireddy <vivek.k...@intel.com>
Date: Thu Apr 11 06:59:43 2024 +0000

udmabuf: pin the pages using memfd_pin_folios() API

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1617ab6a980000
start commit: d35b2284e966 Add linux-next specific files for 20240607
git tree: linux-next
final oops: https://syzkaller.appspot.com/x/report.txt?x=1517ab6a980000
console output: https://syzkaller.appspot.com/x/log.txt?x=1117ab6a980000
Reported-by: syzbot+569ed1...@syzkaller.appspotmail.com
Fixes: 265a5cde9462 ("udmabuf: pin the pages using memfd_pin_folios() API")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection
Reply all
Reply to author
Forward
0 new messages