[syzbot] WARNING in alloc_charge_hpage

9 views
Skip to first unread message

syzbot

unread,
Oct 28, 2022, 3:49:46 AM10/28/22
to ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: a70385240892 Merge tag 'perf_urgent_for_v6.1_rc2' of git:/..
git tree: upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=13f4726a880000
kernel config: https://syzkaller.appspot.com/x/.config?x=ea03ca45176080bc
dashboard link: https://syzkaller.appspot.com/bug?extid=0044b22d177870ee974f
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13726f72880000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=13b57436880000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/5e17f1e83cf3/disk-a7038524.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/f8ef729877f7/vmlinux-a7038524.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+0044b2...@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221 __alloc_pages_node include/linux/gfp.h:221 [inline]
WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221 hpage_collapse_alloc_page mm/khugepaged.c:807 [inline]
WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221 alloc_charge_hpage+0x802/0xaa0 mm/khugepaged.c:963
Modules linked in:
CPU: 1 PID: 3646 Comm: syz-executor210 Not tainted 6.1.0-rc1-syzkaller-00454-ga70385240892 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/11/2022
RIP: 0010:__alloc_pages_node include/linux/gfp.h:221 [inline]
RIP: 0010:hpage_collapse_alloc_page mm/khugepaged.c:807 [inline]
RIP: 0010:alloc_charge_hpage+0x802/0xaa0 mm/khugepaged.c:963
Code: e5 01 4c 89 ee e8 6e f9 ae ff 4d 85 ed 0f 84 28 fc ff ff e8 70 fc ae ff 48 8d 6b ff 4c 8d 63 07 e9 16 fc ff ff e8 5e fc ae ff <0f> 0b e9 96 fa ff ff 41 bc 1a 00 00 00 e9 86 fd ff ff e8 47 fc ae
RSP: 0018:ffffc90003fdf7d8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff888077f457c0 RSI: ffffffff81cd8f42 RDI: 0000000000000001
RBP: ffff888079388c0c R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 00007f6b48ccf700(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6b48a819f0 CR3: 00000000171e7000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
collapse_file+0x1ca/0x5780 mm/khugepaged.c:1715
hpage_collapse_scan_file+0xd6c/0x17a0 mm/khugepaged.c:2156
madvise_collapse+0x53a/0xb40 mm/khugepaged.c:2611
madvise_vma_behavior+0xd0a/0x1cc0 mm/madvise.c:1066
madvise_walk_vmas+0x1c7/0x2b0 mm/madvise.c:1240
do_madvise.part.0+0x24a/0x340 mm/madvise.c:1419
do_madvise mm/madvise.c:1432 [inline]
__do_sys_madvise mm/madvise.c:1432 [inline]
__se_sys_madvise mm/madvise.c:1430 [inline]
__x64_sys_madvise+0x113/0x150 mm/madvise.c:1430
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f6b48a4eef9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 b1 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f6b48ccf318 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 00007f6b48af0048 RCX: 00007f6b48a4eef9
RDX: 0000000000000019 RSI: 0000000000600003 RDI: 0000000020000000
RBP: 00007f6b48af0040 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f6b48aa53a4
R13: 00007f6b48bffcbf R14: 00007f6b48ccf400 R15: 0000000000022000
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

Hillf Danton

unread,
Oct 28, 2022, 5:49:33 AM10/28/22
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 28 Oct 2022 00:49:45 -0700
> syzbot found the following issue on:
>
> HEAD commit: a70385240892 Merge tag 'perf_urgent_for_v6.1_rc2' of git:/..
> git tree: upstream
> console+strace: https://syzkaller.appspot.com/x/log.txt?x=13f4726a880000
> kernel config: https://syzkaller.appspot.com/x/.config?x=ea03ca45176080bc
> dashboard link: https://syzkaller.appspot.com/bug?extid=0044b22d177870ee974f
> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13726f72880000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=13b57436880000

Check node online to quiesce the warning.

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git a70385240892

--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -957,9 +957,12 @@ static int alloc_charge_hpage(struct pag
{
/* Only allocate from the target node */
gfp_t gfp = (cc->is_khugepaged ? alloc_hugepage_khugepaged_gfpmask() :
- GFP_TRANSHUGE) | __GFP_THISNODE;
+ GFP_TRANSHUGE);
int node = hpage_collapse_find_target_node(cc);

+ if (node_online(node))
+ gfp |= __GFP_THISNODE;
+
if (!hpage_collapse_alloc_page(hpage, gfp, node))
return SCAN_ALLOC_HUGE_PAGE_FAIL;
if (unlikely(mem_cgroup_charge(page_folio(*hpage), mm, gfp)))
--

Yang Shi

unread,
Oct 28, 2022, 1:37:14 PM10/28/22
to syzbot, Zach O'Keefe, Mel Gorman, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
On Fri, Oct 28, 2022 at 12:49 AM syzbot
<syzbot+0044b2...@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: a70385240892 Merge tag 'perf_urgent_for_v6.1_rc2' of git:/..
> git tree: upstream
> console+strace: https://syzkaller.appspot.com/x/log.txt?x=13f4726a880000
> kernel config: https://syzkaller.appspot.com/x/.config?x=ea03ca45176080bc
> dashboard link: https://syzkaller.appspot.com/bug?extid=0044b22d177870ee974f
> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13726f72880000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=13b57436880000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/5e17f1e83cf3/disk-a7038524.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/f8ef729877f7/vmlinux-a7038524.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+0044b2...@syzkaller.appspotmail.com
>
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221 __alloc_pages_node include/linux/gfp.h:221 [inline]
> WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221 hpage_collapse_alloc_page mm/khugepaged.c:807 [inline]
> WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221 alloc_charge_hpage+0x802/0xaa0 mm/khugepaged.c:963

It looks like the node requested by the allocation is offlined, and
the allocation is using __GFP_THISNODE. It doesn't sound like
khugepaged specific problem IIUC, I don't see other callsites do
anything to prevent the node from being offlined.

Shall we remove that warn? The page allocation call will return NULL
anyway. Or we could just ignore the warn?

syzbot

unread,
Oct 28, 2022, 4:37:23 PM10/28/22
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: rcu detected stall in corrupted

rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P4090 } 2674 jiffies s: 2637 root: 0x0/T
rcu: blocking rcu_node structures (internal RCU debug):


Tested on:

commit: a7038524 Merge tag 'perf_urgent_for_v6.1_rc2' of git:/..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=1543d716880000
kernel config: https://syzkaller.appspot.com/x/.config?x=ea03ca45176080bc
dashboard link: https://syzkaller.appspot.com/bug?extid=0044b22d177870ee974f
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=1401516e880000

Zach O'Keefe

unread,
Oct 28, 2022, 5:44:53 PM10/28/22
to Yang Shi, syzbot, Mel Gorman, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
Thanks for taking a look, Yang.

As for the warn -- I don't see the value in the warning at present. As you
mention, I don't think it's a khugepaged-thing (as, at least from skimming,
other callsites don't take care to expect that the node is online). The
allocation might fail -- for a number of reasons -- and I don't think
__GFP_THISNODE + offline node is so special to deserve a warn.

A different question, is if this particular callsite wants to defend against
that possibility. My understanding, which I state without data, is that the
performance of accessing remote memory outweighs the benefit of pmd-mapping the
memory -- and so I think we rightly fail here.

So, my vote is to just remove the warn.

Thanks,
Zach

Yang Shi

unread,
Oct 31, 2022, 1:30:56 PM10/31/22
to Zach O'Keefe, Michal Hocko, syzbot, Mel Gorman, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
Thanks, Zach.

Did some archeology, it seems the warning was even stronger before
commit 8addc2d00fe17 ("mm: do not warn on offline nodes unless the
specific node is explicitly requested"). That commit softened the
warning for __GFP_THISNODE.

This warning seems not quite useful TBH because:
* There is no guarantee the node is online for __GFP_THISNODE context
* Kernel just fails the allocation regardless the warning, and it
looks all callsites handle the allocation failure gracefully.

So I think we'd better remove this warning.

Zach O'Keefe

unread,
Oct 31, 2022, 5:14:19 PM10/31/22
to Yang Shi, Michal Hocko, syzbot, Mel Gorman, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
Hey Yang,

Yep, I did a similar lookup and came to the same conclusion.

Will a look at your patch (thank you for sending it out)

Best,
Zach
Reply all
Reply to author
Forward
0 new messages