linux-next test error: BUG: using __this_cpu_read() in preemptible code in __mod_memcg_state

30 views
Skip to first unread message

syzbot

unread,
Mar 7, 2020, 4:05:11 PM3/7/20
to ak...@linux-foundation.org, cgr...@vger.kernel.org, han...@cmpxchg.org, linux-...@vger.kernel.org, linu...@kvack.org, mho...@kernel.org, syzkall...@googlegroups.com, vdavyd...@gmail.com
Hello,

syzbot found the following crash on:

HEAD commit: b86a6a24 Add linux-next specific files for 20200306
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1766b731e00000
kernel config: https://syzkaller.appspot.com/x/.config?x=9c79dccc623ccc6f
dashboard link: https://syzkaller.appspot.com/bug?extid=826543256ed3b8c37f62
compiler: gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+826543...@syzkaller.appspotmail.com

check_preemption_disabled: 3 callbacks suppressed
BUG: using __this_cpu_read() in preemptible [00000000] code: syz-fuzzer/9432
caller is __mod_memcg_state+0x27/0x1a0 mm/memcontrol.c:689
CPU: 1 PID: 9432 Comm: syz-fuzzer Not tainted 5.6.0-rc4-next-20200306-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x188/0x20d lib/dump_stack.c:118
check_preemption_disabled lib/smp_processor_id.c:47 [inline]
__this_cpu_preempt_check.cold+0x84/0x90 lib/smp_processor_id.c:64
__mod_memcg_state+0x27/0x1a0 mm/memcontrol.c:689
__split_huge_page mm/huge_memory.c:2575 [inline]
split_huge_page_to_list+0x124b/0x3380 mm/huge_memory.c:2862
split_huge_page include/linux/huge_mm.h:167 [inline]
madvise_free_huge_pmd+0x873/0xb90 mm/huge_memory.c:1766
madvise_free_pte_range+0x6ff/0x2650 mm/madvise.c:584
walk_pmd_range mm/pagewalk.c:89 [inline]
walk_pud_range mm/pagewalk.c:160 [inline]
walk_p4d_range mm/pagewalk.c:193 [inline]
walk_pgd_range mm/pagewalk.c:229 [inline]
__walk_page_range+0xcfb/0x2070 mm/pagewalk.c:331
walk_page_range+0x1bd/0x3a0 mm/pagewalk.c:427
madvise_free_single_vma+0x384/0x550 mm/madvise.c:731
madvise_dontneed_free mm/madvise.c:819 [inline]
madvise_vma mm/madvise.c:958 [inline]
do_madvise mm/madvise.c:1161 [inline]
do_madvise+0x5ba/0x1b80 mm/madvise.c:1084
__do_sys_madvise mm/madvise.c:1189 [inline]
__se_sys_madvise mm/madvise.c:1187 [inline]
__x64_sys_madvise+0xae/0x120 mm/madvise.c:1187
do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x460bf7
Code: 8b 24 24 48 8b 6c 24 10 48 83 c4 18 c3 cc cc cc cc cc cc 48 8b 7c 24 08 48 8b 74 24 10 8b 54 24 18 48 c7 c0 1c 00 00 00 0f 05 <89> 44 24 20 c3 cc cc cc cc 48 8b 7c 24 08 8b 74 24 10 8b 54 24 14
RSP: 002b:00007ffd6e086670 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 0000000000460bf7
RDX: 0000000000000008 RSI: 000000000000a000 RDI: 000000c00029a000
RBP: 00007ffd6e0866b0 R08: 000000c000200000 R09: 000000c0002a4000
R10: 00007fffffffffff R11: 0000000000000246 R12: 0000000000000007
R13: 00007f30cae546d0 R14: 0000000000000080 R15: 00000000000000fa
BUG: using __this_cpu_add() in preemptible [00000000] code: syz-fuzzer/9432
caller is __mod_memcg_state+0xca/0x1a0 mm/memcontrol.c:697
CPU: 1 PID: 9432 Comm: syz-fuzzer Not tainted 5.6.0-rc4-next-20200306-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x188/0x20d lib/dump_stack.c:118
check_preemption_disabled lib/smp_processor_id.c:47 [inline]
__this_cpu_preempt_check.cold+0x84/0x90 lib/smp_processor_id.c:64
__mod_memcg_state+0xca/0x1a0 mm/memcontrol.c:697
__split_huge_page mm/huge_memory.c:2575 [inline]
split_huge_page_to_list+0x124b/0x3380 mm/huge_memory.c:2862
split_huge_page include/linux/huge_mm.h:167 [inline]
madvise_free_huge_pmd+0x873/0xb90 mm/huge_memory.c:1766
madvise_free_pte_range+0x6ff/0x2650 mm/madvise.c:584
walk_pmd_range mm/pagewalk.c:89 [inline]
walk_pud_range mm/pagewalk.c:160 [inline]
walk_p4d_range mm/pagewalk.c:193 [inline]
walk_pgd_range mm/pagewalk.c:229 [inline]
__walk_page_range+0xcfb/0x2070 mm/pagewalk.c:331
walk_page_range+0x1bd/0x3a0 mm/pagewalk.c:427
madvise_free_single_vma+0x384/0x550 mm/madvise.c:731
madvise_dontneed_free mm/madvise.c:819 [inline]
madvise_vma mm/madvise.c:958 [inline]
do_madvise mm/madvise.c:1161 [inline]
do_madvise+0x5ba/0x1b80 mm/madvise.c:1084
__do_sys_madvise mm/madvise.c:1189 [inline]
__se_sys_madvise mm/madvise.c:1187 [inline]
__x64_sys_madvise+0xae/0x120 mm/madvise.c:1187
do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x460bf7
Code: 8b 24 24 48 8b 6c 24 10 48 83 c4 18 c3 cc cc cc cc cc cc 48 8b 7c 24 08 48 8b 74 24 10 8b 54 24 18 48 c7 c0 1c 00 00 00 0f 05 <89> 44 24 20 c3 cc cc cc cc 48 8b 7c 24 08 8b 74 24 10 8b 54 24 14
RSP: 002b:00007ffd6e086670 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 0000000000460bf7
RDX: 0000000000000008 RSI: 000000000000a000 RDI: 000000c00029a000
RBP: 00007ffd6e0866b0 R08: 000000c000200000 R09: 000000c0002a4000
R10: 00007fffffffffff R11: 0000000000000246 R12: 0000000000000007
R13: 00007f30cae546d0 R14: 0000000000000080 R15: 00000000000000fa
BUG: using __this_cpu_write() in preemptible [00000000] code: syz-fuzzer/9432
caller is __mod_memcg_state+0x87/0x1a0 mm/memcontrol.c:702
CPU: 1 PID: 9432 Comm: syz-fuzzer Not tainted 5.6.0-rc4-next-20200306-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x188/0x20d lib/dump_stack.c:118
check_preemption_disabled lib/smp_processor_id.c:47 [inline]
__this_cpu_preempt_check.cold+0x84/0x90 lib/smp_processor_id.c:64
__mod_memcg_state+0x87/0x1a0 mm/memcontrol.c:702
__split_huge_page mm/huge_memory.c:2575 [inline]
split_huge_page_to_list+0x124b/0x3380 mm/huge_memory.c:2862
split_huge_page include/linux/huge_mm.h:167 [inline]
madvise_free_huge_pmd+0x873/0xb90 mm/huge_memory.c:1766
madvise_free_pte_range+0x6ff/0x2650 mm/madvise.c:584
walk_pmd_range mm/pagewalk.c:89 [inline]
walk_pud_range mm/pagewalk.c:160 [inline]
walk_p4d_range mm/pagewalk.c:193 [inline]
walk_pgd_range mm/pagewalk.c:229 [inline]
__walk_page_range+0xcfb/0x2070 mm/pagewalk.c:331
walk_page_range+0x1bd/0x3a0 mm/pagewalk.c:427
madvise_free_single_vma+0x384/0x550 mm/madvise.c:731
madvise_dontneed_free mm/madvise.c:819 [inline]
madvise_vma mm/madvise.c:958 [inline]
do_madvise mm/madvise.c:1161 [inline]
do_madvise+0x5ba/0x1b80 mm/madvise.c:1084
__do_sys_madvise mm/madvise.c:1189 [inline]
__se_sys_madvise mm/madvise.c:1187 [inline]
__x64_sys_madvise+0xae/0x120 mm/madvise.c:1187
do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x460bf7
Code: 8b 24 24 48 8b 6c 24 10 48 83 c4 18 c3 cc cc cc cc cc cc 48 8b 7c 24 08 48 8b 74 24 10 8b 54 24 18 48 c7 c0 1c 00 00 00 0f 05 <89> 44 24 20 c3 cc cc cc cc 48 8b 7c 24 08 8b 74 24 10 8b 54 24 14
RSP: 002b:00007ffd6e086670 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 0000000000460bf7
RDX: 0000000000000008 RSI: 000000000000a000 RDI: 000000c00029a000
RBP: 00007ffd6e0866b0 R08: 000000c000200000 R09: 000000c0002a4000
R10: 00007fffffffffff R11: 0000000000000246 R12: 0000000000000007
R13: 00007f30cae546d0 R14: 0000000000000080 R15: 00000000000000fa


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Kirill A. Shutemov

unread,
Mar 9, 2020, 5:24:26 AM3/9/20
to syzbot, Alex Shi, ak...@linux-foundation.org, cgr...@vger.kernel.org, han...@cmpxchg.org, linux-...@vger.kernel.org, linu...@kvack.org, mho...@kernel.org, syzkall...@googlegroups.com, vdavyd...@gmail.com
It looks like a regression due to c8cba0cc2a80 ("mm/thp: narrow lru
locking").

Alex?

> madvise_free_huge_pmd+0x873/0xb90 mm/huge_memory.c:1766
> madvise_free_pte_range+0x6ff/0x2650 mm/madvise.c:584
> walk_pmd_range mm/pagewalk.c:89 [inline]
> walk_pud_range mm/pagewalk.c:160 [inline]
> walk_p4d_range mm/pagewalk.c:193 [inline]
> walk_pgd_range mm/pagewalk.c:229 [inline]
> __walk_page_range+0xcfb/0x2070 mm/pagewalk.c:331
> walk_page_range+0x1bd/0x3a0 mm/pagewalk.c:427
> madvise_free_single_vma+0x384/0x550 mm/madvise.c:731
> madvise_dontneed_free mm/madvise.c:819 [inline]
> madvise_vma mm/madvise.c:958 [inline]
> do_madvise mm/madvise.c:1161 [inline]
> do_madvise+0x5ba/0x1b80 mm/madvise.c:1084
> __do_sys_madvise mm/madvise.c:1189 [inline]
> __se_sys_madvise mm/madvise.c:1187 [inline]
> __x64_sys_madvise+0xae/0x120 mm/madvise.c:1187
> do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x460bf7
--
Kirill A. Shutemov

Alex Shi

unread,
Mar 9, 2020, 5:56:11 AM3/9/20
to Kirill A. Shutemov, syzbot, ak...@linux-foundation.org, cgr...@vger.kernel.org, han...@cmpxchg.org, linux-...@vger.kernel.org, linu...@kvack.org, mho...@kernel.org, syzkall...@googlegroups.com, vdavyd...@gmail.com


在 2020/3/9 下午5:24, Kirill A. Shutemov 写道:
>> check_preemption_disabled: 3 callbacks suppressed
>> BUG: using __this_cpu_read() in preemptible [00000000] code: syz-fuzzer/9432
>> caller is __mod_memcg_state+0x27/0x1a0 mm/memcontrol.c:689
>> CPU: 1 PID: 9432 Comm: syz-fuzzer Not tainted 5.6.0-rc4-next-20200306-syzkaller #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>> Call Trace:
>> __dump_stack lib/dump_stack.c:77 [inline]
>> dump_stack+0x188/0x20d lib/dump_stack.c:118
>> check_preemption_disabled lib/smp_processor_id.c:47 [inline]
>> __this_cpu_preempt_check.cold+0x84/0x90 lib/smp_processor_id.c:64
>> __mod_memcg_state+0x27/0x1a0 mm/memcontrol.c:689
>> __split_huge_page mm/huge_memory.c:2575 [inline]
>> split_huge_page_to_list+0x124b/0x3380 mm/huge_memory.c:2862
>> split_huge_page include/linux/huge_mm.h:167 [inline]
> It looks like a regression due to c8cba0cc2a80 ("mm/thp: narrow lru
> locking").

yes, I guess so.

In this patch, I am very bold to move the lru unlock from before
'remap_page(head);' up to before 'ClearPageCompound(head);' which is
often checked in lrulock. I want to know which part that real should
stay in lru_lock.

So revert this patch or move it back or move after ClearPageCompound
should fix this problem.

In the weekend and today, I tried a lot to reproduce this bug on my 2
machines, but still can't. :~(

Many thanks to give a try!

Thank
Alex

line 2605 mm/huge_memory.c:
spin_unlock_irqrestore(&pgdat->lru_lock, flags);

ClearPageCompound(head);

split_page_owner(head, HPAGE_PMD_ORDER);

/* See comment in __split_huge_page_tail() */
if (PageAnon(head)) {
/* Additional pin to swap cache */
if (PageSwapCache(head)) {
page_ref_add(head, 2);
xa_unlock(&swap_cache->i_pages);
} else {
page_ref_inc(head);
}
} else {
/* Additional pin to page cache */
page_ref_add(head, 2);
xa_unlock(&head->mapping->i_pages);
}

remap_page(head);

Alex Shi

unread,
Mar 9, 2020, 9:27:06 AM3/9/20
to Kirill A. Shutemov, syzbot, ak...@linux-foundation.org, cgr...@vger.kernel.org, han...@cmpxchg.org, linux-...@vger.kernel.org, linu...@kvack.org, mho...@kernel.org, syzkall...@googlegroups.com, vdavyd...@gmail.com


在 2020/3/9 下午5:56, Alex Shi 写道:
>
>
> 在 2020/3/9 下午5:24, Kirill A. Shutemov 写道:
>>> check_preemption_disabled: 3 callbacks suppressed
>>> BUG: using __this_cpu_read() in preemptible [00000000] code: syz-fuzzer/9432
>>> caller is __mod_memcg_state+0x27/0x1a0 mm/memcontrol.c:689
>>> CPU: 1 PID: 9432 Comm: syz-fuzzer Not tainted 5.6.0-rc4-next-20200306-syzkaller #0
>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>>> Call Trace:
>>> __dump_stack lib/dump_stack.c:77 [inline]
>>> dump_stack+0x188/0x20d lib/dump_stack.c:118
>>> check_preemption_disabled lib/smp_processor_id.c:47 [inline]
>>> __this_cpu_preempt_check.cold+0x84/0x90 lib/smp_processor_id.c:64
>>> __mod_memcg_state+0x27/0x1a0 mm/memcontrol.c:689
>>> __split_huge_page mm/huge_memory.c:2575 [inline]
>>> split_huge_page_to_list+0x124b/0x3380 mm/huge_memory.c:2862
>>> split_huge_page include/linux/huge_mm.h:167 [inline]
>> It looks like a regression due to c8cba0cc2a80 ("mm/thp: narrow lru
>> locking").
>
> yes, I guess so.

Yes, it is a stupid mistake to pull out lock for __mod_memcg_state which
should be in a lock.

revert this patch should be all fine, since ClearPageCompound and page_ref_inc
later may related with lru_list valid issue in release_pges.


Sorry for the disaster!

Alex

Dmitry Vyukov

unread,
Apr 18, 2020, 3:04:51 AM4/18/20
to Alex Shi, Linux-Next Mailing List, Stephen Rothwell, Kirill A. Shutemov, syzbot, Andrew Morton, Cgroups, Johannes Weiner, LKML, Linux-MM, Michal Hocko, syzkaller-bugs, Vladimir Davydov
+linux-next, Stephen for currently open linux-next build/boot failure

Hi Alex,

What's the status of this? Was the guilty patch reverted? If so,
please mark it as invalid for syzbot, otherwise it still shows up as
open bug.

Stephen Rothwell

unread,
Apr 18, 2020, 3:44:07 AM4/18/20
to Dmitry Vyukov, Alex Shi, Linux-Next Mailing List, Kirill A. Shutemov, syzbot, Andrew Morton, Cgroups, Johannes Weiner, LKML, Linux-MM, Michal Hocko, syzkaller-bugs, Vladimir Davydov
Hi Dmitry,
The patch was removed from Andrew's tree in March and never made it to
Linus' tree. I can't find how to tell syzbot that the patch went away ...

--
Cheers,
Stephen Rothwell

Stephen Rothwell

unread,
Apr 18, 2020, 3:51:09 AM4/18/20
to Dmitry Vyukov, Alex Shi, Linux-Next Mailing List, Kirill A. Shutemov, syzbot, Andrew Morton, Cgroups, Johannes Weiner, LKML, Linux-MM, Michal Hocko, syzkaller-bugs, Vladimir Davydov
Hi Stephen,
Lets try:

#syz invalid

--
Cheers,
Stephen Rothwell

Dmitry Vyukov

unread,
Apr 18, 2020, 4:02:50 AM4/18/20
to Stephen Rothwell, Alex Shi, Linux-Next Mailing List, Kirill A. Shutemov, syzbot, Andrew Morton, Cgroups, Johannes Weiner, LKML, Linux-MM, Michal Hocko, syzkaller-bugs, Vladimir Davydov
This is correct, thanks!

You may now see "Status: closed as invalid on 2020/04/18 07:51" at:
https://syzkaller.appspot.com/bug?extid=826543256ed3b8c37f62

It does not show up as "open" and if this will happen again syzbot
will report it (rather than assume it's still the old bug happening).

Stephen Rothwell

unread,
Apr 18, 2020, 4:14:08 AM4/18/20
to Dmitry Vyukov, Alex Shi, Linux-Next Mailing List, Kirill A. Shutemov, syzbot, Andrew Morton, Cgroups, Johannes Weiner, LKML, Linux-MM, Michal Hocko, syzkaller-bugs, Vladimir Davydov
Hi Dmitry,

On Sat, 18 Apr 2020 10:02:36 +0200 Dmitry Vyukov <dvy...@google.com> wrote:
> >
> > #syz invalid
>
> This is correct, thanks!
>
> You may now see "Status: closed as invalid on 2020/04/18 07:51" at:
> https://syzkaller.appspot.com/bug?extid=826543256ed3b8c37f62
>
> It does not show up as "open" and if this will happen again syzbot
> will report it (rather than assume it's still the old bug happening).

OK, good, thanks.

--
Cheers,
Stephen Rothwell
Reply all
Reply to author
Forward
0 new messages