mm: BUG in pgtable_pmd_page_dtor

30 views
Skip to first unread message

Dmitry Vyukov

unread,
Nov 18, 2016, 5:19:51 AM11/18/16
to Andrew Morton, Kirill A. Shutemov, Michal Hocko, Vlastimil Babka, Ingo Molnar, Joonsoo Kim, linu...@kvack.org, LKML, Andrey Ryabinin, syzkaller
Hello,

I've got the following BUG while running syzkaller on
a25f0944ba9b1d8a6813fd6f1a86f1bd59ac25a6 (4.9-rc5). Unfortunately it's
not reproducible.

kernel BUG at ./include/linux/mm.h:1743!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 3 PID: 4049 Comm: syz-fuzzer Not tainted 4.9.0-rc5+ #43
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff88006ad028c0 task.stack: ffff8800667e0000
RIP: 0010:[<ffffffff8130e2ab>] [< inline >]
pgtable_pmd_page_dtor include/linux/mm.h:1743
RIP: 0010:[<ffffffff8130e2ab>] [<ffffffff8130e2ab>]
___pmd_free_tlb+0x3db/0x5a0 arch/x86/mm/pgtable.c:74
RSP: 0018:ffff8800667e6908 EFLAGS: 00010292
RAX: 0000000000000000 RBX: 1ffff1000ccfcd25 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffed000ccfcd10
RBP: ffff8800667e6a70 R08: 0000000000000001 R09: 0000000000000000
R10: dffffc0000000000 R11: 0000000000000001 R12: ffff8800667e6ef8
R13: ffff8800667e6a48 R14: ffffea0000e196c0 R15: 000000000003865b
FS: 00007f152a530700(0000) GS:ffff88006d100000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1514bff9d0 CR3: 0000000009821000 CR4: 00000000000006e0
DR0: 0000000000000400 DR1: 0000000000000400 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Stack:
0000000000000000 ffff88006ad030e0 ffff88006ad030b8 dffffc0000000000
0000000041b58ab3 ffffffff894db568 ffffffff8130ded0 ffffffff8156b2a0
0000000000000082 ffff88006ad030e0 1ffff1000ccfcd30 1ffff1000ccfcd38
Call Trace:
[< inline >] __pmd_free_tlb arch/x86/include/asm/pgalloc.h:110
[< inline >] free_pmd_range mm/memory.c:443
[< inline >] free_pud_range mm/memory.c:461
[<ffffffff81946458>] free_pgd_range+0xb98/0x1270 mm/memory.c:537
[<ffffffff81946da5>] free_pgtables+0x275/0x340 mm/memory.c:569
[<ffffffff81972761>] exit_mmap+0x281/0x4e0 mm/mmap.c:2942
[< inline >] __mmput kernel/fork.c:866
[<ffffffff813f24ce>] mmput+0x20e/0x4c0 kernel/fork.c:888
[< inline >] exit_mm kernel/exit.c:512
[<ffffffff814119a0>] do_exit+0x960/0x2640 kernel/exit.c:815
[<ffffffff8141383e>] do_group_exit+0x14e/0x420 kernel/exit.c:931
[<ffffffff814429d3>] get_signal+0x663/0x1880 kernel/signal.c:2307
[<ffffffff81239b45>] do_signal+0xc5/0x2190 arch/x86/kernel/signal.c:807
[<ffffffff8100666a>] exit_to_usermode_loop+0x1ea/0x2d0
arch/x86/entry/common.c:156
[< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190
[<ffffffff81009693>] syscall_return_slowpath+0x4d3/0x570
arch/x86/entry/common.c:259
[<ffffffff881479a6>] entry_SYSCALL_64_fastpath+0xc4/0xc6
Code: 10 00 00 4c 89 e7 e8 25 6c 63 00 e9 9b fd ff ff e8 0b 9e 3d 00
0f 0b e8 04 9e 3d 00 48 c7 c6 00 ac 27 88 4c 89 f7 e8 85 9e 62 00 <0f>
0b e8 9e 2d 6e 00 e9 1a fe ff ff 48 89 cf 48 89 8d b0 fe ff
RIP [< inline >] pgtable_pmd_page_dtor include/linux/mm.h:1743
RIP [<ffffffff8130e2ab>] ___pmd_free_tlb+0x3db/0x5a0 arch/x86/mm/pgtable.c:74
RSP <ffff8800667e6908>
---[ end trace 4ef4b70d88f62f8a ]---

Kirill A. Shutemov

unread,
Nov 18, 2016, 5:52:42 AM11/18/16
to Dmitry Vyukov, Andrew Morton, Kirill A. Shutemov, Michal Hocko, Vlastimil Babka, Ingo Molnar, Joonsoo Kim, linu...@kvack.org, LKML, Andrey Ryabinin, syzkaller
On Fri, Nov 18, 2016 at 11:19:30AM +0100, Dmitry Vyukov wrote:
> Hello,
>
> I've got the following BUG while running syzkaller on
> a25f0944ba9b1d8a6813fd6f1a86f1bd59ac25a6 (4.9-rc5). Unfortunately it's
> not reproducible.

I don't think there's enough info to track it down :(

Let me know if you will see this again.

--
Kirill A. Shutemov

Vlastimil Babka

unread,
Nov 24, 2016, 8:49:41 AM11/24/16
to Dmitry Vyukov, Andrew Morton, Kirill A. Shutemov, Michal Hocko, Ingo Molnar, Joonsoo Kim, linu...@kvack.org, LKML, Andrey Ryabinin, syzkaller
On 11/18/2016 11:19 AM, Dmitry Vyukov wrote:
> Hello,
>
> I've got the following BUG while running syzkaller on
> a25f0944ba9b1d8a6813fd6f1a86f1bd59ac25a6 (4.9-rc5). Unfortunately it's
> not reproducible.
>
> kernel BUG at ./include/linux/mm.h:1743!
> invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN

Shouldn't there be also dump_page() output? Since you've hit this:
VM_BUG_ON_PAGE(page->pmd_huge_pte, page);

Anyway the output wouldn't contain the value of pmd_huge_pte or stuff
that's in union with it. I'd suggest adding a local patch that prints
this in the error case, in case the fuzzer hits it again.

Heck, it might even make sense to print raw contents of struct page in
dump_page() as a catch-all solution? Should I send a patch?
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majo...@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"do...@kvack.org"> em...@kvack.org </a>
>

Dmitry Vyukov

unread,
Nov 24, 2016, 9:23:47 AM11/24/16
to Vlastimil Babka, Andrew Morton, Kirill A. Shutemov, Michal Hocko, Ingo Molnar, Joonsoo Kim, linu...@kvack.org, LKML, Andrey Ryabinin, syzkaller
On Thu, Nov 24, 2016 at 2:49 PM, Vlastimil Babka <vba...@suse.cz> wrote:
> On 11/18/2016 11:19 AM, Dmitry Vyukov wrote:
>>
>> Hello,
>>
>> I've got the following BUG while running syzkaller on
>> a25f0944ba9b1d8a6813fd6f1a86f1bd59ac25a6 (4.9-rc5). Unfortunately it's
>> not reproducible.
>>
>> kernel BUG at ./include/linux/mm.h:1743!
>> invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
>
>
> Shouldn't there be also dump_page() output? Since you've hit this:
> VM_BUG_ON_PAGE(page->pmd_huge_pte, page);

Here it is:

[ 250.326131] page:ffffea0000e196c0 count:1 mapcount:0 mapping:
(null) index:0x0
[ 250.343393] flags: 0x1fffc0000000000()
[ 250.345328] page dumped because: VM_BUG_ON_PAGE(page->pmd_huge_pte)
[ 250.346780] ------------[ cut here ]------------
[ 250.347742] kernel BUG at ./include/linux/mm.h:1743!


> Anyway the output wouldn't contain the value of pmd_huge_pte or stuff that's
> in union with it. I'd suggest adding a local patch that prints this in the
> error case, in case the fuzzer hits it again.
>
> Heck, it might even make sense to print raw contents of struct page in
> dump_page() as a catch-all solution? Should I send a patch?

Yes, please send.
We are moving towards continuous build without local patches.

Vlastimil Babka

unread,
Nov 25, 2016, 3:42:11 AM11/25/16
to Dmitry Vyukov, Andrew Morton, Kirill A. Shutemov, Michal Hocko, Ingo Molnar, Joonsoo Kim, linu...@kvack.org, LKML, Andrey Ryabinin, syzkaller
On 11/24/2016 03:23 PM, Dmitry Vyukov wrote:
> On Thu, Nov 24, 2016 at 2:49 PM, Vlastimil Babka <vba...@suse.cz> wrote:
>> On 11/18/2016 11:19 AM, Dmitry Vyukov wrote:
>>>
>>> Hello,
>>>
>>> I've got the following BUG while running syzkaller on
>>> a25f0944ba9b1d8a6813fd6f1a86f1bd59ac25a6 (4.9-rc5). Unfortunately it's
>>> not reproducible.
>>>
>>> kernel BUG at ./include/linux/mm.h:1743!
>>> invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
>>
>>
>> Shouldn't there be also dump_page() output? Since you've hit this:
>> VM_BUG_ON_PAGE(page->pmd_huge_pte, page);
>
> Here it is:
>
> [ 250.326131] page:ffffea0000e196c0 count:1 mapcount:0 mapping:
> (null) index:0x0
> [ 250.343393] flags: 0x1fffc0000000000()
> [ 250.345328] page dumped because: VM_BUG_ON_PAGE(page->pmd_huge_pte)
> [ 250.346780] ------------[ cut here ]------------
> [ 250.347742] kernel BUG at ./include/linux/mm.h:1743!

Yeah, as expected, not very useful for this particular BUG_ON :/

>> Anyway the output wouldn't contain the value of pmd_huge_pte or stuff that's
>> in union with it. I'd suggest adding a local patch that prints this in the
>> error case, in case the fuzzer hits it again.
>>
>> Heck, it might even make sense to print raw contents of struct page in
>> dump_page() as a catch-all solution? Should I send a patch?
>
> Yes, please send.
> We are moving towards continuous build without local patches.

Something like this?
-------8<-------
From 2ac2c9b83d7c4c8be076c24246865a2ed01f9032 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vba...@suse.cz>
Date: Fri, 25 Nov 2016 09:08:05 +0100
Subject: [PATCH] mm, debug: print raw struct page data in __dump_page()

The __dump_page() function is used when a page metadata inconsistency is
detected, either by standard runtime checks, or extra checks in CONFIG_DEBUG_VM
builds. It prints some of the relevant metadata, but not the whole struct page,
which is based on unions and interpretation is dependent on the context.

This means that sometimes e.g. a VM_BUG_ON_PAGE() checks certain field, which
is however not printed by __dump_page() and the resulting bug report may then
lack clues that could help in determining the root cause. This patch solves
the problem by simply printing the whole struct page word by word, so no part
is missing, but the interpretation of the data is left to developers. This is
similar to e.g. x86_64 raw stack dumps.

Example output:

page:ffffea00000475c0 count:1 mapcount:0 mapping: (null) index:0x0
flags: 0x100000000000400(reserved)
raw struct page data:
0100000000000400 0000000000000000 0000000000000000 00000001ffffffff
ffffea00000475e0 ffffea00000475e0 0000000000000000 0000000000000000
page dumped because: VM_BUG_ON_PAGE(1)

Signed-off-by: Vlastimil Babka <vba...@suse.cz>
---
mm/debug.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/mm/debug.c b/mm/debug.c
index 9feb699c5d25..9f67ad74d036 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -48,6 +48,8 @@ void __dump_page(struct page *page, const char *reason)
* encode own info.
*/
int mapcount = PageSlab(page) ? 0 : page_mapcount(page);
+ int i;
+ const int words_per_line = (sizeof(unsigned long) == 8) ? 4 : 8;

pr_emerg("page:%p count:%d mapcount:%d mapping:%p index:%#lx",
page, page_ref_count(page), mapcount,
@@ -59,6 +61,21 @@ void __dump_page(struct page *page, const char *reason)

pr_emerg("flags: %#lx(%pGp)\n", page->flags, &page->flags);

+ pr_alert("raw struct page data:");
+ for (i = 0; i < sizeof(struct page) / sizeof(unsigned long); i++) {
+ unsigned long *word_ptr;
+
+ word_ptr = ((unsigned long *) page) + i;
+
+ if ((i % words_per_line) == 0) {
+ pr_cont("\n");
+ pr_alert(" %016lx", *word_ptr);
+ } else {
+ pr_cont(" %016lx", *word_ptr);
+ }
+ }
+ pr_cont("\n");
+
if (reason)
pr_alert("page dumped because: %s\n", reason);

--
2.10.2


Kirill A. Shutemov

unread,
Nov 25, 2016, 5:48:32 AM11/25/16
to Vlastimil Babka, Dmitry Vyukov, Andrew Morton, Kirill A. Shutemov, Michal Hocko, Ingo Molnar, Joonsoo Kim, linu...@kvack.org, LKML, Andrey Ryabinin, syzkaller
Do we really need this line? I would like to keep dump_page() output as
compact as possible.

> + for (i = 0; i < sizeof(struct page) / sizeof(unsigned long); i++) {
> + unsigned long *word_ptr;
> +
> + word_ptr = ((unsigned long *) page) + i;
> +
> + if ((i % words_per_line) == 0) {
> + pr_cont("\n");
> + pr_alert(" %016lx", *word_ptr);
> + } else {
> + pr_cont(" %016lx", *word_ptr);

16 is a waste on 32-bit system. And it will produce too long lines.

Maybe 'unsigned long long' a time?

> + }
> + }
> + pr_cont("\n");
> +
> if (reason)
> pr_alert("page dumped because: %s\n", reason);
>
> --
> 2.10.2
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majo...@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"do...@kvack.org"> em...@kvack.org </a>

--
Kirill A. Shutemov

Andrey Ryabinin

unread,
Nov 25, 2016, 6:41:23 AM11/25/16
to Vlastimil Babka, Dmitry Vyukov, Andrew Morton, Kirill A. Shutemov, Michal Hocko, Ingo Molnar, Joonsoo Kim, linu...@kvack.org, LKML, syzkaller


On 11/25/2016 11:42 AM, Vlastimil Babka wrote:

> pr_emerg("page:%p count:%d mapcount:%d mapping:%p index:%#lx",
> page, page_ref_count(page), mapcount,
> @@ -59,6 +61,21 @@ void __dump_page(struct page *page, const char *reason)
>
> pr_emerg("flags: %#lx(%pGp)\n", page->flags, &page->flags);
>
> + pr_alert("raw struct page data:");
> + for (i = 0; i < sizeof(struct page) / sizeof(unsigned long); i++) {
> + unsigned long *word_ptr;
> +
> + word_ptr = ((unsigned long *) page) + i;
> +
> + if ((i % words_per_line) == 0) {
> + pr_cont("\n");
> + pr_alert(" %016lx", *word_ptr);
> + } else {
> + pr_cont(" %016lx", *word_ptr);
> + }
> + }
> + pr_cont("\n");
> +

Single call to print_hex_dump() could replace this loop.

Vlastimil Babka

unread,
Nov 25, 2016, 7:59:02 AM11/25/16
to Andrey Ryabinin, Dmitry Vyukov, Andrew Morton, Kirill A. Shutemov, Michal Hocko, Ingo Molnar, Joonsoo Kim, linu...@kvack.org, LKML, syzkaller
Ah, didn't know about that one, thanks!

This also addresses Kirill's comment:

-----8<-----
From 417467521d0a68fb70dc2d5bd151524bf0c79437 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vba...@suse.cz>
Date: Fri, 25 Nov 2016 09:08:05 +0100
Subject: [PATCH] mm, debug: print raw struct page data in __dump_page()

The __dump_page() function is used when a page metadata inconsistency is
detected, either by standard runtime checks, or extra checks in CONFIG_DEBUG_VM
builds. It prints some of the relevant metadata, but not the whole struct page,
which is based on unions and interpretation is dependent on the context.

This means that sometimes e.g. a VM_BUG_ON_PAGE() checks certain field, which
is however not printed by __dump_page() and the resulting bug report may then
lack clues that could help in determining the root cause. This patch solves
the problem by simply printing the whole struct page word by word, so no part
is missing, but the interpretation of the data is left to developers. This is
similar to e.g. x86_64 raw stack dumps.

Example output:

page:ffffea00000475c0 count:1 mapcount:0 mapping: (null) index:0x0
flags: 0x100000000000400(reserved)
raw: 0100000000000400 0000000000000000 0000000000000000 00000001ffffffff
raw: ffffea00000475e0 ffffea00000475e0 0000000000000000 0000000000000000
page dumped because: VM_BUG_ON_PAGE(1)

[arya...@virtuozzo.com: suggested print_hex_dump()]
Signed-off-by: Vlastimil Babka <vba...@suse.cz>
---
mm/debug.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/mm/debug.c b/mm/debug.c
index 9feb699c5d25..185c19bda078 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -59,6 +59,10 @@ void __dump_page(struct page *page, const char *reason)

pr_emerg("flags: %#lx(%pGp)\n", page->flags, &page->flags);

+ print_hex_dump(KERN_ALERT, "raw: ", DUMP_PREFIX_NONE,
+ 32, (sizeof(unsigned long) == 8) ? 8 : 4,
+ page, sizeof(struct page), false);
+
if (reason)
pr_alert("page dumped because: %s\n", reason);

--
2.10.2



Kirill A. Shutemov

unread,
Nov 25, 2016, 8:08:00 AM11/25/16
to Vlastimil Babka, Andrey Ryabinin, Dmitry Vyukov, Andrew Morton, Kirill A. Shutemov, Michal Hocko, Ingo Molnar, Joonsoo Kim, linu...@kvack.org, LKML, syzkaller
That's a very fancy way to write sizeof(unsigned long) ;)

--
Kirill A. Shutemov

Vlastimil Babka

unread,
Nov 25, 2016, 9:08:16 AM11/25/16
to Kirill A. Shutemov, Andrey Ryabinin, Dmitry Vyukov, Andrew Morton, Kirill A. Shutemov, Michal Hocko, Ingo Molnar, Joonsoo Kim, linu...@kvack.org, LKML, syzkaller
On 11/25/2016 02:07 PM, Kirill A. Shutemov wrote:
>> --- a/mm/debug.c
>> +++ b/mm/debug.c
>> @@ -59,6 +59,10 @@ void __dump_page(struct page *page, const char *reason)
>>
>> pr_emerg("flags: %#lx(%pGp)\n", page->flags, &page->flags);
>>
>> + print_hex_dump(KERN_ALERT, "raw: ", DUMP_PREFIX_NONE,
>> + 32, (sizeof(unsigned long) == 8) ? 8 : 4,
>
> That's a very fancy way to write sizeof(unsigned long) ;)

Ah, damnit, thanks.

----8<----
From 08d2ee803567c13e3de7ce7e19338fe5286cc6b8 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vba...@suse.cz>
Date: Fri, 25 Nov 2016 09:08:05 +0100
Subject: [PATCH v3] mm, debug: print raw struct page data in __dump_page()

The __dump_page() function is used when a page metadata inconsistency is
detected, either by standard runtime checks, or extra checks in CONFIG_DEBUG_VM
builds. It prints some of the relevant metadata, but not the whole struct page,
which is based on unions and interpretation is dependent on the context.

This means that sometimes e.g. a VM_BUG_ON_PAGE() checks certain field, which
is however not printed by __dump_page() and the resulting bug report may then
lack clues that could help in determining the root cause. This patch solves
the problem by simply printing the whole struct page word by word, so no part
is missing, but the interpretation of the data is left to developers. This is
similar to e.g. x86_64 raw stack dumps.

Example output:

page:ffffea00000475c0 count:1 mapcount:0 mapping: (null) index:0x0
flags: 0x100000000000400(reserved)
raw: 0100000000000400 0000000000000000 0000000000000000 00000001ffffffff
raw: ffffea00000475e0 ffffea00000475e0 0000000000000000 0000000000000000
page dumped because: VM_BUG_ON_PAGE(1)

[arya...@virtuozzo.com: suggested print_hex_dump()]
Signed-off-by: Vlastimil Babka <vba...@suse.cz>
---
mm/debug.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/mm/debug.c b/mm/debug.c
index 9feb699c5d25..db1cd26d8752 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -59,6 +59,10 @@ void __dump_page(struct page *page, const char *reason)

pr_emerg("flags: %#lx(%pGp)\n", page->flags, &page->flags);

+ print_hex_dump(KERN_ALERT, "raw: ", DUMP_PREFIX_NONE, 32,
+ sizeof(unsigned long), page,
+ sizeof(struct page), false);

Kirill A. Shutemov

unread,
Nov 25, 2016, 9:15:59 AM11/25/16
to Vlastimil Babka, Andrey Ryabinin, Dmitry Vyukov, Andrew Morton, Kirill A. Shutemov, Michal Hocko, Ingo Molnar, Joonsoo Kim, linu...@kvack.org, LKML, syzkaller
Acked-by: Kirill A. Shutemov <kirill....@linux.intel.com>

--
Kirill A. Shutemov

Andrey Ryabinin

unread,
Nov 25, 2016, 11:02:45 AM11/25/16
to Vlastimil Babka, Kirill A. Shutemov, Dmitry Vyukov, Andrew Morton, Kirill A. Shutemov, Michal Hocko, Ingo Molnar, Joonsoo Kim, linu...@kvack.org, LKML, syzkaller
Acked-by: Andrey Ryabinin <arya...@virtuozzo.com>
Reply all
Reply to author
Forward
0 new messages