mm: GPF in __insert_vmap

Dmitry Vyukov

unread,

Sep 3, 2016, 8:15:36 AM9/3/16

to Andrew Morton, David Rientjes, Vladimir Davydov, ziju...@zoho.com, Joonsoo Kim, linu...@kvack.org, LKML, Kirill A. Shutemov, Andrey Ryabinin, Konstantin Khlebnikov, syzkaller

Hello,

While running syzkaller fuzzer I've got the following GPF:

general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 2 PID: 4268 Comm: syz-executor Not tainted 4.8.0-rc3-next-20160825+ #8
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff88006a6527c0 task.stack: ffff880052630000
RIP: 0010:[<ffffffff82e1ccd6>] [<ffffffff82e1ccd6>]
__list_add_valid+0x26/0xd0 lib/list_debug.c:23
RSP: 0018:ffff880052637a18 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90001c87000
RDX: 0000000000000001 RSI: ffff88001344cdb0 RDI: 0000000000000008
RBP: ffff880052637a30 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: ffffffff8a5deee0 R12: ffff88006cc47230
R13: ffff88001344cdb0 R14: ffff88006cc47230 R15: 0000000000000000
FS: 00007fbacc97e700(0000) GS:ffff88006d200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020de7000 CR3: 000000003c4d2000 CR4: 00000000000006e0
DR0: 000000000000001e DR1: 000000000000001e DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Stack:
ffff88006cc47200 ffff88001344cd98 ffff88006cc47200 ffff880052637a78
ffffffff817bc6d1 ffff88006cc47208 ffffed000d988e41 ffff88006cc47208
ffff88006cc3e680 ffffc900035b7000 ffffc900035a7000 ffff88006cc47200
Call Trace:
[< inline >] __list_add_rcu include/linux/rculist.h:51
[< inline >] list_add_rcu include/linux/rculist.h:78
[<ffffffff817bc6d1>] __insert_vmap_area+0x1c1/0x3c0 mm/vmalloc.c:340
[<ffffffff817bf544>] alloc_vmap_area+0x614/0x890 mm/vmalloc.c:458
[<ffffffff817bf8a8>] __get_vm_area_node+0xe8/0x340 mm/vmalloc.c:1377
[<ffffffff817c332a>] __vmalloc_node_range+0xaa/0x6d0 mm/vmalloc.c:1687
[< inline >] __vmalloc_node mm/vmalloc.c:1736
[<ffffffff817c39ab>] __vmalloc+0x5b/0x70 mm/vmalloc.c:1742
[<ffffffff8166ae9c>] bpf_prog_alloc+0x3c/0x190 kernel/bpf/core.c:82
[<ffffffff85c40ba9>] bpf_prog_create_from_user+0xa9/0x2c0
net/core/filter.c:1132
[< inline >] seccomp_prepare_filter kernel/seccomp.c:373
[< inline >] seccomp_prepare_user_filter kernel/seccomp.c:408
[< inline >] seccomp_set_mode_filter kernel/seccomp.c:737
[<ffffffff815d7687>] do_seccomp+0x317/0x1800 kernel/seccomp.c:787
[<ffffffff815d8f84>] prctl_set_seccomp+0x34/0x60 kernel/seccomp.c:830
[< inline >] SYSC_prctl kernel/sys.c:2157
[<ffffffff813ccf8f>] SyS_prctl+0x82f/0xc80 kernel/sys.c:2075
[<ffffffff86e10700>] entry_SYSCALL_64_fastpath+0x23/0xc1
Code: 00 00 00 00 00 55 48 b8 00 00 00 00 00 fc ff df 48 89 e5 41 54
49 89 fc 48 8d 7a 08 53 48 89 d3 48 89 fa 48 83 ec 08 48 c1 ea 03 <80>
3c 02 00 75 7c 48 8b 53 08 48 39 f2 75 37 48 89 f2 48 b8 00
RIP [<ffffffff82e1ccd6>] __list_add_valid+0x26/0xd0 lib/list_debug.c:23
RSP <ffff880052637a18>
---[ end trace 983e625f02f00d9f ]---
Kernel panic - not syncing: Fatal exception

On commit 0f98f121e1670eaa2a2fbb675e07d6ba7f0e146f of linux-next.
Unfortunately it is not reproducible.
The crashing line is:
CHECK_DATA_CORRUPTION(next->prev != prev,

It crashed on KASAN check at (%rax, %rdx), this address corresponds to
next address = 0x8. So next was ~NULL.

zijun_hu

unread,

Sep 4, 2016, 7:31:14 PM9/4/16

to Dmitry Vyukov, Andrew Morton, David Rientjes, Vladimir Davydov, Joonsoo Kim, linu...@kvack.org, LKML, Kirill A. Shutemov, Andrey Ryabinin, Konstantin Khlebnikov, ziju...@htc.com, syzkaller

Hi Dmitry
i am not sure for the real reason of the GPF panic
i have been browsing mm/vmalloc.c recently and have found several bugs perhaps
i hope it maybe help you

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 91f44e7..1429ee6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -321,10 +321,10 @@ static void __insert_vmap_area(struct vmap_area *va)

parent = *p;
tmp_va = rb_entry(parent, struct vmap_area, rb_node);
- if (va->va_start < tmp_va->va_end)
- p = &(*p)->rb_left;
- else if (va->va_end > tmp_va->va_start)
- p = &(*p)->rb_right;
+ if (va->va_end <= tmp_va->va_start)
+ p = &parent->rb_left;
+ else if (va->va_start >= tmp_va->va_end)
+ p = &parent->rb_right;
else
BUG();
}
@@ -432,6 +432,9 @@ nocache:

if (!first)
goto found;
+
+ if (addr > first->va_end)
+ first = list_next_entry(first, list);
}

/* from the starting point, walk areas until a suitable hole is found */
@@ -594,7 +597,9 @@ static unsigned long lazy_max_pages(void)
{
unsigned int log;

- log = fls(num_online_cpus());
+ log = num_online_cpus();
+ if (log > 1)
+ log = (unsigned int)get_count_order(log);

return log * (32UL * 1024 * 1024 / PAGE_SIZE);
}
--
1.9.1

Dmitry Vyukov

unread,

Sep 5, 2016, 4:37:58 AM9/5/16

to zijun_hu, Andrew Morton, David Rientjes, Vladimir Davydov, Joonsoo Kim, linu...@kvack.org, LKML, Kirill A. Shutemov, Andrey Ryabinin, Konstantin Khlebnikov, ziju...@htc.com, syzkaller

Hi,

Unfortunately the bug is not reproducible, so I can't test your patches.
However, if you see bugs in mm code, please mail these patches for review.

Kees Cook

unread,

Sep 6, 2016, 5:03:43 PM9/6/16

to Daniel Borkmann, Paul McKenney, Dmitry Vyukov, Andrew Morton, David Rientjes, Vladimir Davydov, ziju...@zoho.com, Joonsoo Kim, linu...@kvack.org, LKML, Kirill A. Shutemov, Andrey Ryabinin, Konstantin Khlebnikov, syzkaller

Paul, the RCU torture tests passed with the CONFIG_DEBUG_LIST changes,
IIRC, yes? I'd love to rule out some kind of race condition between
the removal and add code for the checking.

Daniel, IIRC there was some talk about RCU and BPF? Am I remembering
that correctly? I'm having a hard time imagining how a list add could
fail (maybe a race between two adds)?

Hmmm

-Kees

--
Kees Cook
Nexus Security

Daniel Borkmann

unread,

Sep 6, 2016, 7:12:13 PM9/6/16

to Kees Cook, Paul McKenney, Dmitry Vyukov, Andrew Morton, David Rientjes, Vladimir Davydov, ziju...@zoho.com, Joonsoo Kim, linu...@kvack.org, LKML, Kirill A. Shutemov, Andrey Ryabinin, Konstantin Khlebnikov, syzkaller, alexei.st...@gmail.com

Can you elaborate? You hit this only once and then never again
in this or some other, similar call-trace form, right?

>> The crashing line is:
>> CHECK_DATA_CORRUPTION(next->prev != prev,
>>
>> It crashed on KASAN check at (%rax, %rdx), this address corresponds to
>> next address = 0x8. So next was ~NULL.
>
> Paul, the RCU torture tests passed with the CONFIG_DEBUG_LIST changes,
> IIRC, yes? I'd love to rule out some kind of race condition between
> the removal and add code for the checking.
>
> Daniel, IIRC there was some talk about RCU and BPF? Am I remembering

--verbose, what talk specifically? There were some fixes longer
time ago, but related to eBPF, not cBPF, but even there I don't
see currently how it could be related to a va->list corruption
triggered in __insert_vmap_area().

> that correctly? I'm having a hard time imagining how a list add could
> fail (maybe a race between two adds)?

Or some use after free that would have corrupted that memory? I
would think right now that the path via bpf_prog_alloc() could
have triggered, but not necessarily caused the issue, hmm.

Paul E. McKenney

unread,

Sep 7, 2016, 3:29:36 AM9/7/16

to Kees Cook, Daniel Borkmann, Dmitry Vyukov, Andrew Morton, David Rientjes, Vladimir Davydov, ziju...@zoho.com, Joonsoo Kim, linu...@kvack.org, LKML, Kirill A. Shutemov, Andrey Ryabinin, Konstantin Khlebnikov, syzkaller

Indeed they did. But of course rcutorture is about the RCU implementation,
and less about uses of RCU.

Thanx, Paul

Kees Cook

unread,

Sep 7, 2016, 1:23:48 PM9/7/16

to Daniel Borkmann, Paul McKenney, Dmitry Vyukov, Andrew Morton, David Rientjes, Vladimir Davydov, ziju...@zoho.com, Joonsoo Kim, linu...@kvack.org, LKML, Kirill A. Shutemov, Andrey Ryabinin, Konstantin Khlebnikov, syzkaller, Alexei Starovoitov

That's my understanding.

>>> The crashing line is:
>>> CHECK_DATA_CORRUPTION(next->prev != prev,
>>>
>>> It crashed on KASAN check at (%rax, %rdx), this address corresponds to
>>> next address = 0x8. So next was ~NULL.
>>
>>
>> Paul, the RCU torture tests passed with the CONFIG_DEBUG_LIST changes,
>> IIRC, yes? I'd love to rule out some kind of race condition between
>> the removal and add code for the checking.
>>
>> Daniel, IIRC there was some talk about RCU and BPF? Am I remembering
>
>
> --verbose, what talk specifically? There were some fixes longer
> time ago, but related to eBPF, not cBPF, but even there I don't
> see currently how it could be related to a va->list corruption
> triggered in __insert_vmap_area().

I meant "discussion", but digging around I think I was remembering this:

5a5abb1fa3b05dd6aa821525832644c1e7d2905f
tun, bpf: fix suspicious RCU usage in tun_{attach, detach}_filter

But that doesn't appear to relate at all. Hmm.

>> that correctly? I'm having a hard time imagining how a list add could
>> fail (maybe a race between two adds)?
>
> Or some use after free that would have corrupted that memory? I
> would think right now that the path via bpf_prog_alloc() could
> have triggered, but not necessarily caused the issue, hmm.

Yeah, I'm really baffled about this.

Dmitry Vyukov

unread,

Sep 8, 2016, 11:06:15 AM9/8/16

to Daniel Borkmann, Kees Cook, Paul McKenney, Andrew Morton, David Rientjes, Vladimir Davydov, zijun_hu, Joonsoo Kim, linu...@kvack.org, LKML, Kirill A. Shutemov, Andrey Ryabinin, Konstantin Khlebnikov, syzkaller, Alexei Starovoitov

Correct.

>>> The crashing line is:
>>> CHECK_DATA_CORRUPTION(next->prev != prev,
>>>
>>> It crashed on KASAN check at (%rax, %rdx), this address corresponds to
>>> next address = 0x8. So next was ~NULL.
>>
>>
>> Paul, the RCU torture tests passed with the CONFIG_DEBUG_LIST changes,
>> IIRC, yes? I'd love to rule out some kind of race condition between
>> the removal and add code for the checking.
>>
>> Daniel, IIRC there was some talk about RCU and BPF? Am I remembering
>
>
> --verbose, what talk specifically? There were some fixes longer
> time ago, but related to eBPF, not cBPF, but even there I don't
> see currently how it could be related to a va->list corruption
> triggered in __insert_vmap_area().
>
>> that correctly? I'm having a hard time imagining how a list add could
>> fail (maybe a race between two adds)?
>
>
> Or some use after free that would have corrupted that memory? I
> would think right now that the path via bpf_prog_alloc() could
> have triggered, but not necessarily caused the issue, hmm.

That's highly likely.

Kees Cook

unread,

Oct 22, 2016, 2:27:25 AM10/22/16

to Paul McKenney, Dmitry Vyukov, Daniel Borkmann, Andrew Morton, David Rientjes, zijun_hu, Joonsoo Kim, linu...@kvack.org, LKML, Kirill A. Shutemov, Andrey Ryabinin, Konstantin Khlebnikov, syzkaller, Alexei Starovoitov

I've spent some more time looking at this, and I'm satisfied that this
isn't a problem introduced by my CONFIG_DEBUG_LIST changes. Thoughts
below...

>>
>>
>> Can you elaborate? You hit this only once and then never again
>> in this or some other, similar call-trace form, right?
>
> Correct.
>
>>>> The crashing line is:
>>>> CHECK_DATA_CORRUPTION(next->prev != prev,
>>>>
>>>> It crashed on KASAN check at (%rax, %rdx), this address corresponds to
>>>> next address = 0x8. So next was ~NULL.

The main issue is that the argument for "next" coming in to the check
is NULL to begin with. That indicates serious problems somewhere else.
If CONFIG_DEBUG_LIST wasn't set, the crash would still have happened
during "next->prev = new;" (since next is NULL):

static inline void __list_add_rcu(struct list_head *new,
struct list_head *prev, struct list_head *next)
{
if (!__list_add_valid(new, prev, next))
return;

new->next = next;
new->prev = prev;
rcu_assign_pointer(list_next_rcu(prev), new);
next->prev = new;

}

>>> Paul, the RCU torture tests passed with the CONFIG_DEBUG_LIST changes,
>>> IIRC, yes? I'd love to rule out some kind of race condition between
>>> the removal and add code for the checking.
>>>
>>> Daniel, IIRC there was some talk about RCU and BPF? Am I remembering
>>
>>
>> --verbose, what talk specifically? There were some fixes longer
>> time ago, but related to eBPF, not cBPF, but even there I don't
>> see currently how it could be related to a va->list corruption
>> triggered in __insert_vmap_area().
>>
>>> that correctly? I'm having a hard time imagining how a list add could
>>> fail (maybe a race between two adds)?
>>
>>
>> Or some use after free that would have corrupted that memory? I
>> would think right now that the path via bpf_prog_alloc() could
>> have triggered, but not necessarily caused the issue, hmm.
>
> That's highly likely.

So, this appears to be a bug somewhere else that happens to manifest
as a crash in the check (and would have crashed in the same place
under CONFIG_DEBUG_LIST before my changes too).

Reply all

Reply to author

Forward

mm: GPF in __insert_vmap_area

Dmitry Vyukov

zijun_hu

Dmitry Vyukov

Kees Cook

Daniel Borkmann

Paul E. McKenney

Kees Cook

Dmitry Vyukov

Kees Cook