Kees Cook
unread,Oct 22, 2016, 2:27:25 AM10/22/16Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Paul McKenney, Dmitry Vyukov, Daniel Borkmann, Andrew Morton, David Rientjes, zijun_hu, Joonsoo Kim, linu...@kvack.org, LKML, Kirill A. Shutemov, Andrey Ryabinin, Konstantin Khlebnikov, syzkaller, Alexei Starovoitov
I've spent some more time looking at this, and I'm satisfied that this
isn't a problem introduced by my CONFIG_DEBUG_LIST changes. Thoughts
below...
>>
>>
>> Can you elaborate? You hit this only once and then never again
>> in this or some other, similar call-trace form, right?
>
> Correct.
>
>>>> The crashing line is:
>>>> CHECK_DATA_CORRUPTION(next->prev != prev,
>>>>
>>>> It crashed on KASAN check at (%rax, %rdx), this address corresponds to
>>>> next address = 0x8. So next was ~NULL.
The main issue is that the argument for "next" coming in to the check
is NULL to begin with. That indicates serious problems somewhere else.
If CONFIG_DEBUG_LIST wasn't set, the crash would still have happened
during "next->prev = new;" (since next is NULL):
static inline void __list_add_rcu(struct list_head *new,
struct list_head *prev, struct list_head *next)
{
if (!__list_add_valid(new, prev, next))
return;
new->next = next;
new->prev = prev;
rcu_assign_pointer(list_next_rcu(prev), new);
next->prev = new;
}
>>> Paul, the RCU torture tests passed with the CONFIG_DEBUG_LIST changes,
>>> IIRC, yes? I'd love to rule out some kind of race condition between
>>> the removal and add code for the checking.
>>>
>>> Daniel, IIRC there was some talk about RCU and BPF? Am I remembering
>>
>>
>> --verbose, what talk specifically? There were some fixes longer
>> time ago, but related to eBPF, not cBPF, but even there I don't
>> see currently how it could be related to a va->list corruption
>> triggered in __insert_vmap_area().
>>
>>> that correctly? I'm having a hard time imagining how a list add could
>>> fail (maybe a race between two adds)?
>>
>>
>> Or some use after free that would have corrupted that memory? I
>> would think right now that the path via bpf_prog_alloc() could
>> have triggered, but not necessarily caused the issue, hmm.
>
> That's highly likely.
So, this appears to be a bug somewhere else that happens to manifest
as a crash in the check (and would have crashed in the same place
under CONFIG_DEBUG_LIST before my changes too).