KASAN: stack-out-of-bounds Read in timerqueue_add

92 views
Skip to first unread message

syzbot

unread,
Jul 4, 2018, 12:29:02 PM7/4/18
to linux-...@vger.kernel.org, syzkall...@googlegroups.com, tg...@linutronix.de
Hello,

syzbot found the following crash on:

HEAD commit: fc36def997cf mm: teach dump_page() to correctly output poi..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=167e3b92400000
kernel config: https://syzkaller.appspot.com/x/.config?x=f62553dc846b0692
dashboard link: https://syzkaller.appspot.com/bug?extid=b680e42077a0d7c9a0c4
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=1030a858400000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1167aaa4400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+b680e4...@syzkaller.appspotmail.com

random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
IPVS: ftp: loaded support on port[0] = 21
==================================================================
BUG: KASAN: stack-out-of-bounds in timerqueue_add+0x249/0x2b0
lib/timerqueue.c:52
Read of size 8 at addr ffff8801af537cf8 by task syz-executor591/7178

CPU: 0 PID: 7178 Comm: syz-executor591 Not tainted 4.18.0-rc3+ #130
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
print_address_description+0x6c/0x20b mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
timerqueue_add+0x249/0x2b0 lib/timerqueue.c:52
enqueue_hrtimer+0x18e/0x540 kernel/time/hrtimer.c:960
__run_hrtimer kernel/time/hrtimer.c:1413 [inline]
__hrtimer_run_queues+0xc07/0x10c0 kernel/time/hrtimer.c:1460
hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1518
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
smp_apic_timer_interrupt+0x165/0x730 arch/x86/kernel/apic/apic.c:1050
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
</IRQ>

The buggy address belongs to the page:
page:ffffea0006bd4dc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x2fffc0000000000()
raw: 02fffc0000000000 0000000000000000 ffffffff06bd0101 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff8801af537b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff8801af537c00: 00 00 00 00 f1 f1 f1 f1 00 f2 f2 f2 f2 f2 f2 f2
> ffff8801af537c80: 00 f2 f2 f2 f2 f2 f2 f2 00 f2 f2 f2 f2 f2 f2 f2
^
ffff8801af537d00: f8 f2 f2 f2 f2 f2 f2 f2 00 f2 f2 f2 00 00 00 00
ffff8801af537d80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

Dmitry Vyukov

unread,
Jul 4, 2018, 12:35:14 PM7/4/18
to syzbot, LKML, syzkaller-bugs, Thomas Gleixner, Alexei Starovoitov, Daniel Borkmann, netdev
On Wed, Jul 4, 2018 at 6:29 PM, syzbot
<syzbot+b680e4...@syzkaller.appspotmail.com> wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit: fc36def997cf mm: teach dump_page() to correctly output poi..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=167e3b92400000
> kernel config: https://syzkaller.appspot.com/x/.config?x=f62553dc846b0692
> dashboard link: https://syzkaller.appspot.com/bug?extid=b680e42077a0d7c9a0c4
> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=1030a858400000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1167aaa4400000
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+b680e4...@syzkaller.appspotmail.com

+bpf maintainers since the repro seems to deal to bpf maps

We've got a splash of crashes today, all seem to suggest some kind of
stack corruption/overflow, see the last 6 bugs here:
https://syzkaller.appspot.com/
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/000000000000b2989805702eedd3%40google.com.
> For more options, visit https://groups.google.com/d/optout.

Alexei Starovoitov

unread,
Jul 4, 2018, 12:59:46 PM7/4/18
to Dmitry Vyukov, John Fastabend, syzbot, LKML, syzkaller-bugs, Thomas Gleixner, Alexei Starovoitov, Daniel Borkmann, netdev
On Wed, Jul 4, 2018 at 9:34 AM, Dmitry Vyukov <dvy...@google.com> wrote:
> On Wed, Jul 4, 2018 at 6:29 PM, syzbot
> <syzbot+b680e4...@syzkaller.appspotmail.com> wrote:
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit: fc36def997cf mm: teach dump_page() to correctly output poi..
>> git tree: upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=167e3b92400000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=f62553dc846b0692
>> dashboard link: https://syzkaller.appspot.com/bug?extid=b680e42077a0d7c9a0c4
>> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
>> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=1030a858400000
>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1167aaa4400000
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+b680e4...@syzkaller.appspotmail.com
>
> +bpf maintainers since the repro seems to deal to bpf maps
>
> We've got a splash of crashes today, all seem to suggest some kind of
> stack corruption/overflow, see the last 6 bugs here:
> https://syzkaller.appspot.com/

John, this is sockhash map related. Please take a look asap.

syzbot

unread,
Jul 4, 2018, 5:20:18 PM7/4/18
to john fastabend, john.fa...@gmail.com, syzkall...@googlegroups.com
> #syz test git://github.com/cilium/linux.git test-pointers-fix

unknown command "test"

John Fastabend

unread,
Jul 4, 2018, 7:29:24 PM7/4/18
to syzbot, syzkaller-bugs, Alexei Starovoitov, Daniel Borkmann

syzbot

unread,
Jul 4, 2018, 7:50:02 PM7/4/18
to a...@kernel.org, dan...@iogearbox.net, john.fa...@gmail.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger
crash:

Reported-and-tested-by:
syzbot+b680e4...@syzkaller.appspotmail.com

Tested on:

commit: 3f86eb1920e6 bpf: sockmap, convert bpf_compute_data_pointe..
git tree: git://github.com/cilium/linux.git/test-pointers-fix
kernel config: https://syzkaller.appspot.com/x/.config?x=a63be0c83e84d370
compiler: gcc (GCC) 8.0.1 20180413 (experimental)

Note: testing is done by a robot and is best-effort only.

Dmitry Vyukov

unread,
Jul 5, 2018, 1:42:11 AM7/5/18
to syzbot, Alexei Starovoitov, Daniel Borkmann, John Fastabend, syzkaller-bugs
On Thu, Jul 5, 2018 at 1:50 AM, syzbot
<syzbot+b680e4...@syzkaller.appspotmail.com> wrote:
> Hello,
>
> syzbot has tested the proposed patch and the reproducer did not trigger
> crash:

Thanks for quick fix. Please push it to bpf-next asap.

We now have about 20 bugs that all have similar symptoms, part is in
moderation queue and is not mailed to kernel lists:
https://syzkaller.appspot.com/#upstream-moderation2
I afraid we will see 40 new per day until this is fixed.

Daniel Borkmann

unread,
Jul 5, 2018, 3:38:41 AM7/5/18
to Dmitry Vyukov, syzbot, Alexei Starovoitov, John Fastabend, syzkaller-bugs
On 07/05/2018 07:41 AM, Dmitry Vyukov wrote:
> On Thu, Jul 5, 2018 at 1:50 AM, syzbot
> <syzbot+b680e4...@syzkaller.appspotmail.com> wrote:
>> Hello,
>>
>> syzbot has tested the proposed patch and the reproducer did not trigger
>> crash:
>
> Thanks for quick fix. Please push it to bpf-next asap.
>
> We now have about 20 bugs that all have similar symptoms, part is in
> moderation queue and is not mailed to kernel lists:
> https://syzkaller.appspot.com/#upstream-moderation2
> I afraid we will see 40 new per day until this is fixed.

Before this lands in bpf-next, it might take few days, say when the
fixes are merged today in bpf tree, then I will route them asap to
net, from there DaveM will send PR to Linus, and fast-forward his tree
to net-next, from which I can fast-forward to bpf-next eventually.

Sometimes I'm wondering whether it would be more suitable to have bpf
instead of bpf-next in syzkaller, or perhaps both. In case of bpf,
syzkaller will trigger if it finds bugs only once they made their way
from bpf -> net -> linus, whereas it would be nice to catch these
things earlier. Otoh, if we don't include bpf-next, then it will
potentially hit us all at once during the merge window which is also
not nice as we want to fix things as early as possible.

If there's around 40 new per day in queue, it would be good to semi
automate duplicate detection. Good thing is that in all the new ones
the syz repro is rather small and they all sort of point to the same
map type (0x12) so likely related ...

r0 = bpf$MAP_CREATE(0x0, &(0x7f0000000280)={0x12, 0x0, 0x4, 0x1, 0x0, 0x1}, 0x2c)
r0 = bpf$MAP_CREATE(0x0, &(0x7f0000000280)={0x12, 0x9, 0x4, 0x1}, 0x34d)
r0 = bpf$MAP_CREATE(0x0, &(0x7f0000000280)={0x12, 0x9, 0x4, 0x1}, 0x34d)

... anyway, lets get this fixed asap.

>> Reported-and-tested-by:
>> syzbot+b680e4...@syzkaller.appspotmail.com
>>
>> Tested on:
>>
>> commit: 3f86eb1920e6 bpf: sockmap, convert bpf_compute_data_pointe..
>> git tree: git://github.com/cilium/linux.git/test-pointers-fix
>> kernel config: https://syzkaller.appspot.com/x/.config?x=a63be0c83e84d370
>> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
>>
>> Note: testing is done by a robot and is best-effort only.

Thanks,
Daniel

Dmitry Vyukov

unread,
Jul 10, 2018, 5:33:34 AM7/10/18
to Daniel Borkmann, syzbot, Alexei Starovoitov, John Fastabend, syzkaller-bugs
On Thu, Jul 5, 2018 at 9:38 AM, Daniel Borkmann <dan...@iogearbox.net> wrote:
> On 07/05/2018 07:41 AM, Dmitry Vyukov wrote:
>> On Thu, Jul 5, 2018 at 1:50 AM, syzbot
>> <syzbot+b680e4...@syzkaller.appspotmail.com> wrote:
>>> Hello,
>>>
>>> syzbot has tested the proposed patch and the reproducer did not trigger
>>> crash:
>>
>> Thanks for quick fix. Please push it to bpf-next asap.
>>
>> We now have about 20 bugs that all have similar symptoms, part is in
>> moderation queue and is not mailed to kernel lists:
>> https://syzkaller.appspot.com/#upstream-moderation2
>> I afraid we will see 40 new per day until this is fixed.
>
> Before this lands in bpf-next, it might take few days, say when the
> fixes are merged today in bpf tree, then I will route them asap to
> net, from there DaveM will send PR to Linus, and fast-forward his tree
> to net-next, from which I can fast-forward to bpf-next eventually.
>
> Sometimes I'm wondering whether it would be more suitable to have bpf
> instead of bpf-next in syzkaller, or perhaps both. In case of bpf,
> syzkaller will trigger if it finds bugs only once they made their way
> from bpf -> net -> linus, whereas it would be nice to catch these
> things earlier. Otoh, if we don't include bpf-next, then it will
> potentially hit us all at once during the merge window which is also
> not nice as we want to fix things as early as possible.


I've added bpf tree to syzbot (and net while I was there).


> If there's around 40 new per day in queue, it would be good to semi
> automate duplicate detection.

But like how? The similarity between them is quite subtle (like
"recent crashes related to stack [usually], not making sense per se
[usually], and for some reason there are lots of split lines in report
(?)). It's reasonably simple to spot them manually, but I have no idea
how to automate this with high precision. And at the time they need to
sorted to buckets, we almost never have reproducers.

The flow seems to be coming to the end now.

Daniel Borkmann

unread,
Jul 11, 2018, 5:14:45 AM7/11/18
to Dmitry Vyukov, syzbot, Alexei Starovoitov, John Fastabend, syzkaller-bugs
That is awesome, thanks so much; this will really help us!

>> If there's around 40 new per day in queue, it would be good to semi
>> automate duplicate detection.
>
> But like how? The similarity between them is quite subtle (like
> "recent crashes related to stack [usually], not making sense per se
> [usually], and for some reason there are lots of split lines in report
> (?)). It's reasonably simple to spot them manually, but I have no idea
> how to automate this with high precision. And at the time they need to
> sorted to buckets, we almost never have reproducers.
>
> The flow seems to be coming to the end now.

Yeah agree it's very tricky and differences can be subtle as well where
they end up being different bugs. I don't think it's possible to automate
this, but in case syzkaller has a fairly small reproducer perhaps there
could be a heuristic on how similar they are, where they could get presorted
into the same report on the dashboard, though admittedly I have no idea on
the internal workflow that is done before we get the report on the list.
But in any case once we fix such a bug, we'll try it with all the reproducers
to make sure nothing gets left out. The findings syzkaller comes up with
are impressive though, thanks a lot for working on it!

Thanks,
Daniel

Dmitry Vyukov

unread,
Jul 12, 2018, 5:09:19 AM7/12/18
to Daniel Borkmann, syzbot, Alexei Starovoitov, John Fastabend, syzkaller-bugs
Thanks.

One practical aspect that would be useful to figure out: how does this
bug caused such devastating effect and if we can catch such bugs
reliably at the point of occurrence?
Most of bugs that KASAN catches could otherwise cause the same, but
KASAN catches them redhanded before the actual corruption happens.
This gives reliable, duduplicatable, easy to debug bug reports.
We had a similar problem with invalid frees (freeing pointer to a
middle of a heap object), but we added detection for this to KASAN and
it started catching them reliably. So it would be very useful to
understand what exactly causes corruptions in this case and catch it
redhanded. From the patch it looks like a typical race, I don't see
what's so special about it. Could you shed some light onto mechanism
of the corruption?

Daniel Borkmann

unread,
Jul 13, 2018, 11:57:22 AM7/13/18
to Dmitry Vyukov, syzbot, Alexei Starovoitov, John Fastabend, syzkaller-bugs
Probably best for John to answer. Does KASAN reliably detect out-of-bounds
when the offset is, say, far away? I've seen couple of instances which I
fixed in the past which had syzbot reports once as slab oob and other times
as use-after-free where the alloc/free trace was completely unrelated but
likely due to pointing to a different slab obj. Perhaps that could be
detected that it's solely a case of oob by figuring out that the original
allocated slab obj changed after pointer arithmetic (mainly +/- ops in this
case)?

Thanks,
Daniel

Dmitry Vyukov

unread,
Jul 16, 2018, 6:55:34 AM7/16/18
to Daniel Borkmann, syzbot, Alexei Starovoitov, John Fastabend, syzkaller-bugs
KASAN does _not_ reliably detect arbitrary OOBs. However for wild
offsets, we usually also see OOBs, UAFs detected at the actual bad
access stack. Some percent of them is also detected 100% properly,
i.e. as OOB on the correct object. However, I don't remember _any_
reports in the involved bpf stacks. And also all crashes seem to
somehow corrupt stack, not anything else. So this case looks somewhat
special.

> Perhaps that could be
> detected that it's solely a case of oob by figuring out that the original
> allocated slab obj changed after pointer arithmetic (mainly +/- ops in this
> case)?

In out experience a naive implementation of such approach (detect
arithmetic that leads to OOB pointer value) has false positives. And a
complete implementation (so called fat pointers that encode original
object bounds) changes ABI and generally does not work in practice
(esp for such complex code as kernel).
So we don't have a realistic plan for this limitation. But if there is something
special about this particular case, then maybe we could address just this
specific narrow case.
Reply all
Reply to author
Forward
0 new messages