net: use-after-free in neigh_timer_handler/sock_wfree

68 views
Skip to first unread message

Dmitry Vyukov

unread,
Mar 1, 2017, 2:27:46 PM3/1/17
to David Miller, netdev, LKML, Cong Wang, Eric Dumazet, syzkaller, Alexey Kuznetsov, James Morris
Hello,

I am seeing the following use-after-free report while running
syzkaller fuzzer on
linux-next/3e7350242c6f3d41d28e03418bd781cc1b7bad5f:

==================================================================
BUG: KASAN: use-after-free in constant_test_bit
arch/x86/include/asm/bitops.h:324 [inline] at addr ffff8801c56d5460
BUG: KASAN: use-after-free in sock_flag include/net/sock.h:789
[inline] at addr ffff8801c56d5460
BUG: KASAN: use-after-free in sock_wfree+0x118/0x120
net/core/sock.c:1630 at addr ffff8801c56d5460
Read of size 8 by task syz-fuzzer/3261
CPU: 0 PID: 3261 Comm: syz-fuzzer Not tainted 4.10.0-next-20170224+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Call Trace:
<IRQ>
__asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
constant_test_bit arch/x86/include/asm/bitops.h:324 [inline]
sock_flag include/net/sock.h:789 [inline]
sock_wfree+0x118/0x120 net/core/sock.c:1630
skb_release_head_state+0xfc/0x200 net/core/skbuff.c:654
skb_release_all+0x15/0x60 net/core/skbuff.c:667
__kfree_skb+0x15/0x20 net/core/skbuff.c:683
kfree_skb+0x16e/0x4c0 net/core/skbuff.c:704
ndisc_error_report+0xbb/0x190 net/ipv6/ndisc.c:683
neigh_invalidate+0x23e/0x570 net/core/neighbour.c:848
neigh_timer_handler+0x4e7/0x1140 net/core/neighbour.c:933
call_timer_fn+0x241/0x820 kernel/time/timer.c:1266
expire_timers kernel/time/timer.c:1305 [inline]
__run_timers+0x960/0xcf0 kernel/time/timer.c:1599
run_timer_softirq+0x21/0x80 kernel/time/timer.c:1612
__do_softirq+0x31f/0xbe7 kernel/softirq.c:284
invoke_softirq kernel/softirq.c:364 [inline]
irq_exit+0x1cc/0x200 kernel/softirq.c:405
exiting_irq arch/x86/include/asm/apic.h:658 [inline]
smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962
apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707
RIP: 0033:0x46a7c3
RSP: 002b:000000c83e2d5180 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
RAX: 0000000000000000 RBX: 000000000046a7b0 RCX: 000000c820471200
RDX: 0000000000000020 RSI: 000000c839e1bba0 RDI: 000000c83e2d5190
RBP: 0000000000000002 R08: 0000000000000002 R09: 0000000000000073
R10: 000000c839a31b03 R11: 000000c839e1bbf8 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000010 R15: 0000000001263e90
</IRQ>
Object at ffff8801c56d5400, in cache RAWv6 size: 1480
Allocated:
PID = 12540
kmem_cache_alloc+0x102/0x680 mm/slab.c:3568
sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1332
sk_alloc+0x8c/0x470 net/core/sock.c:1394
inet6_create+0x44d/0x1140 net/ipv6/af_inet6.c:183
__sock_create+0x4e4/0x870 net/socket.c:1197
sock_create net/socket.c:1237 [inline]
SYSC_socket net/socket.c:1267 [inline]
SyS_socket+0xf9/0x230 net/socket.c:1247
entry_SYSCALL_64_fastpath+0x1f/0xc2
Freed:
PID = 12572
kasan_slab_free+0x6f/0xb0 mm/kasan/kasan.c:580
__cache_free mm/slab.c:3510 [inline]
kmem_cache_free+0x71/0x240 mm/slab.c:3770
sk_prot_free net/core/sock.c:1375 [inline]
__sk_destruct+0x487/0x6b0 net/core/sock.c:1450
sk_destruct+0x47/0x80 net/core/sock.c:1458
__sk_free+0x57/0x230 net/core/sock.c:1466
sk_free+0x23/0x30 net/core/sock.c:1477
sock_put include/net/sock.h:1644 [inline]
sk_common_release+0x3bf/0x5e0 net/core/sock.c:2781
rawv6_close+0x4c/0x80 net/ipv6/raw.c:1218
inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
inet6_release+0x50/0x70 net/ipv6/af_inet6.c:432
sock_release+0x8d/0x1e0 net/socket.c:597
sock_close+0x16/0x20 net/socket.c:1061
__fput+0x332/0x7f0 fs/file_table.c:208
____fput+0x15/0x20 fs/file_table.c:244
task_work_run+0x18a/0x260 kernel/task_work.c:116
exit_task_work include/linux/task_work.h:21 [inline]
do_exit+0x1956/0x2900 kernel/exit.c:873
do_group_exit+0x149/0x420 kernel/exit.c:977
get_signal+0x7e0/0x1820 kernel/signal.c:2313
do_signal+0xd2/0x2190 arch/x86/kernel/signal.c:807
exit_to_usermode_loop+0x200/0x2a0 arch/x86/entry/common.c:156
prepare_exit_to_usermode arch/x86/entry/common.c:190 [inline]
syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259
entry_SYSCALL_64_fastpath+0xc0/0xc2

Cong Wang

unread,
Mar 1, 2017, 4:25:00 PM3/1/17
to Dmitry Vyukov, David Miller, netdev, LKML, Eric Dumazet, syzkaller, Alexey Kuznetsov, James Morris
This one looks very similar to a previous one:
https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ

Both happen on raw v6 sockets.

For me, it seems the sk refcnt is not correct, skb should still hold
a refcnt so it should not be freed before kfree_skb() in a timer
handler...

Cong Wang

unread,
Mar 1, 2017, 4:44:05 PM3/1/17
to Dmitry Vyukov, David Miller, netdev, LKML, Eric Dumazet, syzkaller, Alexey Kuznetsov, James Morris
More precisely, after this commit:

commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
Author: Eric Dumazet <eric.d...@gmail.com>
Date: Thu Jun 11 02:55:43 2009 -0700

net: No more expensive sock_hold()/sock_put() on each tx

we don't take (old) refcnt any more on TX path, sk_wmem_alloc
is the new refcnt. ;)

Eric Dumazet

unread,
Mar 1, 2017, 4:54:28 PM3/1/17
to Cong Wang, Dmitry Vyukov, David Miller, netdev, LKML, syzkaller, Alexey Kuznetsov, James Morris
On Wed, Mar 1, 2017 at 1:43 PM, Cong Wang <xiyou.w...@gmail.com> wrote:
>>
>> This one looks very similar to a previous one:
>> https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ
>>
>> Both happen on raw v6 sockets.
>>
>> For me, it seems the sk refcnt is not correct, skb should still hold
>> a refcnt so it should not be freed before kfree_skb() in a timer
>> handler...
>
> More precisely, after this commit:
>
> commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
> Author: Eric Dumazet <eric.d...@gmail.com>
> Date: Thu Jun 11 02:55:43 2009 -0700
>
> net: No more expensive sock_hold()/sock_put() on each tx
>
> we don't take (old) refcnt any more on TX path, sk_wmem_alloc
> is the new refcnt. ;)

So the bug is that skb->truesize is mangled by reassembly unit,
while sbk->sk is tracking sk_wmem_alloc changes in order
to decide when it is safe to free sk.

This is why we need to call skb_orphan(), as we did for IPv4 in
8282f27449bf15548

Cong Wang

unread,
Mar 1, 2017, 6:09:33 PM3/1/17
to Eric Dumazet, Dmitry Vyukov, David Miller, netdev, LKML, syzkaller, Alexey Kuznetsov, James Morris
That is my suspicion as well, skb->truesize is updated somewhere
but sk->sk_wmem_alloc isn't, so leads to this bug.

>
> This is why we need to call skb_orphan(), as we did for IPv4 in
> 8282f27449bf15548


But I doubt skb_orphan() is the solution here, shouldn't we just
update sk->sk_wmem_alloc with skb->truesize changes?

Eric Dumazet

unread,
Mar 1, 2017, 6:15:56 PM3/1/17
to Cong Wang, Dmitry Vyukov, David Miller, netdev, LKML, syzkaller, Alexey Kuznetsov, James Morris
On Wed, Mar 1, 2017 at 3:09 PM, Cong Wang <xiyou.w...@gmail.com> wrote:

>
> But I doubt skb_orphan() is the solution here, shouldn't we just
> update sk->sk_wmem_alloc with skb->truesize changes?

Is it worth it ? Apart from syszkaller I mean...

We started with something that had a real impact on real workloads.

158f323b9868b59967ad96957c4ca388161be321 net: adjust skb->truesize in
pskb_expand_head()

Note that auditing the stack took me a while.

Cong Wang

unread,
Mar 2, 2017, 12:26:15 AM3/2/17
to Eric Dumazet, Dmitry Vyukov, David Miller, netdev, LKML, syzkaller, Alexey Kuznetsov, James Morris
I don't know how sk refcnt could work correctly without making
sk_wmem_alloc correctly. We certainly could just call skb_orphan()
is we don't need skb->sk any more, probably like the frag case,
but for this case, the neigh one, the skb's sitting in neigh->arp_queue
are not going to be freed unless in failed case, therefore skb->sk
should not be orphaned so early.

Eric Dumazet

unread,
Mar 2, 2017, 12:36:09 AM3/2/17
to Cong Wang, Dmitry Vyukov, David Miller, netdev, LKML, syzkaller, Alexey Kuznetsov, James Morris
There is absolutely no issue in arp/nd case.
Many skbs can sit there and it is fine.
Same with skbs sitting a long time in a qdisc.

Of course we try to not call skb_orphan() unless really needed.

tcp_gso_segment() tries very hard to propagate skb ownership to the segments,
but even something apparently easy like that took some patches before
being done right.

(for details : 0d08c42cf9a71530fef5ebcfe368f38f2dd0476f "tcp: gso: fix
truesize tracking")

conntrack reasm is mostly used in forwarding workloads, where skb->sk
is already NULL.

Are you thinking of a real workload where skb->sk _needs_ to be kept
in ipv6 reasm ?
Reply all
Reply to author
Forward
0 new messages