net: use-after-free in tw_timer_handler

546 views
Skip to first unread message

Dmitry Vyukov

unread,
Jan 23, 2017, 5:20:20 AM1/23/17
to David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Eric Dumazet, syzkaller
Hello,

While running syzkaller fuzzer I started seeing use-after-frees in
tw_timer_handler. It happens with very low frequency, so far I've seen
22 of them. But all reports look consistent, so I would assume that it
is real, just requires a very tricky race to happen. I've stared
seeing it around Jan 17, however I did not update kernels for some
time before that so potentially the issues was introduced somewhat
earlier. Or maybe fuzzer just figured how to trigger it, and the bug
is actually old. I am seeing it on all of torvalds/linux-next/mmotm,
some commits if it matters: 7a308bb3016f57e5be11a677d15b821536419d36,
5cf7a0f3442b2312326c39f571d637669a478235,
c497f8d17246720afe680ea1a8fa6e48e75af852.
Majority of reports points to net_drop_ns as the offending free, but
it may be red herring. Since the access happens in timer, it can
happen long after free and the memory could have been reused. I've
also seen few where the access in tw_timer_handler is reported as
out-of-bounds on task_struct and on struct filename.


BUG: KASAN: use-after-free in tw_timer_handler+0xc3/0xd0
net/ipv4/inet_timewait_sock.c:149 at addr ffff8801cb58c398
Read of size 8 by task syz-executor0/24691
CPU: 0 PID: 24691 Comm: syz-executor0 Not tainted 4.9.0 #3
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
ffff8801dc007328 ffffffff8234530f ffffffff00000000 1ffff1003b800df8
ffffed003b800df0 0000000041b58ab3 ffffffff84b379b8 ffffffff82345021
ffff8801d8ad8f60 ffff8801d8ad8f68 ffff8801d8ad8740 000000000000002e
Call Trace:
[<ffffffff819dd8fe>] __asan_report_load8_noabort+0x3e/0x40
mm/kasan/report.c:329
[<ffffffff8374fd93>] tw_timer_handler+0xc3/0xd0
net/ipv4/inet_timewait_sock.c:149
[<ffffffff815f5b21>] call_timer_fn+0x241/0x800 kernel/time/timer.c:1308
[<ffffffff815f84b7>] expire_timers kernel/time/timer.c:1348 [inline]
[<ffffffff815f84b7>] __run_timers+0x9e7/0xe90 kernel/time/timer.c:1641
[<ffffffff815f8981>] run_timer_softirq+0x21/0x80 kernel/time/timer.c:1654
[<ffffffff84372c7f>] __do_softirq+0x31f/0xbcd kernel/softirq.c:284
[<ffffffff8143c18c>] invoke_softirq kernel/softirq.c:364 [inline]
[<ffffffff8143c18c>] irq_exit+0x1cc/0x200 kernel/softirq.c:405
[<ffffffff843723ee>] exiting_irq arch/x86/include/asm/apic.h:659 [inline]
[<ffffffff843723ee>] smp_trace_apic_timer_interrupt+0x13e/0x6a8
arch/x86/kernel/apic/apic.c:981
[<ffffffff843713dc>] trace_apic_timer_interrupt+0x8c/0xa0
arch/x86/entry/entry_64.S:709
<EOI> [ 2916.083183] [<ffffffff8436ebe6>] ? arch_local_irq_enable
arch/x86/include/asm/paravirt.h:777 [inline]
<EOI> [ 2916.083183] [<ffffffff8436ebe6>] ? __raw_spin_unlock_irq
include/linux/spinlock_api_smp.h:170 [inline]
<EOI> [ 2916.083183] [<ffffffff8436ebe6>] ?
_raw_spin_unlock_irq+0x56/0x70 kernel/locking/spinlock.c:199
[<ffffffff814cbff2>] finish_lock_switch kernel/sched/sched.h:1157 [inline]
[<ffffffff814cbff2>] finish_task_switch+0x1c2/0x710 kernel/sched/core.c:2769
[<ffffffff84356654>] context_switch kernel/sched/core.c:2902 [inline]
[<ffffffff84356654>] __schedule+0x724/0x1e90 kernel/sched/core.c:3402
[<ffffffff84357ec8>] schedule+0x108/0x440 kernel/sched/core.c:3457
[<ffffffff8100790f>] exit_to_usermode_loop+0x20f/0x2a0
arch/x86/entry/common.c:149
[<ffffffff81009413>] prepare_exit_to_usermode
arch/x86/entry/common.c:190 [inline]
[<ffffffff81009413>] syscall_return_slowpath+0x4d3/0x570
arch/x86/entry/common.c:259
[<ffffffff8436fa22>] entry_SYSCALL_64_fastpath+0xc0/0xc2
Object at ffff8801cb58c1c0, in cache net_namespace size: 6656
Allocated:
PID = 3183
[ 2916.342108] [<ffffffff819dcd92>] kasan_slab_alloc+0x12/0x20
mm/kasan/kasan.c:537
[ 2916.349322] [<ffffffff819d83e2>] kmem_cache_alloc+0x102/0x680 mm/slab.c:3565
[ 2916.356776] [<ffffffff83549a86>] kmem_cache_zalloc
include/linux/slab.h:626 [inline]
[ 2916.356776] [<ffffffff83549a86>] net_alloc
net/core/net_namespace.c:339 [inline]
[ 2916.356776] [<ffffffff83549a86>] copy_net_ns+0x196/0x480
net/core/net_namespace.c:379
[ 2916.363783] [<ffffffff814b1349>] create_new_namespaces+0x409/0x860
kernel/nsproxy.c:106
[ 2916.371605] [<ffffffff814b1aed>] copy_namespaces+0x34d/0x420
kernel/nsproxy.c:164
[ 2916.379042] [<ffffffff814197f1>]
copy_process.part.40+0x2281/0x4d30 kernel/fork.c:1659
[ 2916.387013] [<ffffffff8141c7e0>] copy_process kernel/fork.c:1483 [inline]
[ 2916.387013] [<ffffffff8141c7e0>] _do_fork+0x200/0xff0 kernel/fork.c:1937
[ 2916.393730] [<ffffffff8141d6a7>] SYSC_clone kernel/fork.c:2047 [inline]
[ 2916.393730] [<ffffffff8141d6a7>] SyS_clone+0x37/0x50 kernel/fork.c:2041
[ 2916.400376] [<ffffffff81009798>] do_syscall_64+0x2e8/0x930
arch/x86/entry/common.c:280
[ 2916.407563] [<ffffffff8436fa49>] return_from_SYSCALL_64+0x0/0x7a
Freed:
PID = 15107
[ 2916.441170] [<ffffffff819da1b1>] __cache_free mm/slab.c:3507 [inline]
[ 2916.441170] [<ffffffff819da1b1>] kmem_cache_free+0x71/0x240 mm/slab.c:3767
[ 2916.448408] [<ffffffff83548e3e>] net_free
net/core/net_namespace.c:355 [inline]
[ 2916.448408] [<ffffffff83548e3e>] net_drop_ns+0x11e/0x140
net/core/net_namespace.c:362
[ 2916.455370] [<ffffffff83549652>] cleanup_net+0x7f2/0xa90
net/core/net_namespace.c:472
[ 2916.462331] [<ffffffff81492960>] process_one_work+0xbd0/0x1c10
kernel/workqueue.c:2096
[ 2916.469877] [<ffffffff81493bc3>] worker_thread+0x223/0x1990
kernel/workqueue.c:2230
[ 2916.477155] [<ffffffff814abb33>] kthread+0x323/0x3e0 kernel/kthread.c:209
[ 2916.483831] [<ffffffff8436fbea>] ret_from_fork+0x2a/0x40
arch/x86/entry/entry_64.S:433
Memory state around the buggy address:
ffff8801cb58c280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8801cb58c300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8801cb58c380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8801cb58c400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8801cb58c480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

BUG: KASAN: use-after-free in tw_timer_handler+0xc3/0xd0
net/ipv4/inet_timewait_sock.c:149 at addr ffff8801cd4ec298
Read of size 8 by task swapper/1/0
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0 #3
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
ffff8801dc107468 ffffffff8234530f ffffffff00000001 1ffff1003b820e20
ffffed003b820e18 0000000041b58ab3 ffffffff84b379b8 ffffffff82345021
1ffff1003b820e17 ffff8801daf0e2c0 0000000041b58ab3 ffffffff84af4170
Call Trace:
[<ffffffff819dd8fe>] __asan_report_load8_noabort+0x3e/0x40
mm/kasan/report.c:329
[<ffffffff8374fd93>] tw_timer_handler+0xc3/0xd0
net/ipv4/inet_timewait_sock.c:149
[<ffffffff815f5b21>] call_timer_fn+0x241/0x800 kernel/time/timer.c:1308
[<ffffffff815f84b7>] expire_timers kernel/time/timer.c:1348 [inline]
[<ffffffff815f84b7>] __run_timers+0x9e7/0xe90 kernel/time/timer.c:1641
[<ffffffff815f8981>] run_timer_softirq+0x21/0x80 kernel/time/timer.c:1654
[<ffffffff84372c7f>] __do_softirq+0x31f/0xbcd kernel/softirq.c:284
[<ffffffff8143c18c>] invoke_softirq kernel/softirq.c:364 [inline]
[<ffffffff8143c18c>] irq_exit+0x1cc/0x200 kernel/softirq.c:405
[<ffffffff8437228b>] exiting_irq arch/x86/include/asm/apic.h:659 [inline]
[<ffffffff8437228b>] smp_apic_timer_interrupt+0x7b/0xa0
arch/x86/kernel/apic/apic.c:960
[<ffffffff8437133c>] apic_timer_interrupt+0x8c/0xa0
arch/x86/entry/entry_64.S:709
<EOI> [ 1412.821824] [<ffffffff8436dbb6>] ?
native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:53
[<ffffffff8436d08f>] arch_safe_halt
arch/x86/include/asm/paravirt.h:103 [inline]
[<ffffffff8436d08f>] default_idle+0xbf/0x440 arch/x86/kernel/process.c:308
[<ffffffff8128a5ca>] arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:299
[<ffffffff8436e0d6>] default_idle_call+0x36/0x90 kernel/sched/idle.c:96
[<ffffffff815549a7>] cpuidle_idle_call kernel/sched/idle.c:154 [inline]
[<ffffffff815549a7>] cpu_idle_loop kernel/sched/idle.c:247 [inline]
[<ffffffff815549a7>] cpu_startup_entry+0x327/0x4b0 kernel/sched/idle.c:302
[<ffffffff812e47ac>] start_secondary+0x36c/0x460 arch/x86/kernel/smpboot.c:263
Object at ffff8801cd4ec0c0, in cache net_namespace size: 6656
Allocated:
PID = 3131
[ 1412.940699] [<ffffffff819d83e2>] kmem_cache_alloc+0x102/0x680 mm/slab.c:3565
[ 1412.948084] [<ffffffff83549a86>] kmem_cache_zalloc
include/linux/slab.h:626 [inline]
[ 1412.948084] [<ffffffff83549a86>] net_alloc
net/core/net_namespace.c:339 [inline]
[ 1412.948084] [<ffffffff83549a86>] copy_net_ns+0x196/0x480
net/core/net_namespace.c:379
[ 1412.955019] [<ffffffff814b1349>] create_new_namespaces+0x409/0x860
kernel/nsproxy.c:106
[ 1412.962817] [<ffffffff814b1aed>] copy_namespaces+0x34d/0x420
kernel/nsproxy.c:164
[ 1412.970094] [<ffffffff814197f1>]
copy_process.part.40+0x2281/0x4d30 kernel/fork.c:1659
[ 1412.978004] [<ffffffff8141c7e0>] copy_process kernel/fork.c:1483 [inline]
[ 1412.978004] [<ffffffff8141c7e0>] _do_fork+0x200/0xff0 kernel/fork.c:1937
[ 1412.984677] [<ffffffff8141d6a7>] SYSC_clone kernel/fork.c:2047 [inline]
[ 1412.984677] [<ffffffff8141d6a7>] SyS_clone+0x37/0x50 kernel/fork.c:2041
[ 1412.991276] [<ffffffff81009798>] do_syscall_64+0x2e8/0x930
arch/x86/entry/common.c:280
[ 1412.998394] [<ffffffff8436fa49>] return_from_SYSCALL_64+0x0/0x7a
Freed:
PID = 9846
[ 1413.031603] [<ffffffff819da1b1>] __cache_free mm/slab.c:3507 [inline]
[ 1413.031603] [<ffffffff819da1b1>] kmem_cache_free+0x71/0x240 mm/slab.c:3767
[ 1413.038796] [<ffffffff83548e3e>] net_free
net/core/net_namespace.c:355 [inline]
[ 1413.038796] [<ffffffff83548e3e>] net_drop_ns+0x11e/0x140
net/core/net_namespace.c:362
[ 1413.045734] [<ffffffff83549652>] cleanup_net+0x7f2/0xa90
net/core/net_namespace.c:472
[ 1413.052667] [<ffffffff81492960>] process_one_work+0xbd0/0x1c10
kernel/workqueue.c:2096
[ 1413.060120] [<ffffffff81493bc3>] worker_thread+0x223/0x1990
kernel/workqueue.c:2230
[ 1413.067357] [<ffffffff814abb33>] kthread+0x323/0x3e0 kernel/kthread.c:209
[ 1413.073944] [<ffffffff8436fbea>] ret_from_fork+0x2a/0x40
arch/x86/entry/entry_64.S:433
Memory state around the buggy address:
ffff8801cd4ec180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8801cd4ec200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8801cd4ec280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8801cd4ec300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8801cd4ec380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

BUG: KASAN: use-after-free in tw_timer_handler+0xc3/0xd0
net/ipv4/inet_timewait_sock.c:149 at addr ffff8801b7b50358
Read of size 8 by task swapper/0/0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0 #3
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
ffff8801dc007468 ffffffff8234530f ffffffff00000000 1ffff1003b800e20
ffffed003b800e18 0000000041b58ab3 ffffffff84b379b8 ffffffff82345021
ffffffff84e2bba0 ffffffff84e2bba8 ffffffff84e2b380 000000000000002e
Call Trace:
[<ffffffff819dd8fe>] __asan_report_load8_noabort+0x3e/0x40
mm/kasan/report.c:329
[<ffffffff8374fd93>] tw_timer_handler+0xc3/0xd0
net/ipv4/inet_timewait_sock.c:149
[<ffffffff815f5b21>] call_timer_fn+0x241/0x800 kernel/time/timer.c:1308
[<ffffffff815f84b7>] expire_timers kernel/time/timer.c:1348 [inline]
[<ffffffff815f84b7>] __run_timers+0x9e7/0xe90 kernel/time/timer.c:1641
[<ffffffff815f8981>] run_timer_softirq+0x21/0x80 kernel/time/timer.c:1654
[<ffffffff84372c7f>] __do_softirq+0x31f/0xbcd kernel/softirq.c:284
[<ffffffff8143c18c>] invoke_softirq kernel/softirq.c:364 [inline]
[<ffffffff8143c18c>] irq_exit+0x1cc/0x200 kernel/softirq.c:405
[<ffffffff8437228b>] exiting_irq arch/x86/include/asm/apic.h:659 [inline]
[<ffffffff8437228b>] smp_apic_timer_interrupt+0x7b/0xa0
arch/x86/kernel/apic/apic.c:960
[<ffffffff8437133c>] apic_timer_interrupt+0x8c/0xa0
arch/x86/entry/entry_64.S:709
<EOI> [ 1965.936792] [<ffffffff8436dbb6>] ?
native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:53
[<ffffffff8436d08f>] arch_safe_halt
arch/x86/include/asm/paravirt.h:103 [inline]
[<ffffffff8436d08f>] default_idle+0xbf/0x440 arch/x86/kernel/process.c:308
[<ffffffff8128a5ca>] arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:299
[<ffffffff8436e0d6>] default_idle_call+0x36/0x90 kernel/sched/idle.c:96
[<ffffffff815549a7>] cpuidle_idle_call kernel/sched/idle.c:154 [inline]
[<ffffffff815549a7>] cpu_idle_loop kernel/sched/idle.c:247 [inline]
[<ffffffff815549a7>] cpu_startup_entry+0x327/0x4b0 kernel/sched/idle.c:302
[<ffffffff8434f05d>] rest_init+0x18d/0x1a0 init/main.c:408
[<ffffffff85481b16>] start_kernel+0x7a0/0x7d2 init/main.c:660
[<ffffffff854802e6>] x86_64_start_reservations+0x2a/0x2c
arch/x86/kernel/head64.c:195
[<ffffffff85480424>] x86_64_start_kernel+0x13c/0x149
arch/x86/kernel/head64.c:176
Object at ffff8801b7b50180, in cache net_namespace size: 6656
Allocated:
PID = 3169
[ 1966.129951] [<ffffffff819d83e2>] kmem_cache_alloc+0x102/0x680 mm/slab.c:3565
[ 1966.137357] [<ffffffff83549a86>] kmem_cache_zalloc
include/linux/slab.h:626 [inline]
[ 1966.137357] [<ffffffff83549a86>] net_alloc
net/core/net_namespace.c:339 [inline]
[ 1966.137357] [<ffffffff83549a86>] copy_net_ns+0x196/0x480
net/core/net_namespace.c:379
[ 1966.144350] [<ffffffff814b1349>] create_new_namespaces+0x409/0x860
kernel/nsproxy.c:106
[ 1966.152254] [<ffffffff814b1aed>] copy_namespaces+0x34d/0x420
kernel/nsproxy.c:164
[ 1966.159567] [<ffffffff814197f1>]
copy_process.part.40+0x2281/0x4d30 kernel/fork.c:1659
[ 1966.167484] [<ffffffff8141c7e0>] copy_process kernel/fork.c:1483 [inline]
[ 1966.167484] [<ffffffff8141c7e0>] _do_fork+0x200/0xff0 kernel/fork.c:1937
[ 1966.174207] [<ffffffff8141d6a7>] SYSC_clone kernel/fork.c:2047 [inline]
[ 1966.174207] [<ffffffff8141d6a7>] SyS_clone+0x37/0x50 kernel/fork.c:2041
[ 1966.180832] [<ffffffff81009798>] do_syscall_64+0x2e8/0x930
arch/x86/entry/common.c:280
[ 1966.187973] [<ffffffff8436fa49>] return_from_SYSCALL_64+0x0/0x7a
Freed:
PID = 8938
[ 1966.221347] [<ffffffff819da1b1>] __cache_free mm/slab.c:3507 [inline]
[ 1966.221347] [<ffffffff819da1b1>] kmem_cache_free+0x71/0x240 mm/slab.c:3767
[ 1966.228568] [<ffffffff83548e3e>] net_free
net/core/net_namespace.c:355 [inline]
[ 1966.228568] [<ffffffff83548e3e>] net_drop_ns+0x11e/0x140
net/core/net_namespace.c:362
[ 1966.235564] [<ffffffff83549652>] cleanup_net+0x7f2/0xa90
net/core/net_namespace.c:472
[ 1966.242517] [<ffffffff81492960>] process_one_work+0xbd0/0x1c10
kernel/workqueue.c:2096
[ 1966.249995] [<ffffffff81493bc3>] worker_thread+0x223/0x1990
kernel/workqueue.c:2230
[ 1966.257258] [<ffffffff814abb33>] kthread+0x323/0x3e0 kernel/kthread.c:209
[ 1966.263879] [<ffffffff8436fbea>] ret_from_fork+0x2a/0x40
arch/x86/entry/entry_64.S:433
Memory state around the buggy address:
ffff8801b7b50200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8801b7b50280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8801b7b50300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8801b7b50380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8801b7b50400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

BUG: KASAN: slab-out-of-bounds in tw_timer_handler+0xc3/0xd0
net/ipv4/inet_timewait_sock.c:149 at addr ffff8801c98f43a0
Read of size 8 by task syz-executor8/3423
CPU: 0 PID: 3423 Comm: syz-executor8 Not tainted 4.10.0-rc5 #19
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
kasan_object_err+0x1c/0x70 mm/kasan/report.c:161
print_address_description mm/kasan/report.c:199 [inline]
kasan_report_error+0x1d1/0x4d0 mm/kasan/report.c:288
kasan_report mm/kasan/report.c:308 [inline]
__asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:329
tw_timer_handler+0xc3/0xd0 net/ipv4/inet_timewait_sock.c:149
call_timer_fn+0x241/0x820 kernel/time/timer.c:1308
expire_timers kernel/time/timer.c:1348 [inline]
__run_timers+0x9e7/0xe90 kernel/time/timer.c:1642
run_timer_softirq+0x21/0x80 kernel/time/timer.c:1655
__do_softirq+0x31f/0xbe7 kernel/softirq.c:284
invoke_softirq kernel/softirq.c:364 [inline]
irq_exit+0x1cc/0x200 kernel/softirq.c:405
exiting_irq arch/x86/include/asm/apic.h:658 [inline]
smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:961
apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707
RIP: 0010:arch_local_save_flags arch/x86/include/asm/paravirt.h:762 [inline]
RIP: 0010:arch_local_irq_save arch/x86/include/asm/paravirt.h:784 [inline]
RIP: 0010:lock_is_held_type+0x124/0x310 kernel/locking/lockdep.c:3787
RSP: 0018:ffff8801c946f558 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10
RAX: 0000000000000286 RBX: 1ffff1003928deac RCX: 1ffff1003928deb0
RDX: 1ffffffff0a18984 RSI: 00000000ffffffff RDI: ffffffff850c4c20
RBP: ffff8801c946f6a8 R08: 0000000000000002 R09: 0000000000000001
R10: 000000000000000a R11: 0000000000000000 R12: ffff8801c946f680
R13: ffff8801c9492640 R14: ffffffff85130ec0 R15: 0000000000000bff
</IRQ>
lock_is_held include/linux/lockdep.h:348 [inline]
___might_sleep+0x5b3/0x650 kernel/sched/core.c:7748
__might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
cache_alloc_debugcheck_before mm/slab.c:3071 [inline]
slab_alloc mm/slab.c:3386 [inline]
kmem_cache_alloc+0x273/0x680 mm/slab.c:3558
shmem_alloc_inode+0x1b/0x40 mm/shmem.c:3647
alloc_inode+0x61/0x180 fs/inode.c:207
new_inode_pseudo+0x69/0x170 fs/inode.c:889
new_inode+0x1c/0x40 fs/inode.c:918
shmem_get_inode+0xd1/0x8a0 mm/shmem.c:2120
shmem_mknod+0x58/0x1b0 mm/shmem.c:2824
shmem_mkdir+0x29/0x50 mm/shmem.c:2875
vfs_mkdir+0x3be/0x600 fs/namei.c:3738
SYSC_mkdirat fs/namei.c:3761 [inline]
SyS_mkdirat fs/namei.c:3745 [inline]
SYSC_mkdir fs/namei.c:3772 [inline]
SyS_mkdir+0x16e/0x290 fs/namei.c:3770
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x44ec87
RSP: 002b:0000000001a2fe40 EFLAGS: 00000212 ORIG_RAX: 0000000000000053
RAX: ffffffffffffffda RBX: 0000000000000010 RCX: 000000000044ec87
RDX: 0000000001a2fe6e RSI: 00000000000001ff RDI: 0000000001a2fe68
RBP: 00000000000019ec R08: 0000000000000000 R09: 0000000000000006
R10: 0000000000000064 R11: 0000000000000212 R12: 0000000001ef390c
R13: 0000000000000000 R14: 00000000000a43b5 R15: 00000000000019ec
Object at ffff8801c98f44c0, in cache task_struct size: 5696
Allocated:
PID = 3452
[<ffffffff8129f656>] save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
[<ffffffff819f6f53>] save_stack+0x43/0xd0 mm/kasan/kasan.c:502
[<ffffffff819f71da>] set_track mm/kasan/kasan.c:514 [inline]
[<ffffffff819f71da>] kasan_kmalloc+0xaa/0xd0 mm/kasan/kasan.c:605
[<ffffffff819f77d2>] kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544
[<ffffffff819f1652>] kmem_cache_alloc_node+0x122/0x690 mm/slab.c:3650
[<ffffffff81421fe2>] alloc_task_struct_node kernel/fork.c:142 [inline]
[<ffffffff81421fe2>] dup_task_struct kernel/fork.c:482 [inline]
[<ffffffff81421fe2>] copy_process.part.42+0x1a32/0x5fd0 kernel/fork.c:1515
[<ffffffff81426ac0>] copy_process kernel/fork.c:1486 [inline]
[<ffffffff81426ac0>] _do_fork+0x200/0xff0 kernel/fork.c:1942
[<ffffffff81427987>] SYSC_clone kernel/fork.c:2052 [inline]
[<ffffffff81427987>] SyS_clone+0x37/0x50 kernel/fork.c:2046
[<ffffffff81009798>] do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:280
[<ffffffff8440fb09>] return_from_SYSCALL_64+0x0/0x7a
Freed:
PID = 29885
[<ffffffff8129f656>] save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
[<ffffffff819f6f53>] save_stack+0x43/0xd0 mm/kasan/kasan.c:502
[<ffffffff819f784f>] set_track mm/kasan/kasan.c:514 [inline]
[<ffffffff819f784f>] kasan_slab_free+0x6f/0xb0 mm/kasan/kasan.c:578
[<ffffffff819f4bf1>] __cache_free mm/slab.c:3502 [inline]
[<ffffffff819f4bf1>] kmem_cache_free+0x71/0x240 mm/slab.c:3762
[<ffffffff8141f041>] free_task_struct kernel/fork.c:147 [inline]
[<ffffffff8141f041>] free_task+0x151/0x1d0 kernel/fork.c:359
[<ffffffff8141f30b>] __put_task_struct+0x24b/0x5f0 kernel/fork.c:396
[<ffffffff81435baa>] put_task_struct include/linux/sched.h:2257 [inline]
[<ffffffff81435baa>] delayed_put_task_struct+0xca/0x3f0 kernel/exit.c:173
[<ffffffff815ef250>] __rcu_reclaim kernel/rcu/rcu.h:118 [inline]
[<ffffffff815ef250>] rcu_do_batch.isra.70+0x9e0/0xdf0 kernel/rcu/tree.c:2780
[<ffffffff815efad2>] invoke_rcu_callbacks kernel/rcu/tree.c:3043 [inline]
[<ffffffff815efad2>] __rcu_process_callbacks kernel/rcu/tree.c:3010 [inline]
[<ffffffff815efad2>] rcu_process_callbacks+0x472/0xc70 kernel/rcu/tree.c:3027
[<ffffffff84412d7f>] __do_softirq+0x31f/0xbe7 kernel/softirq.c:284
Memory state around the buggy address:
ffff8801c98f4280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8801c98f4300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8801c98f4380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff8801c98f4400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8801c98f4480: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
==================================================================

Dmitry Vyukov

unread,
Jan 23, 2017, 5:24:09 AM1/23/17
to David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Eric Dumazet, syzkaller
On Mon, Jan 23, 2017 at 11:19 AM, Dmitry Vyukov <dvy...@google.com> wrote:
> Hello,
>
> While running syzkaller fuzzer I started seeing use-after-frees in
> tw_timer_handler. It happens with very low frequency, so far I've seen
> 22 of them. But all reports look consistent, so I would assume that it
> is real, just requires a very tricky race to happen. I've stared
> seeing it around Jan 17, however I did not update kernels for some
> time before that so potentially the issues was introduced somewhat
> earlier. Or maybe fuzzer just figured how to trigger it, and the bug
> is actually old. I am seeing it on all of torvalds/linux-next/mmotm,
> some commits if it matters: 7a308bb3016f57e5be11a677d15b821536419d36,
> 5cf7a0f3442b2312326c39f571d637669a478235,
> c497f8d17246720afe680ea1a8fa6e48e75af852.
> Majority of reports points to net_drop_ns as the offending free, but
> it may be red herring. Since the access happens in timer, it can
> happen long after free and the memory could have been reused. I've
> also seen few where the access in tw_timer_handler is reported as
> out-of-bounds on task_struct and on struct filename.



I've briefly skimmed through the code. Assuming that it requires a
very tricky race to be triggered, the most suspicious looks
inet_twsk_deschedule_put vs __inet_twsk_schedule:

void inet_twsk_deschedule_put(struct inet_timewait_sock *tw)
{
if (del_timer_sync(&tw->tw_timer))
inet_twsk_kill(tw);
inet_twsk_put(tw);
}

void __inet_twsk_schedule(struct inet_timewait_sock *tw, int timeo, bool rearm)
{
tw->tw_kill = timeo <= 4*HZ;
if (!rearm) {
BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo));
atomic_inc(&tw->tw_dr->tw_count);
} else {
mod_timer_pending(&tw->tw_timer, jiffies + timeo);
}
}

Can't it somehow end up rearming already deleted timer? Or maybe the
first mod_timer happens after del_timer_sync?

Eric Dumazet

unread,
Jan 24, 2017, 9:29:01 AM1/24/17
to Dmitry Vyukov, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Eric Dumazet, syzkaller
This code was changed a long time ago :

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054

So I suspect a recent patch broke the logic.

You might start a bisection :

I would check if 4.7 and 4.8 trigger the issue you noticed.

Thanks.


Dmitry Vyukov

unread,
Jan 24, 2017, 10:06:23 AM1/24/17
to Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Eric Dumazet, syzkaller
It happens with too low rate for bisecting (few times per day). I
could add some additional checks into code, but I don't know what
checks could be useful.

Eric Dumazet

unread,
Jan 24, 2017, 10:52:55 AM1/24/17
to Dmitry Vyukov, Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller
On Tue, Jan 24, 2017 at 7:06 AM, Dmitry Vyukov <dvy...@google.com> wrote:
>>
>> This code was changed a long time ago :
>>
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054
>>
>> So I suspect a recent patch broke the logic.
>>
>> You might start a bisection :
>>
>> I would check if 4.7 and 4.8 trigger the issue you noticed.
>
>
> It happens with too low rate for bisecting (few times per day). I
> could add some additional checks into code, but I don't know what
> checks could be useful.

If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure
we are able to help.

Dmitry Vyukov

unread,
Feb 8, 2017, 12:36:29 PM2/8/17
to Eric Dumazet, Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller
There are also chances that the problem is older.

Looking at the code, this part of inet_twsk_purge looks fishy:

285 if (unlikely((tw->tw_family != family) ||
286 atomic_read(&twsk_net(tw)->count))) {

It uses net->count == 0 check to find the right sockets. But what if
there are several nets with count == 0 in flight, can't there be
several inet_twsk_purge calls running concurrently freeing each other
sockets? If so it looks like inet_twsk_purge can call
inet_twsk_deschedule_put twice for a socket. Namely, two calls for
different nets discover the socket, check that net->count==0 and both
call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge
net that it needs to purge?

The second issue that I noticed is that tw_refcnt is set to 4 _after_
we schedule the timer. The timer will probably won't fire before we
set tw_refcnt, but if it somehow does it will corrupt the ref count. I
don't think that it's what I am seeing, though. More likely it's the
first issues (if it's real).

Does it make any sense?

Eric Dumazet

unread,
Feb 8, 2017, 12:58:38 PM2/8/17
to Dmitry Vyukov, Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller
Yes, atomic_read() is not a proper sync point.

> The second issue that I noticed is that tw_refcnt is set to 4 _after_
> we schedule the timer. The timer will probably won't fire before we
> set tw_refcnt, but if it somehow does it will corrupt the ref count. I
> don't think that it's what I am seeing, though. More likely it's the
> first issues (if it's real).
>

Timer is pinned, it cannot fire under us on this cpu.


Dmitry Vyukov

unread,
Feb 8, 2017, 1:56:18 PM2/8/17
to Eric Dumazet, Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller
Do you mean that it does not include read barrier?
I more mean that we can call inet_twsk_deschedule_put twice for the same socket.


>> The second issue that I noticed is that tw_refcnt is set to 4 _after_
>> we schedule the timer. The timer will probably won't fire before we
>> set tw_refcnt, but if it somehow does it will corrupt the ref count. I
>> don't think that it's what I am seeing, though. More likely it's the
>> first issues (if it's real).
>>
>
> Timer is pinned, it cannot fire under us on this cpu.

Ack

Eric Dumazet

unread,
Feb 8, 2017, 2:17:44 PM2/8/17
to Dmitry Vyukov, Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller
I meant that this code assumed RTNL being held.

This might not be the case now, after some old change.


Dmitry Vyukov

unread,
Feb 8, 2017, 2:32:59 PM2/8/17
to Eric Dumazet, Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller
cleanup_net releases rtnl lock right before calling these callbacks.

Dmitry Vyukov

unread,
Feb 14, 2017, 2:38:52 PM2/14/17
to Eric Dumazet, Eric Dumazet, David Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller, Andrey Ryabinin
+Andrey, do you know somebody on your side interested in stability of
network namespace?
This use-after-free seems to be related to net namespace. For context,
full thread is here:
https://groups.google.com/forum/#!msg/syzkaller/p1tn-_Kc6l4/smuL_FMAAgAJ

Cong Wang

unread,
Feb 17, 2017, 1:51:58 PM2/17/17
to Dmitry Vyukov, Eric Dumazet, Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller
I don't think this could happen, because cleanup_net() is called in a
work struct, and this work can't be scheduled twice, so there should
not be any parallel cleanup_net().

Also, inet_twsk_deschedule_put() already waits for the flying timer,
net->count==0 at this point and all sockets in this netns are already
gone, so I don't know how a timer could be still fired after this.

Dmitry Vyukov

unread,
Feb 17, 2017, 3:37:19 PM2/17/17
to Cong Wang, Eric Dumazet, Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller
One thing that I noticed is that use-after-free always happens in
tw_timer_handler, but never in timer code. This suggests that:
1. Whoever frees the struct waits for the timer mutex unlock.
2. Free happens almost at the same time as timer fires (otherwise the
timer would probably be cancelled).

That could happen if there is a race in timer code during cancellation
or rearming. I understand that it's heavily stressed code, but who
knows...

I still wonder if twsk_net(tw)->count==0 is the right condition. This
net may not yet have been scheduled for destruction, but another task
could pick it up and start destroying...

Cong, do you have any ideas as to what debugging checks I could add to
help track this down?

Cong Wang

unread,
Feb 17, 2017, 5:30:31 PM2/17/17
to Dmitry Vyukov, Eric Dumazet, Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller
Good point! I think this could happen since timer is deactivated before
unlinking the tw sock, so another CPU could find it and rearm the timer
during such window?

>
> I still wonder if twsk_net(tw)->count==0 is the right condition. This
> net may not yet have been scheduled for destruction, but another task
> could pick it up and start destroying...

Good question. net_exit_list is just a snapshot of the cleanup_list, so
it is true that inet_twsk_purge() could purge more netns'es than in
net_exit_list, but I don't see any problem with this, the work later can't
find the same twsck again since it is unlinked from hashtable.
Do you see otherwise?

>
> Cong, do you have any ideas as to what debugging checks I could add to
> help track this down?

I don't have any theory for the cause of this bug. Your point above actually
makes sense for me. Maybe you can add some code in between the following
code:

if (del_timer_sync(&tw->tw_timer))
inet_twsk_kill(tw);

to verify is the timer is rearmed or not.

Dmitry Vyukov

unread,
Feb 21, 2017, 4:47:01 AM2/21/17
to Cong Wang, Eric Dumazet, Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller
No, I don't see otherwise. But it's just something suspicious in the
context of an elusive race that happens under load.

>> Cong, do you have any ideas as to what debugging checks I could add to
>> help track this down?
>
> I don't have any theory for the cause of this bug. Your point above actually
> makes sense for me. Maybe you can add some code in between the following
> code:
>
> if (del_timer_sync(&tw->tw_timer))
> inet_twsk_kill(tw);
>
> to verify is the timer is rearmed or not.

Now looking at the reports more closely, I think that debug info is
off-by-one, and the use-after-free happens on the net object on this
line: __NET_INC_STATS(twsk_net(tw), LINUX_MIB_TIMEWAITKILLED).

And this report is actually all correct:

BUG: KASAN: use-after-free in tw_timer_handler+0xc3/0xd0
net/ipv4/inet_timewait_sock.c:149 at addr ffff88018063a460
Read of size 8 by task swapper/1/0
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.10.0-rc8+ #65
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Call Trace:
<IRQ>
__asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
tw_timer_handler+0xc3/0xd0 net/ipv4/inet_timewait_sock.c:149
call_timer_fn+0x241/0x820 kernel/time/timer.c:1308
expire_timers kernel/time/timer.c:1348 [inline]
__run_timers+0x9e7/0xe90 kernel/time/timer.c:1642
run_timer_softirq+0x21/0x80 kernel/time/timer.c:1655
__do_softirq+0x31f/0xbe7 kernel/softirq.c:284
invoke_softirq kernel/softirq.c:364 [inline]
irq_exit+0x1cc/0x200 kernel/softirq.c:405
exiting_irq arch/x86/include/asm/apic.h:658 [inline]
smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:961
apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707

Object at ffff88018063a240, in cache net_namespace size: 6784

Allocated:
PID = 3270
kmem_cache_zalloc include/linux/slab.h:626 [inline]
net_alloc net/core/net_namespace.c:339 [inline]
copy_net_ns+0x196/0x530 net/core/net_namespace.c:379
create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106
copy_namespaces+0x34d/0x420 kernel/nsproxy.c:164
copy_process.part.42+0x20c6/0x5fd0 kernel/fork.c:1664
copy_process kernel/fork.c:1486 [inline]
_do_fork+0x200/0xff0 kernel/fork.c:1942
SYSC_clone kernel/fork.c:2052 [inline]
SyS_clone+0x37/0x50 kernel/fork.c:2046
do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:280
return_from_SYSCALL_64+0x0/0x7a

Freed:
PID = 8056
kmem_cache_free+0x71/0x240 mm/slab.c:3762
net_free+0xd7/0x110 net/core/net_namespace.c:355
net_drop_ns+0x31/0x40 net/core/net_namespace.c:362
cleanup_net+0x7f2/0xa90 net/core/net_namespace.c:479
process_one_work+0xbd0/0x1c10 kernel/workqueue.c:2098
worker_thread+0x223/0x1990 kernel/workqueue.c:2232
kthread+0x326/0x3f0 kernel/kthread.c:227
ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430

The read is of size 8, but read of tw->tw_kill should be 4.
We could also double check offset within the object and check if it
matches what __NET_INC_STATS(twsk_net(tw)) reads.

So the net is somehow freed while the timer is active.

Dmitry Vyukov

unread,
Feb 21, 2017, 5:40:45 AM2/21/17
to Cong Wang, Eric Dumazet, Eric Dumazet, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, syzkaller
Suspecting a race between inet_twsk_deschedule_put and
inet_twsk_reschedule, but don't see any bugs.

Can inet_twsk_deschedule_put run concurrently with inet_twsk_schedule
(not the reschedule)? That might explain the bug, but that's probably
not possible, right?

Starring at the code, this part looks wrong:

960 static inline int
961 __mod_timer(struct timer_list *timer, unsigned long expires, bool
pending_only)
962 {
...
975 if (timer_pending(timer)) {
976 if (timer->expires == expires)
977 return 1;
...
985 base = lock_timer_base(timer, &flags);
987 clk = base->clk;
988 idx = calc_wheel_index(expires, clk);
995 if (idx == timer_get_idx(timer)) {
996 timer->expires = expires;
997 ret = 1;
998 goto out_unlock;
999 }


We check timer_pending before taking the lock, and not re-check it
after taking the lock. So we can merely update timer->expires for a
non-pending timer and fail to requeue it. It's fine for pending_only
(as it has already fired), but not for !pending_only. Am I missing
something? But it still does not explain the original bug.

Andrey Ryabinin

unread,
Feb 21, 2017, 6:26:56 AM2/21/17
to Gerrit Renker, David S. Miller, dc...@vger.kernel.org, Dmitry Vyukov, Eric Dumazet, Cong Wang, Alexey Kuznetsov, Hideaki YOSHIFUJI, Patrick McHardy, syzk...@googlegroups.com, net...@vger.kernel.org, linux-...@vger.kernel.org, Andrey Ryabinin
DCCP doesn't purge timewait sockets on network namespace shutdown.
So, after net namespace destroyed we could still have an active timer
which will trigger use after free in tw_timer_handler():

BUG: KASAN: use-after-free in tw_timer_handler+0x4a/0xa0 at addr ffff88010e0d1e10
Read of size 8 by task swapper/1/0
Call Trace:
__asan_load8+0x54/0x90
tw_timer_handler+0x4a/0xa0
call_timer_fn+0x127/0x480
expire_timers+0x1db/0x2e0
run_timer_softirq+0x12f/0x2a0
__do_softirq+0x105/0x5b4
irq_exit+0xdd/0xf0
smp_apic_timer_interrupt+0x57/0x70
apic_timer_interrupt+0x90/0xa0

Object at ffff88010e0d1bc0, in cache net_namespace size: 6848
Allocated:
save_stack_trace+0x1b/0x20
kasan_kmalloc+0xee/0x180
kasan_slab_alloc+0x12/0x20
kmem_cache_alloc+0x134/0x310
copy_net_ns+0x8d/0x280
create_new_namespaces+0x23f/0x340
unshare_nsproxy_namespaces+0x75/0xf0
SyS_unshare+0x299/0x4f0
entry_SYSCALL_64_fastpath+0x18/0xad
Freed:
save_stack_trace+0x1b/0x20
kasan_slab_free+0xae/0x180
kmem_cache_free+0xb4/0x350
net_drop_ns+0x3f/0x50
cleanup_net+0x3df/0x450
process_one_work+0x419/0xbb0
worker_thread+0x92/0x850
kthread+0x192/0x1e0
ret_from_fork+0x2e/0x40

Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge
timewait sockets on net namespace destruction and prevent above issue.

Reported-by: Dmitry Vyukov <dvy...@google.com>
Signed-off-by: Andrey Ryabinin <arya...@virtuozzo.com>
---
net/dccp/ipv4.c | 6 ++++++
net/dccp/ipv6.c | 6 ++++++
2 files changed, 12 insertions(+)

diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index d859a5c..da7cb16 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -1018,9 +1018,15 @@ static void __net_exit dccp_v4_exit_net(struct net *net)
inet_ctl_sock_destroy(net->dccp.v4_ctl_sk);
}

+static void __net_exit dccp_v4_exit_batch(struct list_head *net_exit_list)
+{
+ inet_twsk_purge(&dccp_hashinfo, &dccp_death_row, AF_INET);
+}
+
static struct pernet_operations dccp_v4_ops = {
.init = dccp_v4_init_net,
.exit = dccp_v4_exit_net,
+ .exit_batch = dccp_v4_exit_batch,
};

static int __init dccp_v4_init(void)
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index c4e879c..f3d8f92 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -1077,9 +1077,15 @@ static void __net_exit dccp_v6_exit_net(struct net *net)
inet_ctl_sock_destroy(net->dccp.v6_ctl_sk);
}

+static void __net_exit dccp_v6_exit_batch(struct list_head *net_exit_list)
+{
+ inet_twsk_purge(&dccp_hashinfo, &dccp_death_row, AF_INET6);
+}
+
static struct pernet_operations dccp_v6_ops = {
.init = dccp_v6_init_net,
.exit = dccp_v6_exit_net,
+ .exit_batch = dccp_v6_exit_batch,
};

static int __init dccp_v6_init(void)
--
2.10.2

Dmitry Vyukov

unread,
Feb 21, 2017, 6:57:20 AM2/21/17
to Andrey Ryabinin, Gerrit Renker, David S. Miller, dc...@vger.kernel.org, Eric Dumazet, Cong Wang, Alexey Kuznetsov, Hideaki YOSHIFUJI, Patrick McHardy, syzkaller, netdev, LKML
Aha! Thanks for tracking this down!

Queued the patch on syzkaller bots, should have the verdict in several hours.

Arnaldo Carvalho de Melo

unread,
Feb 21, 2017, 8:43:39 AM2/21/17
to Andrey Ryabinin, Eric W. Biederman, Gerrit Renker, David S. Miller, dc...@vger.kernel.org, Dmitry Vyukov, Eric Dumazet, Cong Wang, Alexey Kuznetsov, Hideaki YOSHIFUJI, Patrick McHardy, syzk...@googlegroups.com, net...@vger.kernel.org, linux-...@vger.kernel.org
Em Tue, Feb 21, 2017 at 02:27:40PM +0300, Andrey Ryabinin escreveu:
> DCCP doesn't purge timewait sockets on network namespace shutdown.
> So, after net namespace destroyed we could still have an active timer
> which will trigger use after free in tw_timer_handler():
>
>
> Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge
> timewait sockets on net namespace destruction and prevent above issue.

Please add this, to help stable kernels to pick this up

Fixes: b099ce2602d8 ("net: Batch inet_twsk_purge")
Cc: Eric W. Biederman <ebie...@xmission.com>

[acme@jouet linux]$ git describe b099ce2602d8
v2.6.32-rc8-1977-gb099ce2602d8

This one added the pernet operations related to network namespaces, but
then the one above got missed.

commit 72a2d6138224298a576bcdc33d7d0004de604856
Author: Pavel Emelyanov <xe...@openvz.org>
Date: Sun Apr 13 22:29:13 2008 -0700

[NETNS][DCCPV4]: Add dummy per-net operations.

----------------------------------

It looks ok, so please consider adding my:

Acked-by: Arnaldo Carvalho de Melo <ac...@redhat.com>

- Arnaldo
> --
> To unsubscribe from this list: send the line "unsubscribe dccp" in
> the body of a message to majo...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

Eric Dumazet

unread,
Feb 21, 2017, 8:53:15 AM2/21/17
to Arnaldo Carvalho de Melo, Andrey Ryabinin, Eric W. Biederman, Gerrit Renker, David S. Miller, dc...@vger.kernel.org, Dmitry Vyukov, Cong Wang, Alexey Kuznetsov, Hideaki YOSHIFUJI, Patrick McHardy, syzkaller, netdev, LKML
On Tue, Feb 21, 2017 at 5:43 AM, Arnaldo Carvalho de Melo
<ac...@kernel.org> wrote:
>
> Em Tue, Feb 21, 2017 at 02:27:40PM +0300, Andrey Ryabinin escreveu:
> > DCCP doesn't purge timewait sockets on network namespace shutdown.
> > So, after net namespace destroyed we could still have an active timer
> > which will trigger use after free in tw_timer_handler():
> >
> >
> > Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge
> > timewait sockets on net namespace destruction and prevent above issue.
>
> Please add this, to help stable kernels to pick this up
>
> Fixes: b099ce2602d8 ("net: Batch inet_twsk_purge")
> Cc: Eric W. Biederman <ebie...@xmission.com>


This patch has nothing to do with this bug really.

Look at commit d315492b1a6ba29da0fa2860759505ae1b2db857
("netns : fix kernel panic in timewait socket destruction")

Back in 2008, nobody spotted that DCCP was using the same infra.

When can we get rid of DCCP in linux so that syszkaller team no longer
spend time on it ?

Thanks.

David Miller

unread,
Feb 21, 2017, 1:23:05 PM2/21/17
to edum...@google.com, ac...@kernel.org, arya...@virtuozzo.com, ebie...@xmission.com, ger...@erg.abdn.ac.uk, dc...@vger.kernel.org, dvy...@google.com, xiyou.w...@gmail.com, kuz...@ms2.inr.ac.ru, yosh...@linux-ipv6.org, ka...@trash.net, syzk...@googlegroups.com, net...@vger.kernel.org, linux-...@vger.kernel.org
From: Eric Dumazet <edum...@google.com>
Date: Tue, 21 Feb 2017 05:53:13 -0800

> On Tue, Feb 21, 2017 at 5:43 AM, Arnaldo Carvalho de Melo
> <ac...@kernel.org> wrote:
>>
>> Em Tue, Feb 21, 2017 at 02:27:40PM +0300, Andrey Ryabinin escreveu:
>> > DCCP doesn't purge timewait sockets on network namespace shutdown.
>> > So, after net namespace destroyed we could still have an active timer
>> > which will trigger use after free in tw_timer_handler():
>> >
>> >
>> > Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge
>> > timewait sockets on net namespace destruction and prevent above issue.
>>
>> Please add this, to help stable kernels to pick this up
>>
>> Fixes: b099ce2602d8 ("net: Batch inet_twsk_purge")
>> Cc: Eric W. Biederman <ebie...@xmission.com>
>
>
> This patch has nothing to do with this bug really.
>
> Look at commit d315492b1a6ba29da0fa2860759505ae1b2db857
> ("netns : fix kernel panic in timewait socket destruction")
>
> Back in 2008, nobody spotted that DCCP was using the same infra.

So, let me get this straight, dccp is buggy because it tried as hard as
possible to share and use common pieces of infrastructure instead of
duplicating all of said logic?

Now I've heard everything.

I know it has been a pain in the rear fixing all of these dccp bugs,
but removing it from the tree or even pushing it into staging is
simply not an option. So we better come up with a better plan based
upon reality rather than fantasy. :-)

David Miller

unread,
Feb 21, 2017, 1:23:54 PM2/21/17
to arya...@virtuozzo.com, ger...@erg.abdn.ac.uk, dc...@vger.kernel.org, dvy...@google.com, edum...@google.com, xiyou.w...@gmail.com, kuz...@ms2.inr.ac.ru, yosh...@linux-ipv6.org, ka...@trash.net, syzk...@googlegroups.com, net...@vger.kernel.org, linux-...@vger.kernel.org
From: Andrey Ryabinin <arya...@virtuozzo.com>
Date: Tue, 21 Feb 2017 14:27:40 +0300

> DCCP doesn't purge timewait sockets on network namespace shutdown.
> So, after net namespace destroyed we could still have an active timer
> which will trigger use after free in tw_timer_handler():
...
> Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge
> timewait sockets on net namespace destruction and prevent above issue.
>
> Reported-by: Dmitry Vyukov <dvy...@google.com>
> Signed-off-by: Andrey Ryabinin <arya...@virtuozzo.com>

Applied and queued up for -stable, thanks.

David Miller

unread,
Feb 21, 2017, 1:24:51 PM2/21/17
to arya...@virtuozzo.com, ger...@erg.abdn.ac.uk, dc...@vger.kernel.org, dvy...@google.com, edum...@google.com, xiyou.w...@gmail.com, kuz...@ms2.inr.ac.ru, yosh...@linux-ipv6.org, ka...@trash.net, syzk...@googlegroups.com, net...@vger.kernel.org, linux-...@vger.kernel.org
From: David Miller <da...@davemloft.net>
Date: Tue, 21 Feb 2017 13:23:51 -0500 (EST)
Actually, this doesn't even compile. Please fix this up and resubmit:

net/dccp/ipv4.c: In function ‘dccp_v4_exit_batch’:
net/dccp/ipv4.c:1022:34: warning: passing argument 2 of ‘inet_twsk_purge’ makes integer from pointer without a cast [-Wint-conversion]
inet_twsk_purge(&dccp_hashinfo, &dccp_death_row, AF_INET);
^
In file included from ./include/linux/dccp.h:14:0,
from net/dccp/ipv4.c:13:
./include/net/inet_timewait_sock.h:118:6: note: expected ‘int’ but argument is of type ‘struct inet_timewait_death_row *’
void inet_twsk_purge(struct inet_hashinfo *hashinfo, int family);
^
net/dccp/ipv4.c:1022:2: error: too many arguments to function ‘inet_twsk_purge’
inet_twsk_purge(&dccp_hashinfo, &dccp_death_row, AF_INET);
^

Dmitry Vyukov

unread,
Feb 22, 2017, 1:49:20 AM2/22/17
to Andrey Ryabinin, Gerrit Renker, David S. Miller, dc...@vger.kernel.org, Eric Dumazet, Cong Wang, Alexey Kuznetsov, Hideaki YOSHIFUJI, Patrick McHardy, syzkaller, netdev, LKML
I do not see the crash happening after applying the patch.

Andrey Ryabinin

unread,
Feb 22, 2017, 3:58:04 AM2/22/17
to Arnaldo Carvalho de Melo, Eric W. Biederman, Gerrit Renker, David S. Miller, dc...@vger.kernel.org, Dmitry Vyukov, Eric Dumazet, Cong Wang, Alexey Kuznetsov, Hideaki YOSHIFUJI, Patrick McHardy, syzk...@googlegroups.com, net...@vger.kernel.org, linux-...@vger.kernel.org
On 02/21/2017 04:43 PM, Arnaldo Carvalho de Melo wrote:
> Em Tue, Feb 21, 2017 at 02:27:40PM +0300, Andrey Ryabinin escreveu:
>> DCCP doesn't purge timewait sockets on network namespace shutdown.
>> So, after net namespace destroyed we could still have an active timer
>> which will trigger use after free in tw_timer_handler():
>>
>>
>> Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge
>> timewait sockets on net namespace destruction and prevent above issue.
>
> Please add this, to help stable kernels to pick this up
>
> Fixes: b099ce2602d8 ("net: Batch inet_twsk_purge")
> Cc: Eric W. Biederman <ebie...@xmission.com>
>

Fixes tag should blame commit f2bf415cfed7 ("mib: add net to NET_ADD_STATS_BH").
It introduced use of net namespace in the timer callback.

Andrey Ryabinin

unread,
Feb 22, 2017, 4:34:23 AM2/22/17
to David Miller, ger...@erg.abdn.ac.uk, dc...@vger.kernel.org, dvy...@google.com, edum...@google.com, xiyou.w...@gmail.com, kuz...@ms2.inr.ac.ru, yosh...@linux-ipv6.org, ka...@trash.net, syzk...@googlegroups.com, net...@vger.kernel.org, linux-...@vger.kernel.org
On 02/21/2017 09:24 PM, David Miller wrote:
> From: David Miller <da...@davemloft.net>
> Date: Tue, 21 Feb 2017 13:23:51 -0500 (EST)
>
>> From: Andrey Ryabinin <arya...@virtuozzo.com>
>> Date: Tue, 21 Feb 2017 14:27:40 +0300
>>
>>> DCCP doesn't purge timewait sockets on network namespace shutdown.
>>> So, after net namespace destroyed we could still have an active timer
>>> which will trigger use after free in tw_timer_handler():
>> ...
>>> Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge
>>> timewait sockets on net namespace destruction and prevent above issue.
>>>
>>> Reported-by: Dmitry Vyukov <dvy...@google.com>
>>> Signed-off-by: Andrey Ryabinin <arya...@virtuozzo.com>
>>
>> Applied and queued up for -stable, thanks.
>
> Actually, this doesn't even compile. Please fix this up and resubmit:
>

Right, I tested this on top of the Linus' tree. Rebased on -next now.

Andrey Ryabinin

unread,
Feb 22, 2017, 4:34:33 AM2/22/17
to Gerrit Renker, David S. Miller, Andrey Ryabinin, Eric Dumazet, Arnaldo Carvalho de Melo, dc...@vger.kernel.org, Dmitry Vyukov, Cong Wang, Alexey Kuznetsov, Hideaki YOSHIFUJI, Patrick McHardy, syzkaller, netdev, LKML
Fixes: f2bf415cfed7 ("mib: add net to NET_ADD_STATS_BH")
Reported-by: Dmitry Vyukov <dvy...@google.com>
Signed-off-by: Andrey Ryabinin <arya...@virtuozzo.com>
Acked-by: Arnaldo Carvalho de Melo <ac...@redhat.com>
---

Changes since v1:
- Rebased on top the -next
- Added Fixes/Acked tags.

net/dccp/ipv4.c | 6 ++++++
net/dccp/ipv6.c | 6 ++++++
2 files changed, 12 insertions(+)

diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index b043ec8..409d0cf 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -1017,9 +1017,15 @@ static void __net_exit dccp_v4_exit_net(struct net *net)
inet_ctl_sock_destroy(net->dccp.v4_ctl_sk);
}

+static void __net_exit dccp_v4_exit_batch(struct list_head *net_exit_list)
+{
+ inet_twsk_purge(&dccp_hashinfo, AF_INET);
+}
+
static struct pernet_operations dccp_v4_ops = {
.init = dccp_v4_init_net,
.exit = dccp_v4_exit_net,
+ .exit_batch = dccp_v4_exit_batch,
};

static int __init dccp_v4_init(void)
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index cef60a4..233b573 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -1075,9 +1075,15 @@ static void __net_exit dccp_v6_exit_net(struct net *net)
inet_ctl_sock_destroy(net->dccp.v6_ctl_sk);
}

+static void __net_exit dccp_v6_exit_batch(struct list_head *net_exit_list)
+{
+ inet_twsk_purge(&dccp_hashinfo, AF_INET6);

David Miller

unread,
Feb 22, 2017, 4:15:41 PM2/22/17
to arya...@virtuozzo.com, ger...@erg.abdn.ac.uk, edum...@google.com, ac...@kernel.org, dc...@vger.kernel.org, dvy...@google.com, xiyou.w...@gmail.com, kuz...@ms2.inr.ac.ru, yosh...@linux-ipv6.org, ka...@trash.net, syzk...@googlegroups.com, net...@vger.kernel.org, linux-...@vger.kernel.org
From: Andrey Ryabinin <arya...@virtuozzo.com>
Date: Wed, 22 Feb 2017 12:35:27 +0300

> DCCP doesn't purge timewait sockets on network namespace shutdown.
> So, after net namespace destroyed we could still have an active timer
> which will trigger use after free in tw_timer_handler():
...
> Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge
> timewait sockets on net namespace destruction and prevent above issue.
>
> Fixes: f2bf415cfed7 ("mib: add net to NET_ADD_STATS_BH")
> Reported-by: Dmitry Vyukov <dvy...@google.com>
> Signed-off-by: Andrey Ryabinin <arya...@virtuozzo.com>
> Acked-by: Arnaldo Carvalho de Melo <ac...@redhat.com>

Applied and queued up for -sable, thanks.

nick...@gmail.com

unread,
Feb 26, 2017, 6:23:58 AM2/26/17
to syzkaller, da...@davemloft.net, kuz...@ms2.inr.ac.ru, jmo...@namei.org, yosh...@linux-ipv6.org, ka...@trash.net, net...@vger.kernel.org, linux-...@vger.kernel.org, edum...@google.com
I notice that this is not fixed in the just released 4.4.52 and 4.9.13 releases. Should it have been?

Regards,

Nick
Reply all
Reply to author
Forward
0 new messages