net: deadlock on genl_mutex

36 views
Skip to first unread message

Dmitry Vyukov

unread,
Nov 26, 2016, 12:04:34 PM11/26/16
to David Miller, Matti Vaittinen, Tycho Andersen, Cong Wang, Florian Westphal, stephen hemminger, Tom Herbert, netdev, LKML, Eric Dumazet, r...@redhat.com, syzkaller
Hello,

The following program triggers deadlock warnings on genl_mutex:

https://gist.githubusercontent.com/dvyukov/65e33d053e507d2ab0bf6ae83d989585/raw/b3c640ec58e894b50bcbf255c471406466cfa5d0/gistfile1.txt

On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24).

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
in_atomic(): 1, irqs_disabled(): 0, pid: 32289, name: syz-executor
CPU: 0 PID: 32289 Comm: syz-executor Not tainted 4.9.0-rc5+ #54
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
ffff88003ec06420 ffffffff834c2e39 ffffffff00000000 1ffff10007d80c17
ffffed0007d80c0f 0000000041b58ab3 ffffffff89575550 ffffffff834c2b4b
ffffffff8baab1a0 dffffc0000000000 0000000000000000 ffff880068f794e0
Call Trace:
<IRQ> [ 287.394552] [< inline >] __dump_stack lib/dump_stack.c:15
<IRQ> [ 287.394552] [<ffffffff834c2e39>] dump_stack+0x2ee/0x3f5
lib/dump_stack.c:51
[<ffffffff814b6ac3>] ___might_sleep+0x483/0x660 kernel/sched/core.c:7761
[<ffffffff814b6d3a>] __might_sleep+0x9a/0x1a0 kernel/sched/core.c:7720
[<ffffffff88139aaa>] mutex_lock_nested+0x1ea/0xf20 kernel/locking/mutex.c:620
[< inline >] genl_lock net/netlink/genetlink.c:31
[<ffffffff86cb5a11>] genl_lock_done+0x71/0xd0 net/netlink/genetlink.c:531
[<ffffffff86ca5458>] netlink_sock_destruct+0xf8/0x400
net/netlink/af_netlink.c:331
[<ffffffff86a7b234>] __sk_destruct+0xf4/0x7f0 net/core/sock.c:1423
[<ffffffff86a87d6c>] sk_destruct+0x4c/0x80 net/core/sock.c:1453
[<ffffffff86a87dfc>] __sk_free+0x5c/0x230 net/core/sock.c:1461
[<ffffffff86a87ff8>] sk_free+0x28/0x30 net/core/sock.c:1472
[< inline >] sock_put include/net/sock.h:1591
[<ffffffff86ca6cd1>] deferred_put_nlk_sk+0x31/0x40 net/netlink/af_netlink.c:652
[< inline >] __rcu_reclaim kernel/rcu/rcu.h:118
[<ffffffff815cbc9d>] rcu_do_batch.isra.70+0x9ed/0xe20 kernel/rcu/tree.c:2776
[< inline >] invoke_rcu_callbacks kernel/rcu/tree.c:3040
[< inline >] __rcu_process_callbacks kernel/rcu/tree.c:3007
[<ffffffff815cc55c>] rcu_process_callbacks+0x48c/0xd70 kernel/rcu/tree.c:3024
[<ffffffff8814d53b>] __do_softirq+0x32b/0xca8 kernel/softirq.c:284
[< inline >] invoke_softirq kernel/softirq.c:364
[<ffffffff8141a941>] irq_exit+0x1d1/0x210 kernel/softirq.c:405
[< inline >] exiting_irq arch/x86/include/asm/apic.h:659
[<ffffffff8814ca30>] smp_apic_timer_interrupt+0x80/0xa0
arch/x86/kernel/apic/apic.c:960
[<ffffffff8814badc>] apic_timer_interrupt+0x8c/0xa0
arch/x86/entry/entry_64.S:489
<EOI> [ 287.403717] [<ffffffff8155c987>] ? lock_is_held+0x247/0x310
[<ffffffff814b6bde>] ___might_sleep+0x59e/0x660 kernel/sched/core.c:7729
[<ffffffff814b6d3a>] __might_sleep+0x9a/0x1a0 kernel/sched/core.c:7720
[<ffffffff88142d08>] down_read+0x78/0x160 kernel/locking/rwsem.c:21
[< inline >] anon_vma_lock_read include/linux/rmap.h:127
[<ffffffff81968295>] validate_mm+0xe5/0x880 mm/mmap.c:347
[<ffffffff8196bf0b>] vma_link+0x11b/0x180 mm/mmap.c:605
[<ffffffff81977f46>] mmap_region+0x1076/0x1880 mm/mmap.c:1692
[<ffffffff81978e4f>] do_mmap+0x6ff/0xe80 mm/mmap.c:1450
[< inline >] do_mmap_pgoff include/linux/mm.h:2039
[<ffffffff818fd527>] vm_mmap_pgoff+0x1b7/0x210 mm/util.c:305
[< inline >] SYSC_mmap_pgoff mm/mmap.c:1500
[<ffffffff8196f961>] SyS_mmap_pgoff+0x231/0x5e0 mm/mmap.c:1458
[< inline >] SYSC_mmap arch/x86/kernel/sys_x86_64.c:95
[<ffffffff8124bf4b>] SyS_mmap+0x1b/0x30 arch/x86/kernel/sys_x86_64.c:86
[<ffffffff88149dc5>] entry_SYSCALL_64_fastpath+0x23/0xc6

=================================
[ INFO: inconsistent lock state ]
4.9.0-rc5+ #54 Tainted: G W
---------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
syz-executor/32289 [HC0[0]:SC1[1]:HE1:SE0] takes:
([ 287.580014] genl_mutex
[< inline >] genl_lock net/netlink/genetlink.c:31
[<ffffffff86cb5a11>] genl_lock_done+0x71/0xd0 net/netlink/genetlink.c:531
{SOFTIRQ-ON-W} state was registered at:
[ 287.580014] [< inline >] mark_irqflags
kernel/locking/lockdep.c:2938
[ 287.580014] [<ffffffff81567ad7>] __lock_acquire+0x6e7/0x3380
kernel/locking/lockdep.c:3292
[ 287.580014] [<ffffffff8156b642>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3746
[ 287.580014] [< inline >] __mutex_lock_common
kernel/locking/mutex.c:521
[ 287.580014] [<ffffffff88139aff>] mutex_lock_nested+0x23f/0xf20
kernel/locking/mutex.c:621
[ 287.580014] [< inline >] genl_lock net/netlink/genetlink.c:31
[ 287.580014] [< inline >] genl_lock_all net/netlink/genetlink.c:52
[ 287.580014] [<ffffffff86cba52e>]
__genl_register_family+0x2ce/0x1870 net/netlink/genetlink.c:374
[ 287.580014] [< inline >]
_genl_register_family_with_ops_grps include/net/genetlink.h:173
[ 287.580014] [<ffffffff8ab90c02>] genl_init+0x11d/0x185
net/netlink/genetlink.c:1084
[ 287.580014] [<ffffffff8100244b>] do_one_initcall+0xfb/0x3f0 init/main.c:778
[ 287.580014] [< inline >] do_initcall_level init/main.c:844
[ 287.580014] [< inline >] do_initcalls init/main.c:852
[ 287.580014] [< inline >] do_basic_setup init/main.c:870
[ 287.580014] [<ffffffff8aa3d03d>] kernel_init_freeable+0x5c4/0x69e
init/main.c:1017
[ 287.580014] [<ffffffff88129c88>] kernel_init+0x18/0x180 init/main.c:943
[ 287.580014] [<ffffffff8814a05a>] ret_from_fork+0x2a/0x40
arch/x86/entry/entry_64.S:433

[ 78.258919] [ INFO: inconsistent lock state ]
[ 78.258919] 4.9.0-rc5+ #54 Tainted: G W
[ 78.258919] ---------------------------------
[ 78.258919] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 78.258919] syz-fuzzer/5211 [HC0[0]:SC1[1]:HE1:SE0] takes:
[ 78.258919] ([ 78.258919] genl_mutex
){+.?.+.}[ 78.258919] , at:
[ 78.258919] [<ffffffff86cb5a11>] genl_lock_done+0x71/0xd0
[ 78.258919] {SOFTIRQ-ON-W} state was registered at:
[ 78.258919] [ 78.258919] [<ffffffff81567ad7>] __lock_acquire+0x6e7/0x3380
[ 78.258919] [ 78.258919] [<ffffffff8156b642>] lock_acquire+0x2a2/0x790
[ 78.258919] [ 78.258919] [<ffffffff88139aff>]
mutex_lock_nested+0x23f/0xf20
[ 78.258919] [ 78.258919] [<ffffffff86cba52e>]
__genl_register_family+0x2ce/0x1870
[ 78.258919] [ 78.258919] [<ffffffff8ab90c02>] genl_init+0x11d/0x185
[ 78.258919] [ 78.258919] [<ffffffff8100244b>] do_one_initcall+0xfb/0x3f0
[ 78.258919] [ 78.258919] [<ffffffff8aa3d03d>]
kernel_init_freeable+0x5c4/0x69e
[ 78.258919] [ 78.258919] [<ffffffff88129c88>] kernel_init+0x18/0x180
[ 78.258919] [ 78.258919] [<ffffffff8814a05a>] ret_from_fork+0x2a/0x40
[ 78.258919] irq event stamp: 149484
[ 78.258919] hardirqs last enabled at (149484): [ 78.258919]
[<ffffffff8814a7df>] restore_regs_and_iret+0x0/0x1d
[ 78.258919] hardirqs last disabled at (149483): [ 78.258919]
[<ffffffff8814bad7>] apic_timer_interrupt+0x87/0xa0
[ 78.258919] softirqs last enabled at (149302): [ 78.258919]
[<ffffffff8814da39>] __do_softirq+0x829/0xca8
[ 78.258919] softirqs last disabled at (149437): [ 78.258919]
[<ffffffff8141a941>] irq_exit+0x1d1/0x210

[ 78.258919]
[ 78.258919] other info that might help us debug this:
[ 78.258919] Possible unsafe locking scenario:
[ 78.258919]
[ 78.258919] CPU0
[ 78.258919] ----
[ 78.258919] lock([ 78.258919] genl_mutex
[ 78.258919] );
[ 78.258919] <Interrupt>
[ 78.258919] lock([ 78.258919] genl_mutex
[ 78.258919] );
[ 78.258919]
[ 78.258919] *** DEADLOCK ***
[ 78.258919]
[ 78.258919] 1 lock held by syz-fuzzer/5211:
[ 78.258919] #0: [ 78.258919] (
rcu_callback[ 78.258919] ){......}
, at: [ 78.258919] [<ffffffff815cbc43>] rcu_do_batch.isra.70+0x993/0xe20
[ 78.258919]
[ 78.258919] stack backtrace:

CPU: 0 PID: 32289 Comm: syz-executor Tainted: G W 4.9.0-rc5+ #54
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
ffff88003ec05db8 ffffffff834c2e39 ffffffff00000000 1ffff10007d80b4a
ffffed0007d80b42 0000000041b58ab3 ffffffff89575550 ffffffff834c2b4b
ffff88003948a340 ffff88003ec22cc0 ffff8800384dd280 0000000041b58ab3
Call Trace:
<IRQ> [ 287.580014] [< inline >] __dump_stack lib/dump_stack.c:15
<IRQ> [ 287.580014] [<ffffffff834c2e39>] dump_stack+0x2ee/0x3f5
lib/dump_stack.c:51
[<ffffffff815648df>] print_usage_bug+0x3ef/0x450 kernel/locking/lockdep.c:2388
[< inline >] valid_state kernel/locking/lockdep.c:2401
[< inline >] mark_lock_irq kernel/locking/lockdep.c:2599
[<ffffffff81565870>] mark_lock+0xf30/0x1410 kernel/locking/lockdep.c:3062
[< inline >] mark_irqflags kernel/locking/lockdep.c:2920
[<ffffffff8156811e>] __lock_acquire+0xd2e/0x3380 kernel/locking/lockdep.c:3292
[<ffffffff8156b642>] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3746
[< inline >] __mutex_lock_common kernel/locking/mutex.c:521
[<ffffffff88139aff>] mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
[< inline >] genl_lock net/netlink/genetlink.c:31
[<ffffffff86cb5a11>] genl_lock_done+0x71/0xd0 net/netlink/genetlink.c:531
[<ffffffff86ca5458>] netlink_sock_destruct+0xf8/0x400
net/netlink/af_netlink.c:331
[<ffffffff86a7b234>] __sk_destruct+0xf4/0x7f0 net/core/sock.c:1423
[<ffffffff86a87d6c>] sk_destruct+0x4c/0x80 net/core/sock.c:1453
[<ffffffff86a87dfc>] __sk_free+0x5c/0x230 net/core/sock.c:1461
[<ffffffff86a87ff8>] sk_free+0x28/0x30 net/core/sock.c:1472
[< inline >] sock_put include/net/sock.h:1591
[<ffffffff86ca6cd1>] deferred_put_nlk_sk+0x31/0x40 net/netlink/af_netlink.c:652
[< inline >] __rcu_reclaim kernel/rcu/rcu.h:118
[<ffffffff815cbc9d>] rcu_do_batch.isra.70+0x9ed/0xe20 kernel/rcu/tree.c:2776
[< inline >] invoke_rcu_callbacks kernel/rcu/tree.c:3040
[< inline >] __rcu_process_callbacks kernel/rcu/tree.c:3007
[<ffffffff815cc55c>] rcu_process_callbacks+0x48c/0xd70 kernel/rcu/tree.c:3024
[<ffffffff8814d53b>] __do_softirq+0x32b/0xca8 kernel/softirq.c:284
[< inline >] invoke_softirq kernel/softirq.c:364
[<ffffffff8141a941>] irq_exit+0x1d1/0x210 kernel/softirq.c:405
[< inline >] exiting_irq arch/x86/include/asm/apic.h:659
[<ffffffff8814ca30>] smp_apic_timer_interrupt+0x80/0xa0
arch/x86/kernel/apic/apic.c:960
[<ffffffff8814badc>] apic_timer_interrupt+0x8c/0xa0
arch/x86/entry/entry_64.S:489
<EOI> [ 287.580014] [<ffffffff8155c987>] ? lock_is_held+0x247/0x310
[<ffffffff814b6bde>] ___might_sleep+0x59e/0x660 kernel/sched/core.c:7729
[<ffffffff814b6d3a>] __might_sleep+0x9a/0x1a0 kernel/sched/core.c:7720
[<ffffffff88142d08>] down_read+0x78/0x160 kernel/locking/rwsem.c:21
[< inline >] anon_vma_lock_read include/linux/rmap.h:127
[<ffffffff81968295>] validate_mm+0xe5/0x880 mm/mmap.c:347
[<ffffffff8196bf0b>] vma_link+0x11b/0x180 mm/mmap.c:605
[<ffffffff81977f46>] mmap_region+0x1076/0x1880 mm/mmap.c:1692
[<ffffffff81978e4f>] do_mmap+0x6ff/0xe80 mm/mmap.c:1450
[< inline >] do_mmap_pgoff include/linux/mm.h:2039
[<ffffffff818fd527>] vm_mmap_pgoff+0x1b7/0x210 mm/util.c:305
[< inline >] SYSC_mmap_pgoff mm/mmap.c:1500
[<ffffffff8196f961>] SyS_mmap_pgoff+0x231/0x5e0 mm/mmap.c:1458
[< inline >] SYSC_mmap arch/x86/kernel/sys_x86_64.c:95
[<ffffffff8124bf4b>] SyS_mmap+0x1b/0x30 arch/x86/kernel/sys_x86_64.c:86
[<ffffffff88149dc5>] entry_SYSCALL_64_fastpath+0x23/0xc6

Eric Dumazet

unread,
Nov 26, 2016, 12:12:33 PM11/26/16
to Dmitry Vyukov, David Miller, Matti Vaittinen, Tycho Andersen, Cong Wang, Florian Westphal, stephen hemminger, Tom Herbert, netdev, LKML, Richard Guy Briggs, syzkaller
Issue was reported yesterday and is under investigation.


http://marc.info/?l=linux-netdev&m=148014004331663&w=2


Thanks !

suba...@codeaurora.org

unread,
Nov 29, 2016, 12:59:29 AM11/29/16
to Eric Dumazet, Dmitry Vyukov, David Miller, Matti Vaittinen, Tycho Andersen, Cong Wang, Florian Westphal, stephen hemminger, Tom Herbert, netdev, LKML, Richard Guy Briggs, syzkaller, netdev...@vger.kernel.org
>
> Issue was reported yesterday and is under investigation.
>
>
> http://marc.info/?l=linux-netdev&m=148014004331663&w=2
>
>
> Thanks !

Hi Dmitry

Can you try the patch below with your reproducer? I haven't seen similar
crashes reported after this (or even with Eric's patch).

https://patchwork.ozlabs.org/patch/699937/

--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
Linux Foundation Collaborative Project

Eric Dumazet

unread,
Nov 29, 2016, 1:06:02 AM11/29/16
to suba...@codeaurora.org, Eric Dumazet, Dmitry Vyukov, David Miller, Matti Vaittinen, Tycho Andersen, Cong Wang, Florian Westphal, stephen hemminger, Tom Herbert, netdev, LKML, Richard Guy Briggs, syzkaller, netdev...@vger.kernel.org
On Mon, 2016-11-28 at 22:59 -0700, suba...@codeaurora.org wrote:
> >
> > Issue was reported yesterday and is under investigation.
> >
> >
> > http://marc.info/?l=linux-netdev&m=148014004331663&w=2
> >
> >
> > Thanks !
>
> Hi Dmitry
>
> Can you try the patch below with your reproducer? I haven't seen similar
> crashes reported after this (or even with Eric's patch).
>
> https://patchwork.ozlabs.org/patch/699937/

Yeah, I will post my patch on top of this one.



Dmitry Vyukov

unread,
Dec 8, 2016, 11:16:28 AM12/8/16
to syzkaller, Eric Dumazet, David Miller, Matti Vaittinen, Tycho Andersen, Cong Wang, Florian Westphal, stephen hemminger, Tom Herbert, netdev, LKML, Richard Guy Briggs, netdev...@vger.kernel.org
On Tue, Nov 29, 2016 at 6:59 AM, <suba...@codeaurora.org> wrote:
>>
>> Issue was reported yesterday and is under investigation.
>>
>>
>> http://marc.info/?l=linux-netdev&m=148014004331663&w=2
>>
>>
>> Thanks !
>
>
> Hi Dmitry
>
> Can you try the patch below with your reproducer? I haven't seen similar
> crashes reported after this (or even with Eric's patch).

I've synced to 318c8932ddec5c1c26a4af0f3c053784841c598e (Dec 7) and do
_not_ see this report happening anymore.
Thanks.

Dmitry Vyukov

unread,
Dec 8, 2016, 12:16:54 PM12/8/16
to syzkaller, Eric Dumazet, David Miller, Matti Vaittinen, Tycho Andersen, Cong Wang, Florian Westphal, stephen hemminger, Tom Herbert, netdev, LKML, Richard Guy Briggs, netdev...@vger.kernel.org
But now I am seeing "possible deadlock" warnings involving genl_lock:

[ INFO: possible circular locking dependency detected ]
4.9.0-rc8+ #77 Not tainted
-------------------------------------------------------
syz-executor7/18794 is trying to acquire lock:
(rtnl_mutex){+.+.+.}, at: [<ffffffff86b4682c>] rtnl_lock+0x1c/0x20
net/core/rtnetlink.c:70
but task is already holding lock:
(genl_mutex){+.+.+.}, at: [< inline >] genl_lock
net/netlink/genetlink.c:31
(genl_mutex){+.+.+.}, at: [<ffffffff86cc27c9>]
genl_rcv_msg+0x209/0x260 net/netlink/genetlink.c:658
which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

[ 315.403815] [< inline >] validate_chain
kernel/locking/lockdep.c:2265
[ 315.403815] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
[ 315.403815] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
[ 315.403815] [< inline >] __mutex_lock_common
kernel/locking/mutex.c:521
[ 315.403815] [<ffffffff88195bcf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
[ 315.403815] [< inline >] genl_lock net/netlink/genetlink.c:31
[ 315.403815] [<ffffffff86cc0c26>] genl_lock_dumpit+0x46/0xa0
net/netlink/genetlink.c:518
[ 315.403815] [<ffffffff86cb33ac>] netlink_dump+0x57c/0xd70
net/netlink/af_netlink.c:2127
[ 315.403815] [<ffffffff86cb7b6a>]
__netlink_dump_start+0x4ea/0x760 net/netlink/af_netlink.c:2217
[ 315.403815] [<ffffffff86cc2319>]
genl_family_rcv_msg+0xdc9/0x1070 net/netlink/genetlink.c:586
[ 315.403815] [<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260
net/netlink/genetlink.c:660
[ 315.403815] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
[ 315.403815] [<ffffffff86cc153d>] genl_rcv+0x2d/0x40
net/netlink/genetlink.c:671
[ 315.403815] [< inline >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
[ 315.403815] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
[ 315.403815] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
[ 315.403815] [< inline >] sock_sendmsg_nosec net/socket.c:621
[ 315.403815] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
net/socket.c:631
[ 315.403815] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
net/socket.c:829
[ 315.403815] [< inline >] new_sync_write fs/read_write.c:499
[ 315.403815] [<ffffffff81a701ae>] __vfs_write+0x4fe/0x830
fs/read_write.c:512
[ 315.403815] [<ffffffff81a71c55>] vfs_write+0x175/0x4e0
fs/read_write.c:560
[ 315.403815] [< inline >] SYSC_write fs/read_write.c:607
[ 315.403815] [<ffffffff81a760e0>] SyS_write+0x100/0x240
fs/read_write.c:599
[ 315.403815] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6

[ 315.403815] [< inline >] validate_chain
kernel/locking/lockdep.c:2265
[ 315.403815] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
[ 315.403815] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
[ 315.403815] [< inline >] __mutex_lock_common
kernel/locking/mutex.c:521
[ 315.403815] [<ffffffff88195bcf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
[ 315.403815] [<ffffffff86cb7779>]
__netlink_dump_start+0xf9/0x760 net/netlink/af_netlink.c:2187
[ 315.403815] [< inline >] netlink_dump_start
include/linux/netlink.h:165
[ 315.403815] [<ffffffff86d14d48>]
ctnetlink_stat_ct_cpu+0x198/0x1e0
net/netfilter/nf_conntrack_netlink.c:2045
[ 315.403815] [<ffffffff86cd313e>]
nfnetlink_rcv_msg+0x9be/0xd60 net/netfilter/nfnetlink.c:212
[ 315.403815] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
[ 315.403815] [<ffffffff86cd1b71>] nfnetlink_rcv+0x7e1/0x10d0
net/netfilter/nfnetlink.c:474
[ 315.403815] [< inline >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
[ 315.403815] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
[ 315.403815] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
[ 315.403815] [< inline >] sock_sendmsg_nosec net/socket.c:621
[ 315.403815] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
net/socket.c:631
[ 315.403815] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
net/socket.c:829
[ 315.403815] [< inline >] new_sync_write fs/read_write.c:499
[ 315.403815] [<ffffffff81a701ae>] __vfs_write+0x4fe/0x830
fs/read_write.c:512
[ 315.403815] [<ffffffff81a71c55>] vfs_write+0x175/0x4e0
fs/read_write.c:560
[ 315.403815] [< inline >] SYSC_write fs/read_write.c:607
[ 315.403815] [<ffffffff81a760e0>] SyS_write+0x100/0x240
fs/read_write.c:599
[ 315.403815] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6

[ 315.403815] [< inline >] validate_chain
kernel/locking/lockdep.c:2265
[ 315.403815] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
[ 315.403815] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
[ 315.403815] [< inline >] __mutex_lock_common
kernel/locking/mutex.c:521
[ 315.403815] [<ffffffff88195bcf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
[ 315.403815] [<ffffffff86cd083d>] nfnl_lock+0x2d/0x30
net/netfilter/nfnetlink.c:61
[ 315.403815] [<ffffffff86d7c5b1>]
nf_tables_netdev_event+0x1f1/0x720
net/netfilter/nf_tables_netdev.c:122
[ 315.403815] [<ffffffff8149095a>]
notifier_call_chain+0x14a/0x2f0 kernel/notifier.c:93
[ 315.403815] [< inline >] __raw_notifier_call_chain
kernel/notifier.c:394
[ 315.403815] [<ffffffff81490b82>]
raw_notifier_call_chain+0x32/0x40 kernel/notifier.c:401
[ 315.403815] [<ffffffff86ae4af6>]
call_netdevice_notifiers_info+0x56/0x90 net/core/dev.c:1645
[ 315.403815] [< inline >] call_netdevice_notifiers
net/core/dev.c:1661
[ 315.403815] [<ffffffff86af898d>]
rollback_registered_many+0x73d/0xba0 net/core/dev.c:6759
[ 315.403815] [<ffffffff86af8e9e>]
rollback_registered+0xae/0x100 net/core/dev.c:6800
[ 315.403815] [<ffffffff86af8f76>]
unregister_netdevice_queue+0x86/0x140 net/core/dev.c:7787
[ 315.403815] [< inline >] unregister_netdevice
include/linux/netdevice.h:2455
[ 315.403815] [<ffffffff84912be6>] __tun_detach+0xc66/0xea0
drivers/net/tun.c:567
[ 315.808015] [< inline >] tun_detach drivers/net/tun.c:578
[ 315.808015] [<ffffffff84912e69>] tun_chr_close+0x49/0x60
drivers/net/tun.c:2350
[ 315.808015] [<ffffffff81a77f7e>] __fput+0x34e/0x910
fs/file_table.c:208
[ 315.808015] [<ffffffff81a785ca>] ____fput+0x1a/0x20
fs/file_table.c:244
[ 315.808015] [<ffffffff81483c20>] task_work_run+0x1a0/0x280
kernel/task_work.c:116
[ 315.808015] [< inline >] exit_task_work
include/linux/task_work.h:21
[ 315.808015] [<ffffffff814129e2>] do_exit+0x1842/0x2650
kernel/exit.c:828
[ 315.808015] [<ffffffff814139ae>] do_group_exit+0x14e/0x420
kernel/exit.c:932
[ 315.808015] [<ffffffff81442b43>] get_signal+0x663/0x1880
kernel/signal.c:2307
[ 315.808015] [<ffffffff81239b45>] do_signal+0xc5/0x2190
arch/x86/kernel/signal.c:807
[ 315.808015] [<ffffffff8100666a>]
exit_to_usermode_loop+0x1ea/0x2d0 arch/x86/entry/common.c:156
[ 315.808015] [< inline >] prepare_exit_to_usermode
arch/x86/entry/common.c:190
[ 315.808015] [<ffffffff81009693>]
syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259
[ 315.808015] [<ffffffff881a6026>] entry_SYSCALL_64_fastpath+0xc4/0xc6

[ 315.808015] [< inline >] check_prev_add
kernel/locking/lockdep.c:1828
[ 315.808015] [<ffffffff8156309b>]
check_prevs_add+0xaab/0x1c20 kernel/locking/lockdep.c:1938
[ 315.808015] [< inline >] validate_chain
kernel/locking/lockdep.c:2265
[ 315.808015] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
[ 315.808015] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
[ 315.808015] [< inline >] __mutex_lock_common
kernel/locking/mutex.c:521
[ 315.808015] [<ffffffff88195bcf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
[ 315.808015] [<ffffffff86b4682c>] rtnl_lock+0x1c/0x20
net/core/rtnetlink.c:70
[ 315.808015] [<ffffffff87b5cdf9>]
nl80211_pre_doit+0x309/0x5b0 net/wireless/nl80211.c:11750
[ 315.808015] [<ffffffff86cc1cd0>]
genl_family_rcv_msg+0x780/0x1070 net/netlink/genetlink.c:631
[ 315.808015] [<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260
net/netlink/genetlink.c:660
[ 315.808015] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
[ 315.808015] [<ffffffff86cc153d>] genl_rcv+0x2d/0x40
net/netlink/genetlink.c:671
[ 315.808015] [< inline >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
[ 315.808015] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
[ 315.808015] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
[ 315.808015] [< inline >] sock_sendmsg_nosec net/socket.c:621
[ 315.808015] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
net/socket.c:631
[ 315.808015] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
net/socket.c:829
[ 315.808015] [<ffffffff81a6f9a3>]
do_iter_readv_writev+0x363/0x670 fs/read_write.c:695
[ 315.808015] [<ffffffff81a723f1>] do_readv_writev+0x431/0x9b0
fs/read_write.c:872
[ 315.808015] [<ffffffff81a72f2c>] vfs_writev+0x8c/0xc0
fs/read_write.c:911
[ 315.808015] [<ffffffff81a73075>] do_writev+0x115/0x2d0
fs/read_write.c:944
[ 315.808015] [< inline >] SYSC_writev fs/read_write.c:1017
[ 315.808015] [<ffffffff81a7682c>] SyS_writev+0x2c/0x40
fs/read_write.c:1014
[ 315.808015] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6

other info that might help us debug this:

Chain exists of:
Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(genl_mutex);
lock(nlk->cb_mutex);
lock(genl_mutex);
lock(rtnl_mutex);

*** DEADLOCK ***

2 locks held by syz-executor7/18794:
#0: (cb_lock){++++++}, at: [<ffffffff86cc152e>] genl_rcv+0x1e/0x40
net/netlink/genetlink.c:670
#1: (genl_mutex){+.+.+.}, at: [< inline >] genl_lock
net/netlink/genetlink.c:31
#1: (genl_mutex){+.+.+.}, at: [<ffffffff86cc27c9>]
genl_rcv_msg+0x209/0x260 net/netlink/genetlink.c:658

stack backtrace:
CPU: 0 PID: 18794 Comm: syz-executor7 Not tainted 4.9.0-rc8+ #77
Hardware name: Google Google/Google, BIOS Google 01/01/2011
ffff88004add6468 ffffffff834c44f9 ffffffff00000000 1ffff100095bac20
ffffed00095bac18 0000000041b58ab3 ffffffff895816f0 ffffffff834c420b
0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff834c44f9>] dump_stack+0x2ee/0x3f5 lib/dump_stack.c:51
[<ffffffff81560cb0>] print_circular_bug+0x310/0x3c0
kernel/locking/lockdep.c:1202
[< inline >] check_prev_add kernel/locking/lockdep.c:1828
[<ffffffff8156309b>] check_prevs_add+0xaab/0x1c20 kernel/locking/lockdep.c:1938
[< inline >] validate_chain kernel/locking/lockdep.c:2265
[<ffffffff81569576>] __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
[<ffffffff8156b672>] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3749
[< inline >] __mutex_lock_common kernel/locking/mutex.c:521
[<ffffffff88195bcf>] mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
[<ffffffff86b4682c>] rtnl_lock+0x1c/0x20 net/core/rtnetlink.c:70
[<ffffffff87b5cdf9>] nl80211_pre_doit+0x309/0x5b0 net/wireless/nl80211.c:11750
[<ffffffff86cc1cd0>] genl_family_rcv_msg+0x780/0x1070
net/netlink/genetlink.c:631
[<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260 net/netlink/genetlink.c:660
[<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0 net/netlink/af_netlink.c:2298
[<ffffffff86cc153d>] genl_rcv+0x2d/0x40 net/netlink/genetlink.c:671
[< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1231
[<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740 net/netlink/af_netlink.c:1257
[<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50 net/netlink/af_netlink.c:1803
[< inline >] sock_sendmsg_nosec net/socket.c:621
[<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110 net/socket.c:631
[<ffffffff86a764fb>] sock_write_iter+0x32b/0x620 net/socket.c:829
[<ffffffff81a6f9a3>] do_iter_readv_writev+0x363/0x670 fs/read_write.c:695
[<ffffffff81a723f1>] do_readv_writev+0x431/0x9b0 fs/read_write.c:872
[<ffffffff81a72f2c>] vfs_writev+0x8c/0xc0 fs/read_write.c:911
[<ffffffff81a73075>] do_writev+0x115/0x2d0 fs/read_write.c:944
[< inline >] SYSC_writev fs/read_write.c:1017
[<ffffffff81a7682c>] SyS_writev+0x2c/0x40 fs/read_write.c:1014
[<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6

Dmitry Vyukov

unread,
Dec 8, 2016, 1:02:40 PM12/8/16
to syzkaller, Eric Dumazet, David Miller, Matti Vaittinen, Tycho Andersen, Cong Wang, Florian Westphal, stephen hemminger, Tom Herbert, netdev, LKML, Richard Guy Briggs, netdev...@vger.kernel.org
Probably a related one:

[ INFO: possible circular locking dependency detected ]
4.9.0-rc8+ #77 Not tainted
-------------------------------------------------------
syz-executor5/5777 is trying to acquire lock:
(genl_mutex){+.+.+.}, at: [< inline >] genl_lock
net/netlink/genetlink.c:31
(genl_mutex){+.+.+.}, at: [<ffffffff86cc0c26>]
genl_lock_dumpit+0x46/0xa0 net/netlink/genetlink.c:518
but task is already holding lock:
(nlk->cb_mutex){+.+.+.}, at: [<ffffffff86cb2f08>]
netlink_dump+0xd8/0xd70 net/netlink/af_netlink.c:2084
which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

[ 158.966653] [< inline >] validate_chain
kernel/locking/lockdep.c:2265
[ 158.966653] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
[ 158.966653] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
[ 158.966653] [< inline >] __mutex_lock_common
kernel/locking/mutex.c:521
[ 158.966653] [<ffffffff88195bcf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
[ 158.966653] [<ffffffff86cb7779>]
__netlink_dump_start+0xf9/0x760 net/netlink/af_netlink.c:2187
[ 158.966653] [< inline >] netlink_dump_start
include/linux/netlink.h:165
[ 158.966653] [<ffffffff86d1395f>]
ctnetlink_get_ct_unconfirmed+0x17f/0x220
net/netfilter/nf_conntrack_netlink.c:1369
[ 158.966653] [<ffffffff86cd313e>]
nfnetlink_rcv_msg+0x9be/0xd60 net/netfilter/nfnetlink.c:212
[ 158.966653] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
[ 158.966653] [<ffffffff86cd1b71>] nfnetlink_rcv+0x7e1/0x10d0
net/netfilter/nfnetlink.c:474
[ 158.966653] [< inline >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
[ 158.966653] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
[ 158.966653] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
[ 158.966653] [< inline >] sock_sendmsg_nosec net/socket.c:621
[ 158.966653] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
net/socket.c:631
[ 158.966653] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
net/socket.c:829
[ 158.966653] [< inline >] new_sync_write fs/read_write.c:499
[ 158.966653] [<ffffffff81a701ae>] __vfs_write+0x4fe/0x830
fs/read_write.c:512
[ 158.966653] [<ffffffff81a71c55>] vfs_write+0x175/0x4e0
fs/read_write.c:560
[ 158.966653] [< inline >] SYSC_write fs/read_write.c:607
[ 158.966653] [<ffffffff81a760e0>] SyS_write+0x100/0x240
fs/read_write.c:599
[ 158.966653] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6

[ 158.966653] [< inline >] validate_chain
kernel/locking/lockdep.c:2265
[ 158.966653] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
[ 158.966653] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
[ 158.966653] [< inline >] __mutex_lock_common
kernel/locking/mutex.c:521
[ 158.966653] [<ffffffff88195bcf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
[ 158.966653] [<ffffffff86cd083d>] nfnl_lock+0x2d/0x30
net/netfilter/nfnetlink.c:61
[ 158.966653] [<ffffffff86d7c5b1>]
nf_tables_netdev_event+0x1f1/0x720
net/netfilter/nf_tables_netdev.c:122
[ 158.966653] [<ffffffff8149095a>]
notifier_call_chain+0x14a/0x2f0 kernel/notifier.c:93
[ 158.966653] [< inline >] __raw_notifier_call_chain
kernel/notifier.c:394
[ 158.966653] [<ffffffff81490b82>]
raw_notifier_call_chain+0x32/0x40 kernel/notifier.c:401
[ 158.966653] [<ffffffff86ae4af6>]
call_netdevice_notifiers_info+0x56/0x90 net/core/dev.c:1645
[ 158.966653] [< inline >] call_netdevice_notifiers
net/core/dev.c:1661
[ 158.966653] [<ffffffff86af898d>]
rollback_registered_many+0x73d/0xba0 net/core/dev.c:6759
[ 158.966653] [<ffffffff86af8e9e>]
rollback_registered+0xae/0x100 net/core/dev.c:6800
[ 158.966653] [<ffffffff86af8f76>]
unregister_netdevice_queue+0x86/0x140 net/core/dev.c:7787
[ 158.966653] [< inline >] unregister_netdevice
include/linux/netdevice.h:2455
[ 158.966653] [<ffffffff84912be6>] __tun_detach+0xc66/0xea0
drivers/net/tun.c:567
[ 158.966653] [< inline >] tun_detach drivers/net/tun.c:578
[ 158.966653] [<ffffffff84912e69>] tun_chr_close+0x49/0x60
drivers/net/tun.c:2350
[ 158.966653] [<ffffffff81a77f7e>] __fput+0x34e/0x910
fs/file_table.c:208
[ 158.966653] [<ffffffff81a785ca>] ____fput+0x1a/0x20
fs/file_table.c:244
[ 158.966653] [<ffffffff81483c20>] task_work_run+0x1a0/0x280
kernel/task_work.c:116
[ 158.966653] [< inline >] exit_task_work
include/linux/task_work.h:21
[ 158.966653] [<ffffffff814129e2>] do_exit+0x1842/0x2650
kernel/exit.c:828
[ 158.966653] [<ffffffff814139ae>] do_group_exit+0x14e/0x420
kernel/exit.c:932
[ 159.308048] [<ffffffff81442b43>] get_signal+0x663/0x1880
kernel/signal.c:2307
[ 159.308048] [<ffffffff81239b45>] do_signal+0xc5/0x2190
arch/x86/kernel/signal.c:807
[ 159.308048] [<ffffffff8100666a>]
exit_to_usermode_loop+0x1ea/0x2d0 arch/x86/entry/common.c:156
[ 159.308048] [< inline >] prepare_exit_to_usermode
arch/x86/entry/common.c:190
[ 159.308048] [<ffffffff81009693>]
syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259
[ 159.308048] [<ffffffff881a6026>] entry_SYSCALL_64_fastpath+0xc4/0xc6

[ 159.308048] [< inline >] validate_chain
kernel/locking/lockdep.c:2265
[ 159.308048] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
[ 159.308048] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
[ 159.308048] [< inline >] __mutex_lock_common
kernel/locking/mutex.c:521
[ 159.308048] [<ffffffff88195bcf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
[ 159.308048] [<ffffffff86b4682c>] rtnl_lock+0x1c/0x20
net/core/rtnetlink.c:70
[ 159.308048] [<ffffffff87b5cdf9>]
nl80211_pre_doit+0x309/0x5b0 net/wireless/nl80211.c:11750
[ 159.308048] [<ffffffff86cc1cd0>]
genl_family_rcv_msg+0x780/0x1070 net/netlink/genetlink.c:631
[ 159.308048] [<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260
net/netlink/genetlink.c:660
[ 159.308048] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
[ 159.308048] [<ffffffff86cc153d>] genl_rcv+0x2d/0x40
net/netlink/genetlink.c:671
[ 159.308048] [< inline >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
[ 159.308048] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
[ 159.308048] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
[ 159.308048] [< inline >] sock_sendmsg_nosec net/socket.c:621
[ 159.308048] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
net/socket.c:631
[ 159.308048] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
net/socket.c:829
[ 159.308048] [<ffffffff81a6f9a3>]
do_iter_readv_writev+0x363/0x670 fs/read_write.c:695
[ 159.308048] [<ffffffff81a723f1>] do_readv_writev+0x431/0x9b0
fs/read_write.c:872
[ 159.308048] [<ffffffff81a72f2c>] vfs_writev+0x8c/0xc0
fs/read_write.c:911
[ 159.308048] [<ffffffff81a73075>] do_writev+0x115/0x2d0
fs/read_write.c:944
[ 159.308048] [< inline >] SYSC_writev fs/read_write.c:1017
[ 159.308048] [<ffffffff81a7682c>] SyS_writev+0x2c/0x40
fs/read_write.c:1014
[ 159.308048] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6

[ 159.308048] [< inline >] check_prev_add
kernel/locking/lockdep.c:1828
[ 159.308048] [<ffffffff8156309b>]
check_prevs_add+0xaab/0x1c20 kernel/locking/lockdep.c:1938
[ 159.308048] [< inline >] validate_chain
kernel/locking/lockdep.c:2265
[ 159.308048] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
[ 159.308048] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
[ 159.308048] [< inline >] __mutex_lock_common
kernel/locking/mutex.c:521
[ 159.308048] [<ffffffff88195bcf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
[ 159.308048] [< inline >] genl_lock net/netlink/genetlink.c:31
[ 159.308048] [<ffffffff86cc0c26>] genl_lock_dumpit+0x46/0xa0
net/netlink/genetlink.c:518
[ 159.308048] [<ffffffff86cb33ac>] netlink_dump+0x57c/0xd70
net/netlink/af_netlink.c:2127
[ 159.308048] [<ffffffff86cb7b6a>]
__netlink_dump_start+0x4ea/0x760 net/netlink/af_netlink.c:2217
[ 159.308048] [<ffffffff86cc2319>]
genl_family_rcv_msg+0xdc9/0x1070 net/netlink/genetlink.c:586
[ 159.308048] [<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260
net/netlink/genetlink.c:660
[ 159.308048] [<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
[ 159.308048] [<ffffffff86cc153d>] genl_rcv+0x2d/0x40
net/netlink/genetlink.c:671
[ 159.308048] [< inline >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
[ 159.308048] [<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
[ 159.308048] [<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
[ 159.308048] [< inline >] sock_sendmsg_nosec net/socket.c:621
[ 159.308048] [<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110
net/socket.c:631
[ 159.308048] [<ffffffff86a764fb>] sock_write_iter+0x32b/0x620
net/socket.c:829
[ 159.308048] [< inline >] new_sync_write fs/read_write.c:499
[ 159.308048] [<ffffffff81a701ae>] __vfs_write+0x4fe/0x830
fs/read_write.c:512
[ 159.308048] [<ffffffff81a71c55>] vfs_write+0x175/0x4e0
fs/read_write.c:560
[ 159.308048] [< inline >] SYSC_write fs/read_write.c:607
[ 159.308048] [<ffffffff81a760e0>] SyS_write+0x100/0x240
fs/read_write.c:599
[ 159.308048] [<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6

other info that might help us debug this:

Chain exists of:
Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(nlk->cb_mutex);
lock(&table[i].mutex);
lock(nlk->cb_mutex);
lock(genl_mutex);

*** DEADLOCK ***

2 locks held by syz-executor5/5777:
#0: (cb_lock){++++++}, at: [<ffffffff86cc152e>] genl_rcv+0x1e/0x40
net/netlink/genetlink.c:670
#1: (nlk->cb_mutex){+.+.+.}, at: [<ffffffff86cb2f08>]
netlink_dump+0xd8/0xd70 net/netlink/af_netlink.c:2084

stack backtrace:
CPU: 1 PID: 5777 Comm: syz-executor5 Not tainted 4.9.0-rc8+ #77
Hardware name: Google Google/Google, BIOS Google 01/01/2011
ffff88005fe363e8 ffffffff834c44f9 ffffffff00000001 1ffff1000bfc6c10
ffffed000bfc6c08 0000000041b58ab3 ffffffff895816f0 ffffffff834c420b
0000000000000000 0000000000000000 0000000000000000 dffffc0000000000
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff834c44f9>] dump_stack+0x2ee/0x3f5 lib/dump_stack.c:51
[<ffffffff81560cb0>] print_circular_bug+0x310/0x3c0
kernel/locking/lockdep.c:1202
[< inline >] check_prev_add kernel/locking/lockdep.c:1828
[<ffffffff8156309b>] check_prevs_add+0xaab/0x1c20 kernel/locking/lockdep.c:1938
[< inline >] validate_chain kernel/locking/lockdep.c:2265
[<ffffffff81569576>] __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
[<ffffffff8156b672>] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3749
[< inline >] __mutex_lock_common kernel/locking/mutex.c:521
[<ffffffff88195bcf>] mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
[< inline >] genl_lock net/netlink/genetlink.c:31
[<ffffffff86cc0c26>] genl_lock_dumpit+0x46/0xa0 net/netlink/genetlink.c:518
[<ffffffff86cb33ac>] netlink_dump+0x57c/0xd70 net/netlink/af_netlink.c:2127
[<ffffffff86cb7b6a>] __netlink_dump_start+0x4ea/0x760
net/netlink/af_netlink.c:2217
[<ffffffff86cc2319>] genl_family_rcv_msg+0xdc9/0x1070
net/netlink/genetlink.c:586
[<ffffffff86cc2770>] genl_rcv_msg+0x1b0/0x260 net/netlink/genetlink.c:660
[<ffffffff86cc034c>] netlink_rcv_skb+0x2bc/0x3a0 net/netlink/af_netlink.c:2298
[<ffffffff86cc153d>] genl_rcv+0x2d/0x40 net/netlink/genetlink.c:671
[< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1231
[<ffffffff86cbeb6a>] netlink_unicast+0x51a/0x740 net/netlink/af_netlink.c:1257
[<ffffffff86cbf834>] netlink_sendmsg+0xaa4/0xe50 net/netlink/af_netlink.c:1803
[< inline >] sock_sendmsg_nosec net/socket.c:621
[<ffffffff86a7618f>] sock_sendmsg+0xcf/0x110 net/socket.c:631
[<ffffffff86a764fb>] sock_write_iter+0x32b/0x620 net/socket.c:829
[< inline >] new_sync_write fs/read_write.c:499
[<ffffffff81a701ae>] __vfs_write+0x4fe/0x830 fs/read_write.c:512
[<ffffffff81a71c55>] vfs_write+0x175/0x4e0 fs/read_write.c:560
[< inline >] SYSC_write fs/read_write.c:607
[<ffffffff81a760e0>] SyS_write+0x100/0x240 fs/read_write.c:599
[<ffffffff881a5f85>] entry_SYSCALL_64_fastpath+0x23/0xc6

Cong Wang

unread,
Dec 8, 2016, 7:14:11 PM12/8/16
to Dmitry Vyukov, syzkaller, Eric Dumazet, David Miller, Matti Vaittinen, Tycho Andersen, Florian Westphal, stephen hemminger, Tom Herbert, netdev, LKML, Richard Guy Briggs, netdev...@vger.kernel.org
On Thu, Dec 8, 2016 at 10:02 AM, Dmitry Vyukov <dvy...@google.com> wrote:
> Chain exists of:
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(nlk->cb_mutex);
> lock(&table[i].mutex);
> lock(nlk->cb_mutex);
> lock(genl_mutex);

Similar to the unix bindlock, this one looks false positive to me too.

Cong Wang

unread,
Dec 8, 2016, 7:32:31 PM12/8/16
to Dmitry Vyukov, syzkaller, Eric Dumazet, David Miller, Matti Vaittinen, Tycho Andersen, Florian Westphal, stephen hemminger, Tom Herbert, netdev, LKML, Richard Guy Briggs, netdev...@vger.kernel.org
On Thu, Dec 8, 2016 at 9:16 AM, Dmitry Vyukov <dvy...@google.com> wrote:
> Chain exists of:
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(genl_mutex);
> lock(nlk->cb_mutex);
> lock(genl_mutex);
> lock(rtnl_mutex);
>
> *** DEADLOCK ***

This one looks legitimate, because nlk->cb_mutex could be rtnl_mutex.
Let me think about it.

Cong Wang

unread,
Dec 9, 2016, 12:09:08 AM12/9/16
to Dmitry Vyukov, syzkaller, Eric Dumazet, David Miller, Matti Vaittinen, Tycho Andersen, Florian Westphal, stephen hemminger, Tom Herbert, netdev, LKML, Richard Guy Briggs, netdev...@vger.kernel.org
Never mind. Actually both reports in this thread are legitimate.

I know what happened now, the lock chain is so long, 4 locks are involved
to form a chain!!!

Let me think about how to break the chain.

Dmitry Vyukov

unread,
Dec 11, 2016, 4:41:15 AM12/11/16
to Cong Wang, syzkaller, Eric Dumazet, David Miller, Matti Vaittinen, Tycho Andersen, Florian Westphal, stephen hemminger, Tom Herbert, netdev, LKML, Richard Guy Briggs, netdev...@vger.kernel.org
Seems to be a related one, now on nfnl_lock :



[ INFO: possible circular locking dependency detected ]
4.9.0-rc8+ #82 Not tainted
-------------------------------------------------------
syz-executor3/10151 is trying to acquire lock:
(&table[i].mutex){+.+.+.}, at: [<ffffffff86c96f1d>]
nfnl_lock+0x2d/0x30 net/netfilter/nfnetlink.c:61
but task is already holding lock:
(rtnl_mutex){+.+.+.}, at: [<ffffffff86b0cf0c>] rtnl_lock+0x1c/0x20
net/core/rtnetlink.c:70
which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

[ 231.942041] [< inline >] validate_chain
kernel/locking/lockdep.c:2265
[ 231.942041] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
floppy0: disk absent or changed during operation
floppy0: disk absent or changed during operation
[ 231.950342] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
[ 231.950342] [< inline >] __mutex_lock_common
kernel/locking/mutex.c:521
[ 231.950342] [<ffffffff8815c2bf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
[ 231.950342] [<ffffffff86b0cf0c>] rtnl_lock+0x1c/0x20
net/core/rtnetlink.c:70
[ 231.950342] [<ffffffff87b234e9>]
nl80211_pre_doit+0x309/0x5b0 net/wireless/nl80211.c:11750
[ 231.950342] [<ffffffff86c883b0>]
genl_family_rcv_msg+0x780/0x1070 net/netlink/genetlink.c:631
[ 231.950342] [<ffffffff86c88e50>] genl_rcv_msg+0x1b0/0x260
net/netlink/genetlink.c:660
[ 231.950342] [<ffffffff86c86a2c>] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
[ 231.950342] [<ffffffff86c87c1d>] genl_rcv+0x2d/0x40
net/netlink/genetlink.c:671
[ 231.950342] [< inline >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
[ 231.950342] [<ffffffff86c8524a>] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
[ 231.950342] [<ffffffff86c85f14>] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
[ 231.950342] [< inline >] sock_sendmsg_nosec net/socket.c:621
[ 231.950342] [<ffffffff86a3c86f>] sock_sendmsg+0xcf/0x110
net/socket.c:631
[ 231.950342] [<ffffffff86a3cbdb>] sock_write_iter+0x32b/0x620
net/socket.c:829
[ 231.950342] [< inline >] new_sync_write fs/read_write.c:499
[ 231.950342] [<ffffffff81a7021e>] __vfs_write+0x4fe/0x830
fs/read_write.c:512
[ 231.950342] [<ffffffff81a71cc5>] vfs_write+0x175/0x4e0
fs/read_write.c:560
[ 231.950342] [< inline >] SYSC_write fs/read_write.c:607
[ 231.950342] [<ffffffff81a76150>] SyS_write+0x100/0x240
fs/read_write.c:599
[ 231.950342] [<ffffffff8816c685>] entry_SYSCALL_64_fastpath+0x23/0xc6

[ 231.950342] [< inline >] validate_chain
kernel/locking/lockdep.c:2265
[ 231.950342] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
[ 231.950342] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
[ 231.950342] [< inline >] __mutex_lock_common
kernel/locking/mutex.c:521
[ 231.950342] [<ffffffff8815c2bf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
[ 231.950342] [< inline >] genl_lock net/netlink/genetlink.c:31
[ 231.950342] [<ffffffff86c87306>] genl_lock_dumpit+0x46/0xa0
net/netlink/genetlink.c:518
[ 231.950342] [<ffffffff86c79a8c>] netlink_dump+0x57c/0xd70
net/netlink/af_netlink.c:2127
[ 231.950342] [<ffffffff86c7e24a>]
__netlink_dump_start+0x4ea/0x760 net/netlink/af_netlink.c:2217
[ 231.950342] [<ffffffff86c889f9>]
genl_family_rcv_msg+0xdc9/0x1070 net/netlink/genetlink.c:586
[ 231.950342] [<ffffffff86c88e50>] genl_rcv_msg+0x1b0/0x260
net/netlink/genetlink.c:660
[ 231.950342] [<ffffffff86c86a2c>] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
[ 231.950342] [<ffffffff86c87c1d>] genl_rcv+0x2d/0x40
net/netlink/genetlink.c:671
[ 231.950342] [< inline >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
[ 231.950342] [<ffffffff86c8524a>] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
[ 231.950342] [<ffffffff86c85f14>] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
[ 231.950342] [< inline >] sock_sendmsg_nosec net/socket.c:621
[ 231.950342] [<ffffffff86a3c86f>] sock_sendmsg+0xcf/0x110
net/socket.c:631
[ 231.950342] [<ffffffff86a3cbdb>] sock_write_iter+0x32b/0x620
net/socket.c:829
[ 231.950342] [<ffffffff81a6fa13>]
do_iter_readv_writev+0x363/0x670 fs/read_write.c:695
[ 231.950342] [<ffffffff81a72461>] do_readv_writev+0x431/0x9b0
fs/read_write.c:872
[ 231.950342] [<ffffffff81a72f9c>] vfs_writev+0x8c/0xc0
fs/read_write.c:911
[ 231.950342] [<ffffffff81a730e5>] do_writev+0x115/0x2d0
fs/read_write.c:944
[ 231.950342] [< inline >] SYSC_writev fs/read_write.c:1017
[ 231.950342] [<ffffffff81a7689c>] SyS_writev+0x2c/0x40
fs/read_write.c:1014
[ 231.950342] [<ffffffff8816c685>] entry_SYSCALL_64_fastpath+0x23/0xc6

[ 231.950342] [< inline >] validate_chain
kernel/locking/lockdep.c:2265
[ 231.950342] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
[ 231.950342] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
[ 231.950342] [< inline >] __mutex_lock_common
kernel/locking/mutex.c:521
[ 231.950342] [<ffffffff8815c2bf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
[ 231.950342] [<ffffffff86c7de59>]
__netlink_dump_start+0xf9/0x760 net/netlink/af_netlink.c:2187
[ 231.950342] [< inline >] netlink_dump_start
include/linux/netlink.h:165
[ 231.950342] [<ffffffff86d9d964>] ip_set_dump+0x204/0x2b0
net/netfilter/ipset/ip_set_core.c:1447
[ 231.950342] [<ffffffff86c9981e>]
nfnetlink_rcv_msg+0x9be/0xd60 net/netfilter/nfnetlink.c:212
[ 231.950342] [<ffffffff86c86a2c>] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
[ 231.950342] [<ffffffff86c98251>] nfnetlink_rcv+0x7e1/0x10d0
net/netfilter/nfnetlink.c:474
[ 231.950342] [< inline >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
[ 231.950342] [<ffffffff86c8524a>] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
[ 231.950342] [<ffffffff86c85f14>] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
[ 231.950342] [< inline >] sock_sendmsg_nosec net/socket.c:621
[ 231.950342] [<ffffffff86a3c86f>] sock_sendmsg+0xcf/0x110
net/socket.c:631
[ 231.950342] [<ffffffff86a3cbdb>] sock_write_iter+0x32b/0x620
net/socket.c:829
[ 231.950342] [< inline >] new_sync_write fs/read_write.c:499
[ 231.950342] [<ffffffff81a7021e>] __vfs_write+0x4fe/0x830
fs/read_write.c:512
[ 231.950342] [<ffffffff81a71cc5>] vfs_write+0x175/0x4e0
fs/read_write.c:560
[ 231.950342] [< inline >] SYSC_write fs/read_write.c:607
[ 231.950342] [<ffffffff81a76150>] SyS_write+0x100/0x240
fs/read_write.c:599
[ 231.950342] [<ffffffff8816c685>] entry_SYSCALL_64_fastpath+0x23/0xc6

[ 231.950342] [< inline >] check_prev_add
kernel/locking/lockdep.c:1828
[ 231.950342] [<ffffffff8156309b>]
check_prevs_add+0xaab/0x1c20 kernel/locking/lockdep.c:1938
[ 231.950342] [< inline >] validate_chain
kernel/locking/lockdep.c:2265
[ 231.950342] [<ffffffff81569576>]
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
[ 231.950342] [<ffffffff8156b672>] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
[ 231.950342] [< inline >] __mutex_lock_common
kernel/locking/mutex.c:521
[ 231.950342] [<ffffffff8815c2bf>]
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
[ 231.950342] [<ffffffff86c96f1d>] nfnl_lock+0x2d/0x30
net/netfilter/nfnetlink.c:61
[ 231.950342] [<ffffffff86d42c91>]
nf_tables_netdev_event+0x1f1/0x720
net/netfilter/nf_tables_netdev.c:122
[ 231.950342] [<ffffffff8149095a>]
notifier_call_chain+0x14a/0x2f0 kernel/notifier.c:93
[ 231.950342] [< inline >] __raw_notifier_call_chain
kernel/notifier.c:394
[ 231.950342] [<ffffffff81490b82>]
raw_notifier_call_chain+0x32/0x40 kernel/notifier.c:401
[ 231.950342] [<ffffffff86aab1d6>]
call_netdevice_notifiers_info+0x56/0x90 net/core/dev.c:1645
[ 231.950342] [< inline >] call_netdevice_notifiers
net/core/dev.c:1661
[ 231.950342] [<ffffffff86abf06d>]
rollback_registered_many+0x73d/0xba0 net/core/dev.c:6759
[ 231.950342] [<ffffffff86abf57e>]
rollback_registered+0xae/0x100 net/core/dev.c:6800
[ 231.950342] [<ffffffff86abf656>]
unregister_netdevice_queue+0x86/0x140 net/core/dev.c:7787
[ 231.950342] [< inline >] unregister_netdevice
include/linux/netdevice.h:2455
[ 231.950342] [<ffffffff848d9296>] __tun_detach+0xc66/0xea0
drivers/net/tun.c:567
[ 231.950342] [< inline >] tun_detach drivers/net/tun.c:578
[ 231.950342] [<ffffffff848d9519>] tun_chr_close+0x49/0x60
drivers/net/tun.c:2350
[ 231.950342] [<ffffffff81a77fee>] __fput+0x34e/0x910
fs/file_table.c:208
[ 231.950342] [<ffffffff81a7863a>] ____fput+0x1a/0x20
fs/file_table.c:244
[ 231.950342] [<ffffffff81483c20>] task_work_run+0x1a0/0x280
kernel/task_work.c:116
[ 231.950342] [< inline >] exit_task_work
include/linux/task_work.h:21
[ 231.950342] [<ffffffff814129e2>] do_exit+0x1842/0x2650
kernel/exit.c:828
[ 231.950342] [<ffffffff814139ae>] do_group_exit+0x14e/0x420
kernel/exit.c:932
[ 231.950342] [<ffffffff81442b43>] get_signal+0x663/0x1880
kernel/signal.c:2307
[ 231.950342] [<ffffffff81239b45>] do_signal+0xc5/0x2190
arch/x86/kernel/signal.c:807
[ 231.950342] [<ffffffff8100666a>]
exit_to_usermode_loop+0x1ea/0x2d0 arch/x86/entry/common.c:156
[ 231.950342] [< inline >] prepare_exit_to_usermode
arch/x86/entry/common.c:190
[ 231.950342] [<ffffffff81009693>]
syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259
[ 231.950342] [<ffffffff8816c726>] entry_SYSCALL_64_fastpath+0xc4/0xc6

other info that might help us debug this:

Chain exists of:
Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(rtnl_mutex);
lock(genl_mutex);
lock(rtnl_mutex);
lock(&table[i].mutex);

*** DEADLOCK ***

1 lock held by syz-executor3/10151:
#0: (rtnl_mutex){+.+.+.}, at: [<ffffffff86b0cf0c>]
rtnl_lock+0x1c/0x20 net/core/rtnetlink.c:70

stack backtrace:
CPU: 2 PID: 10151 Comm: syz-executor3 Not tainted 4.9.0-rc8+ #82
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
ffff8800311057f8 ffffffff8348fc59 ffffffff00000002 1ffff10006220a92
ffffed0006220a8a 0000000041b58ab3 ffffffff8957cf18 ffffffff8348f96b
ffffffff894eb258 ffffffff81564970 ffffffff8b565c30 ffffffff8b8e5020
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff8348fc59>] dump_stack+0x2ee/0x3f5 lib/dump_stack.c:51
[<ffffffff81560cb0>] print_circular_bug+0x310/0x3c0
kernel/locking/lockdep.c:1202
[< inline >] check_prev_add kernel/locking/lockdep.c:1828
[<ffffffff8156309b>] check_prevs_add+0xaab/0x1c20 kernel/locking/lockdep.c:1938
[< inline >] validate_chain kernel/locking/lockdep.c:2265
[<ffffffff81569576>] __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
[<ffffffff8156b672>] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3749
[< inline >] __mutex_lock_common kernel/locking/mutex.c:521
[<ffffffff8815c2bf>] mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
[<ffffffff86c96f1d>] nfnl_lock+0x2d/0x30 net/netfilter/nfnetlink.c:61
[<ffffffff86d42c91>] nf_tables_netdev_event+0x1f1/0x720
net/netfilter/nf_tables_netdev.c:122
[<ffffffff8149095a>] notifier_call_chain+0x14a/0x2f0 kernel/notifier.c:93
[< inline >] __raw_notifier_call_chain kernel/notifier.c:394
[<ffffffff81490b82>] raw_notifier_call_chain+0x32/0x40 kernel/notifier.c:401
[<ffffffff86aab1d6>] call_netdevice_notifiers_info+0x56/0x90
net/core/dev.c:1645
[< inline >] call_netdevice_notifiers net/core/dev.c:1661
[<ffffffff86abf06d>] rollback_registered_many+0x73d/0xba0 net/core/dev.c:6759
[<ffffffff86abf57e>] rollback_registered+0xae/0x100 net/core/dev.c:6800
[<ffffffff86abf656>] unregister_netdevice_queue+0x86/0x140 net/core/dev.c:7787
[< inline >] unregister_netdevice include/linux/netdevice.h:2455
[<ffffffff848d9296>] __tun_detach+0xc66/0xea0 drivers/net/tun.c:567
[< inline >] tun_detach drivers/net/tun.c:578
[<ffffffff848d9519>] tun_chr_close+0x49/0x60 drivers/net/tun.c:2350
[<ffffffff81a77fee>] __fput+0x34e/0x910 fs/file_table.c:208
[<ffffffff81a7863a>] ____fput+0x1a/0x20 fs/file_table.c:244
[<ffffffff81483c20>] task_work_run+0x1a0/0x280 kernel/task_work.c:116
[< inline >] exit_task_work include/linux/task_work.h:21
[<ffffffff814129e2>] do_exit+0x1842/0x2650 kernel/exit.c:828
[<ffffffff814139ae>] do_group_exit+0x14e/0x420 kernel/exit.c:932
[<ffffffff81442b43>] get_signal+0x663/0x1880 kernel/signal.c:2307
[<ffffffff81239b45>] do_signal+0xc5/0x2190 arch/x86/kernel/signal.c:807
[<ffffffff8100666a>] exit_to_usermode_loop+0x1ea/0x2d0
arch/x86/entry/common.c:156
[< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190
[<ffffffff81009693>] syscall_return_slowpath+0x4d3/0x570
arch/x86/entry/common.c:259
[<ffffffff8816c726>] entry_SYSCALL_64_fastpath+0xc4/0xc6

Dmitry Vyukov

unread,
Jan 29, 2017, 5:11:37 AM1/29/17
to Cong Wang, syzkaller, Eric Dumazet, David Miller, Matti Vaittinen, Tycho Andersen, Florian Westphal, stephen hemminger, Tom Herbert, netdev, LKML, Richard Guy Briggs, netdev...@vger.kernel.org
On Fri, Dec 9, 2016 at 6:08 AM, Cong Wang <xiyou.w...@gmail.com> wrote:
>>> Chain exists of:
>>> Possible unsafe locking scenario:
>>>
>>> CPU0 CPU1
>>> ---- ----
>>> lock(genl_mutex);
>>> lock(nlk->cb_mutex);
>>> lock(genl_mutex);
>>> lock(rtnl_mutex);
>>>
>>> *** DEADLOCK ***
>>
>> This one looks legitimate, because nlk->cb_mutex could be rtnl_mutex.
>> Let me think about it.
>
> Never mind. Actually both reports in this thread are legitimate.
>
> I know what happened now, the lock chain is so long, 4 locks are involved
> to form a chain!!!
>
> Let me think about how to break the chain.


Cong, any success with breaking the chain?

Still happenning on f0ad17712b9f71c24e2b8b9725230ef57232377f. Or is it
a different one?


[ INFO: possible circular locking dependency detected ]
4.10.0-rc3+ #4 Not tainted
-------------------------------------------------------
syz-executor9/2705 is trying to acquire lock:
(genl_mutex){+.+.+.}, at: [<ffffffff836f58fe>] genl_lock
net/netlink/genetlink.c:32 [inline]
(genl_mutex){+.+.+.}, at: [<ffffffff836f58fe>]
genl_family_rcv_msg+0xdae/0x1040 net/netlink/genetlink.c:547

but task is already holding lock:
(rtnl_mutex){+.+.+.}, at: [<ffffffff836416e7>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:70

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (rtnl_mutex){+.+.+.}:

[<ffffffff8157e729>] validate_chain kernel/locking/lockdep.c:2265 [inline]
[<ffffffff8157e729>] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338
[<ffffffff815808b1>] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
[<ffffffff843f9de0>] __mutex_lock_common kernel/locking/mutex.c:639 [inline]
[<ffffffff843f9de0>] mutex_lock_nested+0x290/0x1730 kernel/locking/mutex.c:753
[<ffffffff836416e7>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70
[<ffffffff83fd5e9e>] nl80211_pre_doit+0x2fe/0x570 net/wireless/nl80211.c:11847
[<ffffffff836f52b0>] genl_family_rcv_msg+0x760/0x1040
net/netlink/genetlink.c:591
[<ffffffff836f807a>] genl_rcv_msg+0x19a/0x330 net/netlink/genetlink.c:620
[<ffffffff836f36cb>] netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298
[<ffffffff836f4b38>] genl_rcv+0x28/0x40 net/netlink/genetlink.c:631
[<ffffffff836f1f14>] netlink_unicast_kernel
net/netlink/af_netlink.c:1231 [inline]
[<ffffffff836f1f14>] netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257
[<ffffffff836f2bcf>] netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803
[<ffffffff83572d3a>] sock_sendmsg_nosec net/socket.c:635 [inline]
[<ffffffff83572d3a>] sock_sendmsg+0xca/0x110 net/socket.c:645
[<ffffffff8357557a>] ___sys_sendmsg+0x8fa/0x9f0 net/socket.c:1985
[<ffffffff83578138>] __sys_sendmsg+0x138/0x300 net/socket.c:2019
[<ffffffff8357832d>] SYSC_sendmsg net/socket.c:2030 [inline]
[<ffffffff8357832d>] SyS_sendmsg+0x2d/0x50 net/socket.c:2026
[<ffffffff8440e7c1>] entry_SYSCALL_64_fastpath+0x1f/0xc2

-> #0 (genl_mutex){+.+.+.}:

[<ffffffff8157847f>] check_prev_add kernel/locking/lockdep.c:1828 [inline]
[<ffffffff8157847f>] check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1938
[<ffffffff8157e729>] validate_chain kernel/locking/lockdep.c:2265 [inline]
[<ffffffff8157e729>] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338
[<ffffffff815808b1>] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
[<ffffffff843f9de0>] __mutex_lock_common kernel/locking/mutex.c:639 [inline]
[<ffffffff843f9de0>] mutex_lock_nested+0x290/0x1730 kernel/locking/mutex.c:753
[<ffffffff836f58fe>] genl_lock net/netlink/genetlink.c:32 [inline]
[<ffffffff836f58fe>] genl_family_rcv_msg+0xdae/0x1040
net/netlink/genetlink.c:547
[<ffffffff836f807a>] genl_rcv_msg+0x19a/0x330 net/netlink/genetlink.c:620
[<ffffffff836f36cb>] netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298
[<ffffffff836f4b38>] genl_rcv+0x28/0x40 net/netlink/genetlink.c:631
[<ffffffff836f1f14>] netlink_unicast_kernel
net/netlink/af_netlink.c:1231 [inline]
[<ffffffff836f1f14>] netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257
[<ffffffff836f2bcf>] netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803
[<ffffffff83572d3a>] sock_sendmsg_nosec net/socket.c:635 [inline]
[<ffffffff83572d3a>] sock_sendmsg+0xca/0x110 net/socket.c:645
[<ffffffff835730a6>] sock_write_iter+0x326/0x600 net/socket.c:848
[<ffffffff81a3c493>] new_sync_write fs/read_write.c:499 [inline]
[<ffffffff81a3c493>] __vfs_write+0x483/0x740 fs/read_write.c:512
[<ffffffff81a42227>] vfs_write+0x187/0x530 fs/read_write.c:560
[<ffffffff81a4675b>] SYSC_write fs/read_write.c:607 [inline]
[<ffffffff81a4675b>] SyS_write+0xfb/0x230 fs/read_write.c:599
[<ffffffff8440e7c1>] entry_SYSCALL_64_fastpath+0x1f/0xc2

other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(rtnl_mutex);
lock(genl_mutex);
lock(rtnl_mutex);
lock(genl_mutex);

*** DEADLOCK ***

2 locks held by syz-executor9/2705:
#0: (cb_lock){++++++}, at: [<ffffffff836f4b29>] genl_rcv+0x19/0x40
net/netlink/genetlink.c:630
#1: (rtnl_mutex){+.+.+.}, at: [<ffffffff836416e7>]
rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70

stack backtrace:
CPU: 1 PID: 2705 Comm: syz-executor9 Not tainted 4.10.0-rc3+ #4
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
print_circular_bug+0x307/0x3b0 kernel/locking/lockdep.c:1202
check_prev_add kernel/locking/lockdep.c:1828 [inline]
check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1938
validate_chain kernel/locking/lockdep.c:2265 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
__mutex_lock_common kernel/locking/mutex.c:639 [inline]
mutex_lock_nested+0x290/0x1730 kernel/locking/mutex.c:753
genl_lock net/netlink/genetlink.c:32 [inline]
genl_family_rcv_msg+0xdae/0x1040 net/netlink/genetlink.c:547
genl_rcv_msg+0x19a/0x330 net/netlink/genetlink.c:620
netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298
genl_rcv+0x28/0x40 net/netlink/genetlink.c:631
netlink_unicast_kernel net/netlink/af_netlink.c:1231 [inline]
netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257
netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803
sock_sendmsg_nosec net/socket.c:635 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:645
sock_write_iter+0x326/0x600 net/socket.c:848
new_sync_write fs/read_write.c:499 [inline]
__vfs_write+0x483/0x740 fs/read_write.c:512
vfs_write+0x187/0x530 fs/read_write.c:560
SYSC_write fs/read_write.c:607 [inline]
SyS_write+0xfb/0x230 fs/read_write.c:599
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x44f5e9
RSP: 002b:00007fdba138cb58 EFLAGS: 00000212 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000020000fdc RCX: 000000000044f5e9
RDX: 0000000000000024 RSI: 0000000020000fdc RDI: 0000000000000006
RBP: 0000000000000006 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000212 R12: 0000000000700000
R13: 0000000000000002 R14: 0000000000000010 R15: 0000000000000000

Cong Wang

unread,
Feb 6, 2017, 1:32:45 AM2/6/17
to Dmitry Vyukov, syzkaller, Eric Dumazet, David Miller, Matti Vaittinen, Tycho Andersen, Florian Westphal, stephen hemminger, Tom Herbert, netdev, LKML, Richard Guy Briggs, netdev...@vger.kernel.org
On Sun, Jan 29, 2017 at 2:11 AM, Dmitry Vyukov <dvy...@google.com> wrote:
> On Fri, Dec 9, 2016 at 6:08 AM, Cong Wang <xiyou.w...@gmail.com> wrote:
>>>> Chain exists of:
>>>> Possible unsafe locking scenario:
>>>>
>>>> CPU0 CPU1
>>>> ---- ----
>>>> lock(genl_mutex);
>>>> lock(nlk->cb_mutex);
>>>> lock(genl_mutex);
>>>> lock(rtnl_mutex);
>>>>
>>>> *** DEADLOCK ***
>>>
>>> This one looks legitimate, because nlk->cb_mutex could be rtnl_mutex.
>>> Let me think about it.
>>
>> Never mind. Actually both reports in this thread are legitimate.
>>
>> I know what happened now, the lock chain is so long, 4 locks are involved
>> to form a chain!!!
>>
>> Let me think about how to break the chain.
>
>
> Cong, any success with breaking the chain?

No luck yet. Each part of the chain seems legit, not sure which
one could be reordered. :-/
Reply all
Reply to author
Forward
0 new messages