crypto: deadlock between crypto_alg_sem/rtnl_mutex/genl_mutex

65 views
Skip to first unread message

Dmitry Vyukov

unread,
Mar 5, 2017, 10:09:00 AM3/5/17
to Herbert Xu, David Miller, linux-...@vger.kernel.org, LKML, Eric Dumazet, Cong Wang, netdev, syzkaller
Hello,

I am getting the following deadlock reports while running syzkaller
fuzzer on net-next/8d70eeb84ab277377c017af6a21d0a337025dede:

======================================================
[ INFO: possible circular locking dependency detected ]
4.10.0+ #5 Not tainted
-------------------------------------------------------
syz-executor6/6143 is trying to acquire lock:
(nlk->cb_mutex){+.+.+.}, at: [<ffffffff837df634>]
__netlink_dump_start+0xf4/0x760 net/netlink/af_netlink.c:2187

but task is already holding lock:
(crypto_alg_sem){+++++.}, at: [<ffffffff821cd1f6>]
crypto_user_rcv_msg+0x136/0x4f0 crypto/crypto_user.c:507

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #4 (crypto_alg_sem){+++++.}:
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
down_read+0x9b/0x150 kernel/locking/rwsem.c:23
crypto_alg_lookup+0x23/0x50 crypto/api.c:199
crypto_larval_lookup.part.10+0x9a/0x3b0 crypto/api.c:217
crypto_larval_lookup crypto/api.c:211 [inline]
crypto_alg_mod_lookup+0x77/0x1b0 crypto/api.c:270
crypto_alloc_base+0x50/0x1e0 crypto/api.c:416
crypto_alloc_cipher include/linux/crypto.h:1407 [inline]
tcp_fastopen_reset_cipher+0xc2/0x2e0 net/ipv4/tcp_fastopen.c:48
tcp_fastopen_init_key_once+0x114/0x120 net/ipv4/tcp_fastopen.c:29
do_tcp_setsockopt.isra.36+0x140a/0x20a0 net/ipv4/tcp.c:2684
tcp_setsockopt+0xb0/0xd0 net/ipv4/tcp.c:2733
sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2731
SYSC_setsockopt net/socket.c:1786 [inline]
SyS_setsockopt+0x25c/0x390 net/socket.c:1765
entry_SYSCALL_64_fastpath+0x1f/0xc2

-> #3 (sk_lock-AF_INET){+.+.+.}:
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
lock_sock_nested+0xcb/0x120 net/core/sock.c:2536
lock_sock include/net/sock.h:1460 [inline]
rds_tcp_listen_stop+0x57/0x140 net/rds/tcp_listen.c:284
rds_tcp_kill_sock net/rds/tcp.c:529 [inline]
rds_tcp_dev_event+0x383/0xc50 net/rds/tcp.c:568
notifier_call_chain+0x1b5/0x2b0 kernel/notifier.c:93
__raw_notifier_call_chain kernel/notifier.c:394 [inline]
raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1646
call_netdevice_notifiers net/core/dev.c:1662 [inline]
netdev_run_todo+0x3b2/0xa30 net/core/dev.c:7530
rtnl_unlock+0xe/0x10 net/core/rtnetlink.c:104
default_device_exit_batch+0x504/0x620 net/core/dev.c:8334
ops_exit_list.isra.6+0x100/0x150 net/core/net_namespace.c:144
cleanup_net+0x551/0xa90 net/core/net_namespace.c:463
process_one_work+0xbd0/0x1c10 kernel/workqueue.c:2096
worker_thread+0x223/0x1990 kernel/workqueue.c:2230
kthread+0x326/0x3f0 kernel/kthread.c:229
ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430

-> #2 (rtnl_mutex){+.+.+.}:
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
__mutex_lock_common kernel/locking/mutex.c:756 [inline]
__mutex_lock+0x172/0x1730 kernel/locking/mutex.c:893
mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70
tipc_nl_bearer_dump+0x3ef/0x720 net/tipc/bearer.c:774
genl_lock_dumpit+0x68/0x90 net/netlink/genetlink.c:479
netlink_dump+0x54d/0xd40 net/netlink/af_netlink.c:2127
__netlink_dump_start+0x4e5/0x760 net/netlink/af_netlink.c:2217
genl_family_rcv_msg+0xd9d/0x1040 net/netlink/genetlink.c:546
genl_rcv_msg+0xa6/0x140 net/netlink/genetlink.c:620
netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298
genl_rcv+0x28/0x40 net/netlink/genetlink.c:631
netlink_unicast_kernel net/netlink/af_netlink.c:1231 [inline]
netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257
netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
sock_write_iter+0x326/0x600 net/socket.c:846
call_write_iter include/linux/fs.h:1733 [inline]
new_sync_write fs/read_write.c:497 [inline]
__vfs_write+0x483/0x740 fs/read_write.c:510
vfs_write+0x187/0x530 fs/read_write.c:558
SYSC_write fs/read_write.c:605 [inline]
SyS_write+0xfb/0x230 fs/read_write.c:597
entry_SYSCALL_64_fastpath+0x1f/0xc2

-> #1 (genl_mutex){+.+.+.}:
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
__mutex_lock_common kernel/locking/mutex.c:756 [inline]
__mutex_lock+0x172/0x1730 kernel/locking/mutex.c:893
mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
genl_lock net/netlink/genetlink.c:32 [inline]
genl_lock_dumpit+0x41/0x90 net/netlink/genetlink.c:478
netlink_dump+0x54d/0xd40 net/netlink/af_netlink.c:2127
__netlink_dump_start+0x4e5/0x760 net/netlink/af_netlink.c:2217
genl_family_rcv_msg+0xd9d/0x1040 net/netlink/genetlink.c:546
genl_rcv_msg+0xa6/0x140 net/netlink/genetlink.c:620
netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298
genl_rcv+0x28/0x40 net/netlink/genetlink.c:631
netlink_unicast_kernel net/netlink/af_netlink.c:1231 [inline]
netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257
netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
sock_write_iter+0x326/0x600 net/socket.c:846
call_write_iter include/linux/fs.h:1733 [inline]
new_sync_write fs/read_write.c:497 [inline]
__vfs_write+0x483/0x740 fs/read_write.c:510
vfs_write+0x187/0x530 fs/read_write.c:558
SYSC_write fs/read_write.c:605 [inline]
SyS_write+0xfb/0x230 fs/read_write.c:597
entry_SYSCALL_64_fastpath+0x1f/0xc2

-> #0 (nlk->cb_mutex){+.+.+.}:
check_prev_add kernel/locking/lockdep.c:1830 [inline]
check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
__mutex_lock_common kernel/locking/mutex.c:756 [inline]
__mutex_lock+0x172/0x1730 kernel/locking/mutex.c:893
mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
__netlink_dump_start+0xf4/0x760 net/netlink/af_netlink.c:2187
netlink_dump_start include/linux/netlink.h:165 [inline]
crypto_user_rcv_msg+0x2ad/0x4f0 crypto/crypto_user.c:517
netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298
crypto_netlink_rcv+0x2a/0x40 crypto/crypto_user.c:538
netlink_unicast_kernel net/netlink/af_netlink.c:1231 [inline]
netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257
netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
___sys_sendmsg+0x8fa/0x9f0 net/socket.c:1985
__sys_sendmsg+0x138/0x300 net/socket.c:2019
SYSC_sendmsg net/socket.c:2030 [inline]
SyS_sendmsg+0x2d/0x50 net/socket.c:2026
entry_SYSCALL_64_fastpath+0x1f/0xc2

other info that might help us debug this:

Chain exists of:
nlk->cb_mutex --> sk_lock-AF_INET --> crypto_alg_sem

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(crypto_alg_sem);
lock(sk_lock-AF_INET);
lock(crypto_alg_sem);
lock(nlk->cb_mutex);

*** DEADLOCK ***

2 locks held by syz-executor6/6143:
#0: (crypto_cfg_mutex){+.+...}, at: [<ffffffff821cad9b>]
crypto_netlink_rcv+0x1b/0x40 crypto/crypto_user.c:537
#1: (crypto_alg_sem){+++++.}, at: [<ffffffff821cd1f6>]
crypto_user_rcv_msg+0x136/0x4f0 crypto/crypto_user.c:507

stack backtrace:
CPU: 0 PID: 6143 Comm: syz-executor6 Not tainted 4.10.0+ #5
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:52
print_circular_bug+0x307/0x3b0 kernel/locking/lockdep.c:1204
check_prev_add kernel/locking/lockdep.c:1830 [inline]
check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
__mutex_lock_common kernel/locking/mutex.c:756 [inline]
__mutex_lock+0x172/0x1730 kernel/locking/mutex.c:893
mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
__netlink_dump_start+0xf4/0x760 net/netlink/af_netlink.c:2187
netlink_dump_start include/linux/netlink.h:165 [inline]
crypto_user_rcv_msg+0x2ad/0x4f0 crypto/crypto_user.c:517
netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298
crypto_netlink_rcv+0x2a/0x40 crypto/crypto_user.c:538
netlink_unicast_kernel net/netlink/af_netlink.c:1231 [inline]
netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257
netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
___sys_sendmsg+0x8fa/0x9f0 net/socket.c:1985
__sys_sendmsg+0x138/0x300 net/socket.c:2019
SYSC_sendmsg net/socket.c:2030 [inline]
SyS_sendmsg+0x2d/0x50 net/socket.c:2026
entry_SYSCALL_64_fastpath+0x1f/0xc2

Dmitry Vyukov

unread,
Mar 5, 2017, 12:36:33 PM3/5/17
to Herbert Xu, David Miller, linux-...@vger.kernel.org, LKML, Eric Dumazet, Cong Wang, netdev, syzkaller
Another one involving tcp_md5sig_mutex:


======================================================
[ INFO: possible circular locking dependency detected ]
SELinux: unrecognized netlink message: protocol=9 nlmsg_type=2
sclass=netlink_audit_socket pig=4033 comm=syz-executor4
4.10.0+ #5 Not tainted
-------------------------------------------------------
syz-executor8/4018 is trying to acquire lock:
(crypto_alg_sem){++++++}, at: [<ffffffff82193473>]
crypto_alg_lookup+0x23/0x50 crypto/api.c:199

but task is already holding lock:
(tcp_md5sig_mutex){+.+...}, at: [<ffffffff838e4efa>]
tcp_alloc_md5sig_pool+0x4a/0x470 net/ipv4/tcp.c:3196

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #5 (tcp_md5sig_mutex){+.+...}:
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
__mutex_lock_common kernel/locking/mutex.c:756 [inline]
__mutex_lock+0x172/0x1730 kernel/locking/mutex.c:893
mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
tcp_alloc_md5sig_pool+0x4a/0x470 net/ipv4/tcp.c:3196
tcp_md5_do_add+0x1d8/0x5d0 net/ipv4/tcp_ipv4.c:969
tcp_v4_parse_md5_keys+0x1c7/0x2b0 net/ipv4/tcp_ipv4.c:1037
do_tcp_setsockopt.isra.36+0x657/0x20a0 net/ipv4/tcp.c:2668
tcp_setsockopt+0xb0/0xd0 net/ipv4/tcp.c:2733
sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2731
SYSC_setsockopt net/socket.c:1786 [inline]
SyS_setsockopt+0x25c/0x390 net/socket.c:1765
do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281
return_from_SYSCALL_64+0x0/0x7a

-> #4 (sk_lock-AF_INET){+.+.+.}:
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
lock_sock_nested+0xcb/0x120 net/core/sock.c:2536
lock_sock include/net/sock.h:1460 [inline]
do_ip_setsockopt.isra.12+0x301/0x3760 net/ipv4/ip_sockglue.c:653
ip_setsockopt+0x3a/0xb0 net/ipv4/ip_sockglue.c:1265
tcp_setsockopt+0x82/0xd0 net/ipv4/tcp.c:2731
sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2731
SYSC_setsockopt net/socket.c:1786 [inline]
SyS_setsockopt+0x25c/0x390 net/socket.c:1765
entry_SYSCALL_64_fastpath+0x1f/0xc2

-> #3 (rtnl_mutex){+.+.+.}:
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
__mutex_lock_common kernel/locking/mutex.c:756 [inline]
__mutex_lock+0x172/0x1730 kernel/locking/mutex.c:893
mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70
nl80211_prepare_wdev_dump.isra.37+0x2c/0x5d0 net/wireless/nl80211.c:548
nl80211_dump_station+0x178/0xd80 net/wireless/nl80211.c:4455
genl_lock_dumpit+0x68/0x90 net/netlink/genetlink.c:479
netlink_dump+0x54d/0xd40 net/netlink/af_netlink.c:2127
__netlink_dump_start+0x4e5/0x760 net/netlink/af_netlink.c:2217
genl_family_rcv_msg+0xd9d/0x1040 net/netlink/genetlink.c:546
genl_rcv_msg+0xa6/0x140 net/netlink/genetlink.c:620
netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298
genl_rcv+0x28/0x40 net/netlink/genetlink.c:631
netlink_unicast_kernel net/netlink/af_netlink.c:1231 [inline]
netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257
netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
sock_write_iter+0x326/0x600 net/socket.c:846
call_write_iter include/linux/fs.h:1733 [inline]
new_sync_write fs/read_write.c:497 [inline]
__vfs_write+0x483/0x740 fs/read_write.c:510
vfs_write+0x187/0x530 fs/read_write.c:558
SYSC_write fs/read_write.c:605 [inline]
SyS_write+0xfb/0x230 fs/read_write.c:597
entry_SYSCALL_64_fastpath+0x1f/0xc2

-> #2 (genl_mutex){+.+.+.}:
-> #1 (nlk->cb_mutex){+.+.+.}:
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
__mutex_lock_common kernel/locking/mutex.c:756 [inline]
__mutex_lock+0x172/0x1730 kernel/locking/mutex.c:893
mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
__netlink_dump_start+0xf4/0x760 net/netlink/af_netlink.c:2187
netlink_dump_start include/linux/netlink.h:165 [inline]
crypto_user_rcv_msg+0x2ad/0x4f0 crypto/crypto_user.c:517
netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298
crypto_netlink_rcv+0x2a/0x40 crypto/crypto_user.c:538
netlink_unicast_kernel net/netlink/af_netlink.c:1231 [inline]
netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257
netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
___sys_sendmsg+0x8fa/0x9f0 net/socket.c:1985
__sys_sendmsg+0x138/0x300 net/socket.c:2019
SYSC_sendmsg net/socket.c:2030 [inline]
SyS_sendmsg+0x2d/0x50 net/socket.c:2026
entry_SYSCALL_64_fastpath+0x1f/0xc2

-> #0 (crypto_alg_sem){++++++}:
check_prev_add kernel/locking/lockdep.c:1830 [inline]
check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
down_read+0x9b/0x150 kernel/locking/rwsem.c:23
crypto_alg_lookup+0x23/0x50 crypto/api.c:199
crypto_larval_lookup.part.10+0x9a/0x3b0 crypto/api.c:217
crypto_larval_lookup crypto/api.c:211 [inline]
crypto_alg_mod_lookup+0x77/0x1b0 crypto/api.c:270
crypto_find_alg crypto/api.c:500 [inline]
crypto_alloc_tfm+0x101/0x2e0 crypto/api.c:533
crypto_alloc_ahash+0x2c/0x40 crypto/ahash.c:525
__tcp_alloc_md5sig_pool net/ipv4/tcp.c:3158 [inline]
tcp_alloc_md5sig_pool+0x85/0x470 net/ipv4/tcp.c:3199
tcp_md5_do_add+0x1d8/0x5d0 net/ipv4/tcp_ipv4.c:969
tcp_v4_parse_md5_keys+0x1c7/0x2b0 net/ipv4/tcp_ipv4.c:1037
do_tcp_setsockopt.isra.36+0x657/0x20a0 net/ipv4/tcp.c:2668
tcp_setsockopt+0xb0/0xd0 net/ipv4/tcp.c:2733
sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2731
SYSC_setsockopt net/socket.c:1786 [inline]
SyS_setsockopt+0x25c/0x390 net/socket.c:1765
do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281
return_from_SYSCALL_64+0x0/0x7a

other info that might help us debug this:

Chain exists of:
crypto_alg_sem --> sk_lock-AF_INET --> tcp_md5sig_mutex

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(tcp_md5sig_mutex);
lock(sk_lock-AF_INET);
lock(tcp_md5sig_mutex);
lock(crypto_alg_sem);

*** DEADLOCK ***

2 locks held by syz-executor8/4018:
#0: (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff838e7a26>] lock_sock
include/net/sock.h:1460 [inline]
#0: (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff838e7a26>]
do_tcp_setsockopt.isra.36+0x216/0x20a0 net/ipv4/tcp.c:2466
#1: (tcp_md5sig_mutex){+.+...}, at: [<ffffffff838e4efa>]
tcp_alloc_md5sig_pool+0x4a/0x470 net/ipv4/tcp.c:3196

stack backtrace:
CPU: 0 PID: 4018 Comm: syz-executor8 Not tainted 4.10.0+ #5
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:52
print_circular_bug+0x307/0x3b0 kernel/locking/lockdep.c:1204
check_prev_add kernel/locking/lockdep.c:1830 [inline]
check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
down_read+0x9b/0x150 kernel/locking/rwsem.c:23
crypto_alg_lookup+0x23/0x50 crypto/api.c:199
crypto_larval_lookup.part.10+0x9a/0x3b0 crypto/api.c:217
crypto_larval_lookup crypto/api.c:211 [inline]
crypto_alg_mod_lookup+0x77/0x1b0 crypto/api.c:270
crypto_find_alg crypto/api.c:500 [inline]
crypto_alloc_tfm+0x101/0x2e0 crypto/api.c:533
crypto_alloc_ahash+0x2c/0x40 crypto/ahash.c:525
__tcp_alloc_md5sig_pool net/ipv4/tcp.c:3158 [inline]
tcp_alloc_md5sig_pool+0x85/0x470 net/ipv4/tcp.c:3199
tcp_md5_do_add+0x1d8/0x5d0 net/ipv4/tcp_ipv4.c:969
tcp_v4_parse_md5_keys+0x1c7/0x2b0 net/ipv4/tcp_ipv4.c:1037
do_tcp_setsockopt.isra.36+0x657/0x20a0 net/ipv4/tcp.c:2668
tcp_setsockopt+0xb0/0xd0 net/ipv4/tcp.c:2733
sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2731
SYSC_setsockopt net/socket.c:1786 [inline]
SyS_setsockopt+0x25c/0x390 net/socket.c:1765
do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281

Dmitry Vyukov

unread,
Mar 6, 2017, 4:37:05 AM3/6/17
to Herbert Xu, David Miller, linux-...@vger.kernel.org, LKML, Eric Dumazet, Cong Wang, netdev, syzkaller
Another one:

======================================================
[ INFO: possible circular locking dependency detected ]
4.10.0+ #6 Not tainted
-------------------------------------------------------
syz-executor8/3613 is trying to acquire lock:
(sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83b72eb9>] lock_sock
include/net/sock.h:1460 [inline]
(sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83b72eb9>]
do_ipv6_setsockopt.isra.11+0x229/0x36e0 net/ipv6/ipv6_sockglue.c:167

but task is already holding lock:
(rtnl_mutex){+.+.+.}, at: [<ffffffff8370a197>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:70

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #4 (rtnl_mutex){+.+.+.}:
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
__mutex_lock_common kernel/locking/mutex.c:756 [inline]
__mutex_lock+0x172/0x1730 kernel/locking/mutex.c:893
mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70
tipc_nl_node_dump_monitor+0x260/0x510 net/tipc/node.c:2128
genl_lock_dumpit+0x68/0x90 net/netlink/genetlink.c:479
netlink_dump+0x54d/0xd40 net/netlink/af_netlink.c:2127
__netlink_dump_start+0x4e5/0x760 net/netlink/af_netlink.c:2217
genl_family_rcv_msg+0xd9d/0x1040 net/netlink/genetlink.c:546
genl_rcv_msg+0xa6/0x140 net/netlink/genetlink.c:620
netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298
genl_rcv+0x28/0x40 net/netlink/genetlink.c:631
netlink_unicast_kernel net/netlink/af_netlink.c:1231 [inline]
netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257
netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
sock_write_iter+0x326/0x600 net/socket.c:846
call_write_iter include/linux/fs.h:1733 [inline]
new_sync_write fs/read_write.c:497 [inline]
__vfs_write+0x483/0x740 fs/read_write.c:510
vfs_write+0x187/0x530 fs/read_write.c:558
SYSC_write fs/read_write.c:605 [inline]
SyS_write+0xfb/0x230 fs/read_write.c:597
entry_SYSCALL_64_fastpath+0x1f/0xc2

-> #3 (genl_mutex){+.+.+.}:
-> #2 (nlk->cb_mutex){+.+.+.}:
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
__mutex_lock_common kernel/locking/mutex.c:756 [inline]
__mutex_lock+0x172/0x1730 kernel/locking/mutex.c:893
mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
__netlink_dump_start+0xf4/0x760 net/netlink/af_netlink.c:2187
netlink_dump_start include/linux/netlink.h:165 [inline]
crypto_user_rcv_msg+0x2ad/0x4f0 crypto/crypto_user.c:517
netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298
crypto_netlink_rcv+0x2a/0x40 crypto/crypto_user.c:538
netlink_unicast_kernel net/netlink/af_netlink.c:1231 [inline]
netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257
netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
___sys_sendmsg+0x8fa/0x9f0 net/socket.c:1985
__sys_sendmsg+0x138/0x300 net/socket.c:2019
SYSC_sendmsg net/socket.c:2030 [inline]
SyS_sendmsg+0x2d/0x50 net/socket.c:2026
entry_SYSCALL_64_fastpath+0x1f/0xc2

-> #1 (crypto_alg_sem){++++++}:
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
down_read+0x9b/0x150 kernel/locking/rwsem.c:23
crypto_alg_lookup+0x23/0x50 crypto/api.c:199
crypto_larval_lookup.part.10+0x9a/0x3b0 crypto/api.c:217
crypto_larval_lookup crypto/api.c:211 [inline]
crypto_alg_mod_lookup+0x77/0x1b0 crypto/api.c:270
crypto_find_alg crypto/api.c:500 [inline]
crypto_alloc_tfm+0x101/0x2e0 crypto/api.c:533
crypto_alloc_shash+0x2c/0x40 crypto/shash.c:433
sctp_listen_start net/sctp/socket.c:6969 [inline]
sctp_inet_listen+0x5b7/0x7e0 net/sctp/socket.c:7054
SYSC_listen net/socket.c:1440 [inline]
SyS_listen+0x2c9/0x390 net/socket.c:1426
entry_SYSCALL_64_fastpath+0x1f/0xc2

-> #0 (sk_lock-AF_INET6){+.+.+.}:
check_prev_add kernel/locking/lockdep.c:1830 [inline]
check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
lock_sock_nested+0xcb/0x120 net/core/sock.c:2536
lock_sock include/net/sock.h:1460 [inline]
do_ipv6_setsockopt.isra.11+0x229/0x36e0 net/ipv6/ipv6_sockglue.c:167
ipv6_setsockopt+0x9b/0x140 net/ipv6/ipv6_sockglue.c:919
tcp_setsockopt+0x82/0xd0 net/ipv4/tcp.c:2731
sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2731
SYSC_setsockopt net/socket.c:1786 [inline]
SyS_setsockopt+0x25c/0x390 net/socket.c:1765
entry_SYSCALL_64_fastpath+0x1f/0xc2

other info that might help us debug this:

Chain exists of:
sk_lock-AF_INET6 --> genl_mutex --> rtnl_mutex

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(rtnl_mutex);
lock(genl_mutex);
lock(rtnl_mutex);
lock(sk_lock-AF_INET6);

*** DEADLOCK ***

1 lock held by syz-executor8/3613:
#0: (rtnl_mutex){+.+.+.}, at: [<ffffffff8370a197>]
rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70

stack backtrace:
CPU: 1 PID: 3613 Comm: syz-executor8 Not tainted 4.10.0+ #6
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:52
print_circular_bug+0x307/0x3b0 kernel/locking/lockdep.c:1204
check_prev_add kernel/locking/lockdep.c:1830 [inline]
check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
lock_sock_nested+0xcb/0x120 net/core/sock.c:2536
lock_sock include/net/sock.h:1460 [inline]
do_ipv6_setsockopt.isra.11+0x229/0x36e0 net/ipv6/ipv6_sockglue.c:167
ipv6_setsockopt+0x9b/0x140 net/ipv6/ipv6_sockglue.c:919

Dmitry Vyukov

unread,
Mar 14, 2017, 4:15:02 AM3/14/17
to Herbert Xu, David Miller, linux-...@vger.kernel.org, LKML, Eric Dumazet, Cong Wang, netdev, santosh....@oracle.com, rds-...@oss.oracle.com, syzkaller
Another one now involving rds_tcp_listen_stop (on net-next
3e3eec09311a48c64104cafa193984cc807ab9e0):

[ INFO: possible circular locking dependency detected ]
4.10.0+ #26 Not tainted
-------------------------------------------------------
kworker/u4:1/19 is trying to acquire lock:
(sk_lock-AF_INET){+.+.+.}, at: [<ffffffff8409a6ec>] lock_sock
include/net/sock.h:1460 [inline]
(sk_lock-AF_INET){+.+.+.}, at: [<ffffffff8409a6ec>]
rds_tcp_listen_stop+0x5c/0x150 net/rds/tcp_listen.c:288

but task is already holding lock:
(rtnl_mutex){+.+.+.}, at: [<ffffffff8370b057>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:70

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #4 (rtnl_mutex){+.+.+.}:
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
__mutex_lock_common kernel/locking/mutex.c:756 [inline]
__mutex_lock+0x172/0x1730 kernel/locking/mutex.c:893
mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70
nl80211_prepare_vendor_dump net/wireless/nl80211.c:11511 [inline]
nl80211_vendor_cmd_dump+0xda/0x1ab0 net/wireless/nl80211.c:11616
genl_lock_dumpit+0x68/0x90 net/netlink/genetlink.c:479
netlink_dump+0x54d/0xd40 net/netlink/af_netlink.c:2127
__netlink_dump_start+0x4e5/0x760 net/netlink/af_netlink.c:2217
genl_family_rcv_msg+0xd9d/0x1040 net/netlink/genetlink.c:546
genl_rcv_msg+0xa6/0x140 net/netlink/genetlink.c:620
netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298
genl_rcv+0x28/0x40 net/netlink/genetlink.c:631
netlink_unicast_kernel net/netlink/af_netlink.c:1231 [inline]
netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257
netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
___sys_sendmsg+0x8fa/0x9f0 net/socket.c:1985
__sys_sendmsg+0x138/0x300 net/socket.c:2019
SYSC_sendmsg net/socket.c:2030 [inline]
SyS_sendmsg+0x2d/0x50 net/socket.c:2026
entry_SYSCALL_64_fastpath+0x1f/0xc2

-> #3 (genl_mutex){+.+.+.}:
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
__mutex_lock_common kernel/locking/mutex.c:756 [inline]
__mutex_lock+0x172/0x1730 kernel/locking/mutex.c:893
mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
genl_lock net/netlink/genetlink.c:32 [inline]
genl_lock_dumpit+0x41/0x90 net/netlink/genetlink.c:478
netlink_dump+0x54d/0xd40 net/netlink/af_netlink.c:2127
__netlink_dump_start+0x4e5/0x760 net/netlink/af_netlink.c:2217
genl_family_rcv_msg+0xd9d/0x1040 net/netlink/genetlink.c:546
genl_rcv_msg+0xa6/0x140 net/netlink/genetlink.c:620
netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298
genl_rcv+0x28/0x40 net/netlink/genetlink.c:631
netlink_unicast_kernel net/netlink/af_netlink.c:1231 [inline]
netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257
netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
___sys_sendmsg+0x8fa/0x9f0 net/socket.c:1985
__sys_sendmsg+0x138/0x300 net/socket.c:2019
SYSC_sendmsg net/socket.c:2030 [inline]
SyS_sendmsg+0x2d/0x50 net/socket.c:2026
entry_SYSCALL_64_fastpath+0x1f/0xc2

sctp_listen_start net/sctp/socket.c:7050 [inline]
sctp_inet_listen+0x5b7/0x7e0 net/sctp/socket.c:7135
SYSC_listen net/socket.c:1440 [inline]
SyS_listen+0x2c9/0x390 net/socket.c:1426
entry_SYSCALL_64_fastpath+0x1f/0xc2

-> #0 (sk_lock-AF_INET){+.+.+.}:
check_prev_add kernel/locking/lockdep.c:1830 [inline]
check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
lock_sock_nested+0xcb/0x120 net/core/sock.c:2596
lock_sock include/net/sock.h:1460 [inline]
rds_tcp_listen_stop+0x5c/0x150 net/rds/tcp_listen.c:288
rds_tcp_kill_sock net/rds/tcp.c:532 [inline]
rds_tcp_dev_event+0x38e/0xc20 net/rds/tcp.c:573
notifier_call_chain+0x1b5/0x2b0 kernel/notifier.c:93
__raw_notifier_call_chain kernel/notifier.c:394 [inline]
raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1646
call_netdevice_notifiers net/core/dev.c:1662 [inline]
netdev_run_todo+0x3b2/0xa30 net/core/dev.c:7530
rtnl_unlock+0xe/0x10 net/core/rtnetlink.c:104
default_device_exit_batch+0x504/0x620 net/core/dev.c:8334
ops_exit_list.isra.6+0x100/0x150 net/core/net_namespace.c:144
cleanup_net+0x551/0xa90 net/core/net_namespace.c:463
process_one_work+0xbd0/0x1c10 kernel/workqueue.c:2096
worker_thread+0x223/0x1990 kernel/workqueue.c:2230
kthread+0x326/0x3f0 kernel/kthread.c:229
ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430

other info that might help us debug this:

Chain exists of:
sk_lock-AF_INET --> genl_mutex --> rtnl_mutex

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(rtnl_mutex);
lock(genl_mutex);
lock(rtnl_mutex);
lock(sk_lock-AF_INET);

*** DEADLOCK ***

4 locks held by kworker/u4:1/19:
#0: ("%s""netns"){.+.+.+}, at: [<ffffffff81497943>]
__write_once_size include/linux/compiler.h:283 [inline]
#0: ("%s""netns"){.+.+.+}, at: [<ffffffff81497943>] atomic64_set
arch/x86/include/asm/atomic64_64.h:33 [inline]
#0: ("%s""netns"){.+.+.+}, at: [<ffffffff81497943>] atomic_long_set
include/asm-generic/atomic-long.h:56 [inline]
#0: ("%s""netns"){.+.+.+}, at: [<ffffffff81497943>] set_work_data
kernel/workqueue.c:617 [inline]
#0: ("%s""netns"){.+.+.+}, at: [<ffffffff81497943>]
set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
#0: ("%s""netns"){.+.+.+}, at: [<ffffffff81497943>]
process_one_work+0xab3/0x1c10 kernel/workqueue.c:2089
#1: (net_cleanup_work){+.+.+.}, at: [<ffffffff81497997>]
process_one_work+0xb07/0x1c10 kernel/workqueue.c:2093
#2: (net_mutex){+.+.+.}, at: [<ffffffff836965cb>]
cleanup_net+0x22b/0xa90 net/core/net_namespace.c:429
#3: (rtnl_mutex){+.+.+.}, at: [<ffffffff8370b057>]
rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70

stack backtrace:
CPU: 0 PID: 19 Comm: kworker/u4:1 Not tainted 4.10.0+ #26
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Workqueue: netns cleanup_net
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:52
print_circular_bug+0x307/0x3b0 kernel/locking/lockdep.c:1204
check_prev_add kernel/locking/lockdep.c:1830 [inline]
check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
lock_sock_nested+0xcb/0x120 net/core/sock.c:2596
lock_sock include/net/sock.h:1460 [inline]
rds_tcp_listen_stop+0x5c/0x150 net/rds/tcp_listen.c:288
rds_tcp_kill_sock net/rds/tcp.c:532 [inline]
rds_tcp_dev_event+0x38e/0xc20 net/rds/tcp.c:573

Herbert Xu

unread,
Mar 14, 2017, 5:17:07 AM3/14/17
to Dmitry Vyukov, David Miller, linux-...@vger.kernel.org, LKML, Eric Dumazet, Cong Wang, netdev, syzkaller
This looks like a false positive. The cb_mutex in #1 is not the
same as the cb_mutex in #0. The cb_mutex in #0 comes is obtained
by crypto_user which uses straight netlink. The cb_mutex in #1
is a genl netlink socket.

I'll have a look to see if we can annotate this.

Cheers,
--
Email: Herbert Xu <her...@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Herbert Xu

unread,
Mar 14, 2017, 5:18:02 AM3/14/17
to Dmitry Vyukov, David Miller, linux-...@vger.kernel.org, LKML, Eric Dumazet, Cong Wang, netdev, syzkaller
Ditto. Please disregard any reports involving genl_mutex and
cb_mutex where the latter comes from crypto_user.

Thanks,

Dmitry Vyukov

unread,
Mar 14, 2017, 5:44:32 AM3/14/17
to Herbert Xu, David Miller, linux-...@vger.kernel.org, LKML, Eric Dumazet, Cong Wang, netdev, syzkaller
Yes, please.
Disregarding some reports is not a good way long term.

Herbert Xu

unread,
Mar 14, 2017, 6:26:14 AM3/14/17
to Dmitry Vyukov, David Miller, linux-...@vger.kernel.org, LKML, Eric Dumazet, Cong Wang, netdev, syzkaller
On Tue, Mar 14, 2017 at 10:44:10AM +0100, Dmitry Vyukov wrote:
>
> Yes, please.
> Disregarding some reports is not a good way long term.

Please try this patch.

---8<---
Subject: netlink: Annotate nlk cb_mutex by protocol

Currently all occurences of nlk->cb_mutex are annotated by lockdep
as a single class. This causes a false lcokdep cycle involving
genl and crypto_user.

This patch fixes it by dividing cb_mutex into individual classes
based on the netlink protocol. As genl and crypto_user do not
use the same netlink protocol this breaks the false dependency
loop.

Reported-by: Dmitry Vyukov <dvy...@google.com>
Signed-off-by: Herbert Xu <her...@gondor.apana.org.au>

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 7b73c7c..596eaff 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -96,6 +96,44 @@ static inline int netlink_is_kernel(struct sock *sk)

static DECLARE_WAIT_QUEUE_HEAD(nl_table_wait);

+static struct lock_class_key nlk_cb_mutex_keys[MAX_LINKS];
+
+static const char *const nlk_cb_mutex_key_strings[MAX_LINKS + 1] = {
+ "nlk_cb_mutex-ROUTE",
+ "nlk_cb_mutex-1",
+ "nlk_cb_mutex-USERSOCK",
+ "nlk_cb_mutex-FIREWALL",
+ "nlk_cb_mutex-SOCK_DIAG",
+ "nlk_cb_mutex-NFLOG",
+ "nlk_cb_mutex-XFRM",
+ "nlk_cb_mutex-SELINUX",
+ "nlk_cb_mutex-ISCSI",
+ "nlk_cb_mutex-AUDIT",
+ "nlk_cb_mutex-FIB_LOOKUP",
+ "nlk_cb_mutex-CONNECTOR",
+ "nlk_cb_mutex-NETFILTER",
+ "nlk_cb_mutex-IP6_FW",
+ "nlk_cb_mutex-DNRTMSG",
+ "nlk_cb_mutex-KOBJECT_UEVENT",
+ "nlk_cb_mutex-GENERIC",
+ "nlk_cb_mutex-17",
+ "nlk_cb_mutex-SCSITRANSPORT",
+ "nlk_cb_mutex-ECRYPTFS",
+ "nlk_cb_mutex-RDMA",
+ "nlk_cb_mutex-CRYPTO",
+ "nlk_cb_mutex-SMC",
+ "nlk_cb_mutex-23",
+ "nlk_cb_mutex-24",
+ "nlk_cb_mutex-25",
+ "nlk_cb_mutex-26",
+ "nlk_cb_mutex-27",
+ "nlk_cb_mutex-28",
+ "nlk_cb_mutex-29",
+ "nlk_cb_mutex-30",
+ "nlk_cb_mutex-31",
+ "nlk_cb_mutex-MAX_LINKS"
+};
+
static int netlink_dump(struct sock *sk);
static void netlink_skb_destructor(struct sk_buff *skb);

@@ -585,6 +623,9 @@ static int __netlink_create(struct net *net, struct socket *sock,
} else {
nlk->cb_mutex = &nlk->cb_def_mutex;
mutex_init(nlk->cb_mutex);
+ lockdep_set_class_and_name(nlk->cb_mutex,
+ nlk_cb_mutex_keys + protocol,
+ nlk_cb_mutex_key_strings[protocol]);
}
init_waitqueue_head(&nlk->wait);

Dmitry Vyukov

unread,
Mar 14, 2017, 6:31:26 AM3/14/17
to Herbert Xu, David Miller, linux-...@vger.kernel.org, LKML, Eric Dumazet, Cong Wang, netdev, syzkaller
On Tue, Mar 14, 2017 at 11:25 AM, Herbert Xu
<her...@gondor.apana.org.au> wrote:
> On Tue, Mar 14, 2017 at 10:44:10AM +0100, Dmitry Vyukov wrote:
>>
>> Yes, please.
>> Disregarding some reports is not a good way long term.
>
> Please try this patch.

Applied on bots. I should have a conclusion within a day.
Thanks!

Sowmini Varadhan

unread,
Mar 14, 2017, 11:25:41 AM3/14/17
to Dmitry Vyukov, Herbert Xu, David Miller, linux-...@vger.kernel.org, LKML, Eric Dumazet, Cong Wang, netdev, santosh....@oracle.com, rds-...@oss.oracle.com, syzkaller
On (03/14/17 09:14), Dmitry Vyukov wrote:
> Another one now involving rds_tcp_listen_stop
:
> kworker/u4:1/19 is trying to acquire lock:
> (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff8409a6ec>] lock_sock
> include/net/sock.h:1460 [inline]
> (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff8409a6ec>]
> rds_tcp_listen_stop+0x5c/0x150 net/rds/tcp_listen.c:288
>
> but task is already holding lock:
> (rtnl_mutex){+.+.+.}, at: [<ffffffff8370b057>] rtnl_lock+0x17/0x20
> net/core/rtnetlink.c:70

Is this also a false positive?

genl_lock_dumpit takes the genl_lock and then waits on the rtnl_lock
(e.g., out of tipc_nl_bearer_dump).

netdev_run_todo takes the rtnl_lock and then wants lock_sock()
for the TCP/IPv4 socket.

Why is lockdep seeing a circular dependancy here? Same pattern
seems to be happening for
http://www.spinics.net/lists/netdev/msg423368.html
and maybe also http://www.spinics.net/lists/netdev/msg423323.html?

--Sowmini

Dmitry Vyukov

unread,
Mar 15, 2017, 5:08:42 AM3/15/17
to Sowmini Varadhan, Herbert Xu, David Miller, linux-...@vger.kernel.org, LKML, Eric Dumazet, Cong Wang, netdev, santosh....@oracle.com, rds-...@oss.oracle.com, syzkaller
After I've applied the patch these reports stopped to happen, and I
have not seem any other reports that look relevant.
However, it there was one, but it looks like a different issue and it
was probably masked by massive amounts of original deadlock reports:


[ INFO: possible circular locking dependency detected ]
4.10.0+ #29 Not tainted
-------------------------------------------------------
syz-executor5/29222 is trying to acquire lock:
(genl_mutex){+.+.+.}, at: [<ffffffff837ea67e>] genl_lock
net/netlink/genetlink.c:32 [inline]
(genl_mutex){+.+.+.}, at: [<ffffffff837ea67e>]
genl_family_rcv_msg+0xdae/0x1040 net/netlink/genetlink.c:547

but task is already holding lock:
(rtnl_mutex){+.+.+.}, at: [<ffffffff8370a057>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:70

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (rtnl_mutex){+.+.+.}:
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
__mutex_lock_common kernel/locking/mutex.c:756 [inline]
__mutex_lock+0x172/0x1730 kernel/locking/mutex.c:893
mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70
nl80211_dump_wiphy+0x45/0x6d0 net/wireless/nl80211.c:1946
genl_lock_dumpit+0x68/0x90 net/netlink/genetlink.c:479
netlink_dump+0x54d/0xd40 net/netlink/af_netlink.c:2168
__netlink_dump_start+0x4e5/0x760 net/netlink/af_netlink.c:2258
genl_family_rcv_msg+0xd9d/0x1040 net/netlink/genetlink.c:546
genl_rcv_msg+0xa6/0x140 net/netlink/genetlink.c:620
netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2339
genl_rcv+0x28/0x40 net/netlink/genetlink.c:631
netlink_unicast_kernel net/netlink/af_netlink.c:1272 [inline]
netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1298
netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1844
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
___sys_sendmsg+0x8fa/0x9f0 net/socket.c:1985
__sys_sendmsg+0x138/0x300 net/socket.c:2019
SYSC_sendmsg net/socket.c:2030 [inline]
SyS_sendmsg+0x2d/0x50 net/socket.c:2026
do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281
return_from_SYSCALL_64+0x0/0x7a

-> #0 (genl_mutex){+.+.+.}:
check_prev_add kernel/locking/lockdep.c:1830 [inline]
check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
__mutex_lock_common kernel/locking/mutex.c:756 [inline]
__mutex_lock+0x172/0x1730 kernel/locking/mutex.c:893
mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
genl_lock net/netlink/genetlink.c:32 [inline]
genl_family_rcv_msg+0xdae/0x1040 net/netlink/genetlink.c:547
genl_rcv_msg+0xa6/0x140 net/netlink/genetlink.c:620
netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2339
genl_rcv+0x28/0x40 net/netlink/genetlink.c:631
netlink_unicast_kernel net/netlink/af_netlink.c:1272 [inline]
netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1298
netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1844
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
sock_write_iter+0x326/0x600 net/socket.c:846
call_write_iter include/linux/fs.h:1733 [inline]
new_sync_write fs/read_write.c:497 [inline]
__vfs_write+0x483/0x740 fs/read_write.c:510
vfs_write+0x187/0x530 fs/read_write.c:558
SYSC_write fs/read_write.c:605 [inline]
SyS_write+0xfb/0x230 fs/read_write.c:597
do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281
return_from_SYSCALL_64+0x0/0x7a

other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(rtnl_mutex);
lock(genl_mutex);
lock(rtnl_mutex);
lock(genl_mutex);

*** DEADLOCK ***

2 locks held by syz-executor5/29222:
#0: (cb_lock){++++++}, at: [<ffffffff837e98a9>] genl_rcv+0x19/0x40
net/netlink/genetlink.c:630
#1: (rtnl_mutex){+.+.+.}, at: [<ffffffff8370a057>]
rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70

stack backtrace:
CPU: 1 PID: 29222 Comm: syz-executor5 Not tainted 4.10.0+ #29
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:52
print_circular_bug+0x307/0x3b0 kernel/locking/lockdep.c:1204
check_prev_add kernel/locking/lockdep.c:1830 [inline]
check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940
validate_chain kernel/locking/lockdep.c:2267 [inline]
__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
__mutex_lock_common kernel/locking/mutex.c:756 [inline]
__mutex_lock+0x172/0x1730 kernel/locking/mutex.c:893
mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
genl_lock net/netlink/genetlink.c:32 [inline]
genl_family_rcv_msg+0xdae/0x1040 net/netlink/genetlink.c:547
genl_rcv_msg+0xa6/0x140 net/netlink/genetlink.c:620
netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2339
genl_rcv+0x28/0x40 net/netlink/genetlink.c:631
netlink_unicast_kernel net/netlink/af_netlink.c:1272 [inline]
netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1298
netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1844
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
sock_write_iter+0x326/0x600 net/socket.c:846
call_write_iter include/linux/fs.h:1733 [inline]
new_sync_write fs/read_write.c:497 [inline]
__vfs_write+0x483/0x740 fs/read_write.c:510
vfs_write+0x187/0x530 fs/read_write.c:558
SYSC_write fs/read_write.c:605 [inline]
SyS_write+0xfb/0x230 fs/read_write.c:597
do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281
entry_SYSCALL64_slow_path+0x25/0x25

Sowmini Varadhan

unread,
Mar 15, 2017, 7:30:32 AM3/15/17
to Dmitry Vyukov, Herbert Xu, David Miller, linux-...@vger.kernel.org, LKML, Eric Dumazet, Cong Wang, netdev, santosh....@oracle.com, rds-...@oss.oracle.com, syzkaller
On (03/15/17 10:08), Dmitry Vyukov wrote:
> After I've applied the patch these reports stopped to happen, and I
> have not seem any other reports that look relevant.
> However, it there was one, but it looks like a different issue and it
> was probably masked by massive amounts of original deadlock reports:

Yes, this looks like a valid deadlock.

I think there may be some ->dumpit callbacks that take the rtnl_lock
and do not unlock it before return, e.g., I see nl80211_dump_interface()
doing this at

2612 rtnl_lock();
2613 if (!cb->args[2]) {
:
2619 ret = nl80211_dump_wiphy_parse(skb, cb, &state);
2620 if (ret)
2621 return ret;

afaict, nl80211_dump_wiphy_parse does not itself do rtnl_unlock on error.


If that's the case then we'd run into the circular locking dependancy
flagged by lockdep.

Disclaimer: I did not check every single ->dumpit, there may be more
than one of these..




Reply all
Reply to author
Forward
0 new messages