[syzbot] [net?] possible deadlock in rtnl_lock (8)

56 views
Skip to first unread message

syzbot

unread,
Aug 18, 2024, 11:49:27 PMAug 18
to da...@davemloft.net, edum...@google.com, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: 1fb918967b56 Merge tag 'for-6.11-rc3-tag' of git://git.ker..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=129dd7d9980000
kernel config: https://syzkaller.appspot.com/x/.config?x=804764788c03071f
dashboard link: https://syzkaller.appspot.com/bug?extid=51cf7cc5f9ffc1006ef2
compiler: aarch64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: arm64

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/384ffdcca292/non_bootable_disk-1fb91896.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/7b8fac7b5b8b/vmlinux-1fb91896.xz
kernel image: https://storage.googleapis.com/syzbot-assets/676950a147e6/Image-1fb91896.gz.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+51cf7c...@syzkaller.appspotmail.com

======================================================
WARNING: possible circular locking dependency detected
6.11.0-rc3-syzkaller-00066-g1fb918967b56 #0 Not tainted
------------------------------------------------------
syz.0.5481/17612 is trying to acquire lock:
ffff8000880033a8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x1c/0x28 net/core/rtnetlink.c:79

but task is already holding lock:
ffff000010332b50 (&smc->clcsock_release_lock){+.+.}-{3:3}, at: smc_setsockopt+0xd8/0xcec net/smc/af_smc.c:3064

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&smc->clcsock_release_lock){+.+.}-{3:3}:
__mutex_lock_common kernel/locking/mutex.c:608 [inline]
__mutex_lock+0x134/0x840 kernel/locking/mutex.c:752
mutex_lock_nested+0x24/0x30 kernel/locking/mutex.c:804
smc_switch_to_fallback+0x34/0x80c net/smc/af_smc.c:902
smc_sendmsg+0xe4/0x8f8 net/smc/af_smc.c:2779
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0xc8/0x168 net/socket.c:745
__sys_sendto+0x1a8/0x254 net/socket.c:2204
__do_sys_sendto net/socket.c:2216 [inline]
__se_sys_sendto net/socket.c:2212 [inline]
__arm64_sys_sendto+0xc0/0x134 net/socket.c:2212
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x6c/0x258 arch/arm64/kernel/syscall.c:49
el0_svc_common.constprop.0+0xac/0x230 arch/arm64/kernel/syscall.c:132
do_el0_svc+0x40/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x50/0x180 arch/arm64/kernel/entry-common.c:712
el0t_64_sync_handler+0x100/0x12c arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598

-> #1 (sk_lock-AF_INET){+.+.}-{0:0}:
lock_sock_nested+0x38/0xe8 net/core/sock.c:3543
lock_sock include/net/sock.h:1607 [inline]
sockopt_lock_sock net/core/sock.c:1061 [inline]
sockopt_lock_sock+0x58/0x74 net/core/sock.c:1052
do_ip_setsockopt+0xe0/0x2358 net/ipv4/ip_sockglue.c:1078
ip_setsockopt+0x34/0x9c net/ipv4/ip_sockglue.c:1417
raw_setsockopt+0x7c/0x2e0 net/ipv4/raw.c:845
sock_common_setsockopt+0x70/0xe0 net/core/sock.c:3735
do_sock_setsockopt+0x17c/0x354 net/socket.c:2324
__sys_setsockopt+0xdc/0x178 net/socket.c:2347
__do_sys_setsockopt net/socket.c:2356 [inline]
__se_sys_setsockopt net/socket.c:2353 [inline]
__arm64_sys_setsockopt+0xa4/0x100 net/socket.c:2353
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x6c/0x258 arch/arm64/kernel/syscall.c:49
el0_svc_common.constprop.0+0xac/0x230 arch/arm64/kernel/syscall.c:132
do_el0_svc+0x40/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x50/0x180 arch/arm64/kernel/entry-common.c:712
el0t_64_sync_handler+0x100/0x12c arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598

-> #0 (rtnl_mutex){+.+.}-{3:3}:
check_prev_add kernel/locking/lockdep.c:3133 [inline]
check_prevs_add kernel/locking/lockdep.c:3252 [inline]
validate_chain kernel/locking/lockdep.c:3868 [inline]
__lock_acquire+0x2aa4/0x6340 kernel/locking/lockdep.c:5142
lock_acquire kernel/locking/lockdep.c:5759 [inline]
lock_acquire+0x48c/0x7a4 kernel/locking/lockdep.c:5724
__mutex_lock_common kernel/locking/mutex.c:608 [inline]
__mutex_lock+0x134/0x840 kernel/locking/mutex.c:752
mutex_lock_nested+0x24/0x30 kernel/locking/mutex.c:804
rtnl_lock+0x1c/0x28 net/core/rtnetlink.c:79
do_ipv6_setsockopt+0x1a04/0x3814 net/ipv6/ipv6_sockglue.c:566
ipv6_setsockopt+0xc8/0x140 net/ipv6/ipv6_sockglue.c:993
tcp_setsockopt+0x90/0xcc net/ipv4/tcp.c:3768
sock_common_setsockopt+0x70/0xe0 net/core/sock.c:3735
smc_setsockopt+0x150/0xcec net/smc/af_smc.c:3072
do_sock_setsockopt+0x17c/0x354 net/socket.c:2324
__sys_setsockopt+0xdc/0x178 net/socket.c:2347
__do_sys_setsockopt net/socket.c:2356 [inline]
__se_sys_setsockopt net/socket.c:2353 [inline]
__arm64_sys_setsockopt+0xa4/0x100 net/socket.c:2353
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x6c/0x258 arch/arm64/kernel/syscall.c:49
el0_svc_common.constprop.0+0xac/0x230 arch/arm64/kernel/syscall.c:132
do_el0_svc+0x40/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x50/0x180 arch/arm64/kernel/entry-common.c:712
el0t_64_sync_handler+0x100/0x12c arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598

other info that might help us debug this:

Chain exists of:
rtnl_mutex --> sk_lock-AF_INET --> &smc->clcsock_release_lock

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&smc->clcsock_release_lock);
lock(sk_lock-AF_INET);
lock(&smc->clcsock_release_lock);
lock(rtnl_mutex);

*** DEADLOCK ***

1 lock held by syz.0.5481/17612:
#0: ffff000010332b50 (&smc->clcsock_release_lock){+.+.}-{3:3}, at: smc_setsockopt+0xd8/0xcec net/smc/af_smc.c:3064

stack backtrace:
CPU: 1 UID: 0 PID: 17612 Comm: syz.0.5481 Not tainted 6.11.0-rc3-syzkaller-00066-g1fb918967b56 #0
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x9c/0x11c arch/arm64/kernel/stacktrace.c:317
show_stack+0x18/0x24 arch/arm64/kernel/stacktrace.c:324
__dump_stack lib/dump_stack.c:93 [inline]
dump_stack_lvl+0xa4/0xf4 lib/dump_stack.c:119
dump_stack+0x1c/0x28 lib/dump_stack.c:128
print_circular_bug+0x420/0x6f8 kernel/locking/lockdep.c:2059
check_noncircular+0x2dc/0x364 kernel/locking/lockdep.c:2186
check_prev_add kernel/locking/lockdep.c:3133 [inline]
check_prevs_add kernel/locking/lockdep.c:3252 [inline]
validate_chain kernel/locking/lockdep.c:3868 [inline]
__lock_acquire+0x2aa4/0x6340 kernel/locking/lockdep.c:5142
lock_acquire kernel/locking/lockdep.c:5759 [inline]
lock_acquire+0x48c/0x7a4 kernel/locking/lockdep.c:5724
__mutex_lock_common kernel/locking/mutex.c:608 [inline]
__mutex_lock+0x134/0x840 kernel/locking/mutex.c:752
mutex_lock_nested+0x24/0x30 kernel/locking/mutex.c:804
rtnl_lock+0x1c/0x28 net/core/rtnetlink.c:79
do_ipv6_setsockopt+0x1a04/0x3814 net/ipv6/ipv6_sockglue.c:566
ipv6_setsockopt+0xc8/0x140 net/ipv6/ipv6_sockglue.c:993
tcp_setsockopt+0x90/0xcc net/ipv4/tcp.c:3768
sock_common_setsockopt+0x70/0xe0 net/core/sock.c:3735
smc_setsockopt+0x150/0xcec net/smc/af_smc.c:3072
do_sock_setsockopt+0x17c/0x354 net/socket.c:2324
__sys_setsockopt+0xdc/0x178 net/socket.c:2347
__do_sys_setsockopt net/socket.c:2356 [inline]
__se_sys_setsockopt net/socket.c:2353 [inline]
__arm64_sys_setsockopt+0xa4/0x100 net/socket.c:2353
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x6c/0x258 arch/arm64/kernel/syscall.c:49
el0_svc_common.constprop.0+0xac/0x230 arch/arm64/kernel/syscall.c:132
do_el0_svc+0x40/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x50/0x180 arch/arm64/kernel/entry-common.c:712
el0t_64_sync_handler+0x100/0x12c arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

syzbot

unread,
Sep 8, 2024, 4:12:27 AMSep 8
to da...@davemloft.net, edum...@google.com, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
syzbot has found a reproducer for the following issue on:

HEAD commit: df54f4a16f82 Merge branch 'for-next/core' into for-kernelci
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
console output: https://syzkaller.appspot.com/x/log.txt?x=12bdabc7980000
kernel config: https://syzkaller.appspot.com/x/.config?x=dde5a5ba8d41ee9e
dashboard link: https://syzkaller.appspot.com/bug?extid=51cf7cc5f9ffc1006ef2
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: arm64
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1798589f980000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=10a30e00580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/aa2eb06e0aea/disk-df54f4a1.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/14728733d385/vmlinux-df54f4a1.xz
kernel image: https://storage.googleapis.com/syzbot-assets/99816271407d/Image-df54f4a1.gz.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+51cf7c...@syzkaller.appspotmail.com

======================================================
WARNING: possible circular locking dependency detected
6.11.0-rc5-syzkaller-gdf54f4a16f82 #0 Not tainted
------------------------------------------------------
syz-executor272/6388 is trying to acquire lock:
ffff8000923b6ce8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x20/0x2c net/core/rtnetlink.c:79

but task is already holding lock:
ffff0000dc408a50 (&smc->clcsock_release_lock){+.+.}-{3:3}, at: smc_setsockopt+0x178/0x10fc net/smc/af_smc.c:3064

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&smc->clcsock_release_lock){+.+.}-{3:3}:
__mutex_lock_common+0x190/0x21a0 kernel/locking/mutex.c:608
__mutex_lock kernel/locking/mutex.c:752 [inline]
mutex_lock_nested+0x2c/0x38 kernel/locking/mutex.c:804
smc_switch_to_fallback+0x48/0xa80 net/smc/af_smc.c:902
smc_sendmsg+0xfc/0x9f8 net/smc/af_smc.c:2779
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg net/socket.c:745 [inline]
__sys_sendto+0x374/0x4f4 net/socket.c:2204
__do_sys_sendto net/socket.c:2216 [inline]
__se_sys_sendto net/socket.c:2212 [inline]
__arm64_sys_sendto+0xd8/0xf8 net/socket.c:2212
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598

-> #1 (sk_lock-AF_INET){+.+.}-{0:0}:
lock_sock_nested net/core/sock.c:3543 [inline]
lock_sock include/net/sock.h:1607 [inline]
sockopt_lock_sock+0x88/0x148 net/core/sock.c:1061
do_ip_setsockopt+0x1438/0x346c net/ipv4/ip_sockglue.c:1078
ip_setsockopt+0x80/0x128 net/ipv4/ip_sockglue.c:1417
raw_setsockopt+0x100/0x294 net/ipv4/raw.c:845
sock_common_setsockopt+0xb0/0xcc net/core/sock.c:3735
do_sock_setsockopt+0x2a0/0x4e0 net/socket.c:2324
__sys_setsockopt+0x128/0x1a8 net/socket.c:2347
__do_sys_setsockopt net/socket.c:2356 [inline]
__se_sys_setsockopt net/socket.c:2353 [inline]
__arm64_sys_setsockopt+0xb8/0xd4 net/socket.c:2353
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598

-> #0 (rtnl_mutex){+.+.}-{3:3}:
check_prev_add kernel/locking/lockdep.c:3133 [inline]
check_prevs_add kernel/locking/lockdep.c:3252 [inline]
validate_chain kernel/locking/lockdep.c:3868 [inline]
__lock_acquire+0x33d8/0x779c kernel/locking/lockdep.c:5142
lock_acquire+0x240/0x728 kernel/locking/lockdep.c:5759
__mutex_lock_common+0x190/0x21a0 kernel/locking/mutex.c:608
__mutex_lock kernel/locking/mutex.c:752 [inline]
mutex_lock_nested+0x2c/0x38 kernel/locking/mutex.c:804
rtnl_lock+0x20/0x2c net/core/rtnetlink.c:79
do_ip_setsockopt+0xe8c/0x346c net/ipv4/ip_sockglue.c:1077
ip_setsockopt+0x80/0x128 net/ipv4/ip_sockglue.c:1417
tcp_setsockopt+0xcc/0xe8 net/ipv4/tcp.c:3768
sock_common_setsockopt+0xb0/0xcc net/core/sock.c:3735
smc_setsockopt+0x204/0x10fc net/smc/af_smc.c:3072
do_sock_setsockopt+0x2a0/0x4e0 net/socket.c:2324
__sys_setsockopt+0x128/0x1a8 net/socket.c:2347
__do_sys_setsockopt net/socket.c:2356 [inline]
__se_sys_setsockopt net/socket.c:2353 [inline]
__arm64_sys_setsockopt+0xb8/0xd4 net/socket.c:2353
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598

other info that might help us debug this:

Chain exists of:
rtnl_mutex --> sk_lock-AF_INET --> &smc->clcsock_release_lock

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&smc->clcsock_release_lock);
lock(sk_lock-AF_INET);
lock(&smc->clcsock_release_lock);
lock(rtnl_mutex);

*** DEADLOCK ***

1 lock held by syz-executor272/6388:
#0: ffff0000dc408a50 (&smc->clcsock_release_lock){+.+.}-{3:3}, at: smc_setsockopt+0x178/0x10fc net/smc/af_smc.c:3064

stack backtrace:
CPU: 1 UID: 0 PID: 6388 Comm: syz-executor272 Not tainted 6.11.0-rc5-syzkaller-gdf54f4a16f82 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
Call trace:
dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:317
show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:324
__dump_stack lib/dump_stack.c:93 [inline]
dump_stack_lvl+0xe4/0x150 lib/dump_stack.c:119
dump_stack+0x1c/0x28 lib/dump_stack.c:128
print_circular_bug+0x150/0x1b8 kernel/locking/lockdep.c:2059
check_noncircular+0x310/0x404 kernel/locking/lockdep.c:2186
check_prev_add kernel/locking/lockdep.c:3133 [inline]
check_prevs_add kernel/locking/lockdep.c:3252 [inline]
validate_chain kernel/locking/lockdep.c:3868 [inline]
__lock_acquire+0x33d8/0x779c kernel/locking/lockdep.c:5142
lock_acquire+0x240/0x728 kernel/locking/lockdep.c:5759
__mutex_lock_common+0x190/0x21a0 kernel/locking/mutex.c:608
__mutex_lock kernel/locking/mutex.c:752 [inline]
mutex_lock_nested+0x2c/0x38 kernel/locking/mutex.c:804
rtnl_lock+0x20/0x2c net/core/rtnetlink.c:79
do_ip_setsockopt+0xe8c/0x346c net/ipv4/ip_sockglue.c:1077
ip_setsockopt+0x80/0x128 net/ipv4/ip_sockglue.c:1417
tcp_setsockopt+0xcc/0xe8 net/ipv4/tcp.c:3768
sock_common_setsockopt+0xb0/0xcc net/core/sock.c:3735
smc_setsockopt+0x204/0x10fc net/smc/af_smc.c:3072
do_sock_setsockopt+0x2a0/0x4e0 net/socket.c:2324
__sys_setsockopt+0x128/0x1a8 net/socket.c:2347
__do_sys_setsockopt net/socket.c:2356 [inline]
__se_sys_setsockopt net/socket.c:2353 [inline]
__arm64_sys_setsockopt+0xb8/0xd4 net/socket.c:2353
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598


---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

Eric Dumazet

unread,
Sep 9, 2024, 4:03:14 AMSep 9
to syzbot, D. Wythe, Wenjia Zhang, Dust Li, da...@davemloft.net, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
Please SMC folks, can you take a look ?

Wenjia Zhang

unread,
Sep 9, 2024, 7:52:33 AMSep 9
to Eric Dumazet, syzbot, D. Wythe, Dust Li, da...@davemloft.net, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
Hi Eric,

Thank you for the reminder! We'll look into it ASAP!

Thanks,
Wenjia

Eric Dumazet

unread,
Sep 10, 2024, 2:37:11 AMSep 10
to D. Wythe, Wenjia Zhang, syzbot, Dust Li, da...@davemloft.net, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
On Tue, Sep 10, 2024 at 7:55 AM D. Wythe <ali...@linux.alibaba.com> wrote:
> I have noticed this issue for a while, but I question the possibility of
> it. If I understand correctly, a deadlock issue following is reported here:
>
> #2
> lock_sock_smc
> {
> clcsock_release_lock --- deadlock
> {
>
> }
> }
>
> #1
> rtnl_mutex
> {
> lock_sock_smc
> {
>
> }
> }
>
> #0
> clcsock_release_lock
> {
> rtnl_mutex --deadlock
> {
>
> }
> }
>
> This is of course a deadlock, but #1 is suspicious.
>
> How would this happen to a smc sock?
>
> #1 ->
> lock_sock_nested+0x38/0xe8 net/core/sock.c:3543
> lock_sock include/net/sock.h:1607 [inline]
> sockopt_lock_sock net/core/sock.c:1061 [inline]
> sockopt_lock_sock+0x58/0x74 net/core/sock.c:1052
> do_ip_setsockopt+0xe0/0x2358 net/ipv4/ip_sockglue.c:1078
> ip_setsockopt+0x34/0x9c net/ipv4/ip_sockglue.c:1417
> raw_setsockopt+0x7c/0x2e0 net/ipv4/raw.c:845
> sock_common_setsockopt+0x70/0xe0 net/core/sock.c:3735
> do_sock_setsockopt+0x17c/0x354 net/socket.c:2324
>
> As a comparison, the correct calling chain should be:
>
> sock_common_setsockopt+0x70/0xe0 net/core/sock.c:3735
> smc_setsockopt+0x150/0xcec net/smc/af_smc.c:3072
> do_sock_setsockopt+0x17c/0x354 net/socket.c:2324
>
>
> That's to say, any setting on SOL_IP options of smc_sock will
> go with smc_setsockopt, which will try lock clcsock_release_lock at first.
>
> Anyway, if anyone can explain #1, then we can see how to solve this problem,
> otherwise I think this problem doesn't exist. (Just my opinion)

Then SMC lacks some lockdep annotations.

Please take a look at sock_lock_init_class_and_name() callers.

D. Wythe

unread,
Sep 10, 2024, 4:06:26 AMSep 10
to Wenjia Zhang, Eric Dumazet, syzbot, Dust Li, da...@davemloft.net, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com


On 9/9/24 7:44 PM, Wenjia Zhang wrote:
>
>
I have noticed this issue for a while, but I question the possibility of
it. If I understand correctly, a deadlock issue following is reported here:

#2
lock_sock_smc
{
    clcsock_release_lock            --- deadlock
    {

    }
}

#1
rtnl_mutex
{
    lock_sock_smc
    {

    }
}

#0
clcsock_release_lock
{
    rtnl_mutex                      --deadlock
    {

    }
}

This is of course a deadlock, but #1 is suspicious.

How would this happen to a smc sock?

#1 ->
       lock_sock_nested+0x38/0xe8 net/core/sock.c:3543
       lock_sock include/net/sock.h:1607 [inline]
       sockopt_lock_sock net/core/sock.c:1061 [inline]
       sockopt_lock_sock+0x58/0x74 net/core/sock.c:1052
       do_ip_setsockopt+0xe0/0x2358 net/ipv4/ip_sockglue.c:1078
       ip_setsockopt+0x34/0x9c net/ipv4/ip_sockglue.c:1417
       raw_setsockopt+0x7c/0x2e0 net/ipv4/raw.c:845
       sock_common_setsockopt+0x70/0xe0 net/core/sock.c:3735
       do_sock_setsockopt+0x17c/0x354 net/socket.c:2324

As a comparison, the correct calling chain should be:

       sock_common_setsockopt+0x70/0xe0 net/core/sock.c:3735
       smc_setsockopt+0x150/0xcec net/smc/af_smc.c:3072
       do_sock_setsockopt+0x17c/0x354 net/socket.c:2324


That's to say,  any setting on SOL_IP options of smc_sock will
go with smc_setsockopt, which will try lock clcsock_release_lock at first.

Anyway, if anyone can explain #1, then we can see how to solve this problem,
otherwise I think this problem doesn't exist. (Just my opinion)

Best wishes,
D. Wythe




D. Wythe

unread,
Sep 10, 2024, 4:06:26 AMSep 10
to Eric Dumazet, Wenjia Zhang, syzbot, Dust Li, da...@davemloft.net, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
It seems so, which also explains why it wasn't reported with AF_SMC sock.
I'll try to fix it ASAP.

D. Wythe

syzbot

unread,
Sep 11, 2024, 4:52:02 AMSep 11
to linux-...@vger.kernel.org, syzkall...@googlegroups.com
For archival purposes, forwarding an incoming command email to
linux-...@vger.kernel.org, syzkall...@googlegroups.com.

***

Subject: Re: [syzbot] [net?] possible deadlock in rtnl_lock (8)
Author: ali...@linux.alibaba.com
#syz test

diff --git a/net/smc/smc_inet.c b/net/smc/smc_inet.c
index bece346..281f0450 100644
--- a/net/smc/smc_inet.c
+++ b/net/smc/smc_inet.c
@@ -102,14 +102,29 @@
 };
 #endif /* CONFIG_IPV6 */

+static struct lock_class_key smc_clcsk_slock_keys[2];
+static struct lock_class_key smc_clcsk_keys[2];
+
 static int smc_inet_init_sock(struct sock *sk)
 {
+   bool is_ipv6 = sk->sk_family == AF_INET6;
    struct net *net = sock_net(sk);
+   int rc;

    /* init common smc sock */
    smc_sk_init(net, sk, IPPROTO_SMC);
    /* create clcsock */
-   return smc_create_clcsk(net, sk, sk->sk_family);
+   rc = smc_create_clcsk(net, sk, sk->sk_family);
+   if (rc)
+       return rc;
+
+   sock_lock_init_class_and_name(smc_sk(sk)->clcsk,
+                     is_ipv6 ? "slock-AF_INET6-SMC-CLCSK" :
"slock-AF_INET-SMC-CLCSK",
+                     &smc_clcsk_slock_keys[is_ipv6],
+                     is_ipv6 ? "sk_lock-AF_INET6-SMC-CLCSK" :
"sk_lock-AF_INET-SMC-CLCSK",
+                     &smc_clcsk_keys[is_ipv6]);
+
+   return 0;
 }

 int __init smc_inet_init(void)
--
1.8.3.1


syzbot

unread,
Sep 11, 2024, 5:34:05 AMSep 11
to ali...@linux.alibaba.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot tried to test the proposed patch but the build/boot failed:

failed to apply patch:
checking file net/smc/smc_inet.c
patch: **** unexpected end of file in patch



Tested on:

commit: 7e3e2c7f Merge branch 'for-next/core' into for-kernelci
kernel config: https://syzkaller.appspot.com/x/.config?x=dde5a5ba8d41ee9e
userspace arch: arm64
patch: https://syzkaller.appspot.com/x/patch.diff?x=148a1807980000

syzbot

unread,
Sep 11, 2024, 5:42:26 AMSep 11
to linux-...@vger.kernel.org, syzkall...@googlegroups.com
For archival purposes, forwarding an incoming command email to
linux-...@vger.kernel.org, syzkall...@googlegroups.com.

***

Subject: Re: [syzbot] [net?] possible deadlock in rtnl_lock (8)
Author: ali...@linux.alibaba.com



On 8/19/24 11:49 AM, syzbot wrote:
#syz test

Make Lockdep happy with IPPROTO_SMC

---
 net/smc/smc_inet.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

syzbot

unread,
Sep 11, 2024, 5:44:04 AMSep 11
to ali...@linux.alibaba.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot tried to test the proposed patch but the build/boot failed:

failed to apply patch:
checking file net/smc/smc_inet.c
patch: **** unexpected end of file in patch



Tested on:

commit: 7e3e2c7f Merge branch 'for-next/core' into for-kernelci
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
kernel config: https://syzkaller.appspot.com/x/.config?x=dde5a5ba8d41ee9e
userspace arch: arm64
patch: https://syzkaller.appspot.com/x/patch.diff?x=11a9a477980000

syzbot

unread,
Sep 11, 2024, 5:48:01 AMSep 11
to linux-...@vger.kernel.org, syzkall...@googlegroups.com
For archival purposes, forwarding an incoming command email to
linux-...@vger.kernel.org, syzkall...@googlegroups.com.

***

Subject: Re: [syzbot] [net?] possible deadlock in rtnl_lock (8)
Author: ali...@linux.alibaba.com

syzbot

unread,
Sep 11, 2024, 6:00:04 AMSep 11
to ali...@linux.alibaba.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com

syzbot

unread,
Sep 11, 2024, 6:15:20 AMSep 11
to linux-...@vger.kernel.org, syzkall...@googlegroups.com

syzbot

unread,
Sep 11, 2024, 6:22:31 AMSep 11
to linux-...@vger.kernel.org, syzkall...@googlegroups.com

syzbot

unread,
Sep 11, 2024, 6:24:05 AMSep 11
to ali...@linux.alibaba.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com

syzbot

unread,
Sep 11, 2024, 6:34:04 AMSep 11
to ali...@linux.alibaba.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot tried to test the proposed patch but the build/boot failed:

net/smc/smc_inet.c:127:44: error: no member named 'clcsk' in 'struct smc_sock'


Tested on:

commit: 7e3e2c7f Merge branch 'for-next/core' into for-kernelci
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
kernel config: https://syzkaller.appspot.com/x/.config?x=dde5a5ba8d41ee9e
dashboard link: https://syzkaller.appspot.com/bug?extid=51cf7cc5f9ffc1006ef2
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: arm64
patch: https://syzkaller.appspot.com/x/patch.diff?x=11cda477980000

syzbot

unread,
Sep 11, 2024, 7:33:39 AMSep 11
to linux-...@vger.kernel.org, syzkall...@googlegroups.com

syzbot

unread,
Sep 11, 2024, 8:07:05 AMSep 11
to ali...@linux.alibaba.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
possible deadlock in rtnl_lock

======================================================
WARNING: possible circular locking dependency detected
6.11.0-rc7-syzkaller-g7e3e2c7f05cd-dirty #0 Not tainted
------------------------------------------------------
syz.0.15/7317 is trying to acquire lock:
ffff8000923b7ea8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x20/0x2c net/core/rtnetlink.c:79

but task is already holding lock:
ffff0000d4798a58 (&smc->clcsock_release_lock){+.+.}-{3:3}, at: smc_setsockopt+0x178/0x10fc net/smc/af_smc.c:3064

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&smc->clcsock_release_lock){+.+.}-{3:3}:
__mutex_lock_common+0x190/0x21a0 kernel/locking/mutex.c:608
__mutex_lock kernel/locking/mutex.c:752 [inline]
mutex_lock_nested+0x2c/0x38 kernel/locking/mutex.c:804
smc_switch_to_fallback+0x48/0xa80 net/smc/af_smc.c:902
smc_sendmsg+0xfc/0x9f8 net/smc/af_smc.c:2779
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg net/socket.c:745 [inline]
__sys_sendto+0x374/0x4f4 net/socket.c:2204
__do_sys_sendto net/socket.c:2216 [inline]
__se_sys_sendto net/socket.c:2212 [inline]
__arm64_sys_sendto+0xd8/0xf8 net/socket.c:2212
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598

-> #1 (sk_lock-AF_INET){+.+.}-{0:0}:
lock_sock_nested net/core/sock.c:3543 [inline]
lock_sock include/net/sock.h:1607 [inline]
sockopt_lock_sock+0x88/0x148 net/core/sock.c:1061
do_ip_setsockopt+0x1438/0x346c net/ipv4/ip_sockglue.c:1078
ip_setsockopt+0x80/0x128 net/ipv4/ip_sockglue.c:1417
raw_setsockopt+0x100/0x294 net/ipv4/raw.c:845
sock_common_setsockopt+0xb0/0xcc net/core/sock.c:3735
do_sock_setsockopt+0x2a0/0x4e0 net/socket.c:2324
__sys_setsockopt+0x128/0x1a8 net/socket.c:2347
__do_sys_setsockopt net/socket.c:2356 [inline]
__se_sys_setsockopt net/socket.c:2353 [inline]
__arm64_sys_setsockopt+0xb8/0xd4 net/socket.c:2353
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598

-> #0 (rtnl_mutex){+.+.}-{3:3}:
check_prev_add kernel/locking/lockdep.c:3133 [inline]
check_prevs_add kernel/locking/lockdep.c:3252 [inline]
validate_chain kernel/locking/lockdep.c:3868 [inline]
__lock_acquire+0x33d8/0x779c kernel/locking/lockdep.c:5142
lock_acquire+0x240/0x728 kernel/locking/lockdep.c:5759
__mutex_lock_common+0x190/0x21a0 kernel/locking/mutex.c:608
__mutex_lock kernel/locking/mutex.c:752 [inline]
mutex_lock_nested+0x2c/0x38 kernel/locking/mutex.c:804
rtnl_lock+0x20/0x2c net/core/rtnetlink.c:79
do_ip_setsockopt+0xe8c/0x346c net/ipv4/ip_sockglue.c:1077
ip_setsockopt+0x80/0x128 net/ipv4/ip_sockglue.c:1417
tcp_setsockopt+0xcc/0xe8 net/ipv4/tcp.c:3768
sock_common_setsockopt+0xb0/0xcc net/core/sock.c:3735
smc_setsockopt+0x204/0x10fc net/smc/af_smc.c:3072
do_sock_setsockopt+0x2a0/0x4e0 net/socket.c:2324
__sys_setsockopt+0x128/0x1a8 net/socket.c:2347
__do_sys_setsockopt net/socket.c:2356 [inline]
__se_sys_setsockopt net/socket.c:2353 [inline]
__arm64_sys_setsockopt+0xb8/0xd4 net/socket.c:2353
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598

other info that might help us debug this:

Chain exists of:
rtnl_mutex --> sk_lock-AF_INET --> &smc->clcsock_release_lock

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&smc->clcsock_release_lock);
lock(sk_lock-AF_INET);
lock(&smc->clcsock_release_lock);
lock(rtnl_mutex);

*** DEADLOCK ***

1 lock held by syz.0.15/7317:
#0: ffff0000d4798a58 (&smc->clcsock_release_lock){+.+.}-{3:3}, at: smc_setsockopt+0x178/0x10fc net/smc/af_smc.c:3064

stack backtrace:
CPU: 1 UID: 0 PID: 7317 Comm: syz.0.15 Not tainted 6.11.0-rc7-syzkaller-g7e3e2c7f05cd-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
Call trace:
dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:319
show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:326
__dump_stack lib/dump_stack.c:93 [inline]
dump_stack_lvl+0xe4/0x150 lib/dump_stack.c:119
dump_stack+0x1c/0x28 lib/dump_stack.c:128
print_circular_bug+0x150/0x1b8 kernel/locking/lockdep.c:2059
check_noncircular+0x310/0x404 kernel/locking/lockdep.c:2186
check_prev_add kernel/locking/lockdep.c:3133 [inline]
check_prevs_add kernel/locking/lockdep.c:3252 [inline]
validate_chain kernel/locking/lockdep.c:3868 [inline]
__lock_acquire+0x33d8/0x779c kernel/locking/lockdep.c:5142
lock_acquire+0x240/0x728 kernel/locking/lockdep.c:5759
__mutex_lock_common+0x190/0x21a0 kernel/locking/mutex.c:608
__mutex_lock kernel/locking/mutex.c:752 [inline]
mutex_lock_nested+0x2c/0x38 kernel/locking/mutex.c:804
rtnl_lock+0x20/0x2c net/core/rtnetlink.c:79
do_ip_setsockopt+0xe8c/0x346c net/ipv4/ip_sockglue.c:1077
ip_setsockopt+0x80/0x128 net/ipv4/ip_sockglue.c:1417
tcp_setsockopt+0xcc/0xe8 net/ipv4/tcp.c:3768
sock_common_setsockopt+0xb0/0xcc net/core/sock.c:3735
smc_setsockopt+0x204/0x10fc net/smc/af_smc.c:3072
do_sock_setsockopt+0x2a0/0x4e0 net/socket.c:2324
__sys_setsockopt+0x128/0x1a8 net/socket.c:2347
__do_sys_setsockopt net/socket.c:2356 [inline]
__se_sys_setsockopt net/socket.c:2353 [inline]
__arm64_sys_setsockopt+0xb8/0xd4 net/socket.c:2353
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598


Tested on:

commit: 7e3e2c7f Merge branch 'for-next/core' into for-kernelci
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
console output: https://syzkaller.appspot.com/x/log.txt?x=13b56100580000
kernel config: https://syzkaller.appspot.com/x/.config?x=921accd5d8340211
dashboard link: https://syzkaller.appspot.com/bug?extid=51cf7cc5f9ffc1006ef2
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: arm64
patch: https://syzkaller.appspot.com/x/patch.diff?x=16856100580000

syzbot

unread,
Sep 11, 2024, 8:27:57 AMSep 11
to linux-...@vger.kernel.org, syzkall...@googlegroups.com

syzbot

unread,
Sep 11, 2024, 9:04:04 AMSep 11
to ali...@linux.alibaba.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-by: syzbot+51cf7c...@syzkaller.appspotmail.com
Tested-by: syzbot+51cf7c...@syzkaller.appspotmail.com

Tested on:

commit: 7e3e2c7f Merge branch 'for-next/core' into for-kernelci
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
console output: https://syzkaller.appspot.com/x/log.txt?x=16f9a49f980000
kernel config: https://syzkaller.appspot.com/x/.config?x=921accd5d8340211
dashboard link: https://syzkaller.appspot.com/bug?extid=51cf7cc5f9ffc1006ef2
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: arm64
patch: https://syzkaller.appspot.com/x/patch.diff?x=142d6100580000

Note: testing is done by a robot and is best-effort only.
Reply all
Reply to author
Forward
0 new messages