[syzbot] possible deadlock in j1939_sk_queue_drop_all

17 views
Skip to first unread message

syzbot

unread,
Sep 8, 2021, 5:41:26 AM9/8/21
to da...@davemloft.net, ker...@pengutronix.de, ku...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, li...@rempel-privat.de, m...@pengutronix.de, net...@vger.kernel.org, ro...@protonic.nl, sock...@hartkopp.net, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: 29ce8f970107 Merge git://git.kernel.org/pub/scm/linux/kern..
git tree: net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1549d3f5300000
kernel config: https://syzkaller.appspot.com/x/.config?x=d2f9d4c9ff8c5ae7
dashboard link: https://syzkaller.appspot.com/bug?extid=3bd970a1887812621b4c
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+3bd970...@syzkaller.appspotmail.com

======================================================
WARNING: possible circular locking dependency detected
5.14.0-rc7-syzkaller #0 Not tainted
------------------------------------------------------
syz-executor.2/24182 is trying to acquire lock:
ffff88802d66f578 (&jsk->sk_session_queue_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:359 [inline]
ffff88802d66f578 (&jsk->sk_session_queue_lock){+.-.}-{2:2}, at: j1939_sk_queue_drop_all+0x40/0x2f0 net/can/j1939/socket.c:139

but task is already holding lock:
ffff88807b54d0d0 (&priv->j1939_socks_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:359 [inline]
ffff88807b54d0d0 (&priv->j1939_socks_lock){+.-.}-{2:2}, at: j1939_sk_netdev_event_netdown+0x28/0x160 net/can/j1939/socket.c:1266

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&priv->j1939_socks_lock){+.-.}-{2:2}:
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
_raw_spin_lock_bh+0x2f/0x40 kernel/locking/spinlock.c:175
spin_lock_bh include/linux/spinlock.h:359 [inline]
j1939_sk_errqueue+0x9f/0x1a0 net/can/j1939/socket.c:1078
__j1939_session_cancel+0x3b9/0x460 net/can/j1939/transport.c:1124
j1939_tp_rxtimer+0x2a8/0x36b net/can/j1939/transport.c:1250
__run_hrtimer kernel/time/hrtimer.c:1537 [inline]
__hrtimer_run_queues+0x609/0xe50 kernel/time/hrtimer.c:1601
hrtimer_run_softirq+0x17b/0x360 kernel/time/hrtimer.c:1618
__do_softirq+0x29b/0x9c2 kernel/softirq.c:558
invoke_softirq kernel/softirq.c:432 [inline]
__irq_exit_rcu+0x16e/0x1c0 kernel/softirq.c:636
irq_exit_rcu+0x5/0x20 kernel/softirq.c:648
sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1100
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
lock_acquire+0x1ef/0x510 kernel/locking/lockdep.c:5593
__might_fault mm/memory.c:5261 [inline]
__might_fault+0x106/0x180 mm/memory.c:5246
_copy_from_user+0x27/0x180 lib/usercopy.c:13
copy_from_user include/linux/uaccess.h:192 [inline]
__copy_msghdr_from_user+0x91/0x4b0 net/socket.c:2288
copy_msghdr_from_user net/socket.c:2339 [inline]
sendmsg_copy_msghdr+0xa1/0x160 net/socket.c:2437
___sys_sendmsg+0xc6/0x170 net/socket.c:2456
__sys_sendmmsg+0x195/0x470 net/socket.c:2546
__do_sys_sendmmsg net/socket.c:2575 [inline]
__se_sys_sendmmsg net/socket.c:2572 [inline]
__x64_sys_sendmmsg+0x99/0x100 net/socket.c:2572
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #1 (&priv->active_session_list_lock){+.-.}-{2:2}:
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
_raw_spin_lock_bh+0x2f/0x40 kernel/locking/spinlock.c:175
spin_lock_bh include/linux/spinlock.h:359 [inline]
j1939_session_list_lock net/can/j1939/transport.c:238 [inline]
j1939_session_activate+0x43/0x4b0 net/can/j1939/transport.c:1554
j1939_sk_queue_activate_next_locked net/can/j1939/socket.c:181 [inline]
j1939_sk_queue_activate_next+0x29b/0x460 net/can/j1939/socket.c:205
j1939_session_deactivate_activate_next+0x2e/0x35 net/can/j1939/transport.c:1101
j1939_xtp_rx_abort_one.cold+0x20b/0x33c net/can/j1939/transport.c:1341
j1939_xtp_rx_abort net/can/j1939/transport.c:1352 [inline]
j1939_tp_cmd_recv net/can/j1939/transport.c:2085 [inline]
j1939_tp_recv+0x8f4/0xb40 net/can/j1939/transport.c:2118
j1939_can_recv+0x6d7/0x930 net/can/j1939/main.c:101
deliver net/can/af_can.c:574 [inline]
can_rcv_filter+0x5d4/0x8d0 net/can/af_can.c:608
can_receive+0x31d/0x580 net/can/af_can.c:665
can_rcv+0x120/0x1c0 net/can/af_can.c:696
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5436
__netif_receive_skb+0x24/0x1b0 net/core/dev.c:5550
process_backlog+0x2a5/0x6c0 net/core/dev.c:6427
__napi_poll+0xaf/0x440 net/core/dev.c:6982
napi_poll net/core/dev.c:7049 [inline]
net_rx_action+0x801/0xb40 net/core/dev.c:7136
__do_softirq+0x29b/0x9c2 kernel/softirq.c:558
run_ksoftirqd kernel/softirq.c:920 [inline]
run_ksoftirqd+0x2d/0x60 kernel/softirq.c:912
smpboot_thread_fn+0x645/0x9c0 kernel/smpboot.c:164
kthread+0x3e5/0x4d0 kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

-> #0 (&jsk->sk_session_queue_lock){+.-.}-{2:2}:
check_prev_add kernel/locking/lockdep.c:3051 [inline]
check_prevs_add kernel/locking/lockdep.c:3174 [inline]
validate_chain kernel/locking/lockdep.c:3789 [inline]
__lock_acquire+0x2a07/0x54a0 kernel/locking/lockdep.c:5015
lock_acquire kernel/locking/lockdep.c:5625 [inline]
lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5590
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
_raw_spin_lock_bh+0x2f/0x40 kernel/locking/spinlock.c:175
spin_lock_bh include/linux/spinlock.h:359 [inline]
j1939_sk_queue_drop_all+0x40/0x2f0 net/can/j1939/socket.c:139
j1939_sk_netdev_event_netdown+0x7b/0x160 net/can/j1939/socket.c:1272
j1939_netdev_notify+0x199/0x1d0 net/can/j1939/main.c:362
notifier_call_chain+0xb5/0x200 kernel/notifier.c:83
call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1996
call_netdevice_notifiers_extack net/core/dev.c:2008 [inline]
call_netdevice_notifiers net/core/dev.c:2022 [inline]
dev_close_many+0x2ff/0x620 net/core/dev.c:1597
dev_close net/core/dev.c:1619 [inline]
dev_close net/core/dev.c:1613 [inline]
__dev_change_net_namespace+0xd4a/0x1360 net/core/dev.c:11164
do_setlink+0x275/0x3970 net/core/rtnetlink.c:2624
__rtnl_newlink+0xde6/0x1750 net/core/rtnetlink.c:3391
rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3506
rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5572
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:724
____sys_sendmsg+0x6e8/0x810 net/socket.c:2406
___sys_sendmsg+0xf3/0x170 net/socket.c:2460
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2489
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:

Chain exists of:
&jsk->sk_session_queue_lock --> &priv->active_session_list_lock --> &priv->j1939_socks_lock

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&priv->j1939_socks_lock);
lock(&priv->active_session_list_lock);
lock(&priv->j1939_socks_lock);
lock(&jsk->sk_session_queue_lock);

*** DEADLOCK ***

2 locks held by syz-executor.2/24182:
#0: ffffffff8d0cd7a8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:72 [inline]
#0: ffffffff8d0cd7a8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x3be/0xb80 net/core/rtnetlink.c:5569
#1: ffff88807b54d0d0 (&priv->j1939_socks_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:359 [inline]
#1: ffff88807b54d0d0 (&priv->j1939_socks_lock){+.-.}-{2:2}, at: j1939_sk_netdev_event_netdown+0x28/0x160 net/can/j1939/socket.c:1266

stack backtrace:
CPU: 1 PID: 24182 Comm: syz-executor.2 Not tainted 5.14.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2131
check_prev_add kernel/locking/lockdep.c:3051 [inline]
check_prevs_add kernel/locking/lockdep.c:3174 [inline]
validate_chain kernel/locking/lockdep.c:3789 [inline]
__lock_acquire+0x2a07/0x54a0 kernel/locking/lockdep.c:5015
lock_acquire kernel/locking/lockdep.c:5625 [inline]
lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5590
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
_raw_spin_lock_bh+0x2f/0x40 kernel/locking/spinlock.c:175
spin_lock_bh include/linux/spinlock.h:359 [inline]
j1939_sk_queue_drop_all+0x40/0x2f0 net/can/j1939/socket.c:139
j1939_sk_netdev_event_netdown+0x7b/0x160 net/can/j1939/socket.c:1272
j1939_netdev_notify+0x199/0x1d0 net/can/j1939/main.c:362
notifier_call_chain+0xb5/0x200 kernel/notifier.c:83
call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1996
call_netdevice_notifiers_extack net/core/dev.c:2008 [inline]
call_netdevice_notifiers net/core/dev.c:2022 [inline]
dev_close_many+0x2ff/0x620 net/core/dev.c:1597
dev_close net/core/dev.c:1619 [inline]
dev_close net/core/dev.c:1613 [inline]
__dev_change_net_namespace+0xd4a/0x1360 net/core/dev.c:11164
do_setlink+0x275/0x3970 net/core/rtnetlink.c:2624
__rtnl_newlink+0xde6/0x1750 net/core/rtnetlink.c:3391
rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3506
rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5572
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:724
____sys_sendmsg+0x6e8/0x810 net/socket.c:2406
___sys_sendmsg+0xf3/0x170 net/socket.c:2460
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2489
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4665f9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f7926a42188 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000000000056bf80 RCX: 00000000004665f9
RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003
RBP: 00000000004bfcc4 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056bf80
R13: 00007ffcf598798f R14: 00007f7926a42300 R15: 0000000000022000
device vcan0 entered promiscuous mode
IPv6: ADDRCONF(NETDEV_CHANGE): vcan0: link becomes ready


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

syzbot

unread,
Jun 3, 2022, 3:14:22 PM6/3/22
to da...@davemloft.net, edum...@google.com, ker...@pengutronix.de, ku...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, li...@rempel-privat.de, m...@pengutronix.de, net...@vger.kernel.org, pab...@redhat.com, ro...@protonic.nl, sock...@hartkopp.net, syzkall...@googlegroups.com
syzbot has found a reproducer for the following issue on:

HEAD commit: 50fd82b3a9a9 Merge tag 'docs-5.19-2' of git://git.lwn.net/..
git tree: upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=102da9cdf00000
kernel config: https://syzkaller.appspot.com/x/.config?x=fc5a30a131480a80
dashboard link: https://syzkaller.appspot.com/bug?extid=3bd970a1887812621b4c
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=146bed83f00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1365ecd3f00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+3bd970...@syzkaller.appspotmail.com

======================================================
WARNING: possible circular locking dependency detected
5.18.0-syzkaller-12234-g50fd82b3a9a9 #0 Not tainted
------------------------------------------------------
syz-executor143/3611 is trying to acquire lock:
ffff888026e4d5c8 (&jsk->sk_session_queue_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:354 [inline]
ffff888026e4d5c8 (&jsk->sk_session_queue_lock){+.-.}-{2:2}, at: j1939_sk_queue_drop_all+0x40/0x2f0 net/can/j1939/socket.c:139

but task is already holding lock:
ffff888073ce10d0 (&priv->j1939_socks_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:354 [inline]
ffff888073ce10d0 (&priv->j1939_socks_lock){+.-.}-{2:2}, at: j1939_sk_netdev_event_netdown+0x28/0x160 net/can/j1939/socket.c:1266

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&priv->j1939_socks_lock){+.-.}-{2:2}:
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x2f/0x40 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:354 [inline]
j1939_sk_errqueue+0x9f/0x1a0 net/can/j1939/socket.c:1078
__j1939_session_cancel+0x3b9/0x460 net/can/j1939/transport.c:1124
j1939_tp_rxtimer.cold+0x1f6/0x24f net/can/j1939/transport.c:1249
__run_hrtimer kernel/time/hrtimer.c:1685 [inline]
__hrtimer_run_queues+0x609/0xe50 kernel/time/hrtimer.c:1749
hrtimer_run_softirq+0x17b/0x360 kernel/time/hrtimer.c:1766
__do_softirq+0x29b/0x9c2 kernel/softirq.c:571
invoke_softirq kernel/softirq.c:445 [inline]
__irq_exit_rcu+0x123/0x180 kernel/softirq.c:650
irq_exit_rcu+0x5/0x20 kernel/softirq.c:662
sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1106
asm_sysvec_apic_timer_interrupt+0x1b/0x20 arch/x86/include/asm/idtentry.h:649
native_safe_halt arch/x86/include/asm/irqflags.h:51 [inline]
arch_safe_halt arch/x86/include/asm/irqflags.h:89 [inline]
acpi_safe_halt drivers/acpi/processor_idle.c:111 [inline]
acpi_idle_do_entry+0x1c6/0x250 drivers/acpi/processor_idle.c:554
acpi_idle_enter+0x369/0x510 drivers/acpi/processor_idle.c:691
cpuidle_enter_state+0x1b1/0xc80 drivers/cpuidle/cpuidle.c:237
cpuidle_enter+0x4a/0xa0 drivers/cpuidle/cpuidle.c:351
call_cpuidle kernel/sched/idle.c:155 [inline]
cpuidle_idle_call kernel/sched/idle.c:236 [inline]
do_idle+0x3e8/0x590 kernel/sched/idle.c:303
cpu_startup_entry+0x14/0x20 kernel/sched/idle.c:400
start_secondary+0x21d/0x2b0 arch/x86/kernel/smpboot.c:266
secondary_startup_64_no_verify+0xce/0xdb

-> #1 (&priv->active_session_list_lock){+.-.}-{2:2}:
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x2f/0x40 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:354 [inline]
j1939_session_list_lock net/can/j1939/transport.c:238 [inline]
j1939_session_activate+0x43/0x4b0 net/can/j1939/transport.c:1553
j1939_sk_queue_activate_next_locked net/can/j1939/socket.c:181 [inline]
j1939_sk_queue_activate_next+0x29b/0x460 net/can/j1939/socket.c:205
j1939_session_deactivate_activate_next net/can/j1939/transport.c:1101 [inline]
j1939_session_completed+0x19a/0x1f0 net/can/j1939/transport.c:1214
j1939_xtp_rx_eoma_one net/can/j1939/transport.c:1384 [inline]
j1939_xtp_rx_eoma+0x2a6/0x5f0 net/can/j1939/transport.c:1399
j1939_tp_cmd_recv net/can/j1939/transport.c:2088 [inline]
j1939_tp_recv+0x930/0xcb0 net/can/j1939/transport.c:2133
j1939_can_recv+0x6ff/0x9a0 net/can/j1939/main.c:108
deliver net/can/af_can.c:574 [inline]
can_rcv_filter+0x5d4/0x8d0 net/can/af_can.c:608
can_receive+0x31d/0x580 net/can/af_can.c:665
can_rcv+0x120/0x1c0 net/can/af_can.c:696
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5478
__netif_receive_skb+0x24/0x1b0 net/core/dev.c:5592
process_backlog+0x3a0/0x7c0 net/core/dev.c:5920
__napi_poll+0xb3/0x6e0 net/core/dev.c:6486
napi_poll net/core/dev.c:6553 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6664
__do_softirq+0x29b/0x9c2 kernel/softirq.c:571
run_ksoftirqd kernel/softirq.c:934 [inline]
run_ksoftirqd+0x2d/0x60 kernel/softirq.c:926
smpboot_thread_fn+0x645/0x9c0 kernel/smpboot.c:164
kthread+0x2e9/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302

-> #0 (&jsk->sk_session_queue_lock){+.-.}-{2:2}:
check_prev_add kernel/locking/lockdep.c:3095 [inline]
check_prevs_add kernel/locking/lockdep.c:3214 [inline]
validate_chain kernel/locking/lockdep.c:3829 [inline]
__lock_acquire+0x2abe/0x5660 kernel/locking/lockdep.c:5053
lock_acquire kernel/locking/lockdep.c:5665 [inline]
lock_acquire+0x1ab/0x570 kernel/locking/lockdep.c:5630
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x2f/0x40 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:354 [inline]
j1939_sk_queue_drop_all+0x40/0x2f0 net/can/j1939/socket.c:139
j1939_sk_netdev_event_netdown+0x7b/0x160 net/can/j1939/socket.c:1272
j1939_netdev_notify+0x199/0x1d0 net/can/j1939/main.c:372
notifier_call_chain+0xb5/0x200 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1943
call_netdevice_notifiers_extack net/core/dev.c:1981 [inline]
call_netdevice_notifiers net/core/dev.c:1995 [inline]
__dev_notify_flags+0x1da/0x2b0 net/core/dev.c:8571
dev_change_flags+0x112/0x170 net/core/dev.c:8607
do_setlink+0x961/0x3bb0 net/core/rtnetlink.c:2780
__rtnl_newlink+0xd6a/0x17e0 net/core/rtnetlink.c:3546
rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3593
rtnetlink_rcv_msg+0x43a/0xc90 net/core/rtnetlink.c:6089
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:734
____sys_sendmsg+0x6eb/0x810 net/socket.c:2492
___sys_sendmsg+0xf3/0x170 net/socket.c:2546
__sys_sendmsg net/socket.c:2575 [inline]
__do_sys_sendmsg net/socket.c:2584 [inline]
__se_sys_sendmsg net/socket.c:2582 [inline]
__x64_sys_sendmsg+0x132/0x220 net/socket.c:2582
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0

other info that might help us debug this:

Chain exists of:
&jsk->sk_session_queue_lock --> &priv->active_session_list_lock --> &priv->j1939_socks_lock

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&priv->j1939_socks_lock);
lock(&priv->active_session_list_lock);
lock(&priv->j1939_socks_lock);
lock(&jsk->sk_session_queue_lock);

*** DEADLOCK ***

2 locks held by syz-executor143/3611:
#0: ffffffff8d5937e8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:74 [inline]
#0: ffffffff8d5937e8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x3e5/0xc90 net/core/rtnetlink.c:6086
#1: ffff888073ce10d0 (&priv->j1939_socks_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:354 [inline]
#1: ffff888073ce10d0 (&priv->j1939_socks_lock){+.-.}-{2:2}, at: j1939_sk_netdev_event_netdown+0x28/0x160 net/can/j1939/socket.c:1266

stack backtrace:
CPU: 1 PID: 3611 Comm: syz-executor143 Not tainted 5.18.0-syzkaller-12234-g50fd82b3a9a9 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2175
check_prev_add kernel/locking/lockdep.c:3095 [inline]
check_prevs_add kernel/locking/lockdep.c:3214 [inline]
validate_chain kernel/locking/lockdep.c:3829 [inline]
__lock_acquire+0x2abe/0x5660 kernel/locking/lockdep.c:5053
lock_acquire kernel/locking/lockdep.c:5665 [inline]
lock_acquire+0x1ab/0x570 kernel/locking/lockdep.c:5630
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x2f/0x40 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:354 [inline]
j1939_sk_queue_drop_all+0x40/0x2f0 net/can/j1939/socket.c:139
j1939_sk_netdev_event_netdown+0x7b/0x160 net/can/j1939/socket.c:1272
j1939_netdev_notify+0x199/0x1d0 net/can/j1939/main.c:372
notifier_call_chain+0xb5/0x200 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1943
call_netdevice_notifiers_extack net/core/dev.c:1981 [inline]
call_netdevice_notifiers net/core/dev.c:1995 [inline]
__dev_notify_flags+0x1da/0x2b0 net/core/dev.c:8571
dev_change_flags+0x112/0x170 net/core/dev.c:8607
do_setlink+0x961/0x3bb0 net/core/rtnetlink.c:2780
__rtnl_newlink+0xd6a/0x17e0 net/core/rtnetlink.c:3546
rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3593
rtnetlink_rcv_msg+0x43a/0xc90 net/core/rtnetlink.c:6089
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:734
____sys_sendmsg+0x6eb/0x810 net/socket.c:2492
___sys_sendmsg+0xf3/0x170 net/socket.c:2546
__sys_sendmsg net/socket.c:2575 [inline]
__do_sys_sendmsg net/socket.c:2584 [inline]
__se_sys_sendmsg net/socket.c:2582 [inline]
__x64_sys_sendmsg+0x132/0x220 net/socket.c:2582
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fe42bbf0e89
Code: 28 c3 e8 4a 15 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffd26802168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007ffd26802178 RCX: 00007fe42bbf0e89
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000005
RBP: 0000000000000003 R08: bb1414ac00000000 R09: bb1414ac00000000
R10: bb1414ac00000000 R11: 0000000000000246 R12: 00007ffd26802180
R13: 00007ffd26802174 R14: 0000000000000003 R15: 0000000000000000
</TASK>
A link change request failed with some changes committed already. Interface vxcan0 may have been left with an inconsistent configuration, please check.

Hillf Danton

unread,
Jun 4, 2022, 1:35:18 AM6/4/22
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Fri, 03 Jun 2022 12:14:21 -0700
To fix deadlock, split __j1939_session_cancel() into two parts and move
the second part that spreads notifications to all sockets subscribed to
this session to the call site.

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 50fd82b3a9a9

--- y/net/can/j1939/transport.c
+++ x/net/can/j1939/transport.c
@@ -1117,25 +1117,30 @@ static void __j1939_session_cancel(struc
!session->transmission,
err, session->skcb.addr.pgn);
}
-
- if (session->sk)
- j1939_sk_send_loop_abort(session->sk, session->err);
- else
- j1939_sk_errqueue(session, J1939_ERRQUEUE_RX_ABORT);
}

static void j1939_session_cancel(struct j1939_session *session,
enum j1939_xtp_abort err)
{
+ int canceled = 0;
+
j1939_session_list_lock(session->priv);

if (session->state >= J1939_SESSION_ACTIVE &&
session->state < J1939_SESSION_WAITING_ABORT) {
j1939_tp_set_rxtimeout(session, J1939_XTP_ABORT_TIMEOUT_MS);
__j1939_session_cancel(session, err);
+ canceled = 1;
}

j1939_session_list_unlock(session->priv);
+
+ if (canceled) {
+ if (session->sk)
+ j1939_sk_send_loop_abort(session->sk, session->err);
+ else
+ j1939_sk_errqueue(session, J1939_ERRQUEUE_RX_ABORT);
+ }
}

static enum hrtimer_restart j1939_tp_txtimer(struct hrtimer *hrtimer)
@@ -1237,6 +1242,7 @@ static enum hrtimer_restart j1939_tp_rxt
session->err = -ETIME;
j1939_session_deactivate(session);
} else {
+ int canceled = 0;
j1939_session_list_lock(session->priv);
if (session->state >= J1939_SESSION_ACTIVE &&
session->state < J1939_SESSION_ACTIVE_MAX) {
@@ -1247,8 +1253,16 @@ static enum hrtimer_restart j1939_tp_rxt
ms_to_ktime(J1939_XTP_ABORT_TIMEOUT_MS),
HRTIMER_MODE_REL_SOFT);
__j1939_session_cancel(session, J1939_XTP_ABORT_TIMEOUT);
+ canceled = 1;
}
j1939_session_list_unlock(session->priv);
+
+ if (canceled) {
+ if (session->sk)
+ j1939_sk_send_loop_abort(session->sk, session->err);
+ else
+ j1939_sk_errqueue(session, J1939_ERRQUEUE_RX_ABORT);
+ }
}

j1939_session_put(session);
--

syzbot

unread,
Jun 4, 2022, 1:51:17 AM6/4/22
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
possible deadlock in j1939_sk_queue_drop_all

======================================================
WARNING: possible circular locking dependency detected
5.18.0-syzkaller-12234-g50fd82b3a9a9-dirty #0 Not tainted
------------------------------------------------------
syz-executor.0/4079 is trying to acquire lock:
ffff88807462f5c8 (&jsk->sk_session_queue_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:354 [inline]
ffff88807462f5c8 (&jsk->sk_session_queue_lock){+.-.}-{2:2}, at: j1939_sk_queue_drop_all+0x40/0x2f0 net/can/j1939/socket.c:139

but task is already holding lock:
ffff888022b210d0
(&priv->j1939_socks_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:354 [inline]
(&priv->j1939_socks_lock){+.-.}-{2:2}, at: j1939_sk_netdev_event_netdown+0x28/0x160 net/can/j1939/socket.c:1266

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2
(&priv->j1939_socks_lock){+.-.}-{2:2}:
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x2f/0x40 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:354 [inline]
j1939_sk_errqueue+0x9f/0x1a0 net/can/j1939/socket.c:1078
j1939_session_destroy+0x301/0x3d0 net/can/j1939/transport.c:269
__j1939_session_release net/can/j1939/transport.c:288 [inline]
kref_put include/linux/kref.h:65 [inline]
j1939_session_put net/can/j1939/transport.c:293 [inline]
j1939_session_deactivate_locked net/can/j1939/transport.c:1075 [inline]
j1939_session_deactivate_locked+0x293/0x340 net/can/j1939/transport.c:1063
j1939_cancel_active_session+0x184/0x370 net/can/j1939/transport.c:2197
j1939_netdev_notify+0x191/0x1d0 net/can/j1939/main.c:371
-> #1 (&priv->active_session_list_lock
){+.-.}-{2:2}
:
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x2f/0x40 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:354 [inline]
j1939_session_list_lock net/can/j1939/transport.c:238 [inline]
j1939_session_activate+0x43/0x4b0 net/can/j1939/transport.c:1567
j1939_sk_queue_activate_next_locked net/can/j1939/socket.c:181 [inline]
j1939_sk_queue_activate_next+0x29b/0x460 net/can/j1939/socket.c:205
j1939_session_deactivate_activate_next net/can/j1939/transport.c:1101 [inline]
j1939_session_completed+0x19a/0x1f0 net/can/j1939/transport.c:1219
j1939_xtp_rx_eoma_one net/can/j1939/transport.c:1398 [inline]
j1939_xtp_rx_eoma+0x2a6/0x5f0 net/can/j1939/transport.c:1413
j1939_tp_cmd_recv net/can/j1939/transport.c:2102 [inline]
j1939_tp_recv+0x930/0xcb0 net/can/j1939/transport.c:2147
&jsk->sk_session_queue_lock);

*** DEADLOCK ***

2 locks held by syz-executor.0/4079:
#0: ffffffff8d5937e8 (rtnl_mutex
){+.+.}-{3:3}
, at: rtnl_lock net/core/rtnetlink.c:74 [inline]
, at: rtnetlink_rcv_msg+0x3e5/0xc90 net/core/rtnetlink.c:6086
#1: ffff888022b210d0 (&priv->j1939_socks_lock
){+.-.}-{2:2}
, at: spin_lock_bh include/linux/spinlock.h:354 [inline]
, at: j1939_sk_netdev_event_netdown+0x28/0x160 net/can/j1939/socket.c:1266

stack backtrace:
CPU: 0 PID: 4079 Comm: syz-executor.0 Not tainted 5.18.0-syzkaller-12234-g50fd82b3a9a9-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2175
RIP: 0033:0x7f106a489109
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f106b644168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f106a59c030 RCX: 00007f106a489109
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000005
RBP: 00007f106a4e308d R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffcb00ebfaf R14: 00007f106b644300 R15: 0000000000022000
</TASK>
A link change request failed with some changes committed already. Interface vxcan0 may have been left with an inconsistent configuration, please check.


Tested on:

commit: 50fd82b3 Merge tag 'docs-5.19-2' of git://git.lwn.net/..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=14f3f82bf00000
kernel config: https://syzkaller.appspot.com/x/.config?x=fc5a30a131480a80
dashboard link: https://syzkaller.appspot.com/bug?extid=3bd970a1887812621b4c
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=15ee5177f00000

Hillf Danton

unread,
Jun 4, 2022, 4:25:35 AM6/4/22
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Fri, 03 Jun 2022 12:14:21 -0700
v1, To fix deadlock, split __j1939_session_cancel() into two parts and move
the second part that spreads notifications to all sockets subscribed to
this session to the call site.

v2, move destroying of session to workqueue to break lock chain.

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 50fd82b3a9a9

--- y/net/can/j1939/transport.c
+++ x/net/can/j1939/transport.c
@@ -280,12 +280,20 @@ static void j1939_session_destroy(struct
kfree(session);
}

+static void j1939_session_destroy_workfn(struct work_struct *work)
+{
+ struct j1939_session *session = container_of(work, struct j1939_session,
+ destroy_work);
+ j1939_session_destroy(session);
+}
+
static void __j1939_session_release(struct kref *kref)
{
struct j1939_session *session = container_of(kref, struct j1939_session,
kref);

- j1939_session_destroy(session);
+ INIT_WORK(&session->destroy_work, j1939_session_destroy_workfn);
+ queue_work(system_unbound_wq, &session->destroy_work);
}

void j1939_session_put(struct j1939_session *session)
@@ -1117,25 +1125,30 @@ static void __j1939_session_cancel(struc
@@ -1237,6 +1250,7 @@ static enum hrtimer_restart j1939_tp_rxt
session->err = -ETIME;
j1939_session_deactivate(session);
} else {
+ int canceled = 0;
j1939_session_list_lock(session->priv);
if (session->state >= J1939_SESSION_ACTIVE &&
session->state < J1939_SESSION_ACTIVE_MAX) {
@@ -1247,8 +1261,16 @@ static enum hrtimer_restart j1939_tp_rxt
ms_to_ktime(J1939_XTP_ABORT_TIMEOUT_MS),
HRTIMER_MODE_REL_SOFT);
__j1939_session_cancel(session, J1939_XTP_ABORT_TIMEOUT);
+ canceled = 1;
}
j1939_session_list_unlock(session->priv);
+
+ if (canceled) {
+ if (session->sk)
+ j1939_sk_send_loop_abort(session->sk, session->err);
+ else
+ j1939_sk_errqueue(session, J1939_ERRQUEUE_RX_ABORT);
+ }
}

j1939_session_put(session);
--- y/net/can/j1939/j1939-priv.h
+++ x/net/can/j1939/j1939-priv.h
@@ -286,6 +286,7 @@ struct j1939_session {
unsigned int dpo;
} pkt;
struct hrtimer txtimer, rxtimer;
+ struct work_struct destroy_work;
};

struct j1939_sock {
--

syzbot

unread,
Jun 4, 2022, 6:44:08 AM6/4/22
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+3bd970...@syzkaller.appspotmail.com

Tested on:

commit: 50fd82b3 Merge tag 'docs-5.19-2' of git://git.lwn.net/..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel config: https://syzkaller.appspot.com/x/.config?x=fc5a30a131480a80
dashboard link: https://syzkaller.appspot.com/bug?extid=3bd970a1887812621b4c
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=1785e693f00000

Note: testing is done by a robot and is best-effort only.

syzbot

unread,
Mar 13, 2024, 8:23:06 AMMar 13
to astr...@yahoo.com, da...@davemloft.net, edum...@google.com, hda...@sina.com, ker...@pengutronix.de, ku...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, li...@rempel-privat.de, m...@pengutronix.de, net...@vger.kernel.org, o.re...@pengutronix.de, pab...@redhat.com, ro...@protonic.nl, sock...@hartkopp.net, syzkall...@googlegroups.com
syzbot suspects this issue was fixed by commit:

commit 6cdedc18ba7b9dacc36466e27e3267d201948c8d
Author: Ziqi Zhao <astr...@yahoo.com>
Date: Fri Jul 21 16:22:26 2023 +0000

can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=10e4d371180000
start commit: dd72f9c7e512 Merge tag 'spi-fix-v6-6-rc4' of git://git.ker..
git tree: upstream
kernel config: https://syzkaller.appspot.com/x/.config?x=12abf4cc4f802b24
dashboard link: https://syzkaller.appspot.com/bug?extid=3bd970a1887812621b4c
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17602089680000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=13398a9d680000

If the result looks correct, please mark the issue as fixed by replying with:

#syz fix: can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

Oleksij Rempel

unread,
Mar 13, 2024, 9:59:38 AMMar 13
to syzbot, astr...@yahoo.com, da...@davemloft.net, edum...@google.com, hda...@sina.com, ker...@pengutronix.de, ku...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, li...@rempel-privat.de, m...@pengutronix.de, net...@vger.kernel.org, pab...@redhat.com, ro...@protonic.nl, sock...@hartkopp.net, syzkall...@googlegroups.com
#syz fix: can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock

--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
Reply all
Reply to author
Forward
0 new messages