[syzbot] [can?] possible deadlock in j1939_sk_errqueue (2)

16 views
Skip to first unread message

syzbot

unread,
Jul 4, 2023, 2:19:56 AM7/4/23
to da...@davemloft.net, edum...@google.com, ker...@pengutronix.de, ku...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, li...@rempel-privat.de, m...@pengutronix.de, net...@vger.kernel.org, pab...@redhat.com, ro...@protonic.nl, sock...@hartkopp.net, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: ae230642190a Merge branch 'af_unix-followup-fixes-for-so_p..
git tree: net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1771bf67280000
kernel config: https://syzkaller.appspot.com/x/.config?x=c9bf1936936ca698
dashboard link: https://syzkaller.appspot.com/bug?extid=1591462f226d9cbf0564
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/8c060db03f09/disk-ae230642.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/1b9b937ece91/vmlinux-ae230642.xz
kernel image: https://storage.googleapis.com/syzbot-assets/0c7eb1c82bf0/bzImage-ae230642.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+159146...@syzkaller.appspotmail.com

======================================================
WARNING: possible circular locking dependency detected
6.4.0-rc7-syzkaller-01948-gae230642190a #0 Not tainted
------------------------------------------------------
syz-executor.2/11224 is trying to acquire lock:
ffff88803bee50d0 (&priv->j1939_socks_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:355 [inline]
ffff88803bee50d0 (&priv->j1939_socks_lock){+.-.}-{2:2}, at: j1939_sk_errqueue+0xa3/0x1a0 net/can/j1939/socket.c:1081

but task is already holding lock:
ffff88803bee5088 (&priv->active_session_list_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:355 [inline]
ffff88803bee5088 (&priv->active_session_list_lock){+.-.}-{2:2}, at: j1939_session_list_lock net/can/j1939/transport.c:238 [inline]
ffff88803bee5088 (&priv->active_session_list_lock){+.-.}-{2:2}, at: j1939_cancel_active_session+0x41/0x360 net/can/j1939/transport.c:2183

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&priv->active_session_list_lock){+.-.}-{2:2}:
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x33/0x40 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:355 [inline]
j1939_session_list_lock net/can/j1939/transport.c:238 [inline]
j1939_session_activate+0x47/0x4b0 net/can/j1939/transport.c:1564
j1939_sk_queue_activate_next_locked net/can/j1939/socket.c:181 [inline]
j1939_sk_queue_activate_next+0x2bf/0x4d0 net/can/j1939/socket.c:208
j1939_session_deactivate_activate_next net/can/j1939/transport.c:1108 [inline]
j1939_xtp_rx_abort_one+0x3c0/0x5b0 net/can/j1939/transport.c:1351
j1939_xtp_rx_abort net/can/j1939/transport.c:1362 [inline]
j1939_tp_cmd_recv net/can/j1939/transport.c:2111 [inline]
j1939_tp_recv+0xd98/0xf50 net/can/j1939/transport.c:2144
j1939_can_recv net/can/j1939/main.c:112 [inline]
j1939_can_recv+0x78e/0xa80 net/can/j1939/main.c:38
deliver net/can/af_can.c:572 [inline]
can_rcv_filter+0x5d4/0x8d0 net/can/af_can.c:606
can_receive+0x31d/0x5c0 net/can/af_can.c:663
can_rcv+0x1e1/0x280 net/can/af_can.c:687
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5452
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5566
process_backlog+0x101/0x670 net/core/dev.c:5894
__napi_poll+0xb7/0x6f0 net/core/dev.c:6460
napi_poll net/core/dev.c:6527 [inline]
net_rx_action+0x8a9/0xcb0 net/core/dev.c:6660
__do_softirq+0x1d4/0x905 kernel/softirq.c:571
run_ksoftirqd kernel/softirq.c:939 [inline]
run_ksoftirqd+0x31/0x60 kernel/softirq.c:931
smpboot_thread_fn+0x659/0x9e0 kernel/smpboot.c:164
kthread+0x344/0x440 kernel/kthread.c:379
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308

-> #1 (&jsk->sk_session_queue_lock){+.-.}-{2:2}:
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x33/0x40 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:355 [inline]
j1939_sk_queue_drop_all+0x3b/0x2f0 net/can/j1939/socket.c:139
j1939_sk_netdev_event_netdown+0x7f/0x160 net/can/j1939/socket.c:1280
j1939_netdev_notify+0x19f/0x1d0 net/can/j1939/main.c:381
notifier_call_chain+0xb6/0x3c0 kernel/notifier.c:93
call_netdevice_notifiers_info+0xb9/0x130 net/core/dev.c:1962
call_netdevice_notifiers_extack net/core/dev.c:2000 [inline]
call_netdevice_notifiers net/core/dev.c:2014 [inline]
__dev_notify_flags+0x1ea/0x2d0 net/core/dev.c:8571
dev_change_flags+0x11b/0x170 net/core/dev.c:8607
do_setlink+0x19e2/0x3ae0 net/core/rtnetlink.c:2867
__rtnl_newlink+0xd85/0x1860 net/core/rtnetlink.c:3655
rtnl_newlink+0x68/0xa0 net/core/rtnetlink.c:3702
rtnetlink_rcv_msg+0x43d/0xd50 net/core/rtnetlink.c:6424
netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2549
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1914
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0xde/0x190 net/socket.c:747
____sys_sendmsg+0x733/0x920 net/socket.c:2493
___sys_sendmsg+0x110/0x1b0 net/socket.c:2547
__sys_sendmsg+0xf7/0x1c0 net/socket.c:2576
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

-> #0 (&priv->j1939_socks_lock){+.-.}-{2:2}:
check_prev_add kernel/locking/lockdep.c:3113 [inline]
check_prevs_add kernel/locking/lockdep.c:3232 [inline]
validate_chain kernel/locking/lockdep.c:3847 [inline]
__lock_acquire+0x2fcd/0x5f30 kernel/locking/lockdep.c:5088
lock_acquire kernel/locking/lockdep.c:5705 [inline]
lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5670
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x33/0x40 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:355 [inline]
j1939_sk_errqueue+0xa3/0x1a0 net/can/j1939/socket.c:1081
j1939_session_destroy+0x26c/0x4e0 net/can/j1939/transport.c:271
__j1939_session_release net/can/j1939/transport.c:294 [inline]
kref_put include/linux/kref.h:65 [inline]
j1939_session_put net/can/j1939/transport.c:299 [inline]
j1939_session_deactivate_locked net/can/j1939/transport.c:1086 [inline]
j1939_session_deactivate_locked+0x293/0x340 net/can/j1939/transport.c:1074
j1939_cancel_active_session+0x183/0x360 net/can/j1939/transport.c:2194
j1939_netdev_notify+0x197/0x1d0 net/can/j1939/main.c:380
notifier_call_chain+0xb6/0x3c0 kernel/notifier.c:93
call_netdevice_notifiers_info+0xb9/0x130 net/core/dev.c:1962
call_netdevice_notifiers_extack net/core/dev.c:2000 [inline]
call_netdevice_notifiers net/core/dev.c:2014 [inline]
__dev_notify_flags+0x1ea/0x2d0 net/core/dev.c:8571
dev_change_flags+0x11b/0x170 net/core/dev.c:8607
do_setlink+0x19e2/0x3ae0 net/core/rtnetlink.c:2867
__rtnl_newlink+0xd85/0x1860 net/core/rtnetlink.c:3655
rtnl_newlink+0x68/0xa0 net/core/rtnetlink.c:3702
rtnetlink_rcv_msg+0x43d/0xd50 net/core/rtnetlink.c:6424
netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2549
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1914
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0xde/0x190 net/socket.c:747
____sys_sendmsg+0x733/0x920 net/socket.c:2493
___sys_sendmsg+0x110/0x1b0 net/socket.c:2547
__sys_sendmsg+0xf7/0x1c0 net/socket.c:2576
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

other info that might help us debug this:

Chain exists of:
&priv->j1939_socks_lock --> &jsk->sk_session_queue_lock --> &priv->active_session_list_lock

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&priv->active_session_list_lock);
lock(&jsk->sk_session_queue_lock);
lock(&priv->active_session_list_lock);
lock(&priv->j1939_socks_lock);

*** DEADLOCK ***

2 locks held by syz-executor.2/11224:
#0: ffffffff8e1194a8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:78 [inline]
#0: ffffffff8e1194a8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x3e8/0xd50 net/core/rtnetlink.c:6421
#1: ffff88803bee5088 (&priv->active_session_list_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:355 [inline]
#1: ffff88803bee5088 (&priv->active_session_list_lock){+.-.}-{2:2}, at: j1939_session_list_lock net/can/j1939/transport.c:238 [inline]
#1: ffff88803bee5088 (&priv->active_session_list_lock){+.-.}-{2:2}, at: j1939_cancel_active_session+0x41/0x360 net/can/j1939/transport.c:2183

stack backtrace:
CPU: 1 PID: 11224 Comm: syz-executor.2 Not tainted 6.4.0-rc7-syzkaller-01948-gae230642190a #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2188
check_prev_add kernel/locking/lockdep.c:3113 [inline]
check_prevs_add kernel/locking/lockdep.c:3232 [inline]
validate_chain kernel/locking/lockdep.c:3847 [inline]
__lock_acquire+0x2fcd/0x5f30 kernel/locking/lockdep.c:5088
lock_acquire kernel/locking/lockdep.c:5705 [inline]
lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5670
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x33/0x40 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:355 [inline]
j1939_sk_errqueue+0xa3/0x1a0 net/can/j1939/socket.c:1081
j1939_session_destroy+0x26c/0x4e0 net/can/j1939/transport.c:271
__j1939_session_release net/can/j1939/transport.c:294 [inline]
kref_put include/linux/kref.h:65 [inline]
j1939_session_put net/can/j1939/transport.c:299 [inline]
j1939_session_deactivate_locked net/can/j1939/transport.c:1086 [inline]
j1939_session_deactivate_locked+0x293/0x340 net/can/j1939/transport.c:1074
j1939_cancel_active_session+0x183/0x360 net/can/j1939/transport.c:2194
j1939_netdev_notify+0x197/0x1d0 net/can/j1939/main.c:380
notifier_call_chain+0xb6/0x3c0 kernel/notifier.c:93
call_netdevice_notifiers_info+0xb9/0x130 net/core/dev.c:1962
call_netdevice_notifiers_extack net/core/dev.c:2000 [inline]
call_netdevice_notifiers net/core/dev.c:2014 [inline]
__dev_notify_flags+0x1ea/0x2d0 net/core/dev.c:8571
dev_change_flags+0x11b/0x170 net/core/dev.c:8607
do_setlink+0x19e2/0x3ae0 net/core/rtnetlink.c:2867
__rtnl_newlink+0xd85/0x1860 net/core/rtnetlink.c:3655
rtnl_newlink+0x68/0xa0 net/core/rtnetlink.c:3702
rtnetlink_rcv_msg+0x43d/0xd50 net/core/rtnetlink.c:6424
netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2549
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1914
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0xde/0x190 net/socket.c:747
____sys_sendmsg+0x733/0x920 net/socket.c:2493
___sys_sendmsg+0x110/0x1b0 net/socket.c:2547
__sys_sendmsg+0xf7/0x1c0 net/socket.c:2576
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fdb2bc8c389
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fdb2cab9168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fdb2bdabf80 RCX: 00007fdb2bc8c389
RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000006
RBP: 00007fdb2bcd7493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd70d89bbf R14: 00007fdb2cab9300 R15: 0000000000022000
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the bug is already fixed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to change bug's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the bug is a duplicate of another bug, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Ziqi Zhao

unread,
Jul 4, 2023, 2:47:27 AM7/4/23
to syzbot+159146...@syzkaller.appspotmail.com, da...@davemloft.net, edum...@google.com, ker...@pengutronix.de, ku...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, li...@rempel-privat.de, m...@pengutronix.de, net...@vger.kernel.org, pab...@redhat.com, ro...@protonic.nl, sock...@hartkopp.net, syzkall...@googlegroups.com, sk...@linuxfoundation.org, ivan.or...@gmail.com, Ziqi Zhao
The following 3 locks would race against each other, causing the
deadlock situation in the Syzbot bug report:

- j1939_socks_lock
- active_session_list_lock
- sk_session_queue_lock

A reasonable fix is to change j1939_socks_lock to an rwlock, since in
the rare situations where a write lock is required for the linked list
that j1939_socks_lock is protecting, the code does not attempt to
acquire any more locks. This would break the circular lock dependency,
where, for example, the current thread already locks j1939_socks_lock
and attempts to acquire sk_session_queue_lock, and at the same time,
another thread attempts to acquire j1939_socks_lock while holding
sk_session_queue_lock.

NOTE: This patch along does not fix the unregister_netdevice bug
reported by Syzbot; instead, it solves a deadlock situation to prepare
for one or more further patches to actually fix the Syzbot bug, which
appears to be a reference counting problem within the j1939 codebase.

#syz test:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

Signed-off-by: Ziqi Zhao <astr...@yahoo.com>
---
net/can/j1939/j1939-priv.h | 2 +-
net/can/j1939/main.c | 2 +-
net/can/j1939/socket.c | 25 +++++++++++++------------
3 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/net/can/j1939/j1939-priv.h b/net/can/j1939/j1939-priv.h
index 16af1a7f80f6..74f15592d170 100644
--- a/net/can/j1939/j1939-priv.h
+++ b/net/can/j1939/j1939-priv.h
@@ -86,7 +86,7 @@ struct j1939_priv {
unsigned int tp_max_packet_size;

/* lock for j1939_socks list */
- spinlock_t j1939_socks_lock;
+ rwlock_t j1939_socks_lock;
struct list_head j1939_socks;

struct kref rx_kref;
diff --git a/net/can/j1939/main.c b/net/can/j1939/main.c
index ecff1c947d68..a6fb89fa6278 100644
--- a/net/can/j1939/main.c
+++ b/net/can/j1939/main.c
@@ -274,7 +274,7 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev)
return ERR_PTR(-ENOMEM);

j1939_tp_init(priv);
- spin_lock_init(&priv->j1939_socks_lock);
+ rwlock_init(&priv->j1939_socks_lock);
INIT_LIST_HEAD(&priv->j1939_socks);

mutex_lock(&j1939_netdev_lock);
diff --git a/net/can/j1939/socket.c b/net/can/j1939/socket.c
index feaec4ad6d16..a8b981dc2065 100644
--- a/net/can/j1939/socket.c
+++ b/net/can/j1939/socket.c
@@ -80,16 +80,16 @@ static void j1939_jsk_add(struct j1939_priv *priv, struct j1939_sock *jsk)
jsk->state |= J1939_SOCK_BOUND;
j1939_priv_get(priv);

- spin_lock_bh(&priv->j1939_socks_lock);
+ write_lock_bh(&priv->j1939_socks_lock);
list_add_tail(&jsk->list, &priv->j1939_socks);
- spin_unlock_bh(&priv->j1939_socks_lock);
+ write_unlock_bh(&priv->j1939_socks_lock);
}

static void j1939_jsk_del(struct j1939_priv *priv, struct j1939_sock *jsk)
{
- spin_lock_bh(&priv->j1939_socks_lock);
+ write_lock_bh(&priv->j1939_socks_lock);
list_del_init(&jsk->list);
- spin_unlock_bh(&priv->j1939_socks_lock);
+ write_unlock_bh(&priv->j1939_socks_lock);

j1939_priv_put(priv);
jsk->state &= ~J1939_SOCK_BOUND;
@@ -329,13 +329,13 @@ bool j1939_sk_recv_match(struct j1939_priv *priv, struct j1939_sk_buff_cb *skcb)
struct j1939_sock *jsk;
bool match = false;

- spin_lock_bh(&priv->j1939_socks_lock);
+ read_lock_bh(&priv->j1939_socks_lock);
list_for_each_entry(jsk, &priv->j1939_socks, list) {
match = j1939_sk_recv_match_one(jsk, skcb);
if (match)
break;
}
- spin_unlock_bh(&priv->j1939_socks_lock);
+ read_unlock_bh(&priv->j1939_socks_lock);

return match;
}
@@ -344,11 +344,11 @@ void j1939_sk_recv(struct j1939_priv *priv, struct sk_buff *skb)
{
struct j1939_sock *jsk;

- spin_lock_bh(&priv->j1939_socks_lock);
+ read_lock_bh(&priv->j1939_socks_lock);
list_for_each_entry(jsk, &priv->j1939_socks, list) {
j1939_sk_recv_one(jsk, skb);
}
- spin_unlock_bh(&priv->j1939_socks_lock);
+ read_unlock_bh(&priv->j1939_socks_lock);
}

static void j1939_sk_sock_destruct(struct sock *sk)
@@ -484,6 +484,7 @@ static int j1939_sk_bind(struct socket *sock, struct sockaddr *uaddr, int len)

priv = j1939_netdev_start(ndev);
dev_put(ndev);
+
if (IS_ERR(priv)) {
ret = PTR_ERR(priv);
goto out_release_sock;
@@ -1078,12 +1079,12 @@ void j1939_sk_errqueue(struct j1939_session *session,
}

/* spread RX notifications to all sockets subscribed to this session */
- spin_lock_bh(&priv->j1939_socks_lock);
+ read_lock_bh(&priv->j1939_socks_lock);
list_for_each_entry(jsk, &priv->j1939_socks, list) {
if (j1939_sk_recv_match_one(jsk, &session->skcb))
__j1939_sk_errqueue(session, &jsk->sk, type);
}
- spin_unlock_bh(&priv->j1939_socks_lock);
+ read_unlock_bh(&priv->j1939_socks_lock);
};

void j1939_sk_send_loop_abort(struct sock *sk, int err)
@@ -1271,7 +1272,7 @@ void j1939_sk_netdev_event_netdown(struct j1939_priv *priv)
struct j1939_sock *jsk;
int error_code = ENETDOWN;

- spin_lock_bh(&priv->j1939_socks_lock);
+ read_lock_bh(&priv->j1939_socks_lock);
list_for_each_entry(jsk, &priv->j1939_socks, list) {
jsk->sk.sk_err = error_code;
if (!sock_flag(&jsk->sk, SOCK_DEAD))
@@ -1279,7 +1280,7 @@ void j1939_sk_netdev_event_netdown(struct j1939_priv *priv)

j1939_sk_queue_drop_all(priv, jsk, error_code);
}
- spin_unlock_bh(&priv->j1939_socks_lock);
+ read_unlock_bh(&priv->j1939_socks_lock);
}

static int j1939_sk_no_ioctlcmd(struct socket *sock, unsigned int cmd,
--
2.34.1

syzbot

unread,
Jul 4, 2023, 2:47:28 AM7/4/23
to astr...@yahoo.com, astr...@yahoo.com, da...@davemloft.net, edum...@google.com, ivan.or...@gmail.com, ker...@pengutronix.de, ku...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, li...@rempel-privat.de, m...@pengutronix.de, net...@vger.kernel.org, pab...@redhat.com, ro...@protonic.nl, sk...@linuxfoundation.org, sock...@hartkopp.net, syzkall...@googlegroups.com
> The following 3 locks would race against each other, causing the
> deadlock situation in the Syzbot bug report:
>
> - j1939_socks_lock
> - active_session_list_lock
> - sk_session_queue_lock
>
> A reasonable fix is to change j1939_socks_lock to an rwlock, since in
> the rare situations where a write lock is required for the linked list
> that j1939_socks_lock is protecting, the code does not attempt to
> acquire any more locks. This would break the circular lock dependency,
> where, for example, the current thread already locks j1939_socks_lock
> and attempts to acquire sk_session_queue_lock, and at the same time,
> another thread attempts to acquire j1939_socks_lock while holding
> sk_session_queue_lock.
>
> NOTE: This patch along does not fix the unregister_netdevice bug
> reported by Syzbot; instead, it solves a deadlock situation to prepare
> for one or more further patches to actually fix the Syzbot bug, which
> appears to be a reference counting problem within the j1939 codebase.
>
> #syz test:

This crash does not have a reproducer. I cannot test it.

Oleksij Rempel

unread,
Jul 4, 2023, 4:28:51 AM7/4/23
to syzbot, astr...@yahoo.com, da...@davemloft.net, edum...@google.com, ivan.or...@gmail.com, ker...@pengutronix.de, ku...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, li...@rempel-privat.de, m...@pengutronix.de, net...@vger.kernel.org, pab...@redhat.com, ro...@protonic.nl, sk...@linuxfoundation.org, sock...@hartkopp.net, syzkall...@googlegroups.com
On Mon, Jul 03, 2023 at 11:47:26PM -0700, syzbot wrote:
> > The following 3 locks would race against each other, causing the
> > deadlock situation in the Syzbot bug report:
> >
> > - j1939_socks_lock
> > - active_session_list_lock
> > - sk_session_queue_lock
> >
> > A reasonable fix is to change j1939_socks_lock to an rwlock, since in
> > the rare situations where a write lock is required for the linked list
> > that j1939_socks_lock is protecting, the code does not attempt to
> > acquire any more locks. This would break the circular lock dependency,
> > where, for example, the current thread already locks j1939_socks_lock
> > and attempts to acquire sk_session_queue_lock, and at the same time,
> > another thread attempts to acquire j1939_socks_lock while holding
> > sk_session_queue_lock.
> >
> > NOTE: This patch along does not fix the unregister_netdevice bug
> > reported by Syzbot; instead, it solves a deadlock situation to prepare
> > for one or more further patches to actually fix the Syzbot bug, which
> > appears to be a reference counting problem within the j1939 codebase.
> >
> > #syz test:
>
> This crash does not have a reproducer. I cannot test it.
>

To stress this code path, the socket should be configured with err queue
enabled. For example like this:

value = 1;
setsockopt(priv->sock, SOL_CAN_J1939, SO_J1939_ERRQUEUE, &value,
sizeof(value));

sock_opt = SOF_TIMESTAMPING_SOFTWARE |
SOF_TIMESTAMPING_OPT_CMSG |
SOF_TIMESTAMPING_TX_ACK |
SOF_TIMESTAMPING_TX_SCHED |
SOF_TIMESTAMPING_OPT_STATS | SOF_TIMESTAMPING_OPT_TSONLY |
SOF_TIMESTAMPING_OPT_ID | SOF_TIMESTAMPING_RX_SOFTWARE;

setsockopt(priv->sock, SOL_SOCKET, SO_TIMESTAMPING,
(char *) &sock_opt, sizeof(sock_opt));


I hope it will help to create the reproducer

Regards,
Oleksij
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

Astra Joan

unread,
Jul 4, 2023, 1:56:06 PM7/4/23
to o.re...@pengutronix.de, Astra Joan, da...@davemloft.net, edum...@google.com, ivan.or...@gmail.com, ker...@pengutronix.de, ku...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, li...@rempel-privat.de, m...@pengutronix.de, net...@vger.kernel.org, pab...@redhat.com, ro...@protonic.nl, sk...@linuxfoundation.org, sock...@hartkopp.net, syzbot+159146...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
Hi Oleksij,

Thank you for providing help with the bug fix! The patch was created
when I was working on another bug:

https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84

But the patch was not a direct fix of the problem reported in the
unregister_netdevice function call. Instead, it suppresses potential
deadlock information to guarantee the real bug would show up. Since I
have verified that the patch resolved a deadlock situation involving
the exact same locks, I'm pretty confident it would be a proper fix for
the current bug in this thread.

I'm not sure, though, about how I could instruct Syzbot to create a
reproducer to properly test this patch. Could you or anyone here help
me find the next step? Thank you so much!

Best regards,
Ziqi

Oleksij Rempel

unread,
Jul 5, 2023, 12:40:24 AM7/5/23
to Astra Joan, da...@davemloft.net, edum...@google.com, ivan.or...@gmail.com, ker...@pengutronix.de, ku...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, li...@rempel-privat.de, m...@pengutronix.de, net...@vger.kernel.org, pab...@redhat.com, ro...@protonic.nl, sk...@linuxfoundation.org, sock...@hartkopp.net, syzbot+159146...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
Sorry, I'm not syzbot expert. I hope someone else can help here.

Dmitry Vyukov

unread,
Jul 5, 2023, 12:50:58 AM7/5/23
to Oleksij Rempel, Astra Joan, da...@davemloft.net, edum...@google.com, ivan.or...@gmail.com, ker...@pengutronix.de, ku...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, li...@rempel-privat.de, m...@pengutronix.de, net...@vger.kernel.org, pab...@redhat.com, ro...@protonic.nl, sk...@linuxfoundation.org, sock...@hartkopp.net, syzbot+159146...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, syzkaller
On Wed, 5 Jul 2023 at 06:40, Oleksij Rempel <o.re...@pengutronix.de> wrote:
>
> On Tue, Jul 04, 2023 at 10:55:47AM -0700, Astra Joan wrote:
> > Hi Oleksij,
> >
> > Thank you for providing help with the bug fix! The patch was created
> > when I was working on another bug:
> >
> > https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84
> >
> > But the patch was not a direct fix of the problem reported in the
> > unregister_netdevice function call. Instead, it suppresses potential
> > deadlock information to guarantee the real bug would show up. Since I
> > have verified that the patch resolved a deadlock situation involving
> > the exact same locks, I'm pretty confident it would be a proper fix for
> > the current bug in this thread.
> >
> > I'm not sure, though, about how I could instruct Syzbot to create a
> > reproducer to properly test this patch. Could you or anyone here help
> > me find the next step? Thank you so much!
>
> Sorry, I'm not syzbot expert. I hope someone else can help here.

+syzkaller mailing list

Hi Astra,

You mean you have a reproducer and you want syzbot to run it with your patch?
No such feature exists at the moment.

Presumably you already run it locally, so I am not sure syzbot can add
much value here.

syzbot

unread,
Jul 10, 2023, 1:53:59 PM7/10/23
to astr...@yahoo.com, da...@davemloft.net, dvy...@google.com, edum...@google.com, ivan.or...@gmail.com, ker...@pengutronix.de, ku...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, li...@rempel-privat.de, m...@pengutronix.de, net...@vger.kernel.org, o.re...@pengutronix.de, pab...@redhat.com, ro...@protonic.nl, sk...@linuxfoundation.org, sock...@hartkopp.net, syzkall...@googlegroups.com, syzk...@googlegroups.com
syzbot has found a reproducer for the following issue on:

HEAD commit: e40939bbfc68 Merge branch 'for-next/core' into for-kernelci
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
console output: https://syzkaller.appspot.com/x/log.txt?x=17ce67d8a80000
kernel config: https://syzkaller.appspot.com/x/.config?x=c84f463eb74eab24
dashboard link: https://syzkaller.appspot.com/bug?extid=1591462f226d9cbf0564
compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: arm64
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1580fc5ca80000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=178f78d4a80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/257596b75aaf/disk-e40939bb.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/9c75b8d61081/vmlinux-e40939bb.xz
kernel image: https://storage.googleapis.com/syzbot-assets/8f0233129f4f/Image-e40939bb.gz.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+159146...@syzkaller.appspotmail.com

======================================================
WARNING: possible circular locking dependency detected
6.4.0-rc7-syzkaller-ge40939bbfc68 #0 Not tainted
------------------------------------------------------
syz-executor375/6045 is trying to acquire lock:
ffff0000d2e690d0 (&priv->j1939_socks_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:355 [inline]
ffff0000d2e690d0 (&priv->j1939_socks_lock){+.-.}-{2:2}, at: j1939_sk_errqueue+0x90/0x144 net/can/j1939/socket.c:1081

but task is already holding lock:
ffff0000d2e69088 (&priv->active_session_list_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:355 [inline]
ffff0000d2e69088 (&priv->active_session_list_lock){+.-.}-{2:2}, at: j1939_session_list_lock net/can/j1939/transport.c:238 [inline]
ffff0000d2e69088 (&priv->active_session_list_lock){+.-.}-{2:2}, at: j1939_cancel_active_session+0x54/0x414 net/can/j1939/transport.c:2183

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&priv->active_session_list_lock){+.-.}-{2:2}:
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x48/0x60 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:355 [inline]
j1939_session_list_lock net/can/j1939/transport.c:238 [inline]
j1939_session_activate+0x60/0x378 net/can/j1939/transport.c:1564
j1939_sk_queue_activate_next_locked net/can/j1939/socket.c:181 [inline]
j1939_sk_queue_activate_next+0x230/0x3b4 net/can/j1939/socket.c:208
j1939_session_deactivate_activate_next net/can/j1939/transport.c:1108 [inline]
j1939_session_completed net/can/j1939/transport.c:1222 [inline]
j1939_xtp_rx_eoma_one net/can/j1939/transport.c:1395 [inline]
j1939_xtp_rx_eoma+0x2c0/0x4c0 net/can/j1939/transport.c:1410
j1939_tp_cmd_recv net/can/j1939/transport.c:2099 [inline]
j1939_tp_recv+0x714/0xe14 net/can/j1939/transport.c:2144
j1939_can_recv+0x5bc/0x930 net/can/j1939/main.c:112
deliver net/can/af_can.c:572 [inline]
can_rcv_filter+0x308/0x714 net/can/af_can.c:606
can_receive+0x338/0x498 net/can/af_can.c:663
can_rcv+0x128/0x23c net/can/af_can.c:687
__netif_receive_skb_one_core net/core/dev.c:5493 [inline]
__netif_receive_skb+0x18c/0x400 net/core/dev.c:5607
process_backlog+0x3c0/0x70c net/core/dev.c:5935
__napi_poll+0xb4/0x648 net/core/dev.c:6498
napi_poll net/core/dev.c:6565 [inline]
net_rx_action+0x5e4/0xdc4 net/core/dev.c:6698
__do_softirq+0x2d0/0xd54 kernel/softirq.c:571
run_ksoftirqd+0x6c/0x158 kernel/softirq.c:939
smpboot_thread_fn+0x4b0/0x920 kernel/smpboot.c:164
kthread+0x288/0x310 kernel/kthread.c:379
ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:853

-> #1 (&jsk->sk_session_queue_lock){+.-.}-{2:2}:
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x48/0x60 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:355 [inline]
j1939_sk_queue_drop_all+0x4c/0x200 net/can/j1939/socket.c:139
j1939_sk_netdev_event_netdown+0xe0/0x144 net/can/j1939/socket.c:1280
j1939_netdev_notify+0xf0/0x144 net/can/j1939/main.c:381
notifier_call_chain+0x1a4/0x510 kernel/notifier.c:93
raw_notifier_call_chain+0x3c/0x50 kernel/notifier.c:461
__dev_notify_flags+0x2bc/0x544
dev_change_flags+0xd0/0x15c net/core/dev.c:8645
do_setlink+0xc68/0x35c8 net/core/rtnetlink.c:2867
__rtnl_newlink net/core/rtnetlink.c:3648 [inline]
rtnl_newlink+0x1354/0x1b1c net/core/rtnetlink.c:3695
rtnetlink_rcv_msg+0x744/0xdb8 net/core/rtnetlink.c:6417
netlink_rcv_skb+0x214/0x3c4 net/netlink/af_netlink.c:2546
rtnetlink_rcv+0x28/0x38 net/core/rtnetlink.c:6435
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x660/0x8d4 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x834/0xb18 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0x568/0x81c net/socket.c:2503
___sys_sendmsg net/socket.c:2557 [inline]
__sys_sendmsg+0x26c/0x33c net/socket.c:2586
__do_sys_sendmsg net/socket.c:2595 [inline]
__se_sys_sendmsg net/socket.c:2593 [inline]
__arm64_sys_sendmsg+0x80/0x94 net/socket.c:2593
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52
el0_svc_common+0x138/0x244 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:191
el0_svc+0x4c/0x160 arch/arm64/kernel/entry-common.c:647
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:665
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591

-> #0 (&priv->j1939_socks_lock){+.-.}-{2:2}:
check_prev_add kernel/locking/lockdep.c:3113 [inline]
check_prevs_add kernel/locking/lockdep.c:3232 [inline]
validate_chain kernel/locking/lockdep.c:3847 [inline]
__lock_acquire+0x3308/0x7604 kernel/locking/lockdep.c:5088
lock_acquire+0x23c/0x71c kernel/locking/lockdep.c:5705
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x48/0x60 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:355 [inline]
j1939_sk_errqueue+0x90/0x144 net/can/j1939/socket.c:1081
__j1939_session_release net/can/j1939/transport.c:294 [inline]
kref_put include/linux/kref.h:65 [inline]
j1939_session_put+0xf0/0x4b4 net/can/j1939/transport.c:299
j1939_session_deactivate_locked net/can/j1939/transport.c:1086 [inline]
j1939_cancel_active_session+0x2ec/0x414 net/can/j1939/transport.c:2194
j1939_netdev_notify+0xe8/0x144 net/can/j1939/main.c:380
notifier_call_chain+0x1a4/0x510 kernel/notifier.c:93
raw_notifier_call_chain+0x3c/0x50 kernel/notifier.c:461
__dev_notify_flags+0x2bc/0x544
dev_change_flags+0xd0/0x15c net/core/dev.c:8645
do_setlink+0xc68/0x35c8 net/core/rtnetlink.c:2867
__rtnl_newlink net/core/rtnetlink.c:3648 [inline]
rtnl_newlink+0x1354/0x1b1c net/core/rtnetlink.c:3695
rtnetlink_rcv_msg+0x744/0xdb8 net/core/rtnetlink.c:6417
netlink_rcv_skb+0x214/0x3c4 net/netlink/af_netlink.c:2546
rtnetlink_rcv+0x28/0x38 net/core/rtnetlink.c:6435
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x660/0x8d4 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x834/0xb18 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0x568/0x81c net/socket.c:2503
___sys_sendmsg net/socket.c:2557 [inline]
__sys_sendmsg+0x26c/0x33c net/socket.c:2586
__do_sys_sendmsg net/socket.c:2595 [inline]
__se_sys_sendmsg net/socket.c:2593 [inline]
__arm64_sys_sendmsg+0x80/0x94 net/socket.c:2593
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52
el0_svc_common+0x138/0x244 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:191
el0_svc+0x4c/0x160 arch/arm64/kernel/entry-common.c:647
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:665
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591

other info that might help us debug this:

Chain exists of:
&priv->j1939_socks_lock --> &jsk->sk_session_queue_lock --> &priv->active_session_list_lock

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&priv->active_session_list_lock);
lock(&jsk->sk_session_queue_lock);
lock(&priv->active_session_list_lock);
lock(&priv->j1939_socks_lock);

*** DEADLOCK ***

2 locks held by syz-executor375/6045:
#0: ffff80009080db68 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:78 [inline]
#0: ffff80009080db68 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x700/0xdb8 net/core/rtnetlink.c:6414
#1: ffff0000d2e69088 (&priv->active_session_list_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:355 [inline]
#1: ffff0000d2e69088 (&priv->active_session_list_lock){+.-.}-{2:2}, at: j1939_session_list_lock net/can/j1939/transport.c:238 [inline]
#1: ffff0000d2e69088 (&priv->active_session_list_lock){+.-.}-{2:2}, at: j1939_cancel_active_session+0x54/0x414 net/can/j1939/transport.c:2183

stack backtrace:
CPU: 1 PID: 6045 Comm: syz-executor375 Not tainted 6.4.0-rc7-syzkaller-ge40939bbfc68 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
Call trace:
dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:233
show_stack+0x2c/0x44 arch/arm64/kernel/stacktrace.c:240
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd0/0x124 lib/dump_stack.c:106
dump_stack+0x1c/0x28 lib/dump_stack.c:113
print_circular_bug+0x150/0x1b8 kernel/locking/lockdep.c:2066
check_noncircular+0x2cc/0x378 kernel/locking/lockdep.c:2188
check_prev_add kernel/locking/lockdep.c:3113 [inline]
check_prevs_add kernel/locking/lockdep.c:3232 [inline]
validate_chain kernel/locking/lockdep.c:3847 [inline]
__lock_acquire+0x3308/0x7604 kernel/locking/lockdep.c:5088
lock_acquire+0x23c/0x71c kernel/locking/lockdep.c:5705
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x48/0x60 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:355 [inline]
j1939_sk_errqueue+0x90/0x144 net/can/j1939/socket.c:1081
__j1939_session_release net/can/j1939/transport.c:294 [inline]
kref_put include/linux/kref.h:65 [inline]
j1939_session_put+0xf0/0x4b4 net/can/j1939/transport.c:299
j1939_session_deactivate_locked net/can/j1939/transport.c:1086 [inline]
j1939_cancel_active_session+0x2ec/0x414 net/can/j1939/transport.c:2194
j1939_netdev_notify+0xe8/0x144 net/can/j1939/main.c:380
notifier_call_chain+0x1a4/0x510 kernel/notifier.c:93
raw_notifier_call_chain+0x3c/0x50 kernel/notifier.c:461
__dev_notify_flags+0x2bc/0x544
dev_change_flags+0xd0/0x15c net/core/dev.c:8645
do_setlink+0xc68/0x35c8 net/core/rtnetlink.c:2867
__rtnl_newlink net/core/rtnetlink.c:3648 [inline]
rtnl_newlink+0x1354/0x1b1c net/core/rtnetlink.c:3695
rtnetlink_rcv_msg+0x744/0xdb8 net/core/rtnetlink.c:6417
netlink_rcv_skb+0x214/0x3c4 net/netlink/af_netlink.c:2546
rtnetlink_rcv+0x28/0x38 net/core/rtnetlink.c:6435
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x660/0x8d4 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x834/0xb18 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0x568/0x81c net/socket.c:2503
___sys_sendmsg net/socket.c:2557 [inline]
__sys_sendmsg+0x26c/0x33c net/socket.c:2586
__do_sys_sendmsg net/socket.c:2595 [inline]
__se_sys_sendmsg net/socket.c:2593 [inline]
__arm64_sys_sendmsg+0x80/0x94 net/socket.c:2593
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52
el0_svc_common+0x138/0x244 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:191
el0_svc+0x4c/0x160 arch/arm64/kernel/entry-common.c:647
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:665
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591


---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

syzbot

unread,
Jul 11, 2023, 9:20:16 AM7/11/23
to hda...@sina.com, hda...@sina.com, linux-...@vger.kernel.org, syzk...@googlegroups.com, syzkall...@googlegroups.com
> On Mon, 10 Jul 2023 10:53:57 -0700
>> HEAD commit: e40939bbfc68 Merge branch 'for-next/core' into for-kernelci
>> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=178f78d4a80000
>
> Put session without the session list lock held.
>
> #syz test https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git e40939bbfc68

Your commands are accepted, but please keep syzkall...@googlegroups.com mailing list in CC next time. It serves as a history of what happened with each bug report. Thank you.

>
> --- x/net/can/j1939/transport.c
> +++ y/net/can/j1939/transport.c
> @@ -1083,7 +1083,6 @@ static bool j1939_session_deactivate_loc
>
> list_del_init(&session->active_session_list_entry);
> session->state = J1939_SESSION_DONE;
> - j1939_session_put(session);
> }
>
> return active;
> @@ -1098,6 +1097,9 @@ static bool j1939_session_deactivate(str
> active = j1939_session_deactivate_locked(session);
> j1939_session_list_unlock(priv);
>
> + if (active)
> + j1939_session_put(session);
> +
> return active;
> }
>
> @@ -2178,6 +2180,7 @@ void j1939_simple_recv(struct j1939_priv
> int j1939_cancel_active_session(struct j1939_priv *priv, struct sock *sk)
> {
> struct j1939_session *session, *saved;
> + LIST_HEAD(active);
>
> netdev_dbg(priv->ndev, "%s, sk: %p\n", __func__, sk);
> j1939_session_list_lock(priv);
> @@ -2191,10 +2194,16 @@ int j1939_cancel_active_session(struct j
> j1939_session_put(session);
>
> session->err = ESHUTDOWN;
> - j1939_session_deactivate_locked(session);
> + if (j1939_session_deactivate_locked(session))
> + list_move(&session->active_session_list_entry, &active);
> }
> }
> j1939_session_list_unlock(priv);
> +
> + list_for_each_entry_safe(session, saved, &active, active_session_list_entry) {
> + list_del_init(&session->active_session_list_entry);
> + j1939_session_put(session);
> + }
> return NOTIFY_DONE;
> }
>
> --

syzbot

unread,
Jul 11, 2023, 9:47:27 AM7/11/23
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com, syzk...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+159146...@syzkaller.appspotmail.com

Tested on:

commit: e40939bb Merge branch 'for-next/core' into for-kernelci
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=12a24702a80000
kernel config: https://syzkaller.appspot.com/x/.config?x=c84f463eb74eab24
dashboard link: https://syzkaller.appspot.com/bug?extid=1591462f226d9cbf0564
compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: arm64
patch: https://syzkaller.appspot.com/x/patch.diff?x=1772e474a80000

Note: testing is done by a robot and is best-effort only.

Ziqi Zhao

unread,
Jul 11, 2023, 8:47:58 PM7/11/23
to syzbot+159146...@syzkaller.appspotmail.com, astr...@yahoo.com, da...@davemloft.net, dvy...@google.com, edum...@google.com, ivan.or...@gmail.com, ker...@pengutronix.de, ku...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, li...@rempel-privat.de, m...@pengutronix.de, net...@vger.kernel.org, o.re...@pengutronix.de, pab...@redhat.com, ro...@protonic.nl, sk...@linuxfoundation.org, sock...@hartkopp.net, syzkall...@googlegroups.com, syzk...@googlegroups.com
The following 3 locks would race against each other, causing the
deadlock situation in the Syzbot bug report:

- j1939_socks_lock
- active_session_list_lock
- sk_session_queue_lock

A reasonable fix is to change j1939_socks_lock to an rwlock, since in
the rare situations where a write lock is required for the linked list
that j1939_socks_lock is protecting, the code does not attempt to
acquire any more locks. This would break the circular lock dependency,
where, for example, the current thread already locks j1939_socks_lock
and attempts to acquire sk_session_queue_lock, and at the same time,
another thread attempts to acquire j1939_socks_lock while holding
sk_session_queue_lock.

NOTE: This patch along does not fix the unregister_netdevice bug
reported by Syzbot; instead, it solves a deadlock situation to prepare
for one or more further patches to actually fix the Syzbot bug, which
appears to be a reference counting problem within the j1939 codebase.

#syz test:

syzbot

unread,
Jul 11, 2023, 9:16:27 PM7/11/23
to astr...@yahoo.com, da...@davemloft.net, dvy...@google.com, edum...@google.com, ivan.or...@gmail.com, ker...@pengutronix.de, ku...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, li...@rempel-privat.de, m...@pengutronix.de, net...@vger.kernel.org, o.re...@pengutronix.de, pab...@redhat.com, ro...@protonic.nl, sk...@linuxfoundation.org, sock...@hartkopp.net, syzkall...@googlegroups.com, syzk...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+159146...@syzkaller.appspotmail.com

Tested on:

commit: 3f01e9fe Merge tag 'linux-watchdog-6.5-rc2' of git://w..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=130a98a2a80000
kernel config: https://syzkaller.appspot.com/x/.config?x=4c2acb092ca90577
dashboard link: https://syzkaller.appspot.com/bug?extid=1591462f226d9cbf0564
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: arm64
patch: https://syzkaller.appspot.com/x/patch.diff?x=1380a782a80000

Astra Joan

unread,
Jul 11, 2023, 9:20:01 PM7/11/23
to dvy...@google.com, Astra Joan, da...@davemloft.net, edum...@google.com, ivan.or...@gmail.com, ker...@pengutronix.de, ku...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, li...@rempel-privat.de, m...@pengutronix.de, net...@vger.kernel.org, o.re...@pengutronix.de, pab...@redhat.com, ro...@protonic.nl, sk...@linuxfoundation.org, sock...@hartkopp.net, syzbot+159146...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, syzk...@googlegroups.com
Hi Dmitry,

Yes, I initially wanted to test my patch against this bug, but only
later realized there should be a reproducer in order to use syz test.

Also, I believe my local testing should suffice to show the patch's
ability to mitigate the deadlock bug. In particular, the attached error
log could be avoided if my patch was applied to the upstream. Could
anyone in the mailing list review the patch given this context?

Best regards,
Ziqi

decode_netdevice.txt

Astra Joan

unread,
Jul 11, 2023, 9:39:54 PM7/11/23
to o.re...@pengutronix.de, Dmitry Vyukov, Astra Joan, da...@davemloft.net, edum...@google.com, ivan.or...@gmail.com, ker...@pengutronix.de, ku...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, li...@rempel-privat.de, m...@pengutronix.de, net...@vger.kernel.org, pab...@redhat.com, ro...@protonic.nl, sk...@linuxfoundation.org, sock...@hartkopp.net, syzbot+159146...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
Hi Oleksij and Dmitry,

Just found out the reproducer was added to this bug. I re-submitted the
Syzbot test request, and the result seems okay. Please take a look at
the patch whenever it’s convenient for you. Thanks again, and sorry for
the repeated emails -- I was trying to keep the correspondence
up-to-date.

Best regards,
Ziqi

Astra Joan

unread,
Jul 14, 2023, 1:59:00 AM7/14/23
to ste...@networkplumber.org, Astra Joan, da...@davemloft.net, Dmitry Vyukov, edum...@google.com, ivan.or...@gmail.com, ker...@pengutronix.de, ku...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, li...@rempel-privat.de, m...@pengutronix.de, net...@vger.kernel.org, o.re...@pengutronix.de, pab...@redhat.com, ro...@protonic.nl, sk...@linuxfoundation.org, sock...@hartkopp.net, syzbot+159146...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, syzk...@googlegroups.com
Hi Stephen,

Thank you for your reply!

Changing the lock sequence may be difficult here since the function

- j1939_sk_errqueue

has may call sites. However, this function attempts to hold the

- j1939_socks_lock

whose hierchy is pretty high in the given code logic. On the other hand,
this function only reads from the

- j1939_socks_list

which the above lock protects against. Therefore, it seems appropriate to
lock the above list inside the function for read access only.

The RCU approach makes sense here, but I probably need to use RCU in
conjunction with other spinlocks and rwlocks in the codebase. Would that
be okay? If not, should I be looking into replacing all the locks with
RCU? I'm actually a mentee for the bug fixing mentorship this summer.
So please bear with me if some of these questions seem a bit naive :D
Thank you so much for your time!

Best regards,
Ziqi

Stephen Hemminger

unread,
Jul 14, 2023, 7:09:44 AM7/14/23
to Ziqi Zhao, syzbot+159146...@syzkaller.appspotmail.com, da...@davemloft.net, dvy...@google.com, edum...@google.com, ivan.or...@gmail.com, ker...@pengutronix.de, ku...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, li...@rempel-privat.de, m...@pengutronix.de, net...@vger.kernel.org, o.re...@pengutronix.de, pab...@redhat.com, ro...@protonic.nl, sk...@linuxfoundation.org, sock...@hartkopp.net, syzkall...@googlegroups.com, syzk...@googlegroups.com
On Tue, 11 Jul 2023 17:47:50 -0700
Ziqi Zhao <astr...@yahoo.com> wrote:

> The following 3 locks would race against each other, causing the
> deadlock situation in the Syzbot bug report:
>
> - j1939_socks_lock
> - active_session_list_lock
> - sk_session_queue_lock
>
> A reasonable fix is to change j1939_socks_lock to an rwlock, since in
> the rare situations where a write lock is required for the linked list
> that j1939_socks_lock is protecting, the code does not attempt to
> acquire any more locks. This would break the circular lock dependency,
> where, for example, the current thread already locks j1939_socks_lock
> and attempts to acquire sk_session_queue_lock, and at the same time,
> another thread attempts to acquire j1939_socks_lock while holding
> sk_session_queue_lock.
>
> NOTE: This patch along does not fix the unregister_netdevice bug
> reported by Syzbot; instead, it solves a deadlock situation to prepare
> for one or more further patches to actually fix the Syzbot bug, which
> appears to be a reference counting problem within the j1939 codebase.
>
> #syz test:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>
> Signed-off-by: Ziqi Zhao <astr...@yahoo.com>
> ---

Reader-writer locks are not the best way to fix a lock hierarchy problem.
Instead either fix the lock ordering, or use RCU.

Other devices don't have this problem, so perhaps the unique locking
in this device is the problem.

Ziqi Zhao

unread,
Jul 21, 2023, 12:22:43 PM7/21/23
to astr...@yahoo.com, da...@davemloft.net, edum...@google.com, ivan.or...@gmail.com, ker...@pengutronix.de, ku...@kernel.org, li...@rempel-privat.de, linu...@vger.kernel.org, m...@pengutronix.de, pab...@redhat.com, ro...@protonic.nl, sk...@linuxfoundation.org, sock...@hartkopp.net, ar...@arndb.de, bri...@lists.linux-foundation.org, linux-...@vger.kernel.org, mudongl...@gmail.com, net...@vger.kernel.org, nik...@nvidia.com, ro...@nvidia.com, syzbot+881d65...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, syzbot+159146...@syzkaller.appspotmail.com
The following 3 locks would race against each other, causing the
deadlock situation in the Syzbot bug report:

- j1939_socks_lock
- active_session_list_lock
- sk_session_queue_lock

A reasonable fix is to change j1939_socks_lock to an rwlock, since in
the rare situations where a write lock is required for the linked list
that j1939_socks_lock is protecting, the code does not attempt to
acquire any more locks. This would break the circular lock dependency,
where, for example, the current thread already locks j1939_socks_lock
and attempts to acquire sk_session_queue_lock, and at the same time,
another thread attempts to acquire j1939_socks_lock while holding
sk_session_queue_lock.

NOTE: This patch along does not fix the unregister_netdevice bug
reported by Syzbot; instead, it solves a deadlock situation to prepare
for one or more further patches to actually fix the Syzbot bug, which
appears to be a reference counting problem within the j1939 codebase.

Reported-by: syzbot+159146...@syzkaller.appspotmail.com

Oleksij Rempel

unread,
Jul 23, 2023, 11:41:53 AM7/23/23
to Ziqi Zhao, da...@davemloft.net, edum...@google.com, ivan.or...@gmail.com, ker...@pengutronix.de, ku...@kernel.org, li...@rempel-privat.de, linu...@vger.kernel.org, m...@pengutronix.de, pab...@redhat.com, ro...@protonic.nl, sk...@linuxfoundation.org, sock...@hartkopp.net, ar...@arndb.de, bri...@lists.linux-foundation.org, linux-...@vger.kernel.org, mudongl...@gmail.com, net...@vger.kernel.org, nik...@nvidia.com, ro...@nvidia.com, syzbot+881d65...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, syzbot+159146...@syzkaller.appspotmail.com
Hi,

Thank you for you patch. Right now I'm on vacation, I'll to take a look
on it as soon as possible. If i do not response for more then 3 weeks,
please ping me.

Oleksij Rempel

unread,
Aug 7, 2023, 12:46:57 AM8/7/23
to Ziqi Zhao, da...@davemloft.net, edum...@google.com, ivan.or...@gmail.com, ker...@pengutronix.de, ku...@kernel.org, li...@rempel-privat.de, linu...@vger.kernel.org, m...@pengutronix.de, pab...@redhat.com, ro...@protonic.nl, sk...@linuxfoundation.org, sock...@hartkopp.net, ar...@arndb.de, net...@vger.kernel.org, bri...@lists.linux-foundation.org, syzkall...@googlegroups.com, linux-...@vger.kernel.org, mudongl...@gmail.com, nik...@nvidia.com, syzbot+159146...@syzkaller.appspotmail.com, ro...@nvidia.com, syzbot+881d65...@syzkaller.appspotmail.com
On Fri, Jul 21, 2023 at 09:22:26AM -0700, Ziqi Zhao wrote:
> The following 3 locks would race against each other, causing the
> deadlock situation in the Syzbot bug report:
>
> - j1939_socks_lock
> - active_session_list_lock
> - sk_session_queue_lock
>
> A reasonable fix is to change j1939_socks_lock to an rwlock, since in
> the rare situations where a write lock is required for the linked list
> that j1939_socks_lock is protecting, the code does not attempt to
> acquire any more locks. This would break the circular lock dependency,
> where, for example, the current thread already locks j1939_socks_lock
> and attempts to acquire sk_session_queue_lock, and at the same time,
> another thread attempts to acquire j1939_socks_lock while holding
> sk_session_queue_lock.
>
> NOTE: This patch along does not fix the unregister_netdevice bug
> reported by Syzbot; instead, it solves a deadlock situation to prepare
> for one or more further patches to actually fix the Syzbot bug, which
> appears to be a reference counting problem within the j1939 codebase.
>
> Reported-by: syzbot+159146...@syzkaller.appspotmail.com
> Signed-off-by: Ziqi Zhao <astr...@yahoo.com>

Acked-by: Oleksij Rempel <o.re...@pengutronix.de>

Thank you!

syzbot

unread,
Nov 14, 2023, 10:54:07 PM11/14/23
to ar...@arndb.de, astr...@yahoo.com, bri...@lists.linux-foundation.org, da...@davemloft.net, dvy...@google.com, edum...@google.com, hda...@sina.com, ivan.or...@gmail.com, ker...@pengutronix.de, ku...@kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, li...@rempel-privat.de, m...@pengutronix.de, mudongl...@gmail.com, net...@vger.kernel.org, nik...@nvidia.com, o.re...@pengutronix.de, pab...@redhat.com, ro...@protonic.nl, ro...@nvidia.com, sk...@linuxfoundation.org, sock...@hartkopp.net, ste...@networkplumber.org, syzkall...@googlegroups.com, syzk...@googlegroups.com
syzbot has bisected this issue to:

commit 2030043e616cab40f510299f09b636285e0a3678
Author: Oleksij Rempel <o.re...@pengutronix.de>
Date: Fri May 21 11:57:20 2021 +0000

can: j1939: fix Use-after-Free, hold skb ref while in use

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1438c947680000
start commit: 1b907d050735 Merge tag '6.7-rc-smb3-client-fixes-part2' of..
git tree: upstream
final oops: https://syzkaller.appspot.com/x/report.txt?x=1638c947680000
console output: https://syzkaller.appspot.com/x/log.txt?x=1238c947680000
kernel config: https://syzkaller.appspot.com/x/.config?x=88e7ba51eecd9cd6
dashboard link: https://syzkaller.appspot.com/bug?extid=1591462f226d9cbf0564
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17fea8fb680000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1633dc70e80000

Reported-by: syzbot+159146...@syzkaller.appspotmail.com
Fixes: 2030043e616c ("can: j1939: fix Use-after-Free, hold skb ref while in use")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

Oleksij Rempel

unread,
Nov 17, 2023, 3:10:23 AM11/17/23
to Ziqi Zhao, ivan.or...@gmail.com, edum...@google.com, syzbot+881d65...@syzkaller.appspotmail.com, sock...@hartkopp.net, bri...@lists.linux-foundation.org, nik...@nvidia.com, syzbot+159146...@syzkaller.appspotmail.com, ro...@nvidia.com, ku...@kernel.org, pab...@redhat.com, ar...@arndb.de, syzkall...@googlegroups.com, mudongl...@gmail.com, linu...@vger.kernel.org, m...@pengutronix.de, sk...@linuxfoundation.org, ro...@protonic.nl, linux-...@vger.kernel.org, li...@rempel-privat.de, ker...@pengutronix.de, net...@vger.kernel.org, da...@davemloft.net
On Mon, Aug 07, 2023 at 06:46:34AM +0200, Oleksij Rempel wrote:
> On Fri, Jul 21, 2023 at 09:22:26AM -0700, Ziqi Zhao wrote:
> > The following 3 locks would race against each other, causing the
> > deadlock situation in the Syzbot bug report:
> >
> > - j1939_socks_lock
> > - active_session_list_lock
> > - sk_session_queue_lock
> >
> > A reasonable fix is to change j1939_socks_lock to an rwlock, since in
> > the rare situations where a write lock is required for the linked list
> > that j1939_socks_lock is protecting, the code does not attempt to
> > acquire any more locks. This would break the circular lock dependency,
> > where, for example, the current thread already locks j1939_socks_lock
> > and attempts to acquire sk_session_queue_lock, and at the same time,
> > another thread attempts to acquire j1939_socks_lock while holding
> > sk_session_queue_lock.
> >
> > NOTE: This patch along does not fix the unregister_netdevice bug
> > reported by Syzbot; instead, it solves a deadlock situation to prepare
> > for one or more further patches to actually fix the Syzbot bug, which
> > appears to be a reference counting problem within the j1939 codebase.
> >
> > Reported-by: syzbot+159146...@syzkaller.appspotmail.com
> > Signed-off-by: Ziqi Zhao <astr...@yahoo.com>

Reviewed-by: Oleksij Rempel <o.re...@pengutronix.de>
Reply all
Reply to author
Forward
0 new messages