[syzbot] [net?] possible deadlock in rlb_choose_channel (2)

1 view
Skip to first unread message

syzbot

unread,
May 13, 2026, 4:46:35 AM (2 days ago) May 13
to andrew...@lunn.ch, da...@davemloft.net, edum...@google.com, j...@jvosburgh.net, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: c21b90f77687 x86/CPU/AMD: Prevent improper isolation of sh..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=10ec7dba580000
kernel config: https://syzkaller.appspot.com/x/.config?x=4caf64b1ee83dac0
dashboard link: https://syzkaller.appspot.com/bug?extid=1db58dbbccbf93c65c83
compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/2f3edabe3b67/disk-c21b90f7.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/539b63753e79/vmlinux-c21b90f7.xz
kernel image: https://storage.googleapis.com/syzbot-assets/48e6e7cbc4ca/bzImage-c21b90f7.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+1db58d...@syzkaller.appspotmail.com

ip6_tunnel: ip6tnl1 xmit: Local address not yet configured!
ip6_tunnel: ip6tnl1 xmit: Local address not yet configured!
============================================
WARNING: possible recursive locking detected
syzkaller #0 Tainted: G L
--------------------------------------------
kworker/u8:3/47 is trying to acquire lock:
ffff88807a618e98 (&bond->mode_lock){+.-.}-{3:3}, at: spin_lock include/linux/spinlock.h:342 [inline]
ffff88807a618e98 (&bond->mode_lock){+.-.}-{3:3}, at: rlb_choose_channel+0x37/0x19a0 drivers/net/bonding/bond_alb.c:562

but task is already holding lock:
ffff88807ffa0e98 (&bond->mode_lock){+.-.}-{3:3}, at: spin_lock_bh include/linux/spinlock.h:348 [inline]
ffff88807ffa0e98 (&bond->mode_lock){+.-.}-{3:3}, at: rlb_update_rx_clients drivers/net/bonding/bond_alb.c:466 [inline]
ffff88807ffa0e98 (&bond->mode_lock){+.-.}-{3:3}, at: bond_alb_monitor+0xe8a/0x17e0 drivers/net/bonding/bond_alb.c:1618

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&bond->mode_lock);
lock(&bond->mode_lock);

*** DEADLOCK ***

May be due to missing lock nesting notation

7 locks held by kworker/u8:3/47:
#0: ffff8880516b7140 ((wq_completion)bond5#2){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3277 [inline]
#0: ffff8880516b7140 ((wq_completion)bond5#2){+.+.}-{0:0}, at: process_scheduled_works+0xa35/0x1860 kernel/workqueue.c:3385
#1: ffffc90000b77c40 ((work_completion)(&(&bond->alb_work)->work)){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3278 [inline]
#1: ffffc90000b77c40 ((work_completion)(&(&bond->alb_work)->work)){+.+.}-{0:0}, at: process_scheduled_works+0xa70/0x1860 kernel/workqueue.c:3385
#2: ffffffff8e95cd60 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:300 [inline]
#2: ffffffff8e95cd60 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
#2: ffffffff8e95cd60 (rcu_read_lock){....}-{1:3}, at: bond_alb_monitor+0xf8/0x17e0 drivers/net/bonding/bond_alb.c:1546
#3: ffff88807ffa0e98 (&bond->mode_lock){+.-.}-{3:3}, at: spin_lock_bh include/linux/spinlock.h:348 [inline]
#3: ffff88807ffa0e98 (&bond->mode_lock){+.-.}-{3:3}, at: rlb_update_rx_clients drivers/net/bonding/bond_alb.c:466 [inline]
#3: ffff88807ffa0e98 (&bond->mode_lock){+.-.}-{3:3}, at: bond_alb_monitor+0xe8a/0x17e0 drivers/net/bonding/bond_alb.c:1618
#4: ffffffff8e95cd60 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:300 [inline]
#4: ffffffff8e95cd60 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
#4: ffffffff8e95cd60 (rcu_read_lock){....}-{1:3}, at: arp_xmit+0x23/0x270 net/ipv4/arp.c:663
#5: ffffffff8e95cdc0 (rcu_read_lock_bh){....}-{1:3}, at: local_bh_disable include/linux/bottom_half.h:20 [inline]
#5: ffffffff8e95cdc0 (rcu_read_lock_bh){....}-{1:3}, at: rcu_read_lock_bh include/linux/rcupdate.h:891 [inline]
#5: ffffffff8e95cdc0 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x2b6/0x3950 net/core/dev.c:4791
#6: ffffffff8e95cd60 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:300 [inline]
#6: ffffffff8e95cd60 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
#6: ffffffff8e95cd60 (rcu_read_lock){....}-{1:3}, at: bond_start_xmit+0xb4/0x1900 drivers/net/bonding/bond_main.c:5591

stack backtrace:
CPU: 0 UID: 0 PID: 47 Comm: kworker/u8:3 Tainted: G L syzkaller #0 PREEMPT(full)
Tainted: [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026
Workqueue: bond5 bond_alb_monitor
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
print_deadlock_bug+0x279/0x290 kernel/locking/lockdep.c:3041
check_deadlock kernel/locking/lockdep.c:3093 [inline]
validate_chain kernel/locking/lockdep.c:3895 [inline]
__lock_acquire+0x253f/0x2cf0 kernel/locking/lockdep.c:5237
lock_acquire+0x106/0x350 kernel/locking/lockdep.c:5868
__raw_spin_lock include/linux/spinlock_api_smp.h:158 [inline]
_raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:158
spin_lock include/linux/spinlock.h:342 [inline]
rlb_choose_channel+0x37/0x19a0 drivers/net/bonding/bond_alb.c:562
rlb_arp_xmit drivers/net/bonding/bond_alb.c:680 [inline]
bond_xmit_alb_slave_get+0x1071/0x20a0 drivers/net/bonding/bond_alb.c:1493
bond_alb_xmit+0x24/0x40 drivers/net/bonding/bond_alb.c:1528
__bond_start_xmit drivers/net/bonding/bond_main.c:5569 [inline]
bond_start_xmit+0x6a2/0x1900 drivers/net/bonding/bond_main.c:5593
__netdev_start_xmit include/linux/netdevice.h:5368 [inline]
netdev_start_xmit include/linux/netdevice.h:5377 [inline]
xmit_one net/core/dev.c:3888 [inline]
dev_hard_start_xmit+0x2cd/0x830 net/core/dev.c:3904
__dev_queue_xmit+0x14d9/0x3950 net/core/dev.c:4870
NF_HOOK+0x33a/0x3c0 include/linux/netfilter.h:-1
arp_xmit+0x16c/0x270 net/ipv4/arp.c:665
rlb_update_client+0x2a8/0x6b0 drivers/net/bonding/bond_alb.c:455
rlb_update_rx_clients drivers/net/bonding/bond_alb.c:473 [inline]
bond_alb_monitor+0xf6a/0x17e0 drivers/net/bonding/bond_alb.c:1618
process_one_work kernel/workqueue.c:3302 [inline]
process_scheduled_works+0xb5d/0x1860 kernel/workqueue.c:3385
worker_thread+0xa53/0xfc0 kernel/workqueue.c:3466
kthread+0x388/0x470 kernel/kthread.c:436
ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Jay Vosburgh

unread,
May 13, 2026, 10:41:34 AM (2 days ago) May 13
to syzbot, andrew...@lunn.ch, da...@davemloft.net, edum...@google.com, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
Just looking at the stack, I suspect that this is either a false
positive, or the NF_HOOK action (a netfilter rule) is reinjecting the
ARP packet in to the same bond that created it.

If the packet is being reinjected to the same interface that
generated it in rlb_update_client, then I believe the above would be the
expected behavior.

On the other hand, if the network configuration is nested bonds,
then the rlb_arp_xmit -> rlb_choose_channel call path above would be
operating on a different instance of the bond->mode_lock, and would not
actually deadlock.

-J

> process_one_work kernel/workqueue.c:3302 [inline]
> process_scheduled_works+0xb5d/0x1860 kernel/workqueue.c:3385
> worker_thread+0xa53/0xfc0 kernel/workqueue.c:3466
> kthread+0x388/0x470 kernel/kthread.c:436
> ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> </TASK>
>
>
>---
>This report is generated by a bot. It may contain errors.
>See https://goo.gl/tpsmEJ for more information about syzbot.
>syzbot engineers can be reached at syzk...@googlegroups.com.
>
>syzbot will keep track of this issue. See:
>https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
>If the report is already addressed, let syzbot know by replying with:
>#syz fix: exact-commit-title
>
>If you want to overwrite report's subsystems, reply with:
>#syz set subsystems: new-subsystem
>(See the list of subsystem names on the web dashboard)
>
>If the report is a duplicate of another one, reply with:
>#syz dup: exact-subject-of-another-report
>
>If you want to undo deduplication, reply with:
>#syz undup

---
-Jay Vosburgh, j...@jvosburgh.net
Reply all
Reply to author
Forward
0 new messages