[v5.15] possible deadlock in sch_direct_xmit

5 views
Skip to first unread message

syzbot

unread,
May 11, 2023, 8:30:01 AM5/11/23
to syzkaller...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: 8a7f2a5c5aa1 Linux 5.15.110
git tree: linux-5.15.y
console output: https://syzkaller.appspot.com/x/log.txt?x=12105566280000
kernel config: https://syzkaller.appspot.com/x/.config?x=ba8d5c9d6c5289f
dashboard link: https://syzkaller.appspot.com/bug?extid=df49c4b28f24568bf8cc
compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/fc04f54c047f/disk-8a7f2a5c.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/6b4ba4cb1191/vmlinux-8a7f2a5c.xz
kernel image: https://storage.googleapis.com/syzbot-assets/d927dc3f9670/bzImage-8a7f2a5c.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+df49c4...@syzkaller.appspotmail.com

============================================
WARNING: possible recursive locking detected
5.15.110-syzkaller #0 Not tainted
--------------------------------------------
sed/10883 is trying to acquire lock:
ffff888147800218 (_xmit_ETHER#2){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:363 [inline]
ffff888147800218 (_xmit_ETHER#2){+.-.}-{2:2}, at: __netif_tx_lock include/linux/netdevice.h:4429 [inline]
ffff888147800218 (_xmit_ETHER#2){+.-.}-{2:2}, at: sch_direct_xmit+0x1c0/0x5e0 net/sched/sch_generic.c:340

but task is already holding lock:
ffff888141789098 (_xmit_ETHER#2){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:363 [inline]
ffff888141789098 (_xmit_ETHER#2){+.-.}-{2:2}, at: __netif_tx_lock include/linux/netdevice.h:4429 [inline]
ffff888141789098 (_xmit_ETHER#2){+.-.}-{2:2}, at: sch_direct_xmit+0x1c0/0x5e0 net/sched/sch_generic.c:340

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(_xmit_ETHER#2);
lock(_xmit_ETHER#2);

*** DEADLOCK ***

May be due to missing lock nesting notation

8 locks held by sed/10883:
#0: ffffc90000dd0be0 ((&in_dev->mr_ifc_timer)){+.-.}-{0:0}, at: lockdep_copy_map include/linux/lockdep.h:45 [inline]
#0: ffffc90000dd0be0 ((&in_dev->mr_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0xbe/0x560 kernel/time/timer.c:1411
#1: ffffffff8c91c540 (rcu_read_lock_bh){....}-{1:2}, at: rcu_lock_acquire+0x9/0x30 include/linux/rcupdate.h:269
#2: ffffffff8c91c540 (rcu_read_lock_bh){....}-{1:2}, at: rcu_lock_acquire+0x9/0x30 include/linux/rcupdate.h:269
#3: ffff8880243f0258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: spin_trylock include/linux/spinlock.h:373 [inline]
#3: ffff8880243f0258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: qdisc_run_begin include/net/sch_generic.h:173 [inline]
#3: ffff8880243f0258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_xmit_skb net/core/dev.c:3806 [inline]
#3: ffff8880243f0258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_queue_xmit+0x11f2/0x3230 net/core/dev.c:4188
#4: ffff888141789098 (_xmit_ETHER#2){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:363 [inline]
#4: ffff888141789098 (_xmit_ETHER#2){+.-.}-{2:2}, at: __netif_tx_lock include/linux/netdevice.h:4429 [inline]
#4: ffff888141789098 (_xmit_ETHER#2){+.-.}-{2:2}, at: sch_direct_xmit+0x1c0/0x5e0 net/sched/sch_generic.c:340
#5: ffffffff8c91c540 (rcu_read_lock_bh){....}-{1:2}, at: rcu_lock_acquire+0x9/0x30 include/linux/rcupdate.h:269
#6: ffffffff8c91c540 (rcu_read_lock_bh){....}-{1:2}, at: rcu_lock_acquire+0x9/0x30 include/linux/rcupdate.h:269
#7: ffff88801b1ab258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: spin_trylock include/linux/spinlock.h:373 [inline]
#7: ffff88801b1ab258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: qdisc_run_begin include/net/sch_generic.h:173 [inline]
#7: ffff88801b1ab258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_xmit_skb net/core/dev.c:3806 [inline]
#7: ffff88801b1ab258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_queue_xmit+0x11f2/0x3230 net/core/dev.c:4188

stack backtrace:
CPU: 1 PID: 10883 Comm: sed Not tainted 5.15.110-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/14/2023
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e3/0x2cb lib/dump_stack.c:106
print_deadlock_bug kernel/locking/lockdep.c:2946 [inline]
check_deadlock kernel/locking/lockdep.c:2989 [inline]
validate_chain+0x46cf/0x58b0 kernel/locking/lockdep.c:3774
__lock_acquire+0x1295/0x1ff0 kernel/locking/lockdep.c:5011
lock_acquire+0x1db/0x4f0 kernel/locking/lockdep.c:5622
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:363 [inline]
__netif_tx_lock include/linux/netdevice.h:4429 [inline]
sch_direct_xmit+0x1c0/0x5e0 net/sched/sch_generic.c:340
__dev_xmit_skb net/core/dev.c:3819 [inline]
__dev_queue_xmit+0x18ee/0x3230 net/core/dev.c:4188
neigh_output include/net/neighbour.h:516 [inline]
ip_finish_output2+0xdbe/0x1270 net/ipv4/ip_output.c:228
iptunnel_xmit+0x50c/0x960 net/ipv4/ip_tunnel_core.c:82
ip_tunnel_xmit+0x1c55/0x24a0 net/ipv4/ip_tunnel.c:810
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
erspan_xmit+0xa9c/0x1530 net/ipv4/ip_gre.c:715
__netdev_start_xmit include/linux/netdevice.h:5019 [inline]
netdev_start_xmit include/linux/netdevice.h:5033 [inline]
xmit_one net/core/dev.c:3592 [inline]
dev_hard_start_xmit+0x298/0x7a0 net/core/dev.c:3608
sch_direct_xmit+0x2b2/0x5e0 net/sched/sch_generic.c:342
__dev_xmit_skb net/core/dev.c:3819 [inline]
__dev_queue_xmit+0x18ee/0x3230 net/core/dev.c:4188
neigh_output include/net/neighbour.h:516 [inline]
ip_finish_output2+0xdbe/0x1270 net/ipv4/ip_output.c:228
igmpv3_send_cr net/ipv4/igmp.c:720 [inline]
igmp_ifc_timer_expire+0xaa3/0xf60 net/ipv4/igmp.c:810
call_timer_fn+0x16d/0x560 kernel/time/timer.c:1421
expire_timers kernel/time/timer.c:1466 [inline]
__run_timers+0x67c/0x890 kernel/time/timer.c:1737
run_timer_softirq+0x63/0xf0 kernel/time/timer.c:1750
__do_softirq+0x3b3/0x93a kernel/softirq.c:558
invoke_softirq kernel/softirq.c:432 [inline]
__irq_exit_rcu+0x155/0x240 kernel/softirq.c:636
irq_exit_rcu+0x5/0x20 kernel/softirq.c:648
sysvec_apic_timer_interrupt+0x91/0xb0 arch/x86/kernel/apic/apic.c:1097
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x16/0x20 arch/x86/include/asm/idtentry.h:638
RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
RIP: 0010:_raw_spin_unlock_irqrestore+0xd4/0x130 kernel/locking/spinlock.c:194
Code: 9c 8f 44 24 20 42 80 3c 23 00 74 08 4c 89 f7 e8 02 b8 a5 f7 f6 44 24 21 02 75 4e 41 f7 c7 00 02 00 00 74 01 fb bf 01 00 00 00 <e8> e7 92 33 f7 65 8b 05 f8 ee de 75 85 c0 74 3f 48 c7 04 24 0e 36
RSP: 0018:ffffc900052cfcc0 EFLAGS: 00000206
RAX: 110dd6943c429900 RBX: 1ffff92000a59f9c RCX: ffffffff913be003
RDX: dffffc0000000000 RSI: ffffffff8a8afb60 RDI: 0000000000000001
RBP: ffffc900052cfd50 R08: ffffffff81864ef0 R09: fffffbfff22afbc0
R10: 0000000000000000 R11: dffffc0000000001 R12: dffffc0000000000
R13: 1ffff92000a59f98 R14: ffffc900052cfce0 R15: 0000000000000246
debug_rcu_head_queue kernel/rcu/rcu.h:177 [inline]
__call_rcu kernel/rcu/tree.c:2977 [inline]
call_rcu+0xb1/0xa70 kernel/rcu/tree.c:3073
task_work_run+0x129/0x1a0 kernel/task_work.c:164
tracehook_notify_resume include/linux/tracehook.h:189 [inline]
exit_to_user_mode_loop+0x106/0x130 kernel/entry/common.c:175
exit_to_user_mode_prepare+0xb1/0x140 kernel/entry/common.c:208
__syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
syscall_exit_to_user_mode+0x5d/0x250 kernel/entry/common.c:301
do_syscall_64+0x49/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x61/0xcb
RIP: 0033:0x7f33b155ea46
Code: 10 00 00 00 44 8b 54 24 e0 48 89 44 24 c0 48 8d 44 24 d0 48 89 44 24 c8 44 89 c2 4c 89 ce bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 76 0c f7 d8 89 05 0a 48 01 00 48 83 c8 ff c3 31
RSP: 002b:00007ffcd8563a08 EFLAGS: 00000287 ORIG_RAX: 0000000000000101
RAX: fffffffffffffffe RBX: 00007ffcd8563c68 RCX: 00007f33b155ea46
RDX: 0000000000080000 RSI: 00007ffcd8563a80 RDI: 00000000ffffff9c
RBP: 00007ffcd8563a70 R08: 0000000000080000 R09: 00007ffcd8563a80
R10: 0000000000000000 R11: 0000000000000287 R12: 00007ffcd8563a80
R13: 0000000000000009 R14: 00007ffcd8563c4f R15: 00000000ffffffff
</TASK>
----------------
Code disassembly (best guess):
0: 9c pushfq
1: 8f 44 24 20 popq 0x20(%rsp)
5: 42 80 3c 23 00 cmpb $0x0,(%rbx,%r12,1)
a: 74 08 je 0x14
c: 4c 89 f7 mov %r14,%rdi
f: e8 02 b8 a5 f7 callq 0xf7a5b816
14: f6 44 24 21 02 testb $0x2,0x21(%rsp)
19: 75 4e jne 0x69
1b: 41 f7 c7 00 02 00 00 test $0x200,%r15d
22: 74 01 je 0x25
24: fb sti
25: bf 01 00 00 00 mov $0x1,%edi
* 2a: e8 e7 92 33 f7 callq 0xf7339316 <-- trapping instruction
2f: 65 8b 05 f8 ee de 75 mov %gs:0x75deeef8(%rip),%eax # 0x75deef2e
36: 85 c0 test %eax,%eax
38: 74 3f je 0x79
3a: 48 rex.W
3b: c7 .byte 0xc7
3c: 04 24 add $0x24,%al
3e: 0e (bad)
3f: 36 ss


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the bug is already fixed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to change bug's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the bug is a duplicate of another bug, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

syzbot

unread,
Aug 23, 2023, 5:09:35 AM8/23/23
to syzkaller...@googlegroups.com
Auto-closing this bug as obsolete.
Crashes did not happen for a while, no reproducer and no activity.
Reply all
Reply to author
Forward
0 new messages