INFO: task hung in vhost_net_stop_vq

23 views
Skip to first unread message

syzbot

unread,
Nov 30, 2018, 5:32:03 PM11/30/18
to jaso...@redhat.com, k...@vger.kernel.org, linux-...@vger.kernel.org, m...@redhat.com, net...@vger.kernel.org, syzkall...@googlegroups.com, virtual...@lists.linux-foundation.org
Hello,

syzbot found the following crash on:

HEAD commit: ef78e5ec9214 ia64: export node_distance function
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=175c2415400000
kernel config: https://syzkaller.appspot.com/x/.config?x=c94f9f0c0363db4b
dashboard link: https://syzkaller.appspot.com/bug?extid=d21e6e297322a900c128
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=141db34d400000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=108ef293400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+d21e6e...@syzkaller.appspotmail.com

IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): veth1: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
8021q: adding VLAN 0 to HW filter on device team0
INFO: task syz-executor251:6230 blocked for more than 140 seconds.
Not tainted 4.20.0-rc4+ #132
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor251 D21080 6230 6229 0x80000002
Call Trace:
context_switch kernel/sched/core.c:2831 [inline]
__schedule+0x8cf/0x21d0 kernel/sched/core.c:3472
schedule+0xfe/0x460 kernel/sched/core.c:3516
schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:3574
__mutex_lock_common kernel/locking/mutex.c:1002 [inline]
__mutex_lock+0xaff/0x16f0 kernel/locking/mutex.c:1072
mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1087
vhost_net_stop_vq+0x2d/0x120 drivers/vhost/net.c:1306
vhost_net_stop drivers/vhost/net.c:1320 [inline]
vhost_net_release+0x5b/0x1d0 drivers/vhost/net.c:1352
__fput+0x385/0xa30 fs/file_table.c:278
____fput+0x15/0x20 fs/file_table.c:309
task_work_run+0x1e8/0x2a0 kernel/task_work.c:113
exit_task_work include/linux/task_work.h:22 [inline]
do_exit+0xf86/0x26d0 kernel/exit.c:867
do_group_exit+0x177/0x440 kernel/exit.c:970
__do_sys_exit_group kernel/exit.c:981 [inline]
__se_sys_exit_group kernel/exit.c:979 [inline]
__x64_sys_exit_group+0x3e/0x50 kernel/exit.c:979
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x445338
Code: 72 6e 20 69 73 20 25 64 0a 00 72 65 67 65 78 3a 20 65 6e 64 20 73 65
61 72 63 68 2c 20 66 6f 75 6e 64 20 25 64 0a 00 23 25 33 <2e> 33 64 00 5f
00 5f 2e 00 31 30 30 00 31 30 31 00 31 30 32 00 31
RSP: 002b:00007ffd228e1cb8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 0000000000445338
RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001
RBP: 00000000004cd650 R08: 00000000000000e7 R09: ffffffffffffffd0
R10: 00007ffd228e1d00 R11: 0000000000000246 R12: 0000000000000001
R13: 00000000006e1320 R14: 000000000000000a R15: 0000000000000000

Showing all locks held in the system:
1 lock held by khungtaskd/1019:
#0: 00000000da98dfa9 (rcu_read_lock){....}, at:
debug_show_all_locks+0xd0/0x424 kernel/locking/lockdep.c:4379
1 lock held by rsyslogd/6077:
#0: 00000000a7881970 (&f->f_pos_lock){+.+.}, at: __fdget_pos+0x1bb/0x200
fs/file.c:766
2 locks held by getty/6199:
#0: 0000000051a5383e (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
#1: 00000000e292e313 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154
2 locks held by getty/6200:
#0: 000000000155ff5f (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
#1: 000000001a79dc08 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154
2 locks held by getty/6201:
#0: 0000000050da3a04 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
#1: 000000000068a427 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154
2 locks held by getty/6202:
#0: 000000004bd50c6d (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
#1: 000000007dde2138 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154
2 locks held by getty/6203:
#0: 0000000054dc4f51 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
#1: 0000000087e81f0c (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154
2 locks held by getty/6204:
#0: 000000004f97fe78 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
#1: 000000009af2a5dd (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154
2 locks held by getty/6205:
#0: 00000000cb15d6f6 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
#1: 00000000392a1db6 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154
1 lock held by syz-executor251/6230:
#0: 000000009c6ae67f (&vq->mutex){+.+.}, at: vhost_net_stop_vq+0x2d/0x120
drivers/vhost/net.c:1306
1 lock held by vhost-6470/6471:

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 1019 Comm: khungtaskd Not tainted 4.20.0-rc4+ #132
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x244/0x39d lib/dump_stack.c:113
nmi_cpu_backtrace.cold.2+0x5c/0xa1 lib/nmi_backtrace.c:101
nmi_trigger_cpumask_backtrace+0x1e8/0x22a lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:205 [inline]
watchdog+0xb51/0x1060 kernel/hung_task.c:289
kthread+0x35a/0x440 kernel/kthread.c:246
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 6471 Comm: vhost-6470 Not tainted 4.20.0-rc4+ #132
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:vhost_umem_interval_tree_iter_first drivers/vhost/vhost.c:53
[inline]
RIP: 0010:iotlb_access_ok+0x4a3/0x600 drivers/vhost/vhost.c:1207
Code: ac 8d fb 4d 39 ec 0f 82 a1 00 00 00 e8 96 ab 8d fb 48 8b 45 b8 4c 8b
60 08 49 8d 7c 24 28 48 89 f8 48 c1 e8 03 42 80 3c 38 00 <0f> 85 ef 00 00
00 4d 8b 74 24 28 4d 89 ec 48 8b 7d c8 4c 89 f6 e8
RSP: 0018:ffff8881c263f950 EFLAGS: 00000246
RAX: 1ffff1103af365b5 RBX: ffff8881d79b2d80 RCX: ffffffff85f1df5c
RDX: 0000000000000000 RSI: ffffffff85f1df6a RDI: ffff8881d79b2da8
RBP: ffff8881c263f9c0 R08: ffff8881c024c500 R09: ffffed1034e2406d
R10: ffff8881c263f9f8 R11: ffff8881a712036f R12: ffff8881d79b2d80
R13: 0000000000000000 R14: 0000000000000010 R15: dffffc0000000000
FS: 0000000000000000(0000) GS:ffff8881dae00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffff600400 CR3: 00000001d753a000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
vq_iotlb_prefetch+0x10e/0x230 drivers/vhost/vhost.c:1240
handle_rx+0x292/0x1df0 drivers/vhost/net.c:1093
handle_rx_net+0x19/0x20 drivers/vhost/net.c:1234
vhost_worker+0x2ac/0x4c0 drivers/vhost/vhost.c:365
kthread+0x35a/0x440 kernel/kthread.c:246
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

syzbot

unread,
Dec 9, 2018, 10:45:04 PM12/9/18
to Jason Wang, jaso...@redhat.com, syzkall...@googlegroups.com
> #syz test

unknown command "test"

syzbot

unread,
Mar 20, 2019, 1:21:01 AM3/20/19
to jaso...@redhat.com, k...@vger.kernel.org, linux-...@vger.kernel.org, m...@redhat.com, net...@vger.kernel.org, syzkall...@googlegroups.com, virtual...@lists.linux-foundation.org, weiy...@gmail.com
syzbot has bisected this bug to:

commit 6b1e6cc7855b09a0a9bfa1d9f30172ba366f161c
Author: Jason Wang <jaso...@redhat.com>
Date: Thu Jun 23 06:04:32 2016 +0000

vhost: new device IOTLB API

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1486ad27200000
start commit: 6b1e6cc7 vhost: new device IOTLB API
git tree: upstream
final crash: https://syzkaller.appspot.com/x/report.txt?x=1686ad27200000
console output: https://syzkaller.appspot.com/x/log.txt?x=1286ad27200000
Reported-by: syzbot+d21e6e...@syzkaller.appspotmail.com
Fixes: 6b1e6cc7 ("vhost: new device IOTLB API")

Michael S. Tsirkin

unread,
Mar 25, 2019, 10:02:52 AM3/25/19
to syzbot, jaso...@redhat.com, k...@vger.kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com, virtual...@lists.linux-foundation.org, weiy...@gmail.com
Looks like more iotlb locking mess?

Jason Wang

unread,
Mar 26, 2019, 6:17:34 AM3/26/19
to Michael S. Tsirkin, syzbot, k...@vger.kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com, virtual...@lists.linux-foundation.org, weiy...@gmail.com

On 2019/3/25 下午10:02, Michael S. Tsirkin wrote:
> Looks like more iotlb locking mess?


Looking at the calltrace:

[ 221.743675] =============================================
[ 221.744297] [ INFO: possible recursive locking detected ]
[ 221.744944] 4.7.0+ #1 Not tainted
[ 221.745326] ---------------------------------------------
[ 221.746128] syz-executor1/6823 is trying to acquire lock:
[ 221.746737] (&vq->mutex){+.+...}, at: [<ffffffff84484b70>] vhost_process_iotlb_msg+0xe0/0x9e0
[ 221.747789]
[ 221.747789] but task is already holding lock:
[ 221.748470] (&vq->mutex){+.+...}, at: [<ffffffff84484b70>] vhost_process_iotlb_msg+0xe0/0x9e0
[ 221.749535]
[ 221.749535] other info that might help us debug this:
[ 221.750280] Possible unsafe locking scenario:
[ 221.750280]
[ 221.750946] CPU0
[ 221.751232] ----
[ 221.751523] lock(&vq->mutex);
[ 221.751922] lock(&vq->mutex);
[ 221.752339]
[ 221.752339] *** DEADLOCK ***
[ 221.752339]

I could not think of a path that can hit this. And I could not reproduce with the reproducer in the link in net-next.

Thanks

Dmitry Vyukov

unread,
Mar 26, 2019, 6:28:30 AM3/26/19
to Jason Wang, Michael S. Tsirkin, syzbot, KVM list, LKML, netdev, syzkaller-bugs, virtual...@lists.linux-foundation.org, weiy...@gmail.com
Looking at the bisection log, syzbot is able to reproduce this
super-reliably on multiple kernel revisions. Are you sure you are
using the right config/revision? What else can be in play? syzbot uses
VMs. The image is available.


> Thanks
>
>
> >
> > On Tue, Mar 19, 2019 at 10:21:00PM -0700, syzbot wrote:
> >> syzbot has bisected this bug to:
> >>
> >> commit 6b1e6cc7855b09a0a9bfa1d9f30172ba366f161c
> >> Author: Jason Wang <jaso...@redhat.com>
> >> Date: Thu Jun 23 06:04:32 2016 +0000
> >>
> >> vhost: new device IOTLB API
> >>
> >> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1486ad27200000
> >> start commit: 6b1e6cc7 vhost: new device IOTLB API
> >> git tree: upstream
> >> final crash: https://syzkaller.appspot.com/x/report.txt?x=1686ad27200000
> >> console output: https://syzkaller.appspot.com/x/log.txt?x=1286ad27200000
> >> kernel config: https://syzkaller.appspot.com/x/.config?x=c94f9f0c0363db4b
> >> dashboard link: https://syzkaller.appspot.com/bug?extid=d21e6e297322a900c128
> >> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=141db34d400000
> >> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=108ef293400000
> >>
> >> Reported-by: syzbot+d21e6e...@syzkaller.appspotmail.com
> >> Fixes: 6b1e6cc7 ("vhost: new device IOTLB API")
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/df4f2cf6-8469-f894-8f45-7c48a6a1801f%40redhat.com.
> For more options, visit https://groups.google.com/d/optout.

Jason Wang

unread,
Apr 8, 2019, 11:31:52 PM4/8/19
to Dmitry Vyukov, Michael S. Tsirkin, syzbot, KVM list, LKML, netdev, syzkaller-bugs, virtual...@lists.linux-foundation.org, weiy...@gmail.com
Yes, looks like the reason is vhost accept zero size iova range which
lead a infinite loop when trying to translate iova. Will post a patch to
fix this.

Thanks

Reply all
Reply to author
Forward
0 new messages