[syzbot] kernel BUG in vhost_get_vq_desc

17 views
Skip to first unread message

syzbot

unread,
Feb 12, 2022, 5:47:24 PM2/12/22
to jaso...@redhat.com, k...@vger.kernel.org, linux-...@vger.kernel.org, m...@redhat.com, net...@vger.kernel.org, syzkall...@googlegroups.com, virtual...@lists.linux-foundation.org
Hello,

syzbot found the following issue on:

HEAD commit: 83e396641110 Merge tag 'soc-fixes-5.17-1' of git://git.ker..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1282df74700000
kernel config: https://syzkaller.appspot.com/x/.config?x=5707221760c00a20
dashboard link: https://syzkaller.appspot.com/bug?extid=3140b17cb44a7b174008
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+3140b1...@syzkaller.appspotmail.com

------------[ cut here ]------------
kernel BUG at drivers/vhost/vhost.c:2335!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 9449 Comm: vhost-9447 Not tainted 5.17.0-rc3-syzkaller-00247-g83e396641110 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:vhost_get_vq_desc+0x1d43/0x22c0 drivers/vhost/vhost.c:2335
Code: 00 00 00 48 c7 c6 00 ac 9c 8a 48 c7 c7 28 27 8e 8d 48 89 ca 48 c1 e1 04 48 01 d9 e8 77 23 29 fd e9 74 ff ff ff e8 bd 3f a3 fa <0f> 0b e8 b6 3f a3 fa 48 8b 54 24 18 48 b8 00 00 00 00 00 fc ff df
RSP: 0018:ffffc9000f527b88 EFLAGS: 00010212

RAX: 0000000000000133 RBX: 0000000000000001 RCX: ffffc9000ef65000
RDX: 0000000000040000 RSI: ffffffff86d46e33 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
R10: ffffffff86d45f2c R11: 0000000000000000 R12: ffff88802bac4d68
R13: 0000000000000000 R14: dffffc0000000000 R15: ffff88802bac4bb0
FS: 0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6c74f8a718 CR3: 000000002bb11000 CR4: 00000000003526e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
vhost_vsock_handle_tx_kick+0x277/0xa20 drivers/vhost/vsock.c:522
vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372
kthread+0x2e9/0x3a0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:vhost_get_vq_desc+0x1d43/0x22c0 drivers/vhost/vhost.c:2335
Code: 00 00 00 48 c7 c6 00 ac 9c 8a 48 c7 c7 28 27 8e 8d 48 89 ca 48 c1 e1 04 48 01 d9 e8 77 23 29 fd e9 74 ff ff ff e8 bd 3f a3 fa <0f> 0b e8 b6 3f a3 fa 48 8b 54 24 18 48 b8 00 00 00 00 00 fc ff df
RSP: 0018:ffffc9000f527b88 EFLAGS: 00010212

RAX: 0000000000000133 RBX: 0000000000000001 RCX: ffffc9000ef65000
RDX: 0000000000040000 RSI: ffffffff86d46e33 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
R10: ffffffff86d45f2c R11: 0000000000000000 R12: ffff88802bac4d68
R13: 0000000000000000 R14: dffffc0000000000 R15: ffff88802bac4bb0
FS: 0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6c7679a1b8 CR3: 000000002bb11000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

syzbot

unread,
Feb 17, 2022, 8:21:20 PM2/17/22
to jaso...@redhat.com, k...@vger.kernel.org, linux-...@vger.kernel.org, m...@redhat.com, net...@vger.kernel.org, syzkall...@googlegroups.com, virtual...@lists.linux-foundation.org
syzbot has found a reproducer for the following issue on:

HEAD commit: f71077a4d84b Merge tag 'mmc-v5.17-rc1-2' of git://git.kern..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=104c04ca700000
kernel config: https://syzkaller.appspot.com/x/.config?x=a78b064590b9f912
dashboard link: https://syzkaller.appspot.com/bug?extid=3140b17cb44a7b174008
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1362e232700000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11373a6c700000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+3140b1...@syzkaller.appspotmail.com

------------[ cut here ]------------
kernel BUG at drivers/vhost/vhost.c:2335!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 3597 Comm: vhost-3596 Not tainted 5.17.0-rc4-syzkaller-00054-gf71077a4d84b #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:vhost_get_vq_desc+0x1d43/0x22c0 drivers/vhost/vhost.c:2335
Code: 00 00 00 48 c7 c6 20 2c 9d 8a 48 c7 c7 98 a6 8e 8d 48 89 ca 48 c1 e1 04 48 01 d9 e8 b7 59 28 fd e9 74 ff ff ff e8 5d c8 a1 fa <0f> 0b e8 56 c8 a1 fa 48 8b 54 24 18 48 b8 00 00 00 00 00 fc ff df
RSP: 0018:ffffc90001d1fb88 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: ffff8880234b0000 RSI: ffffffff86d715c3 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
R10: ffffffff86d706bc R11: 0000000000000000 R12: ffff888072c24d68
R13: 0000000000000000 R14: dffffc0000000000 R15: ffff888072c24bb0
FS: 0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000002 CR3: 000000007902c000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
vhost_vsock_handle_tx_kick+0x277/0xa20 drivers/vhost/vsock.c:522
vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372
kthread+0x2e9/0x3a0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:vhost_get_vq_desc+0x1d43/0x22c0 drivers/vhost/vhost.c:2335
Code: 00 00 00 48 c7 c6 20 2c 9d 8a 48 c7 c7 98 a6 8e 8d 48 89 ca 48 c1 e1 04 48 01 d9 e8 b7 59 28 fd e9 74 ff ff ff e8 5d c8 a1 fa <0f> 0b e8 56 c8 a1 fa 48 8b 54 24 18 48 b8 00 00 00 00 00 fc ff df
RSP: 0018:ffffc90001d1fb88 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: ffff8880234b0000 RSI: ffffffff86d715c3 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
R10: ffffffff86d706bc R11: 0000000000000000 R12: ffff888072c24d68
R13: 0000000000000000 R14: dffffc0000000000 R15: ffff888072c24bb0
FS: 0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000002 CR3: 000000007902c000 CR4: 00000000003506e0

Michael S. Tsirkin

unread,
Feb 18, 2022, 6:37:36 AM2/18/22
to syzbot, jaso...@redhat.com, k...@vger.kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com, virtual...@lists.linux-foundation.org
I don't see how this can trigger normally so I'm assuming
another case of use after free.

Hillf Danton

unread,
Feb 19, 2022, 6:49:50 AM2/19/22
to syzbot, jaso...@redhat.com, linux-...@vger.kernel.org, m...@redhat.com, syzkall...@googlegroups.com
On Thu, 17 Feb 2022 17:21:20 -0800
Debug the report by checking the notify flag once more, and trigger bug if
the two checks do not match.

Hillf

#syz test: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ f71077a4d84b

--- x/drivers/vhost/vhost.c
+++ y/drivers/vhost/vhost.c
@@ -2207,6 +2207,7 @@ int vhost_get_vq_desc(struct vhost_virtq
__virtio16 avail_idx;
__virtio16 ring_head;
int ret, access;
+ bool was_set = !!(vq->used_flags & VRING_USED_F_NO_NOTIFY);

/* Check it isn't doing very strange things with descriptor numbers. */
last_avail_idx = vq->last_avail_idx;
@@ -2332,7 +2333,7 @@ int vhost_get_vq_desc(struct vhost_virtq

/* Assume notifications from guest are disabled at this point,
* if they aren't we would need to update avail_event index. */
- BUG_ON(!(vq->used_flags & VRING_USED_F_NO_NOTIFY));
+ BUG_ON(!!(vq->used_flags & VRING_USED_F_NO_NOTIFY) != was_set);
return head;
}
EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
--

syzbot

unread,
Feb 19, 2022, 7:00:08 AM2/19/22
to hda...@sina.com, jaso...@redhat.com, linux-...@vger.kernel.org, m...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
WARNING in vhost_dev_cleanup

------------[ cut here ]------------
WARNING: CPU: 1 PID: 4052 at drivers/vhost/vhost.c:715 vhost_dev_cleanup+0x8b8/0xbc0 drivers/vhost/vhost.c:715
Modules linked in:
CPU: 1 PID: 4052 Comm: syz-executor213 Not tainted 5.17.0-rc4-syzkaller-00054-gf71077a4d84b-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:vhost_dev_cleanup+0x8b8/0xbc0 drivers/vhost/vhost.c:715
Code: c7 85 90 01 00 00 00 00 00 00 e8 83 6e a2 fa 48 89 ef 48 83 c4 20 5b 5d 41 5c 41 5d 41 5e 41 5f e9 7d d6 ff ff e8 68 6e a2 fa <0f> 0b e9 46 ff ff ff 48 8b 7c 24 10 e8 b7 00 ea fa e9 75 f7 ff ff
RSP: 0018:ffffc90001d2fca8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: dffffc0000000000 RCX: 0000000000000000
RDX: ffff8880229e8000 RSI: ffffffff86d66fb8 RDI: ffff8880794300b0
RBP: ffff888079430000 R08: 0000000000000001 R09: 0000000000000001
R10: ffffffff817f1e08 R11: 0000000000000000 R12: ffff8880794300d0
R13: ffff888079430120 R14: ffff8880794300d0 R15: 0000000000000002
FS: 0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000002 CR3: 0000000019a2f000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
vhost_vsock_dev_release+0x36e/0x4b0 drivers/vhost/vsock.c:771
__fput+0x286/0x9f0 fs/file_table.c:313
task_work_run+0xdd/0x1a0 kernel/task_work.c:164
exit_task_work include/linux/task_work.h:32 [inline]
do_exit+0xb29/0x2a30 kernel/exit.c:806
do_group_exit+0xd2/0x2f0 kernel/exit.c:935
__do_sys_exit_group kernel/exit.c:946 [inline]
__se_sys_exit_group kernel/exit.c:944 [inline]
__x64_sys_exit_group+0x3a/0x50 kernel/exit.c:944
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f2b623eaba9
Code: Unable to access opcode bytes at RIP 0x7f2b623eab7f.
RSP: 002b:00007ffd86806ac8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00007f2b6245f330 RCX: 00007f2b623eaba9
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 00007ffd86806cb8
R10: 00007ffd86806cb8 R11: 0000000000000246 R12: 00007f2b6245f330
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
</TASK>


Tested on:

commit: f71077a4 Merge tag 'mmc-v5.17-rc1-2' of git://git.kern..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
console output: https://syzkaller.appspot.com/x/log.txt?x=11ece422700000
kernel config: https://syzkaller.appspot.com/x/.config?x=a78b064590b9f912
dashboard link: https://syzkaller.appspot.com/bug?extid=3140b17cb44a7b174008
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=12f7f94c700000

Hillf Danton

unread,
Feb 19, 2022, 7:51:15 AM2/19/22
to syzbot, jaso...@redhat.com, linux-...@vger.kernel.org, m...@redhat.com, syzkall...@googlegroups.com
On Sat, 19 Feb 2022 04:00:07 -0800
> Hello,
>
> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> WARNING in vhost_dev_cleanup

The BUG_ON disappears.
>
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 4052 at drivers/vhost/vhost.c:715 vhost_dev_cleanup+0x8b8/0xbc0 drivers/vhost/vhost.c:715

This was also Reported-by: syzbot+1e3ea6...@syzkaller.appspotmail.com
Debug the warning by making worker handle pending works even after receiving
the stop signal. And check pending works after worker is stopped.

Hillf

#syz test: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ f71077a4d84b

--- x/drivers/vhost/vhost.c
+++ y/drivers/vhost/vhost.c
@@ -353,14 +353,16 @@ static int vhost_worker(void *data)
/* mb paired w/ kthread_stop */
set_current_state(TASK_INTERRUPTIBLE);

- if (kthread_should_stop()) {
- __set_current_state(TASK_RUNNING);
- break;
- }
-
node = llist_del_all(&dev->work_list);
- if (!node)
+ if (!node) {
+ if (kthread_should_stop()) {
+ __set_current_state(TASK_RUNNING);
+ break;
+ }
+
schedule();
+ continue;
+ }

node = llist_reverse_order(node);
/* make sure flag is seen after deletion */
@@ -712,12 +714,12 @@ void vhost_dev_cleanup(struct vhost_dev
dev->iotlb = NULL;
vhost_clear_msg(dev);
wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM);
- WARN_ON(!llist_empty(&dev->work_list));
if (dev->worker) {
kthread_stop(dev->worker);
dev->worker = NULL;
dev->kcov_handle = 0;
}
+ WARN_ON(!llist_empty(&dev->work_list));
vhost_detach_mm(dev);
}
EXPORT_SYMBOL_GPL(vhost_dev_cleanup);
@@ -2207,6 +2209,7 @@ int vhost_get_vq_desc(struct vhost_virtq
__virtio16 avail_idx;
__virtio16 ring_head;
int ret, access;
+ bool was_set = !!(vq->used_flags & VRING_USED_F_NO_NOTIFY);

/* Check it isn't doing very strange things with descriptor numbers. */
last_avail_idx = vq->last_avail_idx;
@@ -2332,7 +2335,7 @@ int vhost_get_vq_desc(struct vhost_virtq

syzbot

unread,
Feb 19, 2022, 8:01:11 AM2/19/22
to hda...@sina.com, jaso...@redhat.com, linux-...@vger.kernel.org, m...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
kernel BUG in vhost_get_vq_desc

------------[ cut here ]------------
kernel BUG at drivers/vhost/vhost.c:2338!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 4071 Comm: vhost-4070 Not tainted 5.17.0-rc4-syzkaller-00054-gf71077a4d84b-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:vhost_get_vq_desc+0x1dc5/0x2350 drivers/vhost/vhost.c:2338
Code: 00 00 00 48 c7 c6 20 2c 9d 8a 48 c7 c7 98 a6 8e 8d 48 89 ca 48 c1 e1 04 48 01 d9 e8 25 59 28 fd e9 74 ff ff ff e8 cb c7 a1 fa <0f> 0b e8 c4 c7 a1 fa 48 8b 54 24 18 48 b8 00 00 00 00 00 fc ff df
RSP: 0018:ffffc900028bfb78 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: ffff88801cbd1d00 RSI: ffffffff86d71655 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
R10: ffffffff86d7072d R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffff88806ffc4bb0 R15: dffffc0000000000
FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000002 CR3: 000000001d077000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
vhost_vsock_handle_tx_kick+0x277/0xa20 drivers/vhost/vsock.c:522
vhost_worker+0x2e9/0x3e0 drivers/vhost/vhost.c:374
kthread+0x2e9/0x3a0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:vhost_get_vq_desc+0x1dc5/0x2350 drivers/vhost/vhost.c:2338
Code: 00 00 00 48 c7 c6 20 2c 9d 8a 48 c7 c7 98 a6 8e 8d 48 89 ca 48 c1 e1 04 48 01 d9 e8 25 59 28 fd e9 74 ff ff ff e8 cb c7 a1 fa <0f> 0b e8 c4 c7 a1 fa 48 8b 54 24 18 48 b8 00 00 00 00 00 fc ff df
RSP: 0018:ffffc900028bfb78 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: ffff88801cbd1d00 RSI: ffffffff86d71655 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
R10: ffffffff86d7072d R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffff88806ffc4bb0 R15: dffffc0000000000
FS: 0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc7293991d0 CR3: 000000001d077000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


Tested on:

commit: f71077a4 Merge tag 'mmc-v5.17-rc1-2' of git://git.kern..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
console output: https://syzkaller.appspot.com/x/log.txt?x=11e82d7a700000
kernel config: https://syzkaller.appspot.com/x/.config?x=a78b064590b9f912
dashboard link: https://syzkaller.appspot.com/bug?extid=3140b17cb44a7b174008
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=11857326700000

Hillf Danton

unread,
Feb 19, 2022, 8:47:28 PM2/19/22
to syzbot, jaso...@redhat.com, linux-...@vger.kernel.org, m...@redhat.com, syzkall...@googlegroups.com
On Sat, 19 Feb 2022 05:01:10 -0800
> Hello,
>
> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> kernel BUG in vhost_get_vq_desc

The WARNING: CPU: 1 PID: 4052 at drivers/vhost/vhost.c:715 got quiesced.
>
> ------------[ cut here ]------------
> kernel BUG at drivers/vhost/vhost.c:2338!

Given the mutex_lock(&vq->mutex) in vhost_vsock_handle_tx_kick(), this
report proves that the bug is bogus.
Attempted fix is bail out if anything eerie is detected in terms of the
notify flag.
@@ -2207,7 +2209,10 @@ int vhost_get_vq_desc(struct vhost_virtq
__virtio16 avail_idx;
__virtio16 ring_head;
int ret, access;
+ bool was_set = !!(vq->used_flags & VRING_USED_F_NO_NOTIFY);

+ if (!was_set)
+ return -EINVAL;
/* Check it isn't doing very strange things with descriptor numbers. */
last_avail_idx = vq->last_avail_idx;

@@ -2327,12 +2332,14 @@ int vhost_get_vq_desc(struct vhost_virtq
}
} while ((i = next_desc(vq, &desc)) != -1);

+ /* Assume notifications from guest are disabled at this point,
+ * if they aren't we would need to update avail_event index. */
+ if (!!(vq->used_flags & VRING_USED_F_NO_NOTIFY) != was_set)
+ return -EINVAL;
+
/* On success, increment avail index. */
vq->last_avail_idx++;

- /* Assume notifications from guest are disabled at this point,
- * if they aren't we would need to update avail_event index. */
- BUG_ON(!(vq->used_flags & VRING_USED_F_NO_NOTIFY));
return head;
}
EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
--

syzbot

unread,
Feb 19, 2022, 9:10:09 PM2/19/22
to hda...@sina.com, jaso...@redhat.com, linux-...@vger.kernel.org, m...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+3140b1...@syzkaller.appspotmail.com

Tested on:

commit: f71077a4 Merge tag 'mmc-v5.17-rc1-2' of git://git.kern..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
kernel config: https://syzkaller.appspot.com/x/.config?x=a78b064590b9f912
dashboard link: https://syzkaller.appspot.com/bug?extid=3140b17cb44a7b174008
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=143dc0d4700000

Note: testing is done by a robot and is best-effort only.

Michael S. Tsirkin

unread,
Feb 20, 2022, 5:08:42 AM2/20/22
to Hillf Danton, syzbot, jaso...@redhat.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
I mean this will fix the warning for sure, but do we understand how
it might have triggered?

Hillf Danton

unread,
Feb 20, 2022, 6:09:56 AM2/20/22
to Michael S. Tsirkin, syzbot, jaso...@redhat.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello Mike,

Thanks for taking a look at it.

> I mean this will fix the warning for sure, but do we understand how
> it might have triggered?

Based on what's fed to BUG_ON in the hunk below, it was the update of
used_flag behind our back that pulled the trigger.

The bigger pain is, given the mutex_lock(&vq->mutex) in
vhost_vsock_handle_tx_kick(), I find nothing to do about it now after
scratching scalp twenty minutes other than detecting the update.

@@ -2332,7 +2335,7 @@ int vhost_get_vq_desc(struct vhost_virtq

/* Assume notifications from guest are disabled at this point,
* if they aren't we would need to update avail_event index. */
- BUG_ON(!(vq->used_flags & VRING_USED_F_NO_NOTIFY));
+ BUG_ON(!!(vq->used_flags & VRING_USED_F_NO_NOTIFY) != was_set);
return head;
}
EXPORT_SYMBOL_GPL(vhost_get_vq_desc);

Michael S. Tsirkin

unread,
Feb 20, 2022, 7:16:35 AM2/20/22
to Hillf Danton, syzbot, jaso...@redhat.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Right. I think it's highly likely a use after free.
How about poisoning the vq struct with some value before freeing
so we can catch that?

Dmitry Vyukov

unread,
Feb 20, 2022, 7:31:15 AM2/20/22
to Michael S. Tsirkin, Hillf Danton, syzbot, jaso...@redhat.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
syzbot config enables KASAN, which catches most use-after-frees. So
unless there is something very special about this code, I wouldn't
assume this is a use-after-free.
Some racy use-after-frees may be caught as both use-after-frees and
other types of bugs with lower probability. I see 8 bugs on the syzbot
dashboard that mention "vhost" but none of the are use-after-frees.

Michael S. Tsirkin

unread,
Feb 20, 2022, 8:10:18 AM2/20/22
to Dmitry Vyukov, Hillf Danton, syzbot, jaso...@redhat.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hmm okay.
Well we also have the (non reproducible)
WARN_ON(!llist_empty(&dev->work_list));

trigger.


So I think what happens is that there's some worker still running
when we call vhost_vq_reset.

Here's what is supposed to stop it:

vhost_vsock_stop(vsock);
vhost_vsock_flush(vsock);
vhost_dev_stop(&vsock->dev);

after this point, there should be no new work.

However I wonder why do we flush before we stop everything.
Maybe this is what it's about.
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index d6ca1c7ad513..b31c3a78dbff 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -754,8 +754,8 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file)
vsock_for_each_connected_socket(vhost_vsock_reset_orphans);

vhost_vsock_stop(vsock);
- vhost_vsock_flush(vsock);
vhost_dev_stop(&vsock->dev);
+ vhost_vsock_flush(vsock);

spin_lock_bh(&vsock->send_pkt_list_lock);
while (!list_empty(&vsock->send_pkt_list)) {

syzbot

unread,
Feb 20, 2022, 8:20:08 AM2/20/22
to dvy...@google.com, hda...@sina.com, jaso...@redhat.com, linux-...@vger.kernel.org, m...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
kernel BUG in vhost_get_vq_desc

------------[ cut here ]------------
kernel BUG at drivers/vhost/vhost.c:2335!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 4048 Comm: vhost-4047 Not tainted 5.17.0-rc4-syzkaller-00054-gf71077a4d84b-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:vhost_get_vq_desc+0x1d43/0x22c0 drivers/vhost/vhost.c:2335
Code: 00 00 00 48 c7 c6 20 2c 9d 8a 48 c7 c7 98 a6 8e 8d 48 89 ca 48 c1 e1 04 48 01 d9 e8 b7 59 28 fd e9 74 ff ff ff e8 5d c8 a1 fa <0f> 0b e8 56 c8 a1 fa 48 8b 54 24 18 48 b8 00 00 00 00 00 fc ff df
RSP: 0018:ffffc90001affb88 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: ffff88801c9c5700 RSI: ffffffff86d715c3 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
R10: ffffffff86d706bc R11: 0000000000000000 R12: ffff888073b44d68
R13: 0000000000000000 R14: dffffc0000000000 R15: ffff888073b44bb0
FS: 0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000002 CR3: 0000000079bfe000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
vhost_vsock_handle_tx_kick+0x277/0xa20 drivers/vhost/vsock.c:522
vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372
kthread+0x2e9/0x3a0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:vhost_get_vq_desc+0x1d43/0x22c0 drivers/vhost/vhost.c:2335
Code: 00 00 00 48 c7 c6 20 2c 9d 8a 48 c7 c7 98 a6 8e 8d 48 89 ca 48 c1 e1 04 48 01 d9 e8 b7 59 28 fd e9 74 ff ff ff e8 5d c8 a1 fa <0f> 0b e8 56 c8 a1 fa 48 8b 54 24 18 48 b8 00 00 00 00 00 fc ff df
RSP: 0018:ffffc90001affb88 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: ffff88801c9c5700 RSI: ffffffff86d715c3 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
R10: ffffffff86d706bc R11: 0000000000000000 R12: ffff888073b44d68
R13: 0000000000000000 R14: dffffc0000000000 R15: ffff888073b44bb0
FS: 0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005619d349f018 CR3: 0000000079bfe000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


Tested on:

commit: f71077a4 Merge tag 'mmc-v5.17-rc1-2' of git://git.kern..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
console output: https://syzkaller.appspot.com/x/log.txt?x=161cf916700000
kernel config: https://syzkaller.appspot.com/x/.config?x=a78b064590b9f912
dashboard link: https://syzkaller.appspot.com/bug?extid=3140b17cb44a7b174008
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=13500f0e700000

Michael S. Tsirkin

unread,
Feb 20, 2022, 8:29:25 AM2/20/22
to Dmitry Vyukov, Hillf Danton, syzbot, jaso...@redhat.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Sun, Feb 20, 2022 at 01:31:02PM +0100, Dmitry Vyukov wrote:
Okay, for starters let's try to make sure whether what we are seeing is
actually accessing a vsock that is being released.
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index d6ca1c7ad513..2dbc64f072e8 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -58,6 +58,7 @@ struct vhost_vsock {

u32 guest_cid;
bool seqpacket_allow;
+ bool dead;
};

static u32 vhost_transport_get_local_cid(void)
@@ -106,6 +107,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,

/* Avoid further vmexits, we're already processing the virtqueue */
vhost_disable_notify(&vsock->dev, vq);
+ WARN_ON(vsock->dead);

do {
struct virtio_vsock_pkt *pkt;
@@ -128,6 +130,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
list_del_init(&pkt->list);
spin_unlock_bh(&vsock->send_pkt_list_lock);

+ WARN_ON(vsock->dead);
head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
&out, &in, NULL, NULL);
if (head < 0) {
@@ -510,6 +513,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
goto out;

vhost_disable_notify(&vsock->dev, vq);
+ WARN_ON(vsock->dead);
do {
if (!vhost_vsock_more_replies(vsock)) {
/* Stop tx until the device processes already
@@ -519,6 +523,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
goto no_more_replies;
}

+ WARN_ON(vsock->dead);
head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
&out, &in, NULL, NULL);
if (head < 0)
@@ -678,6 +683,7 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
}

vsock->guest_cid = 0; /* no CID assigned yet */
+ vsock->dead = false;

atomic_set(&vsock->queued_replies, 0);

@@ -754,8 +760,9 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file)
vsock_for_each_connected_socket(vhost_vsock_reset_orphans);

vhost_vsock_stop(vsock);
- vhost_vsock_flush(vsock);
vhost_dev_stop(&vsock->dev);
+ vhost_vsock_flush(vsock);
+ vsock->dead = true;

Hillf Danton

unread,
Feb 20, 2022, 9:12:21 PM2/20/22
to syzbot, jaso...@redhat.com, linux-...@vger.kernel.org, m...@redhat.com, syzkall...@googlegroups.com
On Sat, 19 Feb 2022 05:01:10 -0800
> Hello,
>
> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> kernel BUG in vhost_get_vq_desc

The WARNING: CPU: 1 PID: 4052 at drivers/vhost/vhost.c:715 got quiesced.
>
> ------------[ cut here ]------------
> kernel BUG at drivers/vhost/vhost.c:2338!

Given the mutex_lock(&vq->mutex) in vhost_vsock_handle_tx_kick(), this
report proves that the reason to BUG_ON there is bogus.
Attempted fix - quiesce worker before vq reset by flushing pending works.

Hillf

#syz test: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ f71077a4d84b

--- x/drivers/vhost/vhost.c
+++ y/drivers/vhost/vhost.c
@@ -692,6 +692,9 @@ void vhost_dev_cleanup(struct vhost_dev
{
int i;

+ wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM);
+ vhost_work_dev_flush(dev);
+
for (i = 0; i < dev->nvqs; ++i) {
if (dev->vqs[i]->error_ctx)
eventfd_ctx_put(dev->vqs[i]->error_ctx);
@@ -711,7 +714,6 @@ void vhost_dev_cleanup(struct vhost_dev
vhost_iotlb_free(dev->iotlb);
dev->iotlb = NULL;
vhost_clear_msg(dev);
- wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM);
WARN_ON(!llist_empty(&dev->work_list));
if (dev->worker) {
kthread_stop(dev->worker);
--

syzbot

unread,
Feb 20, 2022, 9:26:07 PM2/20/22
to hda...@sina.com, jaso...@redhat.com, linux-...@vger.kernel.org, m...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
kernel BUG in vhost_get_vq_desc

------------[ cut here ]------------
kernel BUG at drivers/vhost/vhost.c:2337!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 4061 Comm: vhost-4060 Not tainted 5.17.0-rc4-syzkaller-00054-gf71077a4d84b-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:vhost_get_vq_desc+0x1d43/0x22c0 drivers/vhost/vhost.c:2337
Code: 00 00 00 48 c7 c6 20 2c 9d 8a 48 c7 c7 98 a6 8e 8d 48 89 ca 48 c1 e1 04 48 01 d9 e8 57 59 28 fd e9 74 ff ff ff e8 fd c7 a1 fa <0f> 0b e8 f6 c7 a1 fa 48 8b 54 24 18 48 b8 00 00 00 00 00 fc ff df
RSP: 0018:ffffc9000204fb88 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: ffff888077138000 RSI: ffffffff86d71623 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
R10: ffffffff86d7071c R11: 0000000000000000 R12: ffff888079664d68
R13: 0000000000000000 R14: dffffc0000000000 R15: ffff888079664bb0
FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fcc525c41d0 CR3: 000000001816c000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
vhost_vsock_handle_tx_kick+0x277/0xa20 drivers/vhost/vsock.c:522
vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372
kthread+0x2e9/0x3a0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:vhost_get_vq_desc+0x1d43/0x22c0 drivers/vhost/vhost.c:2337
Code: 00 00 00 48 c7 c6 20 2c 9d 8a 48 c7 c7 98 a6 8e 8d 48 89 ca 48 c1 e1 04 48 01 d9 e8 57 59 28 fd e9 74 ff ff ff e8 fd c7 a1 fa <0f> 0b e8 f6 c7 a1 fa 48 8b 54 24 18 48 b8 00 00 00 00 00 fc ff df
RSP: 0018:ffffc9000204fb88 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: ffff888077138000 RSI: ffffffff86d71623 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
R10: ffffffff86d7071c R11: 0000000000000000 R12: ffff888079664d68
R13: 0000000000000000 R14: dffffc0000000000 R15: ffff888079664bb0
FS: 0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000002 CR3: 000000001816c000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


Tested on:

commit: f71077a4 Merge tag 'mmc-v5.17-rc1-2' of git://git.kern..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
console output: https://syzkaller.appspot.com/x/log.txt?x=128be8ea700000
kernel config: https://syzkaller.appspot.com/x/.config?x=a78b064590b9f912
dashboard link: https://syzkaller.appspot.com/bug?extid=3140b17cb44a7b174008
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=1651c3d2700000

Hillf Danton

unread,
Feb 20, 2022, 11:07:58 PM2/20/22
to syzbot, jaso...@redhat.com, linux-...@vger.kernel.org, m...@redhat.com, syzkall...@googlegroups.com
On Sun, 20 Feb 2022 18:26:06 -0800
The re-trigger of the BUG_ON sends us to the start point and looks like it
could not be solved without a mind refresh.

Add a flag to vsock and set it before work flush upon release, and no more
works will be queued with it turned on.
--- x/drivers/vhost/vsock.c
+++ y/drivers/vhost/vsock.c
@@ -55,6 +55,7 @@ struct vhost_vsock {
struct list_head send_pkt_list; /* host->guest pending packets */

atomic_t queued_replies;
+ int cleanup;

u32 guest_cid;
bool seqpacket_allow;
@@ -262,6 +263,9 @@ vhost_transport_do_send_pkt(struct vhost
out:
mutex_unlock(&vq->mutex);

+ if (vsock->cleanup)
+ return;
+
if (restart_tx)
vhost_poll_queue(&tx_vq->poll);
}
@@ -678,6 +682,7 @@ static int vhost_vsock_dev_open(struct i
}

vsock->guest_cid = 0; /* no CID assigned yet */
+ vsock->cleanup = 0;

atomic_set(&vsock->queued_replies, 0);

@@ -741,6 +746,8 @@ static int vhost_vsock_dev_release(struc
{
struct vhost_vsock *vsock = file->private_data;

+ vsock->cleanup = 1;
+
mutex_lock(&vhost_vsock_mutex);
if (vsock->guest_cid)
hash_del_rcu(&vsock->hash);
--

syzbot

unread,
Feb 20, 2022, 11:18:09 PM2/20/22
to hda...@sina.com, jaso...@redhat.com, linux-...@vger.kernel.org, m...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
WARNING in vhost_dev_cleanup

------------[ cut here ]------------
WARNING: CPU: 1 PID: 4069 at drivers/vhost/vhost.c:715 vhost_dev_cleanup+0x8b8/0xbc0 drivers/vhost/vhost.c:715
Modules linked in:
CPU: 0 PID: 4069 Comm: syz-executor422 Not tainted 5.17.0-rc4-syzkaller-00054-gf71077a4d84b-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:vhost_dev_cleanup+0x8b8/0xbc0 drivers/vhost/vhost.c:715
Code: c7 85 90 01 00 00 00 00 00 00 e8 a3 6d a2 fa 48 89 ef 48 83 c4 20 5b 5d 41 5c 41 5d 41 5e 41 5f e9 7d d6 ff ff e8 88 6d a2 fa <0f> 0b e9 46 ff ff ff 48 8b 7c 24 10 e8 d7 ff e9 fa e9 75 f7 ff ff
RSP: 0018:ffffc9000280fca8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: dffffc0000000000 RCX: 0000000000000000
RDX: ffff88801cadd700 RSI: ffffffff86d67098 RDI: ffff88807b1d00b0
RBP: ffff88807b1d0000 R08: 0000000000000001 R09: 0000000000000001
R10: ffffffff817f1e08 R11: 0000000000000000 R12: ffff88807b1d00d0
R13: ffff88807b1d0120 R14: ffff88807b1d00d0 R15: 0000000000000002
FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000561d17c43600 CR3: 0000000073741000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
vhost_vsock_dev_release+0x3a4/0x4f0 drivers/vhost/vsock.c:778
__fput+0x286/0x9f0 fs/file_table.c:313
task_work_run+0xdd/0x1a0 kernel/task_work.c:164
exit_task_work include/linux/task_work.h:32 [inline]
do_exit+0xb29/0x2a30 kernel/exit.c:806
do_group_exit+0xd2/0x2f0 kernel/exit.c:935
__do_sys_exit_group kernel/exit.c:946 [inline]
__se_sys_exit_group kernel/exit.c:944 [inline]
__x64_sys_exit_group+0x3a/0x50 kernel/exit.c:944
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f43a65e8ba9
Code: Unable to access opcode bytes at RIP 0x7f43a65e8b7f.
RSP: 002b:00007ffdf78cba98 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00007f43a665d330 RCX: 00007f43a65e8ba9
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 00007ffdf78cbc88
R10: 00007ffdf78cbc88 R11: 0000000000000246 R12: 00007f43a665d330
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
</TASK>


Tested on:

commit: f71077a4 Merge tag 'mmc-v5.17-rc1-2' of git://git.kern..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
console output: https://syzkaller.appspot.com/x/log.txt?x=16da8df2700000
kernel config: https://syzkaller.appspot.com/x/.config?x=a78b064590b9f912
dashboard link: https://syzkaller.appspot.com/bug?extid=3140b17cb44a7b174008
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=1682e422700000

Hillf Danton

unread,
Feb 21, 2022, 12:41:28 AM2/21/22
to syzbot, jaso...@redhat.com, linux-...@vger.kernel.org, m...@redhat.com, syzkall...@googlegroups.com
On Sun, 20 Feb 2022 20:18:08 -0800
Another round of attempts to quiesce the
WARNING: CPU: 1 PID: 4069 at drivers/vhost/vhost.c:715 after the
BUG at drivers/vhost/vhost.c:2337 went home.

Flush works before vq reset.

syzbot

unread,
Feb 21, 2022, 12:51:10 AM2/21/22
to hda...@sina.com, jaso...@redhat.com, linux-...@vger.kernel.org, m...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
WARNING in vhost_dev_cleanup

------------[ cut here ]------------
WARNING: CPU: 0 PID: 4098 at drivers/vhost/vhost.c:717 vhost_dev_cleanup+0x8f8/0xc20 drivers/vhost/vhost.c:717
Modules linked in:
CPU: 1 PID: 4098 Comm: syz-executor375 Not tainted 5.17.0-rc4-syzkaller-00054-gf71077a4d84b-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:vhost_dev_cleanup+0x8f8/0xc20 drivers/vhost/vhost.c:717
Code: c7 85 90 01 00 00 00 00 00 00 e8 43 4b a2 fa 48 89 ef 48 83 c4 28 5b 5d 41 5c 41 5d 41 5e 41 5f e9 1d b4 ff ff e8 28 4b a2 fa <0f> 0b e9 49 ff ff ff 48 8b 7c 24 10 e8 77 dd e9 fa e9 93 f7 ff ff
RSP: 0018:ffffc9000296fca0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: dffffc0000000000 RCX: 0000000000000000
RDX: ffff88807b86d700 RSI: ffffffff86d692f8 RDI: ffff888077fd00b0
RBP: ffff888077fd0000 R08: 0000000000000000 R09: ffff888077fd00d3
R10: ffffed100effa01a R11: 0000000000000001 R12: ffff888077fd00d0
R13: ffff888077fd0120 R14: ffff888077fd00d0 R15: 0000000000000002
FS: 0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f118cbe2130 CR3: 0000000020703000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
vhost_vsock_dev_release+0x3a4/0x4f0 drivers/vhost/vsock.c:778
__fput+0x286/0x9f0 fs/file_table.c:313
task_work_run+0xdd/0x1a0 kernel/task_work.c:164
exit_task_work include/linux/task_work.h:32 [inline]
do_exit+0xb29/0x2a30 kernel/exit.c:806
do_group_exit+0xd2/0x2f0 kernel/exit.c:935
__do_sys_exit_group kernel/exit.c:946 [inline]
__se_sys_exit_group kernel/exit.c:944 [inline]
__x64_sys_exit_group+0x3a/0x50 kernel/exit.c:944
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f118cb6fba9
Code: Unable to access opcode bytes at RIP 0x7f118cb6fb7f.
RSP: 002b:00007ffcb8cb7868 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00007f118cbe4330 RCX: 00007f118cb6fba9
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 00007ffcb8cb7a58
R10: 00007ffcb8cb7a58 R11: 0000000000000246 R12: 00007f118cbe4330
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
</TASK>


Tested on:

commit: f71077a4 Merge tag 'mmc-v5.17-rc1-2' of git://git.kern..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
console output: https://syzkaller.appspot.com/x/log.txt?x=14df1346700000
kernel config: https://syzkaller.appspot.com/x/.config?x=a78b064590b9f912
dashboard link: https://syzkaller.appspot.com/bug?extid=3140b17cb44a7b174008
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=108c4b4a700000

Hillf Danton

unread,
Feb 21, 2022, 3:52:39 AM2/21/22
to syzbot, jaso...@redhat.com, linux-...@vger.kernel.org, m...@redhat.com, syzkall...@googlegroups.com
On Sun, 20 Feb 2022 20:18:08 -0800
Another round of attempts to quiesce the
WARNING: CPU: 1 PID: 4069 at drivers/vhost/vhost.c:715 after the
BUG at drivers/vhost/vhost.c:2337 went home.

V1: Flush works before vq reset.
--- x/drivers/vhost/vhost.c
+++ y/drivers/vhost/vhost.c
@@ -692,6 +692,10 @@ void vhost_dev_cleanup(struct vhost_dev
{
int i;

+ wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM);
+ vhost_dev_stop(dev);
+ vhost_work_dev_flush(dev);
+
for (i = 0; i < dev->nvqs; ++i) {
if (dev->vqs[i]->error_ctx)
eventfd_ctx_put(dev->vqs[i]->error_ctx);
@@ -711,7 +715,6 @@ void vhost_dev_cleanup(struct vhost_dev

Michael S. Tsirkin

unread,
Feb 21, 2022, 4:17:09 AM2/21/22
to Hillf Danton, syzbot, jaso...@redhat.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Mon, Feb 21, 2022 at 04:52:27PM +0800, Hillf Danton wrote:
> Another round of attempts to quiesce the
> WARNING: CPU: 1 PID: 4069 at drivers/vhost/vhost.c:715 after the
> BUG at drivers/vhost/vhost.c:2337 went home.

Could you pls clarify what do you mean by "went home" here?

Thanks,

--
MST

Hillf Danton

unread,
Feb 21, 2022, 5:15:52 AM2/21/22
to Michael S. Tsirkin, syzbot, jaso...@redhat.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
The reproducer failed to trigger it.

Hillf

Michael S. Tsirkin

unread,
Feb 21, 2022, 5:48:55 AM2/21/22
to Hillf Danton, syzbot, jaso...@redhat.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
You mean this patch?

@@ -2207,7 +2209,10 @@ int vhost_get_vq_desc(struct vhost_virtq
__virtio16 avail_idx;
__virtio16 ring_head;
int ret, access;
+ bool was_set = !!(vq->used_flags & VRING_USED_F_NO_NOTIFY);

+ if (!was_set)
+ return -EINVAL;
/* Check it isn't doing very strange things with descriptor numbers. */
last_avail_idx = vq->last_avail_idx;


However, I do not understand how do we enter vhost_get_vq_desc
with vq->used_flags & VRING_USED_F_NO_NOTIFY being clear.
Do you?

syzbot

unread,
Feb 21, 2022, 7:46:11 AM2/21/22
to hda...@sina.com, jaso...@redhat.com, linux-...@vger.kernel.org, m...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
WARNING in vhost_dev_cleanup

------------[ cut here ]------------
WARNING: CPU: 1 PID: 4073 at drivers/vhost/vhost.c:718 vhost_dev_cleanup+0x900/0xc20 drivers/vhost/vhost.c:718
Modules linked in:
CPU: 1 PID: 4073 Comm: syz-executor336 Not tainted 5.17.0-rc4-syzkaller-00054-gf71077a4d84b-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:vhost_dev_cleanup+0x900/0xc20 drivers/vhost/vhost.c:718
Code: c7 85 90 01 00 00 00 00 00 00 e8 5b 48 a2 fa 48 89 ef 48 83 c4 28 5b 5d 41 5c 41 5d 41 5e 41 5f e9 35 b1 ff ff e8 40 48 a2 fa <0f> 0b e9 49 ff ff ff 48 8b 7c 24 10 e8 8f da e9 fa e9 93 f7 ff ff
RSP: 0018:ffffc90001fa7ca0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: dffffc0000000000 RCX: 0000000000000000
RDX: ffff88807cadd700 RSI: ffffffff86d695e0 RDI: ffff8880764c00b0
RBP: ffff8880764c0000 R08: 0000000000000000 R09: ffff8880764c00d3
R10: ffffed100ec9801a R11: 0000000000000001 R12: ffff8880764c00d0
R13: ffff8880764c0120 R14: ffff8880764c00d0 R15: 0000000000000002
FS: 0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000002 CR3: 000000000b88e000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
vhost_vsock_dev_release+0x3a4/0x4f0 drivers/vhost/vsock.c:778
__fput+0x286/0x9f0 fs/file_table.c:313
task_work_run+0xdd/0x1a0 kernel/task_work.c:164
exit_task_work include/linux/task_work.h:32 [inline]
do_exit+0xb29/0x2a30 kernel/exit.c:806
do_group_exit+0xd2/0x2f0 kernel/exit.c:935
__do_sys_exit_group kernel/exit.c:946 [inline]
__se_sys_exit_group kernel/exit.c:944 [inline]
__x64_sys_exit_group+0x3a/0x50 kernel/exit.c:944
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fd6d7a48ba9
Code: Unable to access opcode bytes at RIP 0x7fd6d7a48b7f.
RSP: 002b:00007ffcc430a878 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00007fd6d7abd330 RCX: 00007fd6d7a48ba9
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 00007ffcc430aa68
R10: 00007ffcc430aa68 R11: 0000000000000246 R12: 00007fd6d7abd330
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
</TASK>


Tested on:

commit: f71077a4 Merge tag 'mmc-v5.17-rc1-2' of git://git.kern..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
console output: https://syzkaller.appspot.com/x/log.txt?x=11dc90ea700000
kernel config: https://syzkaller.appspot.com/x/.config?x=a78b064590b9f912
dashboard link: https://syzkaller.appspot.com/bug?extid=3140b17cb44a7b174008
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=14afd0b6700000

Hillf Danton

unread,
Feb 21, 2022, 8:00:34 AM2/21/22
to Michael S. Tsirkin, syzbot, jaso...@redhat.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Mon, 21 Feb 2022 05:48:48 -0500 Michael S. Tsirkin wrote:
> On Mon, Feb 21, 2022 at 06:15:38PM +0800, Hillf Danton wrote:
> > On Mon, 21 Feb 2022 04:17:02 -0500 Michael S. Tsirkin wrote:
> > > On Mon, Feb 21, 2022 at 04:52:27PM +0800, Hillf Danton wrote:
> > > > Another round of attempts to quiesce the
> > > > WARNING: CPU: 1 PID: 4069 at drivers/vhost/vhost.c:715 after the
> > > > BUG at drivers/vhost/vhost.c:2337 went home.
> > >
> > > Could you pls clarify what do you mean by "went home" here?
> >
> > The reproducer failed to trigger it.
> >
> > Hillf
>
> You mean this patch?

No, it is part of the first round.
>
> @@ -2207,7 +2209,10 @@ int vhost_get_vq_desc(struct vhost_virtq
> __virtio16 avail_idx;
> __virtio16 ring_head;
> int ret, access;
> + bool was_set = !!(vq->used_flags & VRING_USED_F_NO_NOTIFY);
>
> + if (!was_set)
> + return -EINVAL;
> /* Check it isn't doing very strange things with descriptor numbers. */
> last_avail_idx = vq->last_avail_idx;
>
>
> However, I do not understand how do we enter vhost_get_vq_desc
> with vq->used_flags & VRING_USED_F_NO_NOTIFY being clear.
> Do you?

The diff below turned BUG in to WARNING, and you can see it in one of the
mails in your inbox as you are on the Cc list.

Hillf
---<---

The re-trigger of the BUG_ON sends us to the start point and looks like it
could not be solved without a mind refresh.

Add a flag to vsock and set it before work flush upon release, and no more
works will be queued with it turned on.

syzbot

unread,
Feb 21, 2022, 8:03:20 AM2/21/22
to sgar...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+3140b1...@syzkaller.appspotmail.com

Tested on:

commit: 4951112b vhost/vsock: don't check owner in vhost_vsock..
git tree: https://github.com/stefano-garzarella/linux.git vsock-fix-stop
kernel config: https://syzkaller.appspot.com/x/.config?x=96b2c57ab158898c
dashboard link: https://syzkaller.appspot.com/bug?extid=3140b17cb44a7b174008
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Note: no patches were applied.

Stefano Garzarella

unread,
Feb 21, 2022, 8:09:34 AM2/21/22
to syzbot, hda...@sina.com, jaso...@redhat.com, linux-...@vger.kernel.org, m...@redhat.com, syzkall...@googlegroups.com
It seems that this patch [1] should fix also this issue. (syzbot seems
happy).

I think because we didn't set the backed to NULL, the worker kept
running and messing up.

Stefano

[1]
https://lore.kernel.org/virtualization/20220221114916.1...@redhat.com/T/#u

Hillf Danton

unread,
Feb 21, 2022, 8:36:59 AM2/21/22
to Stefano Garzarella, syzbot, jaso...@redhat.com, linux-...@vger.kernel.org, m...@redhat.com, syzkall...@googlegroups.com
Hey Stefano,

On Mon, 21 Feb 2022 14:09:26 +0100 Stefano Garzarella wrote:
> It seems that this patch [1] should fix also this issue. (syzbot seems
> happy).

What do you mean by happy?
Why not feed it to syzbot if it is a good fix, given a test-by tag can
speak for itself?

Hillf

Stefano Garzarella

unread,
Feb 21, 2022, 8:45:25 AM2/21/22
to Hillf Danton, syzbot, jaso...@redhat.com, linux-...@vger.kernel.org, m...@redhat.com, syzkall...@googlegroups.com
On Mon, Feb 21, 2022 at 09:36:46PM +0800, Hillf Danton wrote:
>Hey Stefano,
>
>On Mon, 21 Feb 2022 14:09:26 +0100 Stefano Garzarella wrote:
>> It seems that this patch [1] should fix also this issue. (syzbot seems
>> happy).
>
>What do you mean by happy?
>Why not feed it to syzbot if it is a good fix, given a test-by tag can
>speak for itself?

Because I sent the patch this morning for another report:
https://syzkaller.appspot.com/bug?extid=1e3ea63db39f2b4440e0

Then I asked syzbot for this report to test my branch with that patch
applied and the result is OK.

Is there any way to ask syzbot to test a patch already posted to the
mailing list? (instead of sending it back to it again)

Stefano

Michael S. Tsirkin

unread,
Feb 21, 2022, 8:58:56 AM2/21/22
to Hillf Danton, syzbot, jaso...@redhat.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Right. So it's not a fix, it's just a work around, and we still need to
understand how we can get into this state.

> Hillf
> ---<---
>
> The re-trigger of the BUG_ON sends us to the start point and looks like it
> could not be solved without a mind refresh.

I don't understand this sentence btw. How does BUG_ON send us to the
start point? what is the start point? and what is a mind refresh?

Michael S. Tsirkin

unread,
Feb 21, 2022, 9:00:10 AM2/21/22
to Stefano Garzarella, Hillf Danton, syzbot, jaso...@redhat.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
I don't know of a way, but hey, sending it back isn't too bad,
just mention this in the mail text.

Stefano Garzarella

unread,
Feb 21, 2022, 9:05:02 AM2/21/22
to Michael S. Tsirkin, Hillf Danton, syzbot, jaso...@redhat.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Okay, I'll do also for another report.

Thanks,
Stefano

Hillf Danton

unread,
Feb 21, 2022, 9:06:11 AM2/21/22
to syzbot, jaso...@redhat.com, linux-...@vger.kernel.org, m...@redhat.com, syzkall...@googlegroups.com
On Mon, 21 Feb 2022 04:46:09 -0800
Another round of attempts to quiesce the
WARNING: CPU: 1 PID: 4069 at drivers/vhost/vhost.c:715 after the
BUG at drivers/vhost/vhost.c:2337 went home.

V2: Flush works before vq reset.
--- x/drivers/vhost/vhost.c
+++ y/drivers/vhost/vhost.c
@@ -692,6 +692,16 @@ void vhost_dev_cleanup(struct vhost_dev
{
int i;

+ wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM);
+ /* open code vhost_vsock_flush() */
+ for (i = 0; i < dev->nvqs; ++i) {
+ if (dev->vqs[i]->handle_kick) {
+ vhost_poll_stop(&dev->vqs[i]->poll);
+ vhost_poll_flush(&dev->vqs[i]->poll);
+ }
+ }
+ vhost_work_dev_flush(dev);
+
for (i = 0; i < dev->nvqs; ++i) {
if (dev->vqs[i]->error_ctx)
eventfd_ctx_put(dev->vqs[i]->error_ctx);
@@ -711,7 +721,6 @@ void vhost_dev_cleanup(struct vhost_dev

Stefano Garzarella

unread,
Feb 21, 2022, 9:10:00 AM2/21/22
to syzbot, Hillf Danton, Jason Wang, kernel list, Michael Tsirkin, syzkall...@googlegroups.com
Patch sent upstream:
https://lore.kernel.org/virtualization/20220221114916.1...@redhat.com/T/#u

On Sun, Feb 20, 2022 at 3:11 AM syzbot
<syzbot+3140b1...@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot has tested the proposed patch and the reproducer did not trigger any issue:
>
> Reported-and-tested-by: syzbot+3140b1...@syzkaller.appspotmail.com
>
> Tested on:
>
> commit: f71077a4 Merge tag 'mmc-v5.17-rc1-2' of git://git.kern..
> git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
> kernel config: https://syzkaller.appspot.com/x/.config?x=a78b064590b9f912
> dashboard link: https://syzkaller.appspot.com/bug?extid=3140b17cb44a7b174008
> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> patch: https://syzkaller.appspot.com/x/patch.diff?x=143dc0d4700000
0001-vhost-vsock-don-t-check-owner-in-vhost_vsock_stop-wh.patch

syzbot

unread,
Feb 21, 2022, 9:14:10 AM2/21/22
to hda...@sina.com, jaso...@redhat.com, linux-...@vger.kernel.org, m...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot tried to test the proposed patch but the build/boot failed:

failed to create VM pool: failed to create GCE image: create image operation failed: &{Code:PERMISSIONS_ERROR Location: Message:Required 'read' permission for 'disks/ci-upstream-kasan-gce-test-job-test-job-image.tar.gz' ForceSendFields:[] NullFields:[]}.


Tested on:

commit: f71077a4 Merge tag 'mmc-v5.17-rc1-2' of git://git.kern..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
kernel config: https://syzkaller.appspot.com/x/.config?x=a78b064590b9f912
dashboard link: https://syzkaller.appspot.com/bug?extid=3140b17cb44a7b174008
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=1296ea64700000

syzbot

unread,
Feb 21, 2022, 9:25:08 AM2/21/22
to hda...@sina.com, jaso...@redhat.com, linux-...@vger.kernel.org, m...@redhat.com, sgar...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+3140b1...@syzkaller.appspotmail.com

Tested on:

commit: f71077a4 Merge tag 'mmc-v5.17-rc1-2' of git://git.kern..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
kernel config: https://syzkaller.appspot.com/x/.config?x=a78b064590b9f912
dashboard link: https://syzkaller.appspot.com/bug?extid=3140b17cb44a7b174008
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=123f7296700000

Hillf Danton

unread,
Feb 21, 2022, 7:15:07 PM2/21/22
to syzbot, jaso...@redhat.com, linux-...@vger.kernel.org, m...@redhat.com, syzkall...@googlegroups.com
On Mon, 21 Feb 2022 04:46:09 -0800
Another round of attempts to quiesce the
WARNING: CPU: 1 PID: 4069 at drivers/vhost/vhost.c:715 after the
BUG at drivers/vhost/vhost.c:2337 went home.

V3: Before flushing works, info vsock not to requeue works and vhost to
process works synchronously instead of using worker, in bid to ensure
no more pending works after flush.

Hillf
--- x/drivers/vhost/vsock.c
+++ y/drivers/vhost/vsock.c
@@ -55,6 +55,7 @@ struct vhost_vsock {
struct list_head send_pkt_list; /* host->guest pending packets */

atomic_t queued_replies;
+ int cleanup;

u32 guest_cid;
bool seqpacket_allow;
@@ -262,6 +263,9 @@ vhost_transport_do_send_pkt(struct vhost
out:
mutex_unlock(&vq->mutex);

+ if (vsock->cleanup)
+ return;
+
if (restart_tx)
vhost_poll_queue(&tx_vq->poll);
}
@@ -678,6 +682,7 @@ static int vhost_vsock_dev_open(struct i
}

vsock->guest_cid = 0; /* no CID assigned yet */
+ vsock->cleanup = 0;

atomic_set(&vsock->queued_replies, 0);

@@ -741,6 +746,9 @@ static int vhost_vsock_dev_release(struc
{
struct vhost_vsock *vsock = file->private_data;

+ vsock->cleanup = 1;
+ vsock->dev.use_worker = false;

syzbot

unread,
Feb 21, 2022, 7:26:22 PM2/21/22
to hda...@sina.com, jaso...@redhat.com, linux-...@vger.kernel.org, m...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
BUG: sleeping function called from invalid context in vhost_vsock_handle_tx_kick

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:577
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 4050, name: vhost-4049
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
2 locks held by vhost-4049/4050:
#0: ffff88806f3e4c20 (&vq->mutex){+.+.}-{3:3}, at: vhost_vsock_handle_tx_kick+0xbf/0xa10 drivers/vhost/vsock.c:508
#1: ffff88806ee92f20 (&ctx->wqh){....}-{2:2}, at: eventfd_signal+0x77/0x1c0 fs/eventfd.c:75
irq event stamp: 158
hardirqs last enabled at (157): [<ffffffff81ad847c>] lockless_pages_from_mm mm/gup.c:2851 [inline]
hardirqs last enabled at (157): [<ffffffff81ad847c>] internal_get_user_pages_fast+0x17cc/0x2510 mm/gup.c:2893
hardirqs last disabled at (158): [<ffffffff8950a9ce>] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
hardirqs last disabled at (158): [<ffffffff8950a9ce>] _raw_spin_lock_irqsave+0x4e/0x50 kernel/locking/spinlock.c:162
softirqs last enabled at (0): [<ffffffff8145328c>] copy_process+0x1eec/0x7300 kernel/fork.c:2109
softirqs last disabled at (0): [<0000000000000000>] 0x0
Preemption disabled at:
[<0000000000000000>] 0x0
CPU: 1 PID: 4050 Comm: vhost-4049 Not tainted 5.17.0-rc4-syzkaller-00054-gf71077a4d84b-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
__might_resched.cold+0x222/0x26b kernel/sched/core.c:9577
__mutex_lock_common kernel/locking/mutex.c:577 [inline]
__mutex_lock+0x9f/0x12f0 kernel/locking/mutex.c:733
vhost_vsock_handle_tx_kick+0xbf/0xa10 drivers/vhost/vsock.c:508
vhost_poll_wakeup+0xd5/0x130 drivers/vhost/vhost.c:174
__wake_up_common+0x147/0x650 kernel/sched/wait.c:108
eventfd_signal+0x129/0x1c0 fs/eventfd.c:81
vhost_update_used_flags drivers/vhost/vhost.c:1979 [inline]
vhost_update_used_flags+0x34c/0x3d0 drivers/vhost/vhost.c:1966
vhost_disable_notify drivers/vhost/vhost.c:2560 [inline]
vhost_disable_notify+0xbe/0x190 drivers/vhost/vhost.c:2552
vhost_vsock_handle_tx_kick+0x187/0xa10 drivers/vhost/vsock.c:516
vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372
kthread+0x2e9/0x3a0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
</TASK>

=============================
[ BUG: Invalid wait context ]
5.17.0-rc4-syzkaller-00054-gf71077a4d84b-dirty #0 Tainted: G W
-----------------------------
vhost-4049/4050 is trying to lock:
ffff88806f3e4c20 (&vq->mutex){+.+.}-{3:3}, at: vhost_vsock_handle_tx_kick+0xbf/0xa10 drivers/vhost/vsock.c:508
other info that might help us debug this:
context-{4:4}
2 locks held by vhost-4049/4050:
#0: ffff88806f3e4c20 (&vq->mutex){+.+.}-{3:3}, at: vhost_vsock_handle_tx_kick+0xbf/0xa10 drivers/vhost/vsock.c:508
#1: ffff88806ee92f20 (&ctx->wqh){....}-{2:2}, at: eventfd_signal+0x77/0x1c0 fs/eventfd.c:75
stack backtrace:
CPU: 1 PID: 4050 Comm: vhost-4049 Tainted: G W 5.17.0-rc4-syzkaller-00054-gf71077a4d84b-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_lock_invalid_wait_context kernel/locking/lockdep.c:4678 [inline]
check_wait_context kernel/locking/lockdep.c:4739 [inline]
__lock_acquire.cold+0xc5/0x3a9 kernel/locking/lockdep.c:4977
lock_acquire kernel/locking/lockdep.c:5639 [inline]
lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5604
__mutex_lock_common kernel/locking/mutex.c:600 [inline]
__mutex_lock+0x12f/0x12f0 kernel/locking/mutex.c:733
vhost_vsock_handle_tx_kick+0xbf/0xa10 drivers/vhost/vsock.c:508
vhost_poll_wakeup+0xd5/0x130 drivers/vhost/vhost.c:174
__wake_up_common+0x147/0x650 kernel/sched/wait.c:108
eventfd_signal+0x129/0x1c0 fs/eventfd.c:81
vhost_update_used_flags drivers/vhost/vhost.c:1979 [inline]
vhost_update_used_flags+0x34c/0x3d0 drivers/vhost/vhost.c:1966
vhost_disable_notify drivers/vhost/vhost.c:2560 [inline]
vhost_disable_notify+0xbe/0x190 drivers/vhost/vhost.c:2552
vhost_vsock_handle_tx_kick+0x187/0xa10 drivers/vhost/vsock.c:516
vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372
kthread+0x2e9/0x3a0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
</TASK>
BUG: scheduling while atomic: vhost-4049/4050/0x00000002
INFO: lockdep is turned off.
Modules linked in:
irq event stamp: 158
hardirqs last enabled at (157): [<ffffffff81ad847c>] lockless_pages_from_mm mm/gup.c:2851 [inline]
hardirqs last enabled at (157): [<ffffffff81ad847c>] internal_get_user_pages_fast+0x17cc/0x2510 mm/gup.c:2893
hardirqs last disabled at (158): [<ffffffff8950a9ce>] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
hardirqs last disabled at (158): [<ffffffff8950a9ce>] _raw_spin_lock_irqsave+0x4e/0x50 kernel/locking/spinlock.c:162
softirqs last enabled at (0): [<ffffffff8145328c>] copy_process+0x1eec/0x7300 kernel/fork.c:2109
softirqs last disabled at (0): [<0000000000000000>] 0x0
Preemption disabled at:
[<0000000000000000>] 0x0


Tested on:

commit: f71077a4 Merge tag 'mmc-v5.17-rc1-2' of git://git.kern..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
console output: https://syzkaller.appspot.com/x/log.txt?x=12c557bc700000
kernel config: https://syzkaller.appspot.com/x/.config?x=a78b064590b9f912
dashboard link: https://syzkaller.appspot.com/bug?extid=3140b17cb44a7b174008
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=1651ba96700000

Hillf Danton

unread,
Feb 21, 2022, 10:11:43 PM2/21/22
to syzbot, jaso...@redhat.com, linux-...@vger.kernel.org, m...@redhat.com, syzkall...@googlegroups.com
On Mon, 21 Feb 2022 16:26:20 -0800
This is the evidence that vhost_work is requeued, and it cripples work
flush.
Another round of attempts to quiesce the
WARNING: CPU: 1 PID: 4069 at drivers/vhost/vhost.c:715 after the
BUG at drivers/vhost/vhost.c:2337 went home.

V4: Before flushing works, info vsock not to requeue works in bid to ensure
no more pending works after flush.

Hillf

#syz test: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ f71077a4d84b

--- x/drivers/vhost/vsock.c
+++ y/drivers/vhost/vsock.c
@@ -55,6 +55,7 @@ struct vhost_vsock {
struct list_head send_pkt_list; /* host->guest pending packets */

atomic_t queued_replies;
+ int cleanup;

u32 guest_cid;
bool seqpacket_allow;
@@ -262,6 +263,9 @@ vhost_transport_do_send_pkt(struct vhost
out:
mutex_unlock(&vq->mutex);

+ if (vsock->cleanup)
+ return;
+
if (restart_tx)
vhost_poll_queue(&tx_vq->poll);
}
@@ -501,6 +505,9 @@ static void vhost_vsock_handle_tx_kick(s
unsigned int out, in;
bool added = false;

+ if (vsock->cleanup)
+ if (test_and_set_bit(VHOST_WORK_QUEUED, &work->flags))
+ return;
mutex_lock(&vq->mutex);

if (!vhost_vq_get_backend(vq))
@@ -678,6 +685,7 @@ static int vhost_vsock_dev_open(struct i
}

vsock->guest_cid = 0; /* no CID assigned yet */
+ vsock->cleanup = 0;

atomic_set(&vsock->queued_replies, 0);

@@ -741,6 +749,7 @@ static int vhost_vsock_dev_release(struc
{
struct vhost_vsock *vsock = file->private_data;

+ vsock->cleanup = 1;

syzbot

unread,
Feb 21, 2022, 11:07:06 PM2/21/22
to hda...@sina.com, jaso...@redhat.com, linux-...@vger.kernel.org, m...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+3140b1...@syzkaller.appspotmail.com

Tested on:

commit: f71077a4 Merge tag 'mmc-v5.17-rc1-2' of git://git.kern..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
kernel config: https://syzkaller.appspot.com/x/.config?x=a78b064590b9f912
dashboard link: https://syzkaller.appspot.com/bug?extid=3140b17cb44a7b174008
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=11c604ca700000

Hillf Danton

unread,
Feb 22, 2022, 12:41:40 AM2/22/22
to syzbot, jaso...@redhat.com, Stefano Garzarella, linux-...@vger.kernel.org, m...@redhat.com, syzkall...@googlegroups.com
On Mon, 21 Feb 2022 20:07:06 -0800
Then we know so far
1/ the BUG_ON has nothing to do with use after free,
2/ the requeue of vhost work is the culprit for both warning and BUG below,
3/ the reasons for adding both

WARNING: CPU: 1 PID: 4069 at drivers/vhost/vhost.c:715 and
BUG at drivers/vhost/vhost.c:2337

are insane because requeue of work cripples work flush by define.

Hillf

Lee Jones

unread,
Mar 2, 2022, 3:29:45 AM3/2/22
to Michael S. Tsirkin, syzbot, jaso...@redhat.com, k...@vger.kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com, virtual...@lists.linux-foundation.org
On Fri, 18 Feb 2022, Michael S. Tsirkin wrote:

> On Thu, Feb 17, 2022 at 05:21:20PM -0800, syzbot wrote:
> > syzbot has found a reproducer for the following issue on:
> >
> > HEAD commit: f71077a4d84b Merge tag 'mmc-v5.17-rc1-2' of git://git.kern..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=104c04ca700000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=a78b064590b9f912
> > dashboard link: https://syzkaller.appspot.com/bug?extid=3140b17cb44a7b174008
> > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1362e232700000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11373a6c700000
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+3140b1...@syzkaller.appspotmail.com
> >
> > ------------[ cut here ]------------
> > kernel BUG at drivers/vhost/vhost.c:2335!
> > invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> > CPU: 1 PID: 3597 Comm: vhost-3596 Not tainted 5.17.0-rc4-syzkaller-00054-gf71077a4d84b #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > RIP: 0010:vhost_get_vq_desc+0x1d43/0x22c0 drivers/vhost/vhost.c:2335
> > Code: 00 00 00 48 c7 c6 20 2c 9d 8a 48 c7 c7 98 a6 8e 8d 48 89 ca 48 c1 e1 04 48 01 d9 e8 b7 59 28 fd e9 74 ff ff ff e8 5d c8 a1 fa <0f> 0b e8 56 c8 a1 fa 48 8b 54 24 18 48 b8 00 00 00 00 00 fc ff df
> > RSP: 0018:ffffc90001d1fb88 EFLAGS: 00010293
> > RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
> > RDX: ffff8880234b0000 RSI: ffffffff86d715c3 RDI: 0000000000000003
> > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
> > R10: ffffffff86d706bc R11: 0000000000000000 R12: ffff888072c24d68
> > R13: 0000000000000000 R14: dffffc0000000000 R15: ffff888072c24bb0
> > FS: 0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000002 CR3: 000000007902c000 CR4: 00000000003506e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> > <TASK>
> > vhost_vsock_handle_tx_kick+0x277/0xa20 drivers/vhost/vsock.c:522
> > vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372
> > kthread+0x2e9/0x3a0 kernel/kthread.c:377
> > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
>
> I don't see how this can trigger normally so I'm assuming
> another case of use after free.

Yes, exactly.

I patched it. Please see:

https://lore.kernel.org/all/20220302075421.21...@linaro.org/T/#t

--
Lee Jones [李琼斯]
Principal Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

syzbot

unread,
Mar 2, 2022, 4:10:17 AM3/2/22
to sgar...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+3140b1...@syzkaller.appspotmail.com

Tested on:

commit: 4951112b vhost/vsock: don't check owner in vhost_vsock..
git tree: https://github.com/stefano-garzarella/linux.git vsock-fix-stop
kernel config: https://syzkaller.appspot.com/x/.config?x=96b2c57ab158898c
dashboard link: https://syzkaller.appspot.com/bug?extid=3140b17cb44a7b174008
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Note: no patches were applied.

Stefano Garzarella

unread,
Mar 2, 2022, 4:18:21 AM3/2/22
to Lee Jones, Michael S. Tsirkin, k...@vger.kernel.org, syzbot, net...@vger.kernel.org, syzkall...@googlegroups.com, linux-...@vger.kernel.org, virtual...@lists.linux-foundation.org
I think this issue is related to the issue fixed by this patch merged
some days ago upstream:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a58da53ffd70294ebea8ecd0eb45fd0d74add9f9
I'm not sure that patch is avoiding the issue. I'll reply to it.

Thanks,
Stefano

Stefano Garzarella

unread,
Mar 2, 2022, 4:23:30 AM3/2/22
to Lee Jones, Michael S. Tsirkin, k...@vger.kernel.org, syzbot, net...@vger.kernel.org, syzkall...@googlegroups.com, linux-...@vger.kernel.org, virtual...@lists.linux-foundation.org
My bad, I think it should be fine, because vhost_vq_reset() set
vq->private_data to NULL and avoids the worker to run.

Thanks,
Stefano

Reply all
Reply to author
Forward
0 new messages