Hello,
syzbot has tested the proposed patch but the reproducer is still triggering an issue:
BUG: sleeping function called from invalid context in vhost_vsock_handle_tx_kick
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:577
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 4050, name: vhost-4049
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
2 locks held by vhost-4049/4050:
#0: ffff88806f3e4c20 (&vq->mutex){+.+.}-{3:3}, at: vhost_vsock_handle_tx_kick+0xbf/0xa10 drivers/vhost/vsock.c:508
#1: ffff88806ee92f20 (&ctx->wqh){....}-{2:2}, at: eventfd_signal+0x77/0x1c0 fs/eventfd.c:75
irq event stamp: 158
hardirqs last enabled at (157): [<ffffffff81ad847c>] lockless_pages_from_mm mm/gup.c:2851 [inline]
hardirqs last enabled at (157): [<ffffffff81ad847c>] internal_get_user_pages_fast+0x17cc/0x2510 mm/gup.c:2893
hardirqs last disabled at (158): [<ffffffff8950a9ce>] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
hardirqs last disabled at (158): [<ffffffff8950a9ce>] _raw_spin_lock_irqsave+0x4e/0x50 kernel/locking/spinlock.c:162
softirqs last enabled at (0): [<ffffffff8145328c>] copy_process+0x1eec/0x7300 kernel/fork.c:2109
softirqs last disabled at (0): [<0000000000000000>] 0x0
Preemption disabled at:
[<0000000000000000>] 0x0
CPU: 1 PID: 4050 Comm: vhost-4049 Not tainted 5.17.0-rc4-syzkaller-00054-gf71077a4d84b-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
__might_resched.cold+0x222/0x26b kernel/sched/core.c:9577
__mutex_lock_common kernel/locking/mutex.c:577 [inline]
__mutex_lock+0x9f/0x12f0 kernel/locking/mutex.c:733
vhost_vsock_handle_tx_kick+0xbf/0xa10 drivers/vhost/vsock.c:508
vhost_poll_wakeup+0xd5/0x130 drivers/vhost/vhost.c:174
__wake_up_common+0x147/0x650 kernel/sched/wait.c:108
eventfd_signal+0x129/0x1c0 fs/eventfd.c:81
vhost_update_used_flags drivers/vhost/vhost.c:1979 [inline]
vhost_update_used_flags+0x34c/0x3d0 drivers/vhost/vhost.c:1966
vhost_disable_notify drivers/vhost/vhost.c:2560 [inline]
vhost_disable_notify+0xbe/0x190 drivers/vhost/vhost.c:2552
vhost_vsock_handle_tx_kick+0x187/0xa10 drivers/vhost/vsock.c:516
vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372
kthread+0x2e9/0x3a0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
</TASK>
=============================
[ BUG: Invalid wait context ]
5.17.0-rc4-syzkaller-00054-gf71077a4d84b-dirty #0 Tainted: G W
-----------------------------
vhost-4049/4050 is trying to lock:
ffff88806f3e4c20 (&vq->mutex){+.+.}-{3:3}, at: vhost_vsock_handle_tx_kick+0xbf/0xa10 drivers/vhost/vsock.c:508
other info that might help us debug this:
context-{4:4}
2 locks held by vhost-4049/4050:
#0: ffff88806f3e4c20 (&vq->mutex){+.+.}-{3:3}, at: vhost_vsock_handle_tx_kick+0xbf/0xa10 drivers/vhost/vsock.c:508
#1: ffff88806ee92f20 (&ctx->wqh){....}-{2:2}, at: eventfd_signal+0x77/0x1c0 fs/eventfd.c:75
stack backtrace:
CPU: 1 PID: 4050 Comm: vhost-4049 Tainted: G W 5.17.0-rc4-syzkaller-00054-gf71077a4d84b-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_lock_invalid_wait_context kernel/locking/lockdep.c:4678 [inline]
check_wait_context kernel/locking/lockdep.c:4739 [inline]
__lock_acquire.cold+0xc5/0x3a9 kernel/locking/lockdep.c:4977
lock_acquire kernel/locking/lockdep.c:5639 [inline]
lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5604
__mutex_lock_common kernel/locking/mutex.c:600 [inline]
__mutex_lock+0x12f/0x12f0 kernel/locking/mutex.c:733
vhost_vsock_handle_tx_kick+0xbf/0xa10 drivers/vhost/vsock.c:508
vhost_poll_wakeup+0xd5/0x130 drivers/vhost/vhost.c:174
__wake_up_common+0x147/0x650 kernel/sched/wait.c:108
eventfd_signal+0x129/0x1c0 fs/eventfd.c:81
vhost_update_used_flags drivers/vhost/vhost.c:1979 [inline]
vhost_update_used_flags+0x34c/0x3d0 drivers/vhost/vhost.c:1966
vhost_disable_notify drivers/vhost/vhost.c:2560 [inline]
vhost_disable_notify+0xbe/0x190 drivers/vhost/vhost.c:2552
vhost_vsock_handle_tx_kick+0x187/0xa10 drivers/vhost/vsock.c:516
vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372
kthread+0x2e9/0x3a0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
</TASK>
BUG: scheduling while atomic: vhost-4049/4050/0x00000002
INFO: lockdep is turned off.
Modules linked in:
irq event stamp: 158
hardirqs last enabled at (157): [<ffffffff81ad847c>] lockless_pages_from_mm mm/gup.c:2851 [inline]
hardirqs last enabled at (157): [<ffffffff81ad847c>] internal_get_user_pages_fast+0x17cc/0x2510 mm/gup.c:2893
hardirqs last disabled at (158): [<ffffffff8950a9ce>] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
hardirqs last disabled at (158): [<ffffffff8950a9ce>] _raw_spin_lock_irqsave+0x4e/0x50 kernel/locking/spinlock.c:162
softirqs last enabled at (0): [<ffffffff8145328c>] copy_process+0x1eec/0x7300 kernel/fork.c:2109
softirqs last disabled at (0): [<0000000000000000>] 0x0
Preemption disabled at:
[<0000000000000000>] 0x0
console output:
https://syzkaller.appspot.com/x/log.txt?x=12c557bc700000
patch:
https://syzkaller.appspot.com/x/patch.diff?x=1651ba96700000