[syzbot] [kernel?] WARNING in __vhost_task_wake

0 views
Skip to first unread message

syzbot

unread,
Sep 17, 2025, 1:14:29 PMĀ (3 days ago)Ā Sep 17
to linux-...@vger.kernel.org, lu...@kernel.org, pet...@infradead.org, syzkall...@googlegroups.com, tg...@linutronix.de
Hello,

syzbot found the following issue on:

HEAD commit: ae2d20002576 Add linux-next specific files for 20250917
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=149ec534580000
kernel config: https://syzkaller.appspot.com/x/.config?x=7dcbc33245a844f3
dashboard link: https://syzkaller.appspot.com/bug?extid=a1a3cefd6148c781117c
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/7c34033bfd08/disk-ae2d2000.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/5b001294bb15/vmlinux-ae2d2000.xz
kernel image: https://storage.googleapis.com/syzbot-assets/83d50ef44860/bzImage-ae2d2000.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+a1a3ce...@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: kernel/vhost_task.c:97 at __vhost_task_wake+0xbb/0xd0 kernel/vhost_task.c:97, CPU#1: syz.3.28/6112
Modules linked in:
CPU: 1 UID: 0 PID: 6112 Comm: syz.3.28 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025
RIP: 0010:__vhost_task_wake+0xbb/0xd0 kernel/vhost_task.c:97
Code: 38 00 74 08 48 89 df e8 93 81 95 00 48 8b 3b 5b 41 5e 41 5f e9 a6 45 01 00 e8 31 ef 30 00 90 0f 0b 90 eb 8b e8 26 ef 30 00 90 <0f> 0b 90 5b 41 5e 41 5f e9 18 47 f7 09 cc 0f 1f 80 00 00 00 00 90
RSP: 0018:ffffc9000b127a20 EFLAGS: 00010293
RAX: ffffffff818eed7a RBX: ffff8880563ff400 RCX: ffff88807e13bc80
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
RBP: ffffc9000b127af0 R08: ffff8880563ff477 R09: 1ffff1100ac7fe8e
R10: dffffc0000000000 R11: ffffed100ac7fe8f R12: 1ffff92001624f4c
R13: dffffc0000000000 R14: 0000000000000002 R15: dffffc0000000000
FS: 000055556f6ab500(0000) GS:ffff888125ae1000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f9324bff000 CR3: 0000000028336000 CR4: 00000000003526f0
Call Trace:
<TASK>
vhost_worker_queue+0x194/0x260 drivers/vhost/vhost.c:253
__vhost_worker_flush+0x134/0x1e0 drivers/vhost/vhost.c:290
vhost_worker_flush drivers/vhost/vhost.c:303 [inline]
vhost_dev_flush drivers/vhost/vhost.c:313 [inline]
vhost_dev_stop+0x282/0x320 drivers/vhost/vhost.c:1178
vhost_vsock_dev_release+0x203/0x3f0 drivers/vhost/vsock.c:751
__fput+0x44c/0xa70 fs/file_table.c:468
task_work_run+0x1d4/0x260 kernel/task_work.c:227
resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
exit_to_user_mode_loop+0xe9/0x130 kernel/entry/common.c:43
exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline]
syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline]
syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline]
do_syscall_64+0x2bd/0xfa0 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f6e0278eba9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe017861a8 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4
RAX: 0000000000000000 RBX: 00007f6e029d7da0 RCX: 00007f6e0278eba9
RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000003
RBP: 00007f6e029d7da0 R08: 0000000000013868 R09: 0000001a0178649f
R10: 00007f6e029d7cb0 R11: 0000000000000246 R12: 000000000001c379
R13: 00007f6e029d6090 R14: ffffffffffffffff R15: 00007ffe017862c0
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

syzbot

unread,
Sep 17, 2025, 9:46:32 PMĀ (3 days ago)Ā Sep 17
to linux-...@vger.kernel.org, lu...@kernel.org, pet...@infradead.org, syzkall...@googlegroups.com, tg...@linutronix.de
syzbot has found a reproducer for the following issue on:

HEAD commit: ae2d20002576 Add linux-next specific files for 20250917
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=11678f62580000
kernel config: https://syzkaller.appspot.com/x/.config?x=d737cfaddae0058c
dashboard link: https://syzkaller.appspot.com/bug?extid=a1a3cefd6148c781117c
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1790ef62580000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=10242534580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/96197382e3c0/disk-ae2d2000.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/55a8a6ba3307/vmlinux-ae2d2000.xz
kernel image: https://storage.googleapis.com/syzbot-assets/c1b4ed5d6e2c/bzImage-ae2d2000.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+a1a3ce...@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: kernel/vhost_task.c:97 at __vhost_task_wake+0xbb/0xd0 kernel/vhost_task.c:97, CPU#0: syz.0.174/6507
Modules linked in:
CPU: 0 UID: 0 PID: 6507 Comm: syz.0.174 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025
RIP: 0010:__vhost_task_wake+0xbb/0xd0 kernel/vhost_task.c:97
Code: 38 00 74 08 48 89 df e8 93 81 95 00 48 8b 3b 5b 41 5e 41 5f e9 a6 45 01 00 e8 31 ef 30 00 90 0f 0b 90 eb 8b e8 26 ef 30 00 90 <0f> 0b 90 5b 41 5e 41 5f e9 18 c7 ff 09 cc 0f 1f 80 00 00 00 00 90
RSP: 0018:ffffc90003b7f680 EFLAGS: 00010293
RAX: ffffffff818f2d7a RBX: ffff888033c7c400 RCX: ffff88802bc85ac0
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
RBP: ffffc90003b7f750 R08: ffff888033c7c477 R09: 1ffff1100678f88e
R10: dffffc0000000000 R11: ffffed100678f88f R12: 1ffff9200076fed8
R13: dffffc0000000000 R14: 0000000000000002 R15: dffffc0000000000
FS: 0000000000000000(0000) GS:ffff88812579c000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f9942aa0f98 CR3: 0000000027bfc000 CR4: 00000000003526f0
Call Trace:
<TASK>
vhost_worker_queue+0x194/0x260 drivers/vhost/vhost.c:253
__vhost_worker_flush+0x134/0x1e0 drivers/vhost/vhost.c:290
vhost_worker_flush drivers/vhost/vhost.c:303 [inline]
vhost_dev_flush+0xb2/0x130 drivers/vhost/vhost.c:313
vhost_vsock_flush drivers/vhost/vsock.c:698 [inline]
vhost_vsock_dev_release+0x1fb/0x3f0 drivers/vhost/vsock.c:750
__fput+0x44c/0xa70 fs/file_table.c:468
task_work_run+0x1d4/0x260 kernel/task_work.c:227
exit_task_work include/linux/task_work.h:40 [inline]
do_exit+0x6b5/0x2300 kernel/exit.c:966
do_group_exit+0x21c/0x2d0 kernel/exit.c:1107
get_signal+0x1285/0x1340 kernel/signal.c:3034
arch_do_signal_or_restart+0xa0/0x790 arch/x86/kernel/signal.c:337
exit_to_user_mode_loop+0x72/0x130 kernel/entry/common.c:40
exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline]
syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline]
syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline]
do_syscall_64+0x2bd/0xfa0 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f9941b8eba9
Code: Unable to access opcode bytes at 0x7f9941b8eb7f.
RSP: 002b:00007f9942aa10e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 00007f9941dd5fa8 RCX: 00007f9941b8eba9
RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f9941dd5fa8
RBP: 00007f9941dd5fa0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f9941dd6038 R14: 00007ffd2989c130 R15: 00007ffd2989c218
</TASK>


---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

Sean Christopherson

unread,
Sep 18, 2025, 10:56:52 AMĀ (2 days ago)Ā Sep 18
to syzbot, linux-...@vger.kernel.org, lu...@kernel.org, pet...@infradead.org, syzkall...@googlegroups.com, tg...@linutronix.de, Michael S. Tsirkin
+Michael

Michael, this is the VHOST_TASK_FLAGS_KILLED WARN that was added[*] to detect
violations similar to KVM.

/*
* Checking VHOST_TASK_FLAGS_KILLED can race with signal delivery, but
* a race can only result in false negatives and this is just a sanity
* check, i.e. if KILLED is set, the caller is buggy no matter what.
*/
if (WARN_ON_ONCE(test_bit(VHOST_TASK_FLAGS_KILLED, &vtsk->flags)))
return;

I haven't been able to repro the splat, but after much staring I think the issue
is that vhost_task_fn() marks the task KILLED before invoking handle_sigkill().
If vhost_worker_flush() already holds worker->mutex, before vhost_worker_killed()
runs, then it could wake a (not yet dead) task that has KILLED set.

Assuming waiting to set KILLED until after handle_sigkill() resolves the issue
(fingers crossed), the two options I see would be to apply the below as fixup,
or simply drop the sanity check for the 6.17 and add it back in 6.18 in conjunction
with the below (again, assuming it actually resolves the issue).

[*] https://lore.kernel.org/all/20250827194107....@google.com
#syz test

diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c
index 01bf7b0e2c5b..6cb3b8b26768 100644
--- a/kernel/vhost_task.c
+++ b/kernel/vhost_task.c
@@ -58,9 +58,15 @@ static int vhost_task_fn(void *data)
* new work and flushed.
*/
if (!test_bit(VHOST_TASK_FLAGS_STOP, &vtsk->flags)) {
- set_bit(VHOST_TASK_FLAGS_KILLED, &vtsk->flags);
if (vtsk->handle_sigkill)
vtsk->handle_sigkill(vtsk->data);
+
+ /*
+ * Mark the task KILLED *after* giving the owner the chance to
+ * handle SIGKILL to avoid false positives on the sanity check
+ * in __vhost_task_wake().
+ */
+ set_bit(VHOST_TASK_FLAGS_KILLED, &vtsk->flags);
}
mutex_unlock(&vtsk->exit_mutex);
complete(&vtsk->exited);

Michael S. Tsirkin

unread,
Sep 18, 2025, 11:02:30 AMĀ (2 days ago)Ā Sep 18
to Sean Christopherson, syzbot, linux-...@vger.kernel.org, lu...@kernel.org, pet...@infradead.org, syzkall...@googlegroups.com, tg...@linutronix.de
On Thu, Sep 18, 2025 at 07:56:48AM -0700, Sean Christopherson wrote:
> +Michael
>
> Michael, this is the VHOST_TASK_FLAGS_KILLED WARN that was added[*] to detect
> violations similar to KVM.
>
> /*
> * Checking VHOST_TASK_FLAGS_KILLED can race with signal delivery, but
> * a race can only result in false negatives and this is just a sanity
> * check, i.e. if KILLED is set, the caller is buggy no matter what.
> */
> if (WARN_ON_ONCE(test_bit(VHOST_TASK_FLAGS_KILLED, &vtsk->flags)))
> return;
>
> I haven't been able to repro the splat, but after much staring I think the issue
> is that vhost_task_fn() marks the task KILLED before invoking handle_sigkill().
> If vhost_worker_flush() already holds worker->mutex, before vhost_worker_killed()
> runs, then it could wake a (not yet dead) task that has KILLED set.
>
> Assuming waiting to set KILLED until after handle_sigkill() resolves the issue
> (fingers crossed), the two options I see would be to apply the below as fixup,
> or simply drop the sanity check for the 6.17 and add it back in 6.18 in conjunction
> with the below (again, assuming it actually resolves the issue).
>
> [*] https://lore.kernel.org/all/20250827194107....@google.com

I just sent this one to Linus. Enough?

syzbot

unread,
Sep 18, 2025, 11:52:08 AMĀ (2 days ago)Ā Sep 18
to linux-...@vger.kernel.org, lu...@kernel.org, m...@redhat.com, pet...@infradead.org, sea...@google.com, syzkall...@googlegroups.com, tg...@linutronix.de
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-by: syzbot+a1a3ce...@syzkaller.appspotmail.com
Tested-by: syzbot+a1a3ce...@syzkaller.appspotmail.com

Tested on:

commit: ae2d2000 Add linux-next specific files for 20250917
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1070cf62580000
kernel config: https://syzkaller.appspot.com/x/.config?x=d737cfaddae0058c
dashboard link: https://syzkaller.appspot.com/bug?extid=a1a3cefd6148c781117c
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
patch: https://syzkaller.appspot.com/x/patch.diff?x=14bca534580000

Note: testing is done by a robot and is best-effort only.
Reply all
Reply to author
Forward
0 new messages