[syzbot] [kvm?] WARNING in vmx_handle_exit (2)

9 views
Skip to first unread message

syzbot

unread,
Dec 11, 2024, 8:12:27 AM12/11/24
to b...@alien8.de, dave....@linux.intel.com, h...@zytor.com, k...@vger.kernel.org, linux-...@vger.kernel.org, mi...@redhat.com, pbon...@redhat.com, sea...@google.com, syzkall...@googlegroups.com, tg...@linutronix.de, x...@kernel.org
Hello,

syzbot found the following issue on:

HEAD commit: b5f217084ab3 Merge tag 'bpf-fixes' of git://git.kernel.org..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1226b330580000
kernel config: https://syzkaller.appspot.com/x/.config?x=9d99f0bff41614d0
dashboard link: https://syzkaller.appspot.com/bug?extid=ac0bc3a70282b4d586cc
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17d10820580000

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-b5f21708.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/4a2037d50b27/vmlinux-b5f21708.xz
kernel image: https://storage.googleapis.com/syzbot-assets/e9e9c9c88191/bzImage-b5f21708.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+ac0bc3...@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: CPU: 3 PID: 6336 at arch/x86/kvm/vmx/vmx.c:6480 __vmx_handle_exit arch/x86/kvm/vmx/vmx.c:6480 [inline]
WARNING: CPU: 3 PID: 6336 at arch/x86/kvm/vmx/vmx.c:6480 vmx_handle_exit+0x40f/0x1f70 arch/x86/kvm/vmx/vmx.c:6637
Modules linked in:
CPU: 3 UID: 0 PID: 6336 Comm: syz.0.73 Not tainted 6.13.0-rc1-syzkaller-00316-gb5f217084ab3 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:__vmx_handle_exit arch/x86/kvm/vmx/vmx.c:6480 [inline]
RIP: 0010:vmx_handle_exit+0x40f/0x1f70 arch/x86/kvm/vmx/vmx.c:6637
Code: 07 38 d0 7f 08 84 c0 0f 85 b1 11 00 00 44 0f b6 a5 49 99 00 00 31 ff 44 89 e6 e8 8c 73 68 00 45 84 e4 75 52 e8 a2 71 68 00 90 <0f> 0b 90 48 8d bd 4a 99 00 00 c6 85 49 99 00 00 01 48 b8 00 00 00
RSP: 0018:ffffc90003a57a58 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88803fa10000 RCX: ffffffff81319494
RDX: ffff888021152440 RSI: ffffffff8131949e RDI: 0000000000000001
RBP: ffffc900066bf000 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000080000021 R14: ffff88803fa102d8 R15: dffffc0000000000
FS: 00007f5d3ac1e6c0(0000) GS:ffff88806a900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000001200e000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
vcpu_enter_guest arch/x86/kvm/x86.c:11081 [inline]
vcpu_run+0x3047/0x4f50 arch/x86/kvm/x86.c:11242
kvm_arch_vcpu_ioctl_run+0x44a/0x1740 arch/x86/kvm/x86.c:11560
kvm_vcpu_ioctl+0x6ce/0x1520 virt/kvm/kvm_main.c:4340
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:906 [inline]
__se_sys_ioctl fs/ioctl.c:892 [inline]
__x64_sys_ioctl+0x190/0x200 fs/ioctl.c:892
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f5d39d7fed9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f5d3ac1e058 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f5d39f46080 RCX: 00007f5d39d7fed9
RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005
RBP: 00007f5d39df3cc8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f5d39f46080 R15: 00007ffdd579bc48
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

syzbot

unread,
Dec 22, 2024, 6:10:28 PM12/22/24
to b...@alien8.de, dave....@linux.intel.com, h...@zytor.com, k...@vger.kernel.org, linux-...@vger.kernel.org, mi...@redhat.com, pbon...@redhat.com, sea...@google.com, syzkall...@googlegroups.com, tg...@linutronix.de, x...@kernel.org
syzbot has found a reproducer for the following issue on:

HEAD commit: bcde95ce32b6 Merge tag 'devicetree-fixes-for-6.13-1' of gi..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=10635fe8580000
kernel config: https://syzkaller.appspot.com/x/.config?x=4f1586bab1323870
dashboard link: https://syzkaller.appspot.com/bug?extid=ac0bc3a70282b4d586cc
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=129c58c4580000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=134e5f30580000

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-bcde95ce.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/d1b2e8d294e3/vmlinux-bcde95ce.xz
kernel image: https://storage.googleapis.com/syzbot-assets/593ff4631acc/bzImage-bcde95ce.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+ac0bc3...@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: CPU: 1 PID: 6008 at arch/x86/kvm/vmx/vmx.c:6480 __vmx_handle_exit arch/x86/kvm/vmx/vmx.c:6480 [inline]
WARNING: CPU: 1 PID: 6008 at arch/x86/kvm/vmx/vmx.c:6480 vmx_handle_exit+0x40f/0x1f70 arch/x86/kvm/vmx/vmx.c:6637
Modules linked in:
CPU: 1 UID: 0 PID: 6008 Comm: syz-executor324 Not tainted 6.13.0-rc3-syzkaller-00301-gbcde95ce32b6 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:__vmx_handle_exit arch/x86/kvm/vmx/vmx.c:6480 [inline]
RIP: 0010:vmx_handle_exit+0x40f/0x1f70 arch/x86/kvm/vmx/vmx.c:6637
Code: 07 38 d0 7f 08 84 c0 0f 85 b1 11 00 00 44 0f b6 a5 49 99 00 00 31 ff 44 89 e6 e8 4c 86 68 00 45 84 e4 75 52 e8 62 84 68 00 90 <0f> 0b 90 48 8d bd 4a 99 00 00 c6 85 49 99 00 00 01 48 b8 00 00 00
RSP: 0018:ffffc90003d17a58 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff888031ba0000 RCX: ffffffff81319144
RDX: ffff8880230ea440 RSI: ffffffff8131914e RDI: 0000000000000001
RBP: ffffc9000428c000 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000080000021 R14: ffff888031ba02d8 R15: dffffc0000000000
FS: 00007f4811d6b6c0(0000) GS:ffff88806a700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4811d4ad58 CR3: 0000000030878000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
vcpu_enter_guest arch/x86/kvm/x86.c:11081 [inline]
vcpu_run+0x3047/0x4f50 arch/x86/kvm/x86.c:11242
kvm_arch_vcpu_ioctl_run+0x44a/0x1740 arch/x86/kvm/x86.c:11560
kvm_vcpu_ioctl+0x6ce/0x1520 virt/kvm/kvm_main.c:4340
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:906 [inline]
__se_sys_ioctl fs/ioctl.c:892 [inline]
__x64_sys_ioctl+0x190/0x200 fs/ioctl.c:892
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f4811dbb649
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 1c 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f4811d6b168 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f4811e3d348 RCX: 00007f4811dbb649
RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000007
RBP: 00007f4811e3d340 R08: 00007f4811d6b6c0 R09: 0000000000000000
R10: 00007f4811d6b6c0 R11: 0000000000000246 R12: 00007f4811e3d34c
R13: 0000000000000000 R14: 00007ffea79b5e30 R15: 00007ffea79b5f18
</TASK>


---

Sean Christopherson

unread,
Feb 12, 2025, 5:50:54 PMFeb 12
to James Houghton, syzbot+ac0bc3...@syzkaller.appspotmail.com, b...@alien8.de, dave....@linux.intel.com, h...@zytor.com, k...@vger.kernel.org, linux-...@vger.kernel.org, mi...@redhat.com, pbon...@redhat.com, syzkall...@googlegroups.com, tg...@linutronix.de, x...@kernel.org
On Wed, Feb 12, 2025, James Houghton wrote:
> Here's what I think is going on (with the C repro anyway):
>
> 1. KVM_RUN a nested VM, and eventually we end up with
> nested_run_pending=1.
> 2. Exit KVM_RUN with EINTR (or any reason really, but I see EINTR in
> repro attempts).
> 3. KVM_SET_REGS to set rflags to 0x1ac585, which has X86_EFLAGS_VM,
> flipping it and setting vmx->emulation_required = true.
> 3. KVM_RUN again. vmx->emulation_required will stop KVM from clearing
> nested_run_pending, and then we hit the
> KVM_BUG_ON(nested_run_pending) in __vmx_handle_exit().
>
> So I guess the KVM_BUG_ON() is a little bit too conservative, but this
> is nonsensical VMM behavior. So I'm not really sure what the best
> solution is. Sean, any thoughts?

Heh, deja vu. This is essentially the same thing that was fixed by commit
fc4fad79fc3d ("KVM: VMX: Reject KVM_RUN if emulation is required with pending
exception"), just with a different WARN.

This should fix it. Checking nested_run_pending in handle_invalid_guest_state()
is overkill, but it can't possibly do any harm, and the weirdness can be addressed
with a comment.

---
arch/x86/kvm/vmx/vmx.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f72835e85b6d..8c9428244cc6 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5869,11 +5869,17 @@ static int handle_nmi_window(struct kvm_vcpu *vcpu)
return 1;
}

-static bool vmx_emulation_required_with_pending_exception(struct kvm_vcpu *vcpu)
+static bool vmx_unhandleable_emulation_required(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);

- return vmx->emulation_required && !vmx->rmode.vm86_active &&
+ if (!vmx->emulation_required)
+ return false;
+
+ if (vmx->nested.nested_run_pending)
+ return true;
+
+ return !vmx->rmode.vm86_active &&
(kvm_is_exception_pending(vcpu) || vcpu->arch.exception.injected);
}

@@ -5896,7 +5902,7 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
if (!kvm_emulate_instruction(vcpu, 0))
return 0;

- if (vmx_emulation_required_with_pending_exception(vcpu)) {
+ if (vmx_unhandleable_emulation_required(vcpu)) {
kvm_prepare_emulation_failure_exit(vcpu);
return 0;
}
@@ -5920,7 +5926,7 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)

int vmx_vcpu_pre_run(struct kvm_vcpu *vcpu)
{
- if (vmx_emulation_required_with_pending_exception(vcpu)) {
+ if (vmx_unhandleable_emulation_required(vcpu)) {
kvm_prepare_emulation_failure_exit(vcpu);
return 0;
}

base-commit: b1da62b213ed5f01d7ead4d14e9d51b48b6256e4
--

James Houghton

unread,
Feb 13, 2025, 4:20:29 AMFeb 13
to syzbot+ac0bc3...@syzkaller.appspotmail.com, sea...@google.com, b...@alien8.de, dave....@linux.intel.com, h...@zytor.com, k...@vger.kernel.org, linux-...@vger.kernel.org, mi...@redhat.com, pbon...@redhat.com, syzkall...@googlegroups.com, tg...@linutronix.de, x...@kernel.org

James Houghton

unread,
Feb 13, 2025, 4:20:47 AMFeb 13
to Sean Christopherson, syzbot+ac0bc3...@syzkaller.appspotmail.com, b...@alien8.de, dave....@linux.intel.com, h...@zytor.com, k...@vger.kernel.org, linux-...@vger.kernel.org, mi...@redhat.com, pbon...@redhat.com, syzkall...@googlegroups.com, tg...@linutronix.de, x...@kernel.org
On Wed, Feb 12, 2025 at 2:50 PM Sean Christopherson <sea...@google.com> wrote:
>
> On Wed, Feb 12, 2025, James Houghton wrote:
> > Here's what I think is going on (with the C repro anyway):
> >
> > 1. KVM_RUN a nested VM, and eventually we end up with
> > nested_run_pending=1.
> > 2. Exit KVM_RUN with EINTR (or any reason really, but I see EINTR in
> > repro attempts).
> > 3. KVM_SET_REGS to set rflags to 0x1ac585, which has X86_EFLAGS_VM,
> > flipping it and setting vmx->emulation_required = true.
> > 3. KVM_RUN again. vmx->emulation_required will stop KVM from clearing
> > nested_run_pending, and then we hit the
> > KVM_BUG_ON(nested_run_pending) in __vmx_handle_exit().
> >
> > So I guess the KVM_BUG_ON() is a little bit too conservative, but this
> > is nonsensical VMM behavior. So I'm not really sure what the best
> > solution is. Sean, any thoughts?
>
> Heh, deja vu. This is essentially the same thing that was fixed by commit
> fc4fad79fc3d ("KVM: VMX: Reject KVM_RUN if emulation is required with pending
> exception"), just with a different WARN.
>
> This should fix it. Checking nested_run_pending in handle_invalid_guest_state()
> is overkill, but it can't possibly do any harm, and the weirdness can be addressed
> with a comment.

Thanks Sean! This works, feel free to add:

Tested-by: James Houghton <jthou...@google.com>

I understand this fix as "KVM cannot emulate a nested vm-enter, so if
emulation is required and we have a pending vm-enter, exit to
userspace." (This doesn't seem overkill to me... perhaps this
explanation is wrong.)

Sean Christopherson

unread,
Feb 13, 2025, 10:47:39 AMFeb 13
to James Houghton, syzbot+ac0bc3...@syzkaller.appspotmail.com, b...@alien8.de, dave....@linux.intel.com, h...@zytor.com, k...@vger.kernel.org, linux-...@vger.kernel.org, mi...@redhat.com, pbon...@redhat.com, syzkall...@googlegroups.com, tg...@linutronix.de, x...@kernel.org
Sort of. It's a horribly convoluted scenario that's exists only because early Intel
CPUs supported a half-baked version of VMX.

Emulation is "required" if and only if guest state is invalid, and VMRESUME/VMLAUNCH
VM-Fail (architecturally) if guest state is invalid. Thus the only way for emulation
to be required when a nested VM-Enter is pending, i.e. after nested VMRESUME/VMLAUNCH
has succeeded but before KVM has entered L2 to complete emulation, is if KVM misses a
VM-Fail consistency check, or as is the case here, if userspace stuffs invalid state
while KVM is partway through VMRESUME/VMLAUNCH emulation.

Stuffing state from userspace doesn't put KVM in harm's way, but KVM can't emulate
the impossible state, and more importantly, it trips KVM's sanity check that detects
missed consistency checks. The KVM_BUG_ON() could also be suppressed by moving the
nested_run_pending check below the emulation_required checks (see below), but that
would largely defeat the purpose of the sanity check.

Just out of sight in the below diff is related handling for the case where userspace,
or the guest itself via modifying SMRAM before RSM, stuffs bad state. I.e. it's
the same scenario this syzkaller program hit, minus hitting the nested_run_pending=true
window.

/*
* Synthesize a triple fault if L2 state is invalid. In normal
* operation, nested VM-Enter rejects any attempt to enter L2
* with invalid state. However, those checks are skipped if
* state is being stuffed via RSM or KVM_SET_NESTED_STATE. If
* L2 state is invalid, it means either L1 modified SMRAM state
* or userspace provided bad state. Synthesize TRIPLE_FAULT as
* doing so is architecturally allowed in the RSM case, and is
* the least awful solution for the userspace case without
* risking false positives.
*/
if (vmx->emulation_required) {
nested_vmx_vmexit(vcpu, EXIT_REASON_TRIPLE_FAULT, 0, 0);
return 1;
}

The extra wrinkle in all of this is that emulation_required is only ever set if
the vCPU lacks Unrestricted Guest (URG). All CPUs since Westmere support URG,
while KVM does allow disabling URG via module param, AFAIK syzbot doesn't run in
environments with enable_unrestricted_guest=0 (other people do run syzkaller in
such setups, but syzbot does not).

And so the only way guest state to be invalid (for emulation_required to be set),
is if L1 is running L2 with URG disabled. I.e. KVM _could_ simply run L2, but
doing so would violate the VMX architecture from L1's perspective.

static inline bool vmx_guest_state_valid(struct kvm_vcpu *vcpu)
{
return is_unrestricted_guest(vcpu) || __vmx_guest_state_valid(vcpu);
}

static inline bool is_unrestricted_guest(struct kvm_vcpu *vcpu)
{
return enable_unrestricted_guest && (!is_guest_mode(vcpu) ||
(secondary_exec_controls_get(to_vmx(vcpu)) &
SECONDARY_EXEC_UNRESTRICTED_GUEST));
}

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f72835e85b6d..42bee8f2cffb 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6492,15 +6492,6 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
if (enable_pml && !is_guest_mode(vcpu))
vmx_flush_pml_buffer(vcpu);

- /*
- * KVM should never reach this point with a pending nested VM-Enter.
- * More specifically, short-circuiting VM-Entry to emulate L2 due to
- * invalid guest state should never happen as that means KVM knowingly
- * allowed a nested VM-Enter with an invalid vmcs12. More below.
- */
- if (KVM_BUG_ON(vmx->nested.nested_run_pending, vcpu->kvm))
- return -EIO;
-
if (is_guest_mode(vcpu)) {
/*
* PML is never enabled when running L2, bail immediately if a
@@ -6538,10 +6529,16 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
return 1;
}

+ if (KVM_BUG_ON(vmx->nested.nested_run_pending, vcpu->kvm))
+ return -EIO;
+
if (nested_vmx_reflect_vmexit(vcpu))
return 1;
}

+ if (KVM_BUG_ON(vmx->nested.nested_run_pending, vcpu->kvm))
+ return -EIO;
+
/* If guest state is invalid, start emulating. L2 is handled above. */
if (vmx->emulation_required)
return handle_invalid_guest_state(vcpu);

Reply all
Reply to author
Forward
0 new messages