KVM Nested L2 guest startup problems

Hu Yaohui

unread,

May 1, 2014, 9:50:02 PM5/1/14

to

Hi all,
I have a problem running the latest version of kvm with nested configuration.
I used to run it with kernel 3.2.2 both for L0 and L1, which works perfectly.
When I change my L0 to kernel 3.10.36, L1 to kernel 3.12.10.
When I start L2 guest in L1 with qemu-kvm. I get the following error
from the qemu-kvm.
<log>
KVM: entry failed, hardware error 0x0
EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000623
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000e05b EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 000f0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT= 00000000 0000ffff
IDT= 00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <2e>
66 83 3e 24 d3 00 0f 85 e2 e5 31 c0 8e d0 66 bc 00 70 00 00 66 ba 4c
28 0f 00 e9 07 e5
</log>

In L0, the kernel gives me error like:
<log>
[310681.735709] nested_vmx_exit_handled failed vm entry 7
</log>

I am wondering anyone has met the similar problem.

Best Wishes,
Yaohui
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Paolo Bonzini

unread,

May 2, 2014, 6:00:01 AM5/2/14

to

Il 02/05/2014 03:43, Hu Yaohui ha scritto:
> Hi all,
> I have a problem running the latest version of kvm with nested configuration.
> I used to run it with kernel 3.2.2 both for L0 and L1, which works perfectly.
> When I change my L0 to kernel 3.10.36, L1 to kernel 3.12.10.
> When I start L2 guest in L1 with qemu-kvm. I get the following error
> from the qemu-kvm.

Try upgrading L0 to a more recent kernel, 3.13 should be enough.

Paolo

Hu Yaohui

unread,

May 2, 2014, 11:20:02 AM5/2/14

to

Hi Paolo,
I have tried L0 with linux kernel 3.14.2 and L1 with linux kernel 3.14.2
L1 QEMU qemu-1.7.0
L2 QEMU qemu-1.7.0.
I still get the same error when running qemu in L1 guest.

<log>
KVM: entry failed, hardware error 0x0

EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000f61
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0

ES =0000 00000000 0000ffff 00009300

CS =f000 ffff0000 0000ffff 00009b00

SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT= 00000000 0000ffff
IDT= 00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000

Code=66 83 c8 ff eb 03 66 89 c8 66 5b 66 5e 66 5f 66 5d 66 c3 90 <ea>
5b e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00

</log>

In L0 kvm kernel module. the same error:
<log>
May 2 11:09:51 o46 kernel: [37940.019338] kvm: zapping shadow pages
for mmio generation wraparound
May 2 11:12:32 o46 kernel: [38100.616392] nested_vmx_exit_handled

failed vm entry 7
</log>

Thanks,
Yaohui

Paolo Bonzini

unread,

May 2, 2014, 12:00:03 PM5/2/14

to

Il 02/05/2014 17:17, Hu Yaohui ha scritto:
> Hi Paolo,
> I have tried L0 with linux kernel 3.14.2 and L1 with linux kernel 3.14.2
> L1 QEMU qemu-1.7.0
> L2 QEMU qemu-1.7.0.

Do you mean L0 and L1?

What is your QEMU command line, and what is the processor? Also, what
guest you are running?

Paolo

> I still get the same error when running qemu in L1 guest.

Hu Yaohui

unread,

May 2, 2014, 1:10:03 PM5/2/14

to

On Fri, May 2, 2014 at 11:52 AM, Paolo Bonzini <pbon...@redhat.com> wrote:
> Il 02/05/2014 17:17, Hu Yaohui ha scritto:
>
>> Hi Paolo,
>> I have tried L0 with linux kernel 3.14.2 and L1 with linux kernel 3.14.2
>> L1 QEMU qemu-1.7.0
>> L2 QEMU qemu-1.7.0.
>
>
> Do you mean L0 and L1?

Yes.

>
> What is your QEMU command line, and what is the processor? Also, what guest
> you are running?
>

L0 host
- Debian 7 with linux kernel 3.14.2
- 24 pCPU, 120G pMEM
- cpu mode: Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
- QEMU command line of L1
$ sudo qemu-system-x86_64 -machine accel=kvm -drive
file=vdisk.img,if=virtio -m 4096 -smp 10 -net
nic,model=virtio,macaddr=52:54:00:12:34:80 -cpu kvm64,+vmx -net
tap,ifname=qtap0,script=no,downscript=no -vnc :2

L1 guest
- Ubuntu 10.04 with linux kernel 3.14.2
- QEMU command line of L2
$qemu-system-x86_64 -machine accel=kvm -smp 2 -boot c -drive
file=/home/nested/vmdisks/vdisk1-virtnet.img,if=virtio -m 2048 -vnc :4
-net nic,model=virtio,macaddr=52:54:00:12:34:90 -net
tap,ifname=qtap0,script=no,downscript=no

L2 guest
- Ubuntu 10.04 with linux kernel 2.6.32

Bandan Das

unread,

May 2, 2014, 2:40:01 PM5/2/14

to

Hu Yaohui <loki...@gmail.com> writes:

> On Fri, May 2, 2014 at 11:52 AM, Paolo Bonzini <pbon...@redhat.com> wrote:
>> Il 02/05/2014 17:17, Hu Yaohui ha scritto:
>>
>>> Hi Paolo,
>>> I have tried L0 with linux kernel 3.14.2 and L1 with linux kernel 3.14.2
>>> L1 QEMU qemu-1.7.0
>>> L2 QEMU qemu-1.7.0.
>>
>>
>> Do you mean L0 and L1?
> Yes.
>>
>> What is your QEMU command line, and what is the processor? Also, what guest
>> you are running?
>>
> L0 host
> - Debian 7 with linux kernel 3.14.2
> - 24 pCPU, 120G pMEM
> - cpu mode: Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz

Ivy Bridge-EP ? Looks similar to
https://bugzilla.kernel.org/show_bug.cgi?id=73331

Just out of curiosity, any difference if you run with ept=0 ?

> - QEMU command line of L1
> $ sudo qemu-system-x86_64 -machine accel=kvm -drive
> file=vdisk.img,if=virtio -m 4096 -smp 10 -net
> nic,model=virtio,macaddr=52:54:00:12:34:80 -cpu kvm64,+vmx -net
> tap,ifname=qtap0,script=no,downscript=no -vnc :2
>
> L1 guest
> - Ubuntu 10.04 with linux kernel 3.14.2
> - QEMU command line of L2
> $qemu-system-x86_64 -machine accel=kvm -smp 2 -boot c -drive
> file=/home/nested/vmdisks/vdisk1-virtnet.img,if=virtio -m 2048 -vnc :4
> -net nic,model=virtio,macaddr=52:54:00:12:34:90 -net
> tap,ifname=qtap0,script=no,downscript=no
>
> L2 guest
> - Ubuntu 10.04 with linux kernel 2.6.32
>> Paolo
>>
>>
>>> I still get the same error when running qemu in L1 guest.
>>
>>
> --

> To unsubscribe from this list: send the line "unsubscribe kvm" in

Hu Yaohui

unread,

May 2, 2014, 4:20:02 PM5/2/14

to

On Fri, May 2, 2014 at 2:39 PM, Bandan Das <b...@redhat.com> wrote:
> Hu Yaohui <loki...@gmail.com> writes:
>
>> On Fri, May 2, 2014 at 11:52 AM, Paolo Bonzini <pbon...@redhat.com> wrote:
>>> Il 02/05/2014 17:17, Hu Yaohui ha scritto:
>>>
>>>> Hi Paolo,
>>>> I have tried L0 with linux kernel 3.14.2 and L1 with linux kernel 3.14.2
>>>> L1 QEMU qemu-1.7.0
>>>> L2 QEMU qemu-1.7.0.
>>>
>>>
>>> Do you mean L0 and L1?
>> Yes.
>>>
>>> What is your QEMU command line, and what is the processor? Also, what guest
>>> you are running?
>>>
>> L0 host
>> - Debian 7 with linux kernel 3.14.2
>> - 24 pCPU, 120G pMEM
>> - cpu mode: Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
>
> Ivy Bridge-EP ? Looks similar to
> https://bugzilla.kernel.org/show_bug.cgi?id=73331
>
> Just out of curiosity, any difference if you run with ept=0 ?

I have tried it. The same error with L0 kvm ept=1 and L1 kvm ept=0
Do you have any idea how the Ivy Bridge-EP problem is solved?

Abel Gordon

unread,

May 4, 2014, 11:00:02 AM5/4/14

to

On Fri, May 2, 2014 at 11:11 PM, Hu Yaohui <loki...@gmail.com> wrote:
>
> On Fri, May 2, 2014 at 2:39 PM, Bandan Das <b...@redhat.com> wrote:
> > Hu Yaohui <loki...@gmail.com> writes:
> >
> >> On Fri, May 2, 2014 at 11:52 AM, Paolo Bonzini <pbon...@redhat.com> wrote:
> >>> Il 02/05/2014 17:17, Hu Yaohui ha scritto:
> >>>
> >>>> Hi Paolo,
> >>>> I have tried L0 with linux kernel 3.14.2 and L1 with linux kernel 3.14.2
> >>>> L1 QEMU qemu-1.7.0
> >>>> L2 QEMU qemu-1.7.0.
> >>>
> >>>
> >>> Do you mean L0 and L1?
> >> Yes.
> >>>
> >>> What is your QEMU command line, and what is the processor? Also, what guest
> >>> you are running?
> >>>
> >> L0 host
> >> - Debian 7 with linux kernel 3.14.2
> >> - 24 pCPU, 120G pMEM
> >> - cpu mode: Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
> >
> > Ivy Bridge-EP ? Looks similar to
> > https://bugzilla.kernel.org/show_bug.cgi?id=73331
> >
> > Just out of curiosity, any difference if you run with ept=0 ?
> I have tried it. The same error with L0 kvm ept=1 and L1 kvm ept=0
> Do you have any idea how the Ivy Bridge-EP problem is solved?

I experienced a similar problem that was related to nested code
having some bugs related to apicv and other new vmx features.

For example, the code enabled posted interrupts to run L2 even when the
feature was not exposed to L1 and L1 didn't use it.

Try changing prepare_vmcs02 to force disabling posted_interrupts,
code should looks like:

....
....
exec_control = vmcs12->pin_based_vm_exec_control;
exec_control |= vmcs_config.pin_based_exec_ctrl;
exec_control &= ~(PIN_BASED_VMX_PREEMPTION_TIMER|PIN_BASED_POSTED_INTR);
vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, exec_control);
....
...

and also

...
...
exec_control &= ~(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
SECONDARY_EXEC_APIC_REGISTER_VIRT |
SECONDARY_EXEC_PAUSE_LOOP_EXITING);
...
...

We also experienced issues using apicv for L1 while running a L2 guest
with no apicv, so also load kvm_intel with enable_apicv=0

Hope this solves your problem...
You are welcome to upstream the changes if it does :)

Hu Yaohui

unread,

May 4, 2014, 12:40:05 PM5/4/14

to

Hi Abel,
Thanks a lot! It works now.

Best Wishes,
Yaohui

Paolo Bonzini

unread,

May 7, 2014, 5:00:04 AM5/7/14

to

Il 04/05/2014 18:33, Hu Yaohui ha scritto:
>> I experienced a similar problem that was related to nested code
>> having some bugs related to apicv and other new vmx features.
>>
>> For example, the code enabled posted interrupts to run L2 even when the
>> feature was not exposed to L1 and L1 didn't use it.
>>
>> Try changing prepare_vmcs02 to force disabling posted_interrupts,
>> code should looks like:
>>
>> ....
>> ....
>> exec_control = vmcs12->pin_based_vm_exec_control;
>> exec_control |= vmcs_config.pin_based_exec_ctrl;
>> exec_control &= ~(PIN_BASED_VMX_PREEMPTION_TIMER|PIN_BASED_POSTED_INTR);
>> vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, exec_control);
>> ....
>> ...
>>
>> and also
>>
>> ...
>> ...
>> exec_control &= ~(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
>> SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
>> SECONDARY_EXEC_APIC_REGISTER_VIRT |
>> SECONDARY_EXEC_PAUSE_LOOP_EXITING);

PLE should be left enabled, I think.

Apart from that, I'll change the suggestion into a patch.

Thanks!

Paolo

Abel Gordon

unread,

May 7, 2014, 7:20:02 AM5/7/14

to

On Wed, May 7, 2014 at 11:58 AM, Paolo Bonzini <pbon...@redhat.com> wrote:
> Il 04/05/2014 18:33, Hu Yaohui ha scritto:
>
>>> I experienced a similar problem that was related to nested code
>>> having some bugs related to apicv and other new vmx features.
>>>
>>> For example, the code enabled posted interrupts to run L2 even when the
>>> feature was not exposed to L1 and L1 didn't use it.
>>>
>>> Try changing prepare_vmcs02 to force disabling posted_interrupts,
>>> code should looks like:
>>>
>>> ....
>>> ....
>>> exec_control = vmcs12->pin_based_vm_exec_control;
>>> exec_control |= vmcs_config.pin_based_exec_ctrl;
>>> exec_control &= ~(PIN_BASED_VMX_PREEMPTION_TIMER|PIN_BASED_POSTED_INTR);
>>> vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, exec_control);
>>> ....
>>> ...
>>>
>>> and also
>>>
>>> ...
>>> ...
>>> exec_control &= ~(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
>>> SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
>>> SECONDARY_EXEC_APIC_REGISTER_VIRT |
>>> SECONDARY_EXEC_PAUSE_LOOP_EXITING);
>
>
> PLE should be left enabled, I think.

Well... the PLE settings L0 uses to run L1 (vmcs01) may be different
than the PLE settings L1 configured to run L2 (vmcs12).
For example, L0 can use a ple_gap to run L1 that is bigger than the
ple_gap L1 configured to run L2. Or L0 can use a ple_window to run L1
that is smaller than the ple_window L1 configured to run L2.

So seems PLE should never be exposed to L1 or an appropriate nested
handling needs to be implemented. Note the handling may become complex
because in some cases a PLE exit from L2 should be handled directly by
L0 and not passed to L1... remember nested preemption timer support :)
?

>
> Apart from that, I'll change the suggestion into a patch.

Great!

Paolo Bonzini

unread,

May 7, 2014, 7:40:02 AM5/7/14

to

Il 07/05/2014 13:16, Abel Gordon ha scritto:
>> > PLE should be left enabled, I think.
> Well... the PLE settings L0 uses to run L1 (vmcs01) may be different
> than the PLE settings L1 configured to run L2 (vmcs12).
> For example, L0 can use a ple_gap to run L1 that is bigger than the
> ple_gap L1 configured to run L2. Or L0 can use a ple_window to run L1
> that is smaller than the ple_window L1 configured to run L2.

That's correct. We should leave PLE enabled while running L2, but hide
the feature altogether from L1.

Paolo

> So seems PLE should never be exposed to L1 or an appropriate nested
> handling needs to be implemented. Note the handling may become complex
> because in some cases a PLE exit from L2 should be handled directly by
> L0 and not passed to L1... remember nested preemption timer support :)
> ?

Paolo Bonzini

unread,

May 7, 2014, 7:50:03 AM5/7/14

to

Il 07/05/2014 13:37, Paolo Bonzini ha scritto:
> Il 07/05/2014 13:16, Abel Gordon ha scritto:
>>> > PLE should be left enabled, I think.
>> Well... the PLE settings L0 uses to run L1 (vmcs01) may be different
>> than the PLE settings L1 configured to run L2 (vmcs12).
>> For example, L0 can use a ple_gap to run L1 that is bigger than the
>> ple_gap L1 configured to run L2. Or L0 can use a ple_window to run L1
>> that is smaller than the ple_window L1 configured to run L2.
>
> That's correct. We should leave PLE enabled while running L2, but hide
> the feature altogether from L1.

... which we already do. The only secondary execution controls we allow
are APIC page, unrestricted guest, WBINVD exits, and of course EPT.

Paolo

Abel Gordon

unread,

May 7, 2014, 11:40:03 AM5/7/14

to

On Wed, May 7, 2014 at 2:40 PM, Paolo Bonzini <pbon...@redhat.com> wrote:
> Il 07/05/2014 13:37, Paolo Bonzini ha scritto:
>
>> Il 07/05/2014 13:16, Abel Gordon ha scritto:
>>>>
>>>> > PLE should be left enabled, I think.
>>>
>>> Well... the PLE settings L0 uses to run L1 (vmcs01) may be different
>>> than the PLE settings L1 configured to run L2 (vmcs12).
>>> For example, L0 can use a ple_gap to run L1 that is bigger than the
>>> ple_gap L1 configured to run L2. Or L0 can use a ple_window to run L1
>>> that is smaller than the ple_window L1 configured to run L2.
>>
>>
>> That's correct. We should leave PLE enabled while running L2, but hide
>> the feature altogether from L1.
>
>
> ... which we already do. The only secondary execution controls we allow are
> APIC page, unrestricted guest, WBINVD exits, and of course EPT.

But we don't verify if L1 tries to enable the feature for L1 (even if
it's not exposed)... Or do we ?

Paolo Bonzini

unread,

May 7, 2014, 11:50:04 AM5/7/14

to

Il 07/05/2014 17:30, Abel Gordon ha scritto:
> > ... which we already do. The only secondary execution controls we allow are
> > APIC page, unrestricted guest, WBINVD exits, and of course EPT.
>
> But we don't verify if L1 tries to enable the feature for L1 (even if
> it's not exposed)... Or do we ?

Yes, we do:

if (!vmx_control_verify(vmcs12->cpu_based_vm_exec_control,
nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high) ||
!vmx_control_verify(vmcs12->secondary_vm_exec_control,
nested_vmx_secondary_ctls_low, nested_vmx_secondary_ctls_high) ||
!vmx_control_verify(vmcs12->pin_based_vm_exec_control,
nested_vmx_pinbased_ctls_low, nested_vmx_pinbased_ctls_high) ||
!vmx_control_verify(vmcs12->vm_exit_controls,
nested_vmx_exit_ctls_low, nested_vmx_exit_ctls_high) ||
!vmx_control_verify(vmcs12->vm_entry_controls,
nested_vmx_entry_ctls_low, nested_vmx_entry_ctls_high))
{
nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
return 1;
}

Paolo