Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH] KVM: nVMX: nested TPR shadow/threshold emulation

56 views
Skip to first unread message

Wanpeng Li

unread,
Jul 30, 2014, 8:10:02 AM7/30/14
to
This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=61411

TPR shadow/threshold feature is important to speed up the Windows guest.
Besides, it is a must feature for certain VMM.

We map virtual APIC page address and TPR threshold from L1 VMCS. If
TPR_BELOW_THRESHOLD VM exit is triggered by L2 guest and L1 interested
in, we inject it into L1 VMM for handling.

Signed-off-by: Wanpeng Li <wanpe...@linux.intel.com>
---
arch/x86/kvm/vmx.c | 22 ++++++++++++++++++----
1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a3845b8..f60846c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2331,7 +2331,7 @@ static __init void nested_vmx_setup_ctls_msrs(void)
CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING |
CPU_BASED_USE_IO_BITMAPS | CPU_BASED_MONITOR_EXITING |
CPU_BASED_RDPMC_EXITING | CPU_BASED_RDTSC_EXITING |
- CPU_BASED_PAUSE_EXITING |
+ CPU_BASED_PAUSE_EXITING | CPU_BASED_TPR_SHADOW |
CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
/*
* We can allow some features even when not supported by the
@@ -6937,7 +6937,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
case EXIT_REASON_MCE_DURING_VMENTRY:
return 0;
case EXIT_REASON_TPR_BELOW_THRESHOLD:
- return 1;
+ return nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW);
case EXIT_REASON_APIC_ACCESS:
return nested_cpu_has2(vmcs12,
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES);
@@ -7058,6 +7058,9 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)

static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
{
+ if (is_guest_mode(vcpu))
+ return;
+
if (irr == -1 || tpr < irr) {
vmcs_write32(TPR_THRESHOLD, 0);
return;
@@ -7962,14 +7965,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
if (!vmx->rdtscp_enabled)
exec_control &= ~SECONDARY_EXEC_RDTSCP;
/* Take the following fields only from vmcs12 */
- exec_control &= ~(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
- SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
+ exec_control &= ~(SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
SECONDARY_EXEC_APIC_REGISTER_VIRT);
if (nested_cpu_has(vmcs12,
CPU_BASED_ACTIVATE_SECONDARY_CONTROLS))
exec_control |= vmcs12->secondary_vm_exec_control;

if (exec_control & SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES) {
+ struct page *virtual_apic_page;
/*
* Translate L1 physical address to host physical
* address for vmcs02. Keep the page pinned, so this
@@ -7992,6 +7995,15 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
else
vmcs_write64(APIC_ACCESS_ADDR,
page_to_phys(vmx->nested.apic_access_page));
+
+ virtual_apic_page = nested_get_page(vcpu,
+ vmcs12->virtual_apic_page_addr);
+ if (vmcs_read64(VIRTUAL_APIC_PAGE_ADDR) !=
+ page_to_phys(virtual_apic_page))
+ vmcs_write64(VIRTUAL_APIC_PAGE_ADDR,
+ page_to_phys(virtual_apic_page));
+ nested_release_page(virtual_apic_page);
+
} else if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) {
exec_control |=
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
@@ -8002,6 +8014,8 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
}

+ if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW))
+ vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold);

/*
* Set host-state according to L0's settings (vmcs12 is irrelevant here)
--
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Paolo Bonzini

unread,
Jul 30, 2014, 11:30:02 AM7/30/14
to
Il 30/07/2014 14:04, Wanpeng Li ha scritto:
> @@ -7962,14 +7965,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
> if (!vmx->rdtscp_enabled)
> exec_control &= ~SECONDARY_EXEC_RDTSCP;
> /* Take the following fields only from vmcs12 */
> - exec_control &= ~(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
> - SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
> + exec_control &= ~(SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
> SECONDARY_EXEC_APIC_REGISTER_VIRT);

This change is wrong. You don't have to take L0's "virtualize APIC
accesses" setting into account, because while running L2 you cannot
modify L1's CR8 (only the virtual nested one).

> +
> + virtual_apic_page = nested_get_page(vcpu,
> + vmcs12->virtual_apic_page_addr);
> + if (vmcs_read64(VIRTUAL_APIC_PAGE_ADDR) !=
> + page_to_phys(virtual_apic_page))
> + vmcs_write64(VIRTUAL_APIC_PAGE_ADDR,
> + page_to_phys(virtual_apic_page));
> + nested_release_page(virtual_apic_page);
> +

You cannot release this page here. You need to the exactly the same
thing that is done for apic_access_page.

One thing:

> + if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW))
> + vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold);

I think you can just do this write unconditionally, since most
hypervisors will enable this. Also, you probably can add the tpr
threshold field to the read-write fields for shadow VMCS.

Paolo

Wanpeng Li

unread,
Jul 31, 2014, 4:10:01 AM7/31/14
to
Hi Paolo,
On Wed, Jul 30, 2014 at 05:20:58PM +0200, Paolo Bonzini wrote:
>Il 30/07/2014 14:04, Wanpeng Li ha scritto:
>> @@ -7962,14 +7965,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>> if (!vmx->rdtscp_enabled)
>> exec_control &= ~SECONDARY_EXEC_RDTSCP;
>> /* Take the following fields only from vmcs12 */
>> - exec_control &= ~(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
>> - SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
>> + exec_control &= ~(SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
>> SECONDARY_EXEC_APIC_REGISTER_VIRT);
>
>This change is wrong. You don't have to take L0's "virtualize APIC
>accesses" setting into account, because while running L2 you cannot
>modify L1's CR8 (only the virtual nested one).
>

Agreed.

>> +
>> + virtual_apic_page = nested_get_page(vcpu,
>> + vmcs12->virtual_apic_page_addr);
>> + if (vmcs_read64(VIRTUAL_APIC_PAGE_ADDR) !=
>> + page_to_phys(virtual_apic_page))
>> + vmcs_write64(VIRTUAL_APIC_PAGE_ADDR,
>> + page_to_phys(virtual_apic_page));
>> + nested_release_page(virtual_apic_page);
>> +
>
>You cannot release this page here. You need to the exactly the same
>thing that is done for apic_access_page.
>

Agreed.

>One thing:
>
>> + if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW))
>> + vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold);
>
>I think you can just do this write unconditionally, since most
>hypervisors will enable this. Also, you probably can add the tpr

What will happen if a hypervisor doesn't enable it? I make it more
cleaner in version two.

>threshold field to the read-write fields for shadow VMCS.
>

Agreed.

Regards,
Wanpeng Li

>Paolo
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in

Paolo Bonzini

unread,
Jul 31, 2014, 5:30:01 AM7/31/14
to
Il 31/07/2014 10:03, Wanpeng Li ha scritto:
>> One thing:
>>
>>> + if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW))
>>> + vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold);
>>
>> I think you can just do this write unconditionally, since most
>> hypervisors will enable this. Also, you probably can add the tpr
>
> What will happen if a hypervisor doesn't enable it? I make it more
> cleaner in version two.

TPR_THRESHOLD will be likely written as zero, but the processor will
never use it anyway. It's just a small optimization because
nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW) will almost always be true.

Paolo

>> threshold field to the read-write fields for shadow VMCS.
>
> Agreed.
>
> Regards,
> Wanpeng Li

Zhang, Yang Z

unread,
Jul 31, 2014, 9:00:02 PM7/31/14
to
Paolo Bonzini wrote on 2014-07-31:
> Il 31/07/2014 10:03, Wanpeng Li ha scritto:
>>> One thing:
>>>
>>>> + if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW))
>>>> + vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold);
>>>
>>> I think you can just do this write unconditionally, since most
>>> hypervisors will enable this. Also, you probably can add the tpr
>>
>> What will happen if a hypervisor doesn't enable it? I make it more
>> cleaner in version two.
>
> TPR_THRESHOLD will be likely written as zero, but the processor will
> never use it anyway. It's just a small optimization because
> nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW) will almost always be true.

Theoretically, you are right. But we should not expect all VMMs follow it. It is not worth to violate the SDM just for saving two or three instructions' cost.

>
> Paolo
>
>>> threshold field to the read-write fields for shadow VMCS.
>>
>> Agreed.
>>
>> Regards,
>> Wanpeng Li


Best regards,
Yang

Paolo Bonzini

unread,
Aug 1, 2014, 2:40:02 AM8/1/14
to
Il 01/08/2014 02:57, Zhang, Yang Z ha scritto:
> > TPR_THRESHOLD will be likely written as zero, but the processor will
> > never use it anyway. It's just a small optimization because
> > nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW) will almost always be true.
>
> Theoretically, you are right. But we should not expect all VMMs
> follow it. It is not worth to violate the SDM just for saving two or
> three instructions' cost.

Yes, you do need an "if (cpu_has_vmx_tpr_shadow())" around the
vmcs_write32. But still, checking nested_cpu_has is not strictly
necessary. Right now they both are a single AND, but I have plans to
change all of the cpu_has_*() checks to static keys.

Paolo

Zhang, Yang Z

unread,
Aug 1, 2014, 2:50:02 AM8/1/14
to
Paolo Bonzini wrote on 2014-08-01:
> Il 01/08/2014 02:57, Zhang, Yang Z ha scritto:
>>> TPR_THRESHOLD will be likely written as zero, but the processor
>>> will never use it anyway. It's just a small optimization because
>>> nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW) will almost always
> be true.
>>
>> Theoretically, you are right. But we should not expect all VMMs
>> follow it. It is not worth to violate the SDM just for saving two or
>> three instructions' cost.
>
> Yes, you do need an "if (cpu_has_vmx_tpr_shadow())" around the
> vmcs_write32. But still, checking nested_cpu_has is not strictly necessary.
> Right now they both are a single AND, but I have plans to change all
> of the
> cpu_has_*() checks to static keys.

See v2 patch. It isn't a problem anymore.

>
> Paolo


Best regards,
Yang

Wanpeng Li

unread,
Aug 1, 2014, 4:10:02 AM8/1/14
to
This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=61411

TPR shadow/threshold feature is important to speed up the Windows guest.
Besides, it is a must feature for certain VMM.

We map virtual APIC page address and TPR threshold from L1 VMCS. If
TPR_BELOW_THRESHOLD VM exit is triggered by L2 guest and L1 interested
in, we inject it into L1 VMM for handling.

Signed-off-by: Wanpeng Li <wanpe...@linux.intel.com>
---
v1 -> v2:
* don't take L0's "virtualize APIC accesses" setting into account
* virtual_apic_page do exactly the same thing that is done for apic_access_page
* add the tpr threshold field to the read-write fields for shadow VMCS

arch/x86/kvm/vmx.c | 33 +++++++++++++++++++++++++++++++--
1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a3845b8..0e6e95e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -379,6 +379,7 @@ struct nested_vmx {
* we must keep them pinned while L2 runs.
*/
struct page *apic_access_page;
+ struct page *virtual_apic_page;
u64 msr_ia32_feature_control;

struct hrtimer preemption_timer;
@@ -533,6 +534,7 @@ static int max_shadow_read_only_fields =
ARRAY_SIZE(shadow_read_only_fields);

static unsigned long shadow_read_write_fields[] = {
+ TPR_THRESHOLD,
GUEST_RIP,
GUEST_RSP,
GUEST_CR0,
@@ -2331,7 +2333,7 @@ static __init void nested_vmx_setup_ctls_msrs(void)
CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING |
CPU_BASED_USE_IO_BITMAPS | CPU_BASED_MONITOR_EXITING |
CPU_BASED_RDPMC_EXITING | CPU_BASED_RDTSC_EXITING |
- CPU_BASED_PAUSE_EXITING |
+ CPU_BASED_PAUSE_EXITING | CPU_BASED_TPR_SHADOW |
CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
/*
* We can allow some features even when not supported by the
@@ -6149,6 +6151,10 @@ static void free_nested(struct vcpu_vmx *vmx)
nested_release_page(vmx->nested.apic_access_page);
vmx->nested.apic_access_page = 0;
}
+ if (vmx->nested.virtual_apic_page) {
+ nested_release_page(vmx->nested.virtual_apic_page);
+ vmx->nested.virtual_apic_page = 0;
+ }

nested_free_all_saved_vmcss(vmx);
}
@@ -6937,7 +6943,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
case EXIT_REASON_MCE_DURING_VMENTRY:
return 0;
case EXIT_REASON_TPR_BELOW_THRESHOLD:
- return 1;
+ return nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW);
case EXIT_REASON_APIC_ACCESS:
return nested_cpu_has2(vmcs12,
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES);
@@ -7058,6 +7064,9 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)

static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
{
+ if (is_guest_mode(vcpu))
+ return;
+
if (irr == -1 || tpr < irr) {
vmcs_write32(TPR_THRESHOLD, 0);
return;
@@ -8025,6 +8034,22 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
exec_control &= ~CPU_BASED_VIRTUAL_NMI_PENDING;
exec_control &= ~CPU_BASED_TPR_SHADOW;
exec_control |= vmcs12->cpu_based_vm_exec_control;
+
+ if (exec_control & CPU_BASED_TPR_SHADOW) {
+ if (vmx->nested.virtual_apic_page)
+ nested_release_page(vmx->nested.virtual_apic_page);
+ vmx->nested.virtual_apic_page =
+ nested_get_page(vcpu, vmcs12->virtual_apic_page_addr);
+ if (!vmx->nested.virtual_apic_page)
+ exec_control &=
+ ~CPU_BASED_TPR_SHADOW;
+ else
+ vmcs_write64(VIRTUAL_APIC_PAGE_ADDR,
+ page_to_phys(vmx->nested.virtual_apic_page));
+
+ vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold);
+ }
+
/*
* Merging of IO and MSR bitmaps not currently supported.
* Rather, exit every time.
@@ -8793,6 +8818,10 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
nested_release_page(vmx->nested.apic_access_page);
vmx->nested.apic_access_page = 0;
}
+ if (vmx->nested.virtual_apic_page) {
+ nested_release_page(vmx->nested.virtual_apic_page);
+ vmx->nested.virtual_apic_page = 0;
+ }

/*
* Exiting from L2 to L1, we're now back to L1 which thinks it just
--
1.9.1

Paolo Bonzini

unread,
Aug 1, 2014, 5:10:01 AM8/1/14
to
This will cause L1 to miss exits when L2 writes to CR8. I think the
only sensible thing to do if this happens is fail the vmentry.

The problem is that while the APIC access page field is used to trap
reads/writes to the APIC access page itself, here the processor will
read/write the virtual APIC page when L2 does CR8 accesses.

Paolo
> + else
> + vmcs_write64(VIRTUAL_APIC_PAGE_ADDR,
> + page_to_phys(vmx->nested.virtual_apic_page));
> +
> + vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold);
> + }
> +
> /*
> * Merging of IO and MSR bitmaps not currently supported.
> * Rather, exit every time.
> @@ -8793,6 +8818,10 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
> nested_release_page(vmx->nested.apic_access_page);
> vmx->nested.apic_access_page = 0;
> }
> + if (vmx->nested.virtual_apic_page) {
> + nested_release_page(vmx->nested.virtual_apic_page);
> + vmx->nested.virtual_apic_page = 0;
> + }
>
> /*
> * Exiting from L2 to L1, we're now back to L1 which thinks it just
>

--

Paolo Bonzini

unread,
Aug 4, 2014, 6:20:02 AM8/4/14
to
Il 04/08/2014 12:11, Wanpeng Li ha scritto:
> Hi Paolo,
> How about add this:
>
> + if (!(exec_control & CPU_BASED_TPR_SHADOW) &&
> + !((exec_control & CPU_BASED_CR8_LOAD_EXITING) &&
> + (exec_control & CPU_BASED_CR8_STORE_EXITING)))
> + nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);

Yes, this is not architecturally correct, but I don't see what else we
can do.

Paolo

>
> Regards,
> Wanpeng Li

Wanpeng Li

unread,
Aug 4, 2014, 6:20:02 AM8/4/14
to
Hi Paolo,
On Fri, Aug 01, 2014 at 11:05:13AM +0200, Paolo Bonzini wrote:
How about add this:

+ if (!(exec_control & CPU_BASED_TPR_SHADOW) &&
+ !((exec_control & CPU_BASED_CR8_LOAD_EXITING) &&
+ (exec_control & CPU_BASED_CR8_STORE_EXITING)))
+ nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
+

Regards,
Wanpeng Li

Wanpeng Li

unread,
Aug 4, 2014, 7:00:02 AM8/4/14
to
This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=61411

TPR shadow/threshold feature is important to speed up the Windows guest.
Besides, it is a must feature for certain VMM.

We map virtual APIC page address and TPR threshold from L1 VMCS. If
TPR_BELOW_THRESHOLD VM exit is triggered by L2 guest and L1 interested
in, we inject it into L1 VMM for handling.

Signed-off-by: Wanpeng Li <wanpe...@linux.intel.com>
---
v2 -> v3:
* nested vm entry failure if both tpr shadow and cr8 exiting bits are not set
v1 -> v2:
* don't take L0's "virtualize APIC accesses" setting into account
* virtual_apic_page do exactly the same thing that is done for apic_access_page
* add the tpr threshold field to the read-write fields for shadow VMCS

arch/x86/kvm/vmx.c | 38 ++++++++++++++++++++++++++++++++++++--
1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c604f3c..7a56e2c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -379,6 +379,7 @@ struct nested_vmx {
* we must keep them pinned while L2 runs.
*/
struct page *apic_access_page;
+ struct page *virtual_apic_page;
u64 msr_ia32_feature_control;

struct hrtimer preemption_timer;
@@ -533,6 +534,7 @@ static int max_shadow_read_only_fields =
ARRAY_SIZE(shadow_read_only_fields);

static unsigned long shadow_read_write_fields[] = {
+ TPR_THRESHOLD,
GUEST_RIP,
GUEST_RSP,
GUEST_CR0,
@@ -2330,7 +2332,7 @@ static __init void nested_vmx_setup_ctls_msrs(void)
CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING |
CPU_BASED_USE_IO_BITMAPS | CPU_BASED_MONITOR_EXITING |
CPU_BASED_RDPMC_EXITING | CPU_BASED_RDTSC_EXITING |
- CPU_BASED_PAUSE_EXITING |
+ CPU_BASED_PAUSE_EXITING | CPU_BASED_TPR_SHADOW |
CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
/*
* We can allow some features even when not supported by the
@@ -6148,6 +6150,10 @@ static void free_nested(struct vcpu_vmx *vmx)
nested_release_page(vmx->nested.apic_access_page);
vmx->nested.apic_access_page = 0;
}
+ if (vmx->nested.virtual_apic_page) {
+ nested_release_page(vmx->nested.virtual_apic_page);
+ vmx->nested.virtual_apic_page = 0;
+ }

nested_free_all_saved_vmcss(vmx);
}
@@ -6936,7 +6942,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
case EXIT_REASON_MCE_DURING_VMENTRY:
return 0;
case EXIT_REASON_TPR_BELOW_THRESHOLD:
- return 1;
+ return nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW);
case EXIT_REASON_APIC_ACCESS:
return nested_cpu_has2(vmcs12,
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES);
@@ -7057,6 +7063,9 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)

static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
{
+ if (is_guest_mode(vcpu))
+ return;
+
if (irr == -1 || tpr < irr) {
vmcs_write32(TPR_THRESHOLD, 0);
return;
@@ -8024,6 +8033,27 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
exec_control &= ~CPU_BASED_VIRTUAL_NMI_PENDING;
exec_control &= ~CPU_BASED_TPR_SHADOW;
exec_control |= vmcs12->cpu_based_vm_exec_control;
+
+ if (exec_control & CPU_BASED_TPR_SHADOW) {
+ if (vmx->nested.virtual_apic_page)
+ nested_release_page(vmx->nested.virtual_apic_page);
+ vmx->nested.virtual_apic_page =
+ nested_get_page(vcpu, vmcs12->virtual_apic_page_addr);
+ if (!vmx->nested.virtual_apic_page)
+ exec_control &=
+ ~CPU_BASED_TPR_SHADOW;
+ else
+ vmcs_write64(VIRTUAL_APIC_PAGE_ADDR,
+ page_to_phys(vmx->nested.virtual_apic_page));
+
+ if (!(exec_control & CPU_BASED_TPR_SHADOW) &&
+ !((exec_control & CPU_BASED_CR8_LOAD_EXITING) &&
+ (exec_control & CPU_BASED_CR8_STORE_EXITING)))
+ nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
+
+ vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold);
+ }
+
/*
* Merging of IO and MSR bitmaps not currently supported.
* Rather, exit every time.
@@ -8802,6 +8832,10 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
nested_release_page(vmx->nested.apic_access_page);
vmx->nested.apic_access_page = 0;
}
+ if (vmx->nested.virtual_apic_page) {
+ nested_release_page(vmx->nested.virtual_apic_page);
+ vmx->nested.virtual_apic_page = 0;
+ }

/*
* Exiting from L2 to L1, we're now back to L1 which thinks it just
--
1.9.1

Wanpeng Li

unread,
Aug 4, 2014, 7:10:01 AM8/4/14
to
Ok, just send out the new version.

Regards,
Wanpeng Li

Paolo Bonzini

unread,
Aug 4, 2014, 9:00:02 AM8/4/14
to
Reviewed-by: Paolo Bonzini <pbon...@redhat.com>

I will apply this for 3.18. Thanks,

Paolo

Zhang, Yang Z

unread,
Aug 5, 2014, 4:10:01 AM8/5/14
to
I think this is not correct. The vmx->nested.virtual_apic_page may not valid due to two reasons:
1. The virtual_apic_page_addr is not a valid gfn. In this case, the vmx failure must be injected to L1 unconditionally regardless of the setting of CR8 load/store exiting.
2. The virtual_apic_page is swapped by L0. In this case, we should not inject failure to L1.

> +
> + vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold);
> + }

Miss else here:
If L2 owns the APIC and doesn't use TPR_SHADOW, we need to setup the vmcs02 based on vmcs01. For example, if vmcs01 is using TPR_SHADOW, then vmcs02 must set it. Also, we need to use vmcs01's virtual_apic_page and TPR_THRESHOLD for vmcs02.

> +
> /*
> * Merging of IO and MSR bitmaps not currently supported.
> * Rather, exit every time.
> @@ -8802,6 +8832,10 @@ static void nested_vmx_vmexit(struct kvm_vcpu
> *vcpu, u32 exit_reason,
> nested_release_page(vmx->nested.apic_access_page);
> vmx->nested.apic_access_page = 0;
> }
> + if (vmx->nested.virtual_apic_page) {
> + nested_release_page(vmx->nested.virtual_apic_page);
> + vmx->nested.virtual_apic_page = 0;
> + }
>
> /*
> * Exiting from L2 to L1, we're now back to L1 which thinks it just
> --
> 1.9.1

Best regards,
Yang

Paolo Bonzini

unread,
Aug 5, 2014, 7:10:02 AM8/5/14
to
You're right that accesses to the APIC-access page may also end up
writing to the virtual-APIC page. Hence, if the "virtualize APIC
accesses" setting is 1 in the secondary exec controls, you also have to
fail the vmentry.

Doing it unconditionally is not correct, but it is the simplest thing to
do and it would be okay with a comment, I think.

> 2. The virtual_apic_page is swapped by L0. In this case, we should not inject failure to L1.

This cannot happen, nested_get_page ends up calling hva_to_pfn with
atomic==false, so the page would be swapped in and pinned.

>> +
>> + vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold);
>> + }
>
> Miss else here:
> If L2 owns the APIC and doesn't use TPR_SHADOW, we need to setup the
> vmcs02 based on vmcs01. For example, if vmcs01 is using TPR_SHADOW, then
> vmcs02 must set it. Also, we need to use vmcs01's virtual_apic_page and
> TPR_THRESHOLD for vmcs02.

What do you mean by "owns the APIC"?

Paolo

Zhang, Yang Z

unread,
Aug 6, 2014, 2:40:01 AM8/6/14
to
Why not correct?

> to do and it would be okay with a comment, I think.
>
>> 2. The virtual_apic_page is swapped by L0. In this case, we should
>> not inject
> failure to L1.
>
> This cannot happen, nested_get_page ends up calling hva_to_pfn with
> atomic==false, so the page would be swapped in and pinned.
>
>>> +
>>> + vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold);
>>> + }
>>
>> Miss else here:
>> If L2 owns the APIC and doesn't use TPR_SHADOW, we need to setup the
>> vmcs02 based on vmcs01. For example, if vmcs01 is using TPR_SHADOW,
>> then
>> vmcs02 must set it. Also, we need to use vmcs01's virtual_apic_page
>> and TPR_THRESHOLD for vmcs02.
>
> What do you mean by "owns the APIC"?

Means if L2 touch L1's APIC directly, for example, if L2 not using TPR_SHADOW, then move to/from CR8 will touch L1's TPR directly. And in this case, L0 should aware of L1's TRP change.

>
> Paolo


Best regards,
Yang

Wanpeng Li

unread,
Aug 7, 2014, 2:30:02 AM8/7/14
to
Hi Paolo,
On Wed, Aug 06, 2014 at 06:38:06AM +0000, Zhang, Yang Z wrote:
[...]
What's your opinion?

>
>> to do and it would be okay with a comment, I think.
>>
>>> 2. The virtual_apic_page is swapped by L0. In this case, we should
>>> not inject
>> failure to L1.
>>
>> This cannot happen, nested_get_page ends up calling hva_to_pfn with
>> atomic==false, so the page would be swapped in and pinned.
>>
>>>> +
>>>> + vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold);
>>>> + }
>>>
>>> Miss else here:
>>> If L2 owns the APIC and doesn't use TPR_SHADOW, we need to setup the
>>> vmcs02 based on vmcs01. For example, if vmcs01 is using TPR_SHADOW,
>>> then
>>> vmcs02 must set it. Also, we need to use vmcs01's virtual_apic_page
>>> and TPR_THRESHOLD for vmcs02.
>>
>> What do you mean by "owns the APIC"?
>
>Means if L2 touch L1's APIC directly, for example, if L2 not using TPR_SHADOW, then move to/from CR8 will touch L1's TPR directly. And in this case, L0 should aware of L1's TRP change.
>

Ditto.

Regards,
Wanpeng Li

Paolo Bonzini

unread,
Aug 7, 2014, 10:10:02 AM8/7/14
to
On real hardware you could point the virtual-APIC page to an invalid
address. If CR8 load exits are enabled, CR8 store exits are enabled,
and virtualize APIC access is disabled, the processor would never notice.

But it's okay with a comment.

>>>> +
>>>> + vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold);
>>>> + }
>>>
>>> Miss else here:
>>> If L2 owns the APIC and doesn't use TPR_SHADOW, we need to setup the
>>> vmcs02 based on vmcs01. For example, if vmcs01 is using TPR_SHADOW,
>>> then
>>> vmcs02 must set it. Also, we need to use vmcs01's virtual_apic_page
>>> and TPR_THRESHOLD for vmcs02.
>>
>> What do you mean by "owns the APIC"?
>
> Means if L2 touch L1's APIC directly, for example, if L2 not using
> TPR_SHADOW, then move to/from CR8 will touch L1's TPR directly. And in
> this case, L0 should aware of L1's TRP change.

You're right.

Paolo
0 new messages