Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH 1/3] KVM: MMU: Fix 32 bit legacy paging with NPT

3 views
Skip to first unread message

Joerg Roedel

unread,
Sep 2, 2010, 11:40:02 AM9/2/10
to
This patch fixes 32 bit legacy paging with NPT enabled. The
mmu_check_root call on the top-level of the loop causes
root_gfn to take values (in the tdp_enabled path) which are
outside of guest memory. So the mmu_check_root call fails at
some point in the loop interation causing the guest to
tiple-fault.
This patch changes the mmu_check_root calls to the places
where they are really necessary. As a side-effect it
introduces a check for the root of a pae page table too.

Signed-off-by: Joerg Roedel <joerg....@amd.com>
---
arch/x86/kvm/mmu.c | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d2dad65..b2136f9 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2387,6 +2387,10 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
return 0;
}
direct = !is_paging(vcpu);
+
+ if (mmu_check_root(vcpu, root_gfn))
+ return 1;
+
for (i = 0; i < 4; ++i) {
hpa_t root = vcpu->arch.mmu.pae_root[i];

@@ -2398,10 +2402,10 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
continue;
}
root_gfn = pdptr >> PAGE_SHIFT;
+ if (mmu_check_root(vcpu, root_gfn))
+ return 1;
} else if (vcpu->arch.mmu.root_level == 0)
root_gfn = 0;
- if (mmu_check_root(vcpu, root_gfn))
- return 1;
if (tdp_enabled) {
direct = 1;
root_gfn = i << 30;
--
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Joerg Roedel

unread,
Sep 2, 2010, 11:40:02 AM9/2/10
to
This patch changes the rip handling in the vmrun emulation
path from using next_rip to the generic kvm register access
functions.

Signed-off-by: Joerg Roedel <joerg....@amd.com>
---

arch/x86/kvm/svm.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ecd4e58..1643f30 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2069,7 +2069,7 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
return false;
}

- trace_kvm_nested_vmrun(svm->vmcb->save.rip - 3, vmcb_gpa,
+ trace_kvm_nested_vmrun(svm->vmcb->save.rip, vmcb_gpa,
nested_vmcb->save.rip,
nested_vmcb->control.int_ctl,
nested_vmcb->control.event_inj,
@@ -2270,8 +2270,8 @@ static int vmrun_interception(struct vcpu_svm *svm)
if (nested_svm_check_permissions(svm))
return 1;

- svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
- skip_emulated_instruction(&svm->vcpu);
+ /* Save rip after vmrun instruction */
+ kvm_rip_write(&svm->vcpu, kvm_rip_read(&svm->vcpu) + 3);

if (!nested_svm_vmrun(svm))
return 1;

Avi Kivity

unread,
Sep 2, 2010, 12:00:03 PM9/2/10
to
On 09/02/2010 06:29 PM, Joerg Roedel wrote:
> This patch fixes 32 bit legacy paging with NPT enabled. The
> mmu_check_root call on the top-level of the loop causes
> root_gfn to take values (in the tdp_enabled path) which are
> outside of guest memory. So the mmu_check_root call fails at
> some point in the loop interation causing the guest to
> tiple-fault.
> This patch changes the mmu_check_root calls to the places
> where they are really necessary. As a side-effect it
> introduces a check for the root of a pae page table too.
>
>
> @@ -2387,6 +2387,10 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
> return 0;
> }
> direct = !is_paging(vcpu);
> +
> + if (mmu_check_root(vcpu, root_gfn))
> + return 1;
> +
> for (i = 0; i< 4; ++i) {
> hpa_t root = vcpu->arch.mmu.pae_root[i];
>
> @@ -2398,10 +2402,10 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
> continue;
> }
> root_gfn = pdptr>> PAGE_SHIFT;
> + if (mmu_check_root(vcpu, root_gfn))
> + return 1;
> } else if (vcpu->arch.mmu.root_level == 0)
> root_gfn = 0;
> - if (mmu_check_root(vcpu, root_gfn))
> - return 1;
> if (tdp_enabled) {
> direct = 1;
> root_gfn = i<< 30;

The overloading of root_gfn is pretty bad. Also, we don't really need
to check root_gfn for the direct case (the guest can easily switch cr3
later to one that would fail the check).

However, I'll apply the patch since it fixes the direct problem. More
involved fixes can come later (esp. after the nnpt patches land).


--
error compiling committee.c: too many arguments to function

Roedel, Joerg

unread,
Sep 2, 2010, 12:40:02 PM9/2/10
to

Ok, great. I will rebase my nnpt patches on-top of this and do some more
final testing before I post them.

Thanks,

Joerg

--
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

Roedel, Joerg

unread,
Sep 3, 2010, 8:30:02 AM9/3/10
to

Argh, in my interactive commit I forgot one part of this patch. Please
apply the attached one instead.


From 42450df2b72c23538d61616834dbdf1b53deafd7 Mon Sep 17 00:00:00 2001
From: Joerg Roedel <joerg....@amd.com>
Date: Thu, 2 Sep 2010 17:12:18 +0200
Subject: [PATCH 3/3] KVM: SVM: Clean up rip handling in vmrun emulation

This patch changes the rip handling in the vmrun emulation
path from using next_rip to the generic kvm register access
functions.

Signed-off-by: Joerg Roedel <joerg....@amd.com>
---

arch/x86/kvm/svm.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ecd4e58..6808f64 100644


--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2069,7 +2069,7 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
return false;
}

- trace_kvm_nested_vmrun(svm->vmcb->save.rip - 3, vmcb_gpa,
+ trace_kvm_nested_vmrun(svm->vmcb->save.rip, vmcb_gpa,
nested_vmcb->save.rip,
nested_vmcb->control.int_ctl,
nested_vmcb->control.event_inj,

@@ -2098,7 +2098,7 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
hsave->save.cr0 = kvm_read_cr0(&svm->vcpu);
hsave->save.cr4 = svm->vcpu.arch.cr4;
hsave->save.rflags = vmcb->save.rflags;
- hsave->save.rip = svm->next_rip;
+ hsave->save.rip = kvm_rip_read(&svm->vcpu);
hsave->save.rsp = vmcb->save.rsp;
hsave->save.rax = vmcb->save.rax;
if (npt_enabled)


@@ -2270,8 +2270,8 @@ static int vmrun_interception(struct vcpu_svm *svm)
if (nested_svm_check_permissions(svm))
return 1;

- svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
- skip_emulated_instruction(&svm->vcpu);
+ /* Save rip after vmrun instruction */
+ kvm_rip_write(&svm->vcpu, kvm_rip_read(&svm->vcpu) + 3);

if (!nested_svm_vmrun(svm))
return 1;
--
1.7.0.4


--

AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--

Alexander Graf

unread,
Sep 3, 2010, 5:30:02 PM9/3/10
to


Any reason we can't use the next_rip information here? A hypervisor could potentially do badness and put a prefix here, thus break all the logic, right?

(yes, I know, I wrote that code, but still ...)


Alex

Joerg Roedel

unread,
Sep 4, 2010, 3:40:01 PM9/4/10
to
On Fri, Sep 03, 2010 at 11:29:11PM +0200, Alexander Graf wrote:
> Any reason we can't use the next_rip information here? A hypervisor
> could potentially do badness and put a prefix here, thus break all the
> logic, right?

Next_rip is not available on older hardware. Yes, this problem exists in
theory (as it does with rdmsr/wrmsr, cpuid, ... too). But a fix for this
on non-next-rip capable hardware would involve the instruction emulator
and kills all performance there. So we use this stupid and fast solution
here which works for all hypervisors I tested :-)

> (yes, I know, I wrote that code, but still ...)

When you wrote that code next_rip capable hardware was not available, so
don't worry :-)


Joerg

0 new messages