[PATCH 01/20] x86: Drop some unneeded local variables from SVM functions

7 views
Skip to first unread message

Jan Kiszka

unread,
Apr 15, 2015, 1:26:07 AM4/15/15
to jailho...@googlegroups.com, Valentine Sinitsyn
No need to maintain cpu_data or even vmcb as local variable if they are
only used once.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/arch/x86/svm.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/hypervisor/arch/x86/svm.c b/hypervisor/arch/x86/svm.c
index 0299cc1..33ded8b 100644
--- a/hypervisor/arch/x86/svm.c
+++ b/hypervisor/arch/x86/svm.c
@@ -640,9 +640,7 @@ static void svm_vcpu_reset(struct per_cpu *cpu_data, unsigned int sipi_vector)

void vcpu_skip_emulated_instruction(unsigned int inst_len)
{
- struct per_cpu *cpu_data = this_cpu_data();
- struct vmcb *vmcb = &cpu_data->vmcb;
- vmcb->rip += inst_len;
+ this_cpu_data()->vmcb.rip += inst_len;
}

static void update_efer(struct per_cpu *cpu_data)
@@ -665,8 +663,7 @@ static void update_efer(struct per_cpu *cpu_data)

bool vcpu_get_guest_paging_structs(struct guest_paging_structures *pg_structs)
{
- struct per_cpu *cpu_data = this_cpu_data();
- struct vmcb *vmcb = &cpu_data->vmcb;
+ struct vmcb *vmcb = &this_cpu_data()->vmcb;

if (vmcb->efer & EFER_LMA) {
pg_structs->root_paging = x86_64_paging;
@@ -1021,8 +1018,7 @@ void vcpu_nmi_handler(void)

void vcpu_tlb_flush(void)
{
- struct per_cpu *cpu_data = this_cpu_data();
- struct vmcb *vmcb = &cpu_data->vmcb;
+ struct vmcb *vmcb = &this_cpu_data()->vmcb;

if (has_flush_by_asid)
vmcb->tlb_control = SVM_TLB_FLUSH_GUEST;
@@ -1033,8 +1029,7 @@ void vcpu_tlb_flush(void)
const u8 *vcpu_get_inst_bytes(const struct guest_paging_structures *pg_structs,
unsigned long pc, unsigned int *size)
{
- struct per_cpu *cpu_data = this_cpu_data();
- struct vmcb *vmcb = &cpu_data->vmcb;
+ struct vmcb *vmcb = &this_cpu_data()->vmcb;
unsigned long start;

if (has_assists) {
--
2.1.4

Jan Kiszka

unread,
Apr 15, 2015, 1:26:07 AM4/15/15
to jailho...@googlegroups.com, Valentine Sinitsyn
The last round of pending next patches. This one focuses on SVM with a
few side effects on VMX. It is mostly about making the SVM code simpler
and/or more consistent. And it improves the vmexit/vmentry path on AMD
by avoiding vmsave/vmload as discussed earlier on this list.

Jan

Jan Kiszka (20):
x86: Drop some unneeded local variables from SVM functions
x86: Remove traces of cpuid interception from SVM
x86: Remove guest registers parameter from svm_handle_cr
x86: Cache vmcb instead of cpu_data in SVM's
vcpu_vendor_get_execution_state
x86: Rename x86_parse_mov_to_cr to svm_parse_mov_to_cr
x86: Pass vmcb instead of cpu_data to some internal SVM functions
x86: Remove guest registers parameter from svm_handle_msr_write
x86: Drop local guest_regs variable from SVM version of
vcpu_handle_exit
x86: Drop PERCPU_VMCB and VMCB_RAX
x86: Simplify set_svm_segment_from_segment
x86: Simplify set_svm_segment_from_dtr
x86: Simplify descriptor reset in svm_vcpu_reset
x86: Drop constant return values from SVM functions
x86: Simplify error exit of svm_parse_mov_to_cr and svm_handle_cr
x86: Refactor SVM version of vcpu_activate_vmm
x86: Remove unneeded MSR restoring from SVM's vcpu_deactivate_vmm
x86: Make SYSENTER MSR restoration VMX-specific
x86: Remove write-only linux_sysenter_* fields
x86: Make FS_BASE MSR restoration VMX-specific
x86: Do not call vmload/vmsave on every VM exit

hypervisor/arch/x86/asm-defines.c | 3 +-
hypervisor/arch/x86/include/asm/percpu.h | 3 -
hypervisor/arch/x86/include/asm/svm.h | 2 -
hypervisor/arch/x86/setup.c | 9 --
hypervisor/arch/x86/svm-vmexit.S | 14 +-
hypervisor/arch/x86/svm.c | 260 +++++++++++--------------------
hypervisor/arch/x86/vmx.c | 9 +-
7 files changed, 101 insertions(+), 199 deletions(-)

--
2.1.4

Jan Kiszka

unread,
Apr 15, 2015, 1:26:07 AM4/15/15
to jailho...@googlegroups.com, Valentine Sinitsyn
There is no foreseeable need to intercept cpuid on AMD. On Intel, we
are not asked if we want to, so we have to execute it on behalf of the
cell.But here we can simple let it happen.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/arch/x86/svm.c | 5 -----
1 file changed, 5 deletions(-)

diff --git a/hypervisor/arch/x86/svm.c b/hypervisor/arch/x86/svm.c
index 33ded8b..565b6a0 100644
--- a/hypervisor/arch/x86/svm.c
+++ b/hypervisor/arch/x86/svm.c
@@ -218,8 +218,6 @@ static int vmcb_setup(struct per_cpu *cpu_data)

vmcb->general1_intercepts |= GENERAL1_INTERCEPT_NMI;
vmcb->general1_intercepts |= GENERAL1_INTERCEPT_CR0_SEL_WRITE;
- /* TODO: Do we need this for SVM ? */
- /* vmcb->general1_intercepts |= GENERAL1_INTERCEPT_CPUID; */
vmcb->general1_intercepts |= GENERAL1_INTERCEPT_IOIO_PROT;
vmcb->general1_intercepts |= GENERAL1_INTERCEPT_MSR_PROT;
vmcb->general1_intercepts |= GENERAL1_INTERCEPT_SHUTDOWN_EVT;
@@ -945,9 +943,6 @@ void vcpu_handle_exit(struct per_cpu *cpu_data)
}
iommu_check_pending_faults(cpu_data);
return;
- case VMEXIT_CPUID:
- /* FIXME: We are not intercepting CPUID now */
- return;
case VMEXIT_VMMCALL:
vcpu_handle_hypercall();
return;
--
2.1.4

Jan Kiszka

unread,
Apr 15, 2015, 1:26:08 AM4/15/15
to jailho...@googlegroups.com, Valentine Sinitsyn
Easier to read.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/arch/x86/svm.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/hypervisor/arch/x86/svm.c b/hypervisor/arch/x86/svm.c
index 55ad981..8cf88ad 100644
--- a/hypervisor/arch/x86/svm.c
+++ b/hypervisor/arch/x86/svm.c
@@ -1050,12 +1050,12 @@ void vcpu_vendor_get_cell_io_bitmap(struct cell *cell,

void vcpu_vendor_get_execution_state(struct vcpu_execution_state *x_state)
{
- struct per_cpu *cpu_data = this_cpu_data();
+ struct vmcb *vmcb = &this_cpu_data()->vmcb;

- x_state->efer = cpu_data->vmcb.efer;
- x_state->rflags = cpu_data->vmcb.rflags;
- x_state->cs = cpu_data->vmcb.cs.selector;
- x_state->rip = cpu_data->vmcb.rip;
+ x_state->efer = vmcb->efer;
+ x_state->rflags = vmcb->rflags;
+ x_state->cs = vmcb->cs.selector;
+ x_state->rip = vmcb->rip;
}

/* GIF must be set for interrupts to be delivered (APMv2, Sect. 15.17) */
--
2.1.4

Jan Kiszka

unread,
Apr 15, 2015, 1:26:08 AM4/15/15
to jailho...@googlegroups.com, Valentine Sinitsyn
We can retrieve them from the per-cpu data structure now.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/arch/x86/svm.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/hypervisor/arch/x86/svm.c b/hypervisor/arch/x86/svm.c
index 349f9b6..69a0799 100644
--- a/hypervisor/arch/x86/svm.c
+++ b/hypervisor/arch/x86/svm.c
@@ -810,15 +810,14 @@ out:
return ok;
}

-static bool svm_handle_msr_write(union registers *guest_regs,
- struct per_cpu *cpu_data)
+static bool svm_handle_msr_write(struct per_cpu *cpu_data)
{
struct vmcb *vmcb = &cpu_data->vmcb;
unsigned long efer;

- if (guest_regs->rcx == MSR_EFER) {
+ if (cpu_data->guest_regs.rcx == MSR_EFER) {
/* Never let a guest to disable SVME; see APMv2, Sect. 3.1.7 */
- efer = get_wrmsr_value(guest_regs) | EFER_SVME;
+ efer = get_wrmsr_value(&cpu_data->guest_regs) | EFER_SVME;
/* Flush TLB on LME/NXE change: See APMv2, Sect. 15.16 */
if ((efer ^ vmcb->efer) & (EFER_LME | EFER_NXE))
vcpu_tlb_flush();
@@ -950,7 +949,7 @@ void vcpu_handle_exit(struct per_cpu *cpu_data)
if (!vmcb->exitinfo1)
res = vcpu_handle_msr_read();
else
- res = svm_handle_msr_write(guest_regs, cpu_data);
+ res = svm_handle_msr_write(cpu_data);
if (res)
return;
break;
--
2.1.4

Jan Kiszka

unread,
Apr 15, 2015, 1:26:07 AM4/15/15
to jailho...@googlegroups.com, Valentine Sinitsyn
We can retrieve them from the per-cpu data structure now.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/arch/x86/svm.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/hypervisor/arch/x86/svm.c b/hypervisor/arch/x86/svm.c
index 565b6a0..55ad981 100644
--- a/hypervisor/arch/x86/svm.c
+++ b/hypervisor/arch/x86/svm.c
@@ -772,8 +772,7 @@ out:
* result in no more than VMEXIT_INVALID. Maybe we can get along without it
* altogether?
*/
-static bool svm_handle_cr(union registers *guest_regs,
- struct per_cpu *cpu_data)
+static bool svm_handle_cr(struct per_cpu *cpu_data)
{
struct vmcb *vmcb = &cpu_data->vmcb;
/* Workaround GCC 4.8 warning on uninitialized variable 'reg' */
@@ -798,7 +797,7 @@ static bool svm_handle_cr(union registers *guest_regs,
if (reg == 4)
val = vmcb->rsp;
else
- val = guest_regs->by_index[15 - reg];
+ val = cpu_data->guest_regs.by_index[15 - reg];

vcpu_skip_emulated_instruction(X86_INST_LEN_MOV_TO_CR);
/* Flush TLB on PG/WP/CD/NW change: See APMv2, Sect. 15.16 */
@@ -948,7 +947,7 @@ void vcpu_handle_exit(struct per_cpu *cpu_data)
return;
case VMEXIT_CR0_SEL_WRITE:
cpu_data->stats[JAILHOUSE_CPU_STAT_VMEXITS_CR]++;
- if (svm_handle_cr(guest_regs, cpu_data))
+ if (svm_handle_cr(cpu_data))
return;
break;
case VMEXIT_MSR:
--
2.1.4

Jan Kiszka

unread,
Apr 15, 2015, 1:26:09 AM4/15/15
to jailho...@googlegroups.com, Valentine Sinitsyn
We can calculate PERCPU_VMCB_RAX directly and save the two intermediate
steps.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/arch/x86/asm-defines.c | 3 +--
hypervisor/arch/x86/svm-vmexit.S | 2 --
2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/hypervisor/arch/x86/asm-defines.c b/hypervisor/arch/x86/asm-defines.c
index 7dfae6e..f53d95e 100644
--- a/hypervisor/arch/x86/asm-defines.c
+++ b/hypervisor/arch/x86/asm-defines.c
@@ -22,8 +22,7 @@ void common(void)
OFFSET(PERCPU_LINUX_SP, per_cpu, linux_sp);
BLANK();

- OFFSET(PERCPU_VMCB, per_cpu, vmcb);
- OFFSET(VMCB_RAX, vmcb, rax);
+ OFFSET(PERCPU_VMCB_RAX, per_cpu, vmcb.rax);
BLANK();

/* GCC evaluates constant expressions involving built-ins
diff --git a/hypervisor/arch/x86/svm-vmexit.S b/hypervisor/arch/x86/svm-vmexit.S
index f374b37..77c884d 100644
--- a/hypervisor/arch/x86/svm-vmexit.S
+++ b/hypervisor/arch/x86/svm-vmexit.S
@@ -12,8 +12,6 @@

#include <asm/asm-defines.h>

-#define PERCPU_VMCB_RAX (PERCPU_VMCB + VMCB_RAX)
-
/* SVM VM-exit handling */
.globl svm_vmexit
svm_vmexit:
--
2.1.4

Jan Kiszka

unread,
Apr 15, 2015, 1:26:08 AM4/15/15
to jailho...@googlegroups.com, Valentine Sinitsyn
This functions is SVM-specific.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/arch/x86/svm.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hypervisor/arch/x86/svm.c b/hypervisor/arch/x86/svm.c
index 8cf88ad..6c2f87a 100644
--- a/hypervisor/arch/x86/svm.c
+++ b/hypervisor/arch/x86/svm.c
@@ -722,7 +722,7 @@ static bool ctx_advance(struct parse_context *ctx,
return true;
}

-static bool x86_parse_mov_to_cr(struct per_cpu *cpu_data,
+static bool svm_parse_mov_to_cr(struct per_cpu *cpu_data,
unsigned long pc,
unsigned char reg,
unsigned long *gpr)
@@ -787,7 +787,7 @@ static bool svm_handle_cr(struct per_cpu *cpu_data)
}
reg = vmcb->exitinfo1 & 0x07;
} else {
- if (!x86_parse_mov_to_cr(cpu_data, vmcb->rip, 0, &reg)) {
+ if (!svm_parse_mov_to_cr(cpu_data, vmcb->rip, 0, &reg)) {
panic_printk("FATAL: Unable to parse MOV-to-CR instruction\n");
ok = false;
goto out;
--
2.1.4

Jan Kiszka

unread,
Apr 15, 2015, 1:26:08 AM4/15/15
to jailho...@googlegroups.com, Valentine Sinitsyn
No need to cache it. It can be derived from cpu_data now.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/arch/x86/svm.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hypervisor/arch/x86/svm.c b/hypervisor/arch/x86/svm.c
index 69a0799..bafd2fb 100644
--- a/hypervisor/arch/x86/svm.c
+++ b/hypervisor/arch/x86/svm.c
@@ -903,7 +903,6 @@ void vcpu_vendor_get_mmio_intercept(struct vcpu_mmio_intercept *mmio)

void vcpu_handle_exit(struct per_cpu *cpu_data)
{
- union registers *guest_regs = &cpu_data->guest_regs;
struct vmcb *vmcb = &cpu_data->vmcb;
bool res = false;
int sipi_vector;
@@ -987,7 +986,7 @@ void vcpu_handle_exit(struct per_cpu *cpu_data)
"exitinfo1 %p exitinfo2 %p\n",
vmcb->exitcode, vmcb->exitinfo1, vmcb->exitinfo2);
}
- dump_guest_regs(guest_regs, vmcb);
+ dump_guest_regs(&cpu_data->guest_regs, vmcb);
panic_park();
}

--
2.1.4

Jan Kiszka

unread,
Apr 15, 2015, 1:26:09 AM4/15/15
to jailho...@googlegroups.com, Valentine Sinitsyn
No need to complain: segment.access_rights is generic as it simply holds
bits 8..23 of the second descriptor dword. The additional invalid bit
used by VMX only can be ignored by SVM - and it is already, even when
leaving out the explicit test.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/arch/x86/svm.c | 14 ++------------
1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/hypervisor/arch/x86/svm.c b/hypervisor/arch/x86/svm.c
index bafd2fb..5ac56d3 100644
--- a/hypervisor/arch/x86/svm.c
+++ b/hypervisor/arch/x86/svm.c
@@ -139,22 +139,12 @@ static void set_svm_segment_from_dtr(struct svm_segment *svm_segment,
*svm_segment = tmp;
}

-/* TODO: struct segment needs to be x86 generic, not VMX-specific one here */
static void set_svm_segment_from_segment(struct svm_segment *svm_segment,
const struct segment *segment)
{
- u32 ar;
-
svm_segment->selector = segment->selector;
-
- if (segment->access_rights == 0x10000) {
- svm_segment->access_rights = 0;
- } else {
- ar = segment->access_rights;
- svm_segment->access_rights =
- ((ar & 0xf000) >> 4) | (ar & 0x00ff);
- }
-
+ svm_segment->access_rights = ((segment->access_rights & 0xf000) >> 4) |
+ (segment->access_rights & 0x00ff);
svm_segment->limit = segment->limit;
svm_segment->base = segment->base;
}
--
2.1.4

Jan Kiszka

unread,
Apr 15, 2015, 1:26:09 AM4/15/15
to jailho...@googlegroups.com, Valentine Sinitsyn
update_efer, svm_parse_mov_to_cr and svm_handle_apic_access have no use
for cpu_data and rather convert it into a vmcb reference directly. So
pass that one instead to save some statements.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/arch/x86/svm.c | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/hypervisor/arch/x86/svm.c b/hypervisor/arch/x86/svm.c
index 6c2f87a..349f9b6 100644
--- a/hypervisor/arch/x86/svm.c
+++ b/hypervisor/arch/x86/svm.c
@@ -641,9 +641,8 @@ void vcpu_skip_emulated_instruction(unsigned int inst_len)
this_cpu_data()->vmcb.rip += inst_len;
}

-static void update_efer(struct per_cpu *cpu_data)
+static void update_efer(struct vmcb *vmcb)
{
- struct vmcb *vmcb = &cpu_data->vmcb;
unsigned long efer = vmcb->efer;

if ((efer & (EFER_LME | EFER_LMA)) != EFER_LME)
@@ -722,13 +721,10 @@ static bool ctx_advance(struct parse_context *ctx,
return true;
}

-static bool svm_parse_mov_to_cr(struct per_cpu *cpu_data,
- unsigned long pc,
- unsigned char reg,
- unsigned long *gpr)
+static bool svm_parse_mov_to_cr(struct vmcb *vmcb, unsigned long pc,
+ unsigned char reg, unsigned long *gpr)
{
struct guest_paging_structures pg_structs;
- struct vmcb *vmcb = &cpu_data->vmcb;
struct parse_context ctx = {};
/* No prefixes are supported yet */
u8 opcodes[] = {0x0f, 0x22}, modrm;
@@ -787,7 +783,7 @@ static bool svm_handle_cr(struct per_cpu *cpu_data)
}
reg = vmcb->exitinfo1 & 0x07;
} else {
- if (!svm_parse_mov_to_cr(cpu_data, vmcb->rip, 0, &reg)) {
+ if (!svm_parse_mov_to_cr(vmcb, vmcb->rip, 0, &reg)) {
panic_printk("FATAL: Unable to parse MOV-to-CR instruction\n");
ok = false;
goto out;
@@ -807,7 +803,7 @@ static bool svm_handle_cr(struct per_cpu *cpu_data)
/* TODO: better check for #GP reasons */
vmcb->cr0 = val & SVM_CR0_ALLOWED_BITS;
if (val & X86_CR0_PG)
- update_efer(cpu_data);
+ update_efer(vmcb);
vmcb->clean_bits &= ~CLEAN_BITS_CRX;

out:
@@ -839,9 +835,8 @@ static bool svm_handle_msr_write(union registers *guest_regs,
* TODO: This handles unaccelerated (non-AVIC) access. AVIC should
* be treated separately in svm_handle_avic_access().
*/
-static bool svm_handle_apic_access(struct per_cpu *cpu_data)
+static bool svm_handle_apic_access(struct vmcb *vmcb)
{
- struct vmcb *vmcb = &cpu_data->vmcb;
struct guest_paging_structures pg_structs;
unsigned int inst_len, offset;
bool is_write;
@@ -965,7 +960,7 @@ void vcpu_handle_exit(struct per_cpu *cpu_data)
vmcb->exitinfo2 < XAPIC_BASE + PAGE_SIZE) {
/* APIC access in non-AVIC mode */
cpu_data->stats[JAILHOUSE_CPU_STAT_VMEXITS_XAPIC]++;
- if (svm_handle_apic_access(cpu_data))
+ if (svm_handle_apic_access(vmcb))
return;
} else {
/* General MMIO (IOAPIC, PCI etc) */
--
2.1.4

Jan Kiszka

unread,
Apr 15, 2015, 1:26:10 AM4/15/15
to jailho...@googlegroups.com, Valentine Sinitsyn
By using set_svm_segment_from_segment for ldtr, we can remove the
condition from set_svm_segment_from_dtr.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/arch/x86/svm.c | 12 +++---------
1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/hypervisor/arch/x86/svm.c b/hypervisor/arch/x86/svm.c
index 5ac56d3..c187382 100644
--- a/hypervisor/arch/x86/svm.c
+++ b/hypervisor/arch/x86/svm.c
@@ -129,14 +129,8 @@ static int svm_check_features(void)
static void set_svm_segment_from_dtr(struct svm_segment *svm_segment,
const struct desc_table_reg *dtr)
{
- struct svm_segment tmp = { 0 };
-
- if (dtr) {
- tmp.base = dtr->base;
- tmp.limit = dtr->limit & 0xffff;
- }
-
- *svm_segment = tmp;
+ svm_segment->base = dtr->base;
+ svm_segment->limit = dtr->limit & 0xffff;
}

static void set_svm_segment_from_segment(struct svm_segment *svm_segment,
@@ -175,8 +169,8 @@ static int vmcb_setup(struct per_cpu *cpu_data)
set_svm_segment_from_segment(&vmcb->gs, &cpu_data->linux_gs);
set_svm_segment_from_segment(&vmcb->ss, &invalid_seg);
set_svm_segment_from_segment(&vmcb->tr, &cpu_data->linux_tss);
+ set_svm_segment_from_segment(&vmcb->ldtr, &invalid_seg);

- set_svm_segment_from_dtr(&vmcb->ldtr, NULL);
set_svm_segment_from_dtr(&vmcb->gdtr, &cpu_data->linux_gdtr);
set_svm_segment_from_dtr(&vmcb->idtr, &cpu_data->linux_idtr);

--
2.1.4

Jan Kiszka

unread,
Apr 15, 2015, 1:26:10 AM4/15/15
to jailho...@googlegroups.com, Valentine Sinitsyn
No need to maintain a return code variable when we can simply return
false directly.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/arch/x86/svm.c | 33 ++++++++++++---------------------
1 file changed, 12 insertions(+), 21 deletions(-)

diff --git a/hypervisor/arch/x86/svm.c b/hypervisor/arch/x86/svm.c
index 80caba4..5804f70 100644
--- a/hypervisor/arch/x86/svm.c
+++ b/hypervisor/arch/x86/svm.c
@@ -687,38 +687,33 @@ static bool svm_parse_mov_to_cr(struct vmcb *vmcb, unsigned long pc,
struct parse_context ctx = {};
/* No prefixes are supported yet */
u8 opcodes[] = {0x0f, 0x22}, modrm;
- bool ok = false;
int n;

ctx.remaining = ARRAY_SIZE(opcodes);
if (!vcpu_get_guest_paging_structs(&pg_structs))
- goto out;
+ return false;
ctx.cs_base = (vmcb->efer & EFER_LMA) ? 0 : vmcb->cs.base;

if (!ctx_advance(&ctx, &pc, &pg_structs))
- goto out;
+ return false;

- for (n = 0; n < ARRAY_SIZE(opcodes); n++, ctx.inst++) {
- if (*(ctx.inst) != opcodes[n])
- goto out;
- if (!ctx_advance(&ctx, &pc, &pg_structs))
- goto out;
- }
+ for (n = 0; n < ARRAY_SIZE(opcodes); n++, ctx.inst++)
+ if (*(ctx.inst) != opcodes[n] ||
+ !ctx_advance(&ctx, &pc, &pg_structs))
+ return false;

if (!ctx_advance(&ctx, &pc, &pg_structs))
- goto out;
+ return false;

modrm = *(ctx.inst);

if (((modrm & 0x38) >> 3) != reg)
- goto out;
+ return false;

if (gpr)
*gpr = (modrm & 0x7);

- ok = true;
-out:
- return ok;
+ return true;
}

/*
@@ -732,20 +727,17 @@ static bool svm_handle_cr(struct per_cpu *cpu_data)
struct vmcb *vmcb = &cpu_data->vmcb;
/* Workaround GCC 4.8 warning on uninitialized variable 'reg' */
unsigned long reg = -1, val, bits;
- bool ok = true;

if (has_assists) {
if (!(vmcb->exitinfo1 & (1UL << 63))) {
panic_printk("FATAL: Unsupported CR access (LMSW or CLTS)\n");
- ok = false;
- goto out;
+ return false;
}
reg = vmcb->exitinfo1 & 0x07;
} else {
if (!svm_parse_mov_to_cr(vmcb, vmcb->rip, 0, &reg)) {
panic_printk("FATAL: Unable to parse MOV-to-CR instruction\n");
- ok = false;
- goto out;
+ return false;
}
};

@@ -765,8 +757,7 @@ static bool svm_handle_cr(struct per_cpu *cpu_data)
update_efer(vmcb);
vmcb->clean_bits &= ~CLEAN_BITS_CRX;

-out:
- return ok;
+ return true;
}

static bool svm_handle_msr_write(struct per_cpu *cpu_data)
--
2.1.4

Jan Kiszka

unread,
Apr 15, 2015, 1:26:10 AM4/15/15
to jailho...@googlegroups.com, Valentine Sinitsyn
None of these MSRs is modified by Jailhouse after VM exit, thus they
still contain the state the Linux guest left behind.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/arch/x86/svm.c | 13 -------------
1 file changed, 13 deletions(-)

diff --git a/hypervisor/arch/x86/svm.c b/hypervisor/arch/x86/svm.c
index 7355aab..54be21d 100644
--- a/hypervisor/arch/x86/svm.c
+++ b/hypervisor/arch/x86/svm.c
@@ -446,19 +446,6 @@ void __attribute__((noreturn)) vcpu_deactivate_vmm(void)
unsigned long *stack = (unsigned long *)vmcb->rsp;
unsigned long linux_ip = vmcb->rip;

- /*
- * Restore the MSRs.
- *
- * XXX: One could argue this is better to be done in
- * arch_cpu_restore(), however, it would require changes
- * to cpu_data to store STAR and friends.
- */
- write_msr(MSR_STAR, vmcb->star);
- write_msr(MSR_LSTAR, vmcb->lstar);
- write_msr(MSR_CSTAR, vmcb->cstar);
- write_msr(MSR_SFMASK, vmcb->sfmask);
- write_msr(MSR_KERNGS_BASE, vmcb->kerngsbase);
-
cpu_data->linux_cr0 = vmcb->cr0;
cpu_data->linux_cr3 = vmcb->cr3;

--
2.1.4

Jan Kiszka

unread,
Apr 15, 2015, 1:26:11 AM4/15/15
to jailho...@googlegroups.com, Valentine Sinitsyn
The vendor code reads the state directly from the MSRs during setup.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/arch/x86/include/asm/percpu.h | 3 ---
hypervisor/arch/x86/setup.c | 4 ----
2 files changed, 7 deletions(-)

diff --git a/hypervisor/arch/x86/include/asm/percpu.h b/hypervisor/arch/x86/include/asm/percpu.h
index cb4309c..654ef1d 100644
--- a/hypervisor/arch/x86/include/asm/percpu.h
+++ b/hypervisor/arch/x86/include/asm/percpu.h
@@ -82,9 +82,6 @@ struct per_cpu {
struct segment linux_gs;
struct segment linux_tss;
unsigned long linux_efer;
- unsigned long linux_sysenter_cs;
- unsigned long linux_sysenter_eip;
- unsigned long linux_sysenter_esp;
/** @} */

/** Shadow states. @{ */
diff --git a/hypervisor/arch/x86/setup.c b/hypervisor/arch/x86/setup.c
index af090af..9899238 100644
--- a/hypervisor/arch/x86/setup.c
+++ b/hypervisor/arch/x86/setup.c
@@ -200,10 +200,6 @@ int arch_cpu_init(struct per_cpu *cpu_data)

cpu_data->linux_efer = read_msr(MSR_EFER);

- cpu_data->linux_sysenter_cs = read_msr(MSR_IA32_SYSENTER_CS);
- cpu_data->linux_sysenter_eip = read_msr(MSR_IA32_SYSENTER_EIP);
- cpu_data->linux_sysenter_esp = read_msr(MSR_IA32_SYSENTER_ESP);
-
cpu_data->initialized = true;

err = apic_cpu_init(cpu_data);
--
2.1.4

Jan Kiszka

unread,
Apr 15, 2015, 1:26:11 AM4/15/15
to jailho...@googlegroups.com, Valentine Sinitsyn
SVM does not touch this MSR on VM exit, thus does not require the
restoration done in arch_cpu_restore so far. Make it VMX-specific so
that we can drop a few lines of code.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/arch/x86/setup.c | 1 -
hypervisor/arch/x86/svm.c | 1 -
hypervisor/arch/x86/vmx.c | 3 ++-
3 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/hypervisor/arch/x86/setup.c b/hypervisor/arch/x86/setup.c
index 9899238..da48833 100644
--- a/hypervisor/arch/x86/setup.c
+++ b/hypervisor/arch/x86/setup.c
@@ -280,6 +280,5 @@ void arch_cpu_restore(struct per_cpu *cpu_data, int return_code)
gdt[cpu_data->linux_tss.selector / 8] &= ~DESC_TSS_BUSY;
asm volatile("ltr %%ax" : : "a" (cpu_data->linux_tss.selector));

- write_msr(MSR_FS_BASE, cpu_data->linux_fs.base);
write_msr(MSR_GS_BASE, cpu_data->linux_gs.base);
}
diff --git a/hypervisor/arch/x86/svm.c b/hypervisor/arch/x86/svm.c
index d35bbe1..fe0d463 100644
--- a/hypervisor/arch/x86/svm.c
+++ b/hypervisor/arch/x86/svm.c
@@ -459,7 +459,6 @@ void __attribute__((noreturn)) vcpu_deactivate_vmm(void)
cpu_data->linux_tss.selector = vmcb->tr.selector;

cpu_data->linux_efer = vmcb->efer & (~EFER_SVME);
- cpu_data->linux_fs.base = vmcb->fs.base;
cpu_data->linux_gs.base = vmcb->gs.base;

cpu_data->linux_ds.selector = vmcb->ds.selector;
diff --git a/hypervisor/arch/x86/vmx.c b/hypervisor/arch/x86/vmx.c
index 94a2083..dfd82a5 100644
--- a/hypervisor/arch/x86/vmx.c
+++ b/hypervisor/arch/x86/vmx.c
@@ -717,9 +717,10 @@ void __attribute__((noreturn)) vcpu_deactivate_vmm(void)
cpu_data->linux_tss.selector = vmcs_read32(GUEST_TR_SELECTOR);

cpu_data->linux_efer = vmcs_read64(GUEST_IA32_EFER);
- cpu_data->linux_fs.base = vmcs_read64(GUEST_FS_BASE);
cpu_data->linux_gs.base = vmcs_read64(GUEST_GS_BASE);

+ write_msr(MSR_FS_BASE, vmcs_read64(GUEST_FS_BASE));
+
write_msr(MSR_IA32_SYSENTER_CS, vmcs_read32(GUEST_SYSENTER_CS));
write_msr(MSR_IA32_SYSENTER_EIP, vmcs_read64(GUEST_SYSENTER_EIP));
write_msr(MSR_IA32_SYSENTER_ESP, vmcs_read64(GUEST_SYSENTER_ESP));
--
2.1.4

Jan Kiszka

unread,
Apr 15, 2015, 1:26:09 AM4/15/15
to jailho...@googlegroups.com, Valentine Sinitsyn
Reduce boilerplate code by using constants for common reset states.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/arch/x86/svm.c | 52 +++++++++++++++++------------------------------
1 file changed, 19 insertions(+), 33 deletions(-)

diff --git a/hypervisor/arch/x86/svm.c b/hypervisor/arch/x86/svm.c
index c187382..9a3afc5 100644
--- a/hypervisor/arch/x86/svm.c
+++ b/hypervisor/arch/x86/svm.c
@@ -526,6 +526,18 @@ void __attribute__((noreturn)) vcpu_deactivate_vmm(void)

static void svm_vcpu_reset(struct per_cpu *cpu_data, unsigned int sipi_vector)
{
+ static const struct svm_segment dataseg_reset_state = {
+ .selector = 0,
+ .base = 0,
+ .limit = 0xffff,
+ .access_rights = 0x0093,
+ };
+ static const struct svm_segment dtr_reset_state = {
+ .selector = 0,
+ .base = 0,
+ .limit = 0xffff,
+ .access_rights = 0,
+ };
struct vmcb *vmcb = &cpu_data->vmcb;
unsigned long val;
bool ok = true;
@@ -549,30 +561,11 @@ static void svm_vcpu_reset(struct per_cpu *cpu_data, unsigned int sipi_vector)
vmcb->cs.limit = 0xffff;
vmcb->cs.access_rights = 0x009b;

- vmcb->ds.selector = 0;
- vmcb->ds.base = 0;
- vmcb->ds.limit = 0xffff;
- vmcb->ds.access_rights = 0x0093;
-
- vmcb->es.selector = 0;
- vmcb->es.base = 0;
- vmcb->es.limit = 0xffff;
- vmcb->es.access_rights = 0x0093;
-
- vmcb->fs.selector = 0;
- vmcb->fs.base = 0;
- vmcb->fs.limit = 0xffff;
- vmcb->fs.access_rights = 0x0093;
-
- vmcb->gs.selector = 0;
- vmcb->gs.base = 0;
- vmcb->gs.limit = 0xffff;
- vmcb->gs.access_rights = 0x0093;
-
- vmcb->ss.selector = 0;
- vmcb->ss.base = 0;
- vmcb->ss.limit = 0xffff;
- vmcb->ss.access_rights = 0x0093;
+ vmcb->ds = dataseg_reset_state;
+ vmcb->es = dataseg_reset_state;
+ vmcb->fs = dataseg_reset_state;
+ vmcb->gs = dataseg_reset_state;
+ vmcb->ss = dataseg_reset_state;

vmcb->tr.selector = 0;
vmcb->tr.base = 0;
@@ -584,15 +577,8 @@ static void svm_vcpu_reset(struct per_cpu *cpu_data, unsigned int sipi_vector)
vmcb->ldtr.limit = 0xffff;
vmcb->ldtr.access_rights = 0x0082;

- vmcb->gdtr.selector = 0;
- vmcb->gdtr.base = 0;
- vmcb->gdtr.limit = 0xffff;
- vmcb->gdtr.access_rights = 0;
-
- vmcb->idtr.selector = 0;
- vmcb->idtr.base = 0;
- vmcb->idtr.limit = 0xffff;
- vmcb->idtr.access_rights = 0;
+ vmcb->gdtr = dtr_reset_state;
+ vmcb->idtr = dtr_reset_state;

vmcb->efer = EFER_SVME;

--
2.1.4

Jan Kiszka

unread,
Apr 15, 2015, 1:26:10 AM4/15/15
to jailho...@googlegroups.com, Valentine Sinitsyn
We can reduce the assembly required in vcpu_activate_vmm by reordering
svm_vmexit to svm_vmentry, i.e. pulling the VM entry logic to the front.
Moreover, RAX can be loaded directly. There is furthermore no need to
declare clobbered variables as we won't return from the assembly block,
which is already declared via __builtin_unreachable.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/arch/x86/include/asm/svm.h | 2 --
hypervisor/arch/x86/svm-vmexit.S | 16 ++++++++--------
hypervisor/arch/x86/svm.c | 13 +++----------
3 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/hypervisor/arch/x86/include/asm/svm.h b/hypervisor/arch/x86/include/asm/svm.h
index 89f88c6..8e6c73d 100644
--- a/hypervisor/arch/x86/include/asm/svm.h
+++ b/hypervisor/arch/x86/include/asm/svm.h
@@ -350,6 +350,4 @@ struct vmcb {
u64 res16[301];
} __attribute__((packed));

-void svm_vmexit(void);
-
#endif
diff --git a/hypervisor/arch/x86/svm-vmexit.S b/hypervisor/arch/x86/svm-vmexit.S
index 77c884d..e5d5fdd 100644
--- a/hypervisor/arch/x86/svm-vmexit.S
+++ b/hypervisor/arch/x86/svm-vmexit.S
@@ -12,9 +12,13 @@

#include <asm/asm-defines.h>

-/* SVM VM-exit handling */
- .globl svm_vmexit
-svm_vmexit:
+/* SVM VM entry and handling of VM exit */
+ .globl svm_vmentry
+svm_vmentry:
+ vmload %rax
+ vmrun %rax
+ vmsave %rax
+
/* XXX: GIF is always cleared here */
push -PERCPU_STACK_END+PERCPU_VMCB_RAX(%rsp)
push %rcx
@@ -55,8 +59,4 @@ svm_vmexit:
pop %rcx
pop -PERCPU_STACK_END+PERCPU_VMCB_RAX(%rsp)

- vmload %rax
- vmrun %rax
- vmsave %rax
-
- jmp svm_vmexit
+ jmp svm_vmentry
diff --git a/hypervisor/arch/x86/svm.c b/hypervisor/arch/x86/svm.c
index 5804f70..7355aab 100644
--- a/hypervisor/arch/x86/svm.c
+++ b/hypervisor/arch/x86/svm.c
@@ -432,17 +432,10 @@ void __attribute__((noreturn)) vcpu_activate_vmm(struct per_cpu *cpu_data)
"mov 0x18(%%rdi),%%r12\n\t"
"mov 0x20(%%rdi),%%rbx\n\t"
"mov 0x28(%%rdi),%%rbp\n\t"
- "mov %0, %%rax\n\t"
- "vmload %%rax\n\t"
- "vmrun %%rax\n\t"
- "vmsave %%rax\n\t"
- /* Restore hypervisor stack */
- "mov %2, %%rsp\n\t"
- "jmp svm_vmexit"
+ "mov %2,%%rsp\n\t"
+ "jmp svm_vmentry"
: /* no output */
- : "m" (vmcb_pa), "D" (cpu_data->linux_reg), "m" (host_stack)
- : "memory", "r15", "r14", "r13", "r12",
- "rbx", "rbp", "rax", "cc");
+ : "D" (cpu_data->linux_reg), "a" (vmcb_pa), "m" (host_stack));
__builtin_unreachable();
}

--
2.1.4

Jan Kiszka

unread,
Apr 15, 2015, 1:26:10 AM4/15/15
to jailho...@googlegroups.com, Valentine Sinitsyn
SVM does not overwrite these MSRs on VM exit, thus does not require the
restoration done in arch_cpu_restore so far. Make them VMX-specific so
that we can drop a few lines of code.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/arch/x86/setup.c | 4 ----
hypervisor/arch/x86/svm.c | 4 ----
hypervisor/arch/x86/vmx.c | 6 +++---
3 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/hypervisor/arch/x86/setup.c b/hypervisor/arch/x86/setup.c
index bb73439..af090af 100644
--- a/hypervisor/arch/x86/setup.c
+++ b/hypervisor/arch/x86/setup.c
@@ -286,8 +286,4 @@ void arch_cpu_restore(struct per_cpu *cpu_data, int return_code)

write_msr(MSR_FS_BASE, cpu_data->linux_fs.base);
write_msr(MSR_GS_BASE, cpu_data->linux_gs.base);
-
- write_msr(MSR_IA32_SYSENTER_CS, cpu_data->linux_sysenter_cs);
- write_msr(MSR_IA32_SYSENTER_EIP, cpu_data->linux_sysenter_eip);
- write_msr(MSR_IA32_SYSENTER_ESP, cpu_data->linux_sysenter_esp);
}
diff --git a/hypervisor/arch/x86/svm.c b/hypervisor/arch/x86/svm.c
index 54be21d..d35bbe1 100644
--- a/hypervisor/arch/x86/svm.c
+++ b/hypervisor/arch/x86/svm.c
@@ -462,10 +462,6 @@ void __attribute__((noreturn)) vcpu_deactivate_vmm(void)
cpu_data->linux_fs.base = vmcb->fs.base;
cpu_data->linux_gs.base = vmcb->gs.base;

- cpu_data->linux_sysenter_cs = vmcb->sysenter_cs;
- cpu_data->linux_sysenter_eip = vmcb->sysenter_eip;
- cpu_data->linux_sysenter_esp = vmcb->sysenter_esp;
-
cpu_data->linux_ds.selector = vmcb->ds.selector;
cpu_data->linux_es.selector = vmcb->es.selector;
cpu_data->linux_fs.selector = vmcb->fs.selector;
diff --git a/hypervisor/arch/x86/vmx.c b/hypervisor/arch/x86/vmx.c
index 0cb34bd..94a2083 100644
--- a/hypervisor/arch/x86/vmx.c
+++ b/hypervisor/arch/x86/vmx.c
@@ -720,9 +720,9 @@ void __attribute__((noreturn)) vcpu_deactivate_vmm(void)
cpu_data->linux_fs.base = vmcs_read64(GUEST_FS_BASE);
cpu_data->linux_gs.base = vmcs_read64(GUEST_GS_BASE);

- cpu_data->linux_sysenter_cs = vmcs_read32(GUEST_SYSENTER_CS);
- cpu_data->linux_sysenter_eip = vmcs_read64(GUEST_SYSENTER_EIP);
- cpu_data->linux_sysenter_esp = vmcs_read64(GUEST_SYSENTER_ESP);
+ write_msr(MSR_IA32_SYSENTER_CS, vmcs_read32(GUEST_SYSENTER_CS));
+ write_msr(MSR_IA32_SYSENTER_EIP, vmcs_read64(GUEST_SYSENTER_EIP));
+ write_msr(MSR_IA32_SYSENTER_ESP, vmcs_read64(GUEST_SYSENTER_ESP));

cpu_data->linux_ds.selector = vmcs_read16(GUEST_DS_SELECTOR);
cpu_data->linux_es.selector = vmcs_read16(GUEST_ES_SELECTOR);
--
2.1.4

Jan Kiszka

unread,
Apr 15, 2015, 1:26:10 AM4/15/15
to jailho...@googlegroups.com, Valentine Sinitsyn
vmcb writing cannot fail on AMD, thus neither vmcb_setup nor
svm_set_cell_config can. Simply remove the error codes and related
handling.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/arch/x86/svm.c | 21 +++++----------------
1 file changed, 5 insertions(+), 16 deletions(-)

diff --git a/hypervisor/arch/x86/svm.c b/hypervisor/arch/x86/svm.c
index 9a3afc5..80caba4 100644
--- a/hypervisor/arch/x86/svm.c
+++ b/hypervisor/arch/x86/svm.c
@@ -143,16 +143,13 @@ static void set_svm_segment_from_segment(struct svm_segment *svm_segment,
svm_segment->base = segment->base;
}

-static bool svm_set_cell_config(struct cell *cell, struct vmcb *vmcb)
+static void svm_set_cell_config(struct cell *cell, struct vmcb *vmcb)
{
- /* No real need for this function; used for consistency with vmx.c */
vmcb->iopm_base_pa = paging_hvirt2phys(cell->svm.iopm);
vmcb->n_cr3 = paging_hvirt2phys(cell->svm.npt_structs.root_table);
-
- return true;
}

-static int vmcb_setup(struct per_cpu *cpu_data)
+static void vmcb_setup(struct per_cpu *cpu_data)
{
struct vmcb *vmcb = &cpu_data->vmcb;

@@ -220,7 +217,7 @@ static int vmcb_setup(struct per_cpu *cpu_data)
/* Explicitly mark all of the state as new */
vmcb->clean_bits = 0;

- return svm_set_cell_config(cpu_data->cell, vmcb);
+ svm_set_cell_config(cpu_data->cell, vmcb);
}

unsigned long arch_paging_gphys2phys(struct per_cpu *cpu_data,
@@ -376,8 +373,7 @@ int vcpu_init(struct per_cpu *cpu_data)

cpu_data->svm_state = SVMON;

- if (!vmcb_setup(cpu_data))
- return trace_error(-EIO);
+ vmcb_setup(cpu_data);

/*
* APM Volume 2, 3.1.1: "When writing the CR0 register, software should
@@ -540,7 +536,6 @@ static void svm_vcpu_reset(struct per_cpu *cpu_data, unsigned int sipi_vector)
};
struct vmcb *vmcb = &cpu_data->vmcb;
unsigned long val;
- bool ok = true;

vmcb->cr0 = X86_CR0_NW | X86_CR0_CD | X86_CR0_ET;
vmcb->cr3 = 0;
@@ -597,13 +592,7 @@ static void svm_vcpu_reset(struct per_cpu *cpu_data, unsigned int sipi_vector)
/* Almost all of the guest state changed */
vmcb->clean_bits = 0;

- ok &= svm_set_cell_config(cpu_data->cell, vmcb);
-
- /* This is always false, but to be consistent with vmx.c... */
- if (!ok) {
- panic_printk("FATAL: CPU reset failed\n");
- panic_stop();
- }
+ svm_set_cell_config(cpu_data->cell, vmcb);
}

void vcpu_skip_emulated_instruction(unsigned int inst_len)
--
2.1.4

Jan Kiszka

unread,
Apr 15, 2015, 1:26:11 AM4/15/15
to jailho...@googlegroups.com, Valentine Sinitsyn
Benchmarks indicate that we can gain about 160 cycles per VM exit &
reentry by only saving/restoring MSR_GS_BASE. We don't touch the other
states that vmload/vmsave deals with.

Specifically, we don't depend on a valid TR/TSS while in root mode
because has neither in userspace nor uses the IST for interrupts or
exceptions.

We still need to perform vmload on handover (actually, we only need to
load MSR_GS_BASE, but vmload is simpler) and after VCPU reset. And as we
no longer save the full state, also for shutdown, we need to pull the
missing information for arch_cpu_restore directly from the registers.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/arch/x86/svm-vmexit.S | 2 --
hypervisor/arch/x86/svm.c | 35 ++++++++++++++++++++++++-----------
2 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/hypervisor/arch/x86/svm-vmexit.S b/hypervisor/arch/x86/svm-vmexit.S
index e5d5fdd..7d0af0a 100644
--- a/hypervisor/arch/x86/svm-vmexit.S
+++ b/hypervisor/arch/x86/svm-vmexit.S
@@ -15,9 +15,7 @@
/* SVM VM entry and handling of VM exit */
.globl svm_vmentry
svm_vmentry:
- vmload %rax
vmrun %rax
- vmsave %rax

/* XXX: GIF is always cleared here */
push -PERCPU_STACK_END+PERCPU_VMCB_RAX(%rsp)
diff --git a/hypervisor/arch/x86/svm.c b/hypervisor/arch/x86/svm.c
index fe0d463..3ae81ca 100644
--- a/hypervisor/arch/x86/svm.c
+++ b/hypervisor/arch/x86/svm.c
@@ -433,6 +433,7 @@ void __attribute__((noreturn)) vcpu_activate_vmm(struct per_cpu *cpu_data)
"mov 0x20(%%rdi),%%rbx\n\t"
"mov 0x28(%%rdi),%%rbp\n\t"
"mov %2,%%rsp\n\t"
+ "vmload %%rax\n\t"
"jmp svm_vmentry"
: /* no output */
: "D" (cpu_data->linux_reg), "a" (vmcb_pa), "m" (host_stack));
@@ -456,15 +457,16 @@ void __attribute__((noreturn)) vcpu_deactivate_vmm(void)

cpu_data->linux_cs.selector = vmcb->cs.selector;

- cpu_data->linux_tss.selector = vmcb->tr.selector;
+ asm volatile("str %0" : "=m" (cpu_data->linux_tss.selector));

cpu_data->linux_efer = vmcb->efer & (~EFER_SVME);
cpu_data->linux_gs.base = vmcb->gs.base;

cpu_data->linux_ds.selector = vmcb->ds.selector;
cpu_data->linux_es.selector = vmcb->es.selector;
- cpu_data->linux_fs.selector = vmcb->fs.selector;
- cpu_data->linux_gs.selector = vmcb->gs.selector;
+
+ asm volatile("mov %%fs,%0" : "=m" (cpu_data->linux_fs.selector));
+ asm volatile("mov %%gs,%0" : "=m" (cpu_data->linux_gs.selector));

arch_cpu_restore(cpu_data, 0);

@@ -568,6 +570,12 @@ static void svm_vcpu_reset(struct per_cpu *cpu_data, unsigned int sipi_vector)
vmcb->clean_bits = 0;

svm_set_cell_config(cpu_data->cell, vmcb);
+
+ asm volatile(
+ "vmload %%rax"
+ : : "a" (paging_hvirt2phys(vmcb)) : "memory");
+ /* vmload overwrites GS_BASE - restore the host state */
+ write_msr(MSR_GS_BASE, (unsigned long)cpu_data);
}

void vcpu_skip_emulated_instruction(unsigned int inst_len)
@@ -832,6 +840,8 @@ void vcpu_handle_exit(struct per_cpu *cpu_data)
bool res = false;
int sipi_vector;

+ vmcb->gs.base = read_msr(MSR_GS_BASE);
+
/* Restore GS value expected by per_cpu data accessors */
write_msr(MSR_GS_BASE, (unsigned long)cpu_data);

@@ -859,14 +869,14 @@ void vcpu_handle_exit(struct per_cpu *cpu_data)
vcpu_reset();
}
iommu_check_pending_faults(cpu_data);
- return;
+ goto vmentry;
case VMEXIT_VMMCALL:
vcpu_handle_hypercall();
- return;
+ goto vmentry;
case VMEXIT_CR0_SEL_WRITE:
cpu_data->stats[JAILHOUSE_CPU_STAT_VMEXITS_CR]++;
if (svm_handle_cr(cpu_data))
- return;
+ goto vmentry;
break;
case VMEXIT_MSR:
cpu_data->stats[JAILHOUSE_CPU_STAT_VMEXITS_MSR]++;
@@ -875,7 +885,7 @@ void vcpu_handle_exit(struct per_cpu *cpu_data)
else
res = svm_handle_msr_write(cpu_data);
if (res)
- return;
+ goto vmentry;
break;
case VMEXIT_NPF:
if ((vmcb->exitinfo1 & 0x7) == 0x7 &&
@@ -884,12 +894,12 @@ void vcpu_handle_exit(struct per_cpu *cpu_data)
/* APIC access in non-AVIC mode */
cpu_data->stats[JAILHOUSE_CPU_STAT_VMEXITS_XAPIC]++;
if (svm_handle_apic_access(vmcb))
- return;
+ goto vmentry;
} else {
/* General MMIO (IOAPIC, PCI etc) */
cpu_data->stats[JAILHOUSE_CPU_STAT_VMEXITS_MMIO]++;
if (vcpu_handle_mmio_access())
- return;
+ goto vmentry;
}

panic_printk("FATAL: Unhandled Nested Page Fault for (%p), "
@@ -898,12 +908,12 @@ void vcpu_handle_exit(struct per_cpu *cpu_data)
break;
case VMEXIT_XSETBV:
if (vcpu_handle_xsetbv())
- return;
+ goto vmentry;
break;
case VMEXIT_IOIO:
cpu_data->stats[JAILHOUSE_CPU_STAT_VMEXITS_PIO]++;
if (vcpu_handle_io_access())
- return;
+ goto vmentry;
break;
/* TODO: Handle VMEXIT_AVIC_NOACCEL and VMEXIT_AVIC_INCOMPLETE_IPI */
default:
@@ -913,6 +923,9 @@ void vcpu_handle_exit(struct per_cpu *cpu_data)
}
dump_guest_regs(&cpu_data->guest_regs, vmcb);
panic_park();
+
+vmentry:
+ write_msr(MSR_GS_BASE, vmcb->gs.base);
}

void vcpu_park(void)
--
2.1.4

Valentine Sinitsyn

unread,
Apr 15, 2015, 2:51:04 AM4/15/15
to Jan Kiszka, jailho...@googlegroups.com
A usual question: have you tried it? I remember I had obscure bugs due
to some of these not being restored properly, but unfortunately I don't
remember which one (likely something syscall-related, but still worth
checking).

Valentine

Valentine Sinitsyn

unread,
Apr 15, 2015, 2:53:27 AM4/15/15
to Jan Kiszka, jailho...@googlegroups.com
On 15.04.2015 10:25, Jan Kiszka wrote:
> We can calculate PERCPU_VMCB_RAX directly and save the two intermediate
> steps.
>
> Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
> ---
> hypervisor/arch/x86/asm-defines.c | 3 +--
> hypervisor/arch/x86/svm-vmexit.S | 2 --
> 2 files changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/hypervisor/arch/x86/asm-defines.c b/hypervisor/arch/x86/asm-defines.c
> index 7dfae6e..f53d95e 100644
> --- a/hypervisor/arch/x86/asm-defines.c
> +++ b/hypervisor/arch/x86/asm-defines.c
> @@ -22,8 +22,7 @@ void common(void)
> OFFSET(PERCPU_LINUX_SP, per_cpu, linux_sp);
> BLANK();
>
> - OFFSET(PERCPU_VMCB, per_cpu, vmcb);
> - OFFSET(VMCB_RAX, vmcb, rax);
> + OFFSET(PERCPU_VMCB_RAX, per_cpu, vmcb.rax);
> BLANK();
Wasn't it the original way it worked in one of my early patch series,
before the review? ;-)

Valentine

Valentine Sinitsyn

unread,
Apr 15, 2015, 2:55:36 AM4/15/15
to Jan Kiszka, jailho...@googlegroups.com
On 15.04.2015 10:25, Jan Kiszka wrote:
> This functions is SVM-specific.
In fact, it isn't IIRC. However, it is used only in svm.c now.
Maybe it would be better to factor out this one and similar function for
MMIO access parsing into separate file, say disasm.c?
Valentine

Jan Kiszka

unread,
Apr 15, 2015, 3:06:47 AM4/15/15
to Valentine Sinitsyn, jailho...@googlegroups.com
On 2015-04-15 08:55, Valentine Sinitsyn wrote:
> On 15.04.2015 10:25, Jan Kiszka wrote:
>> This functions is SVM-specific.
> In fact, it isn't IIRC. However, it is used only in svm.c now.
> Maybe it would be better to factor out this one and similar function for
> MMIO access parsing into separate file, say disasm.c?

This is best done when we have a second user. Do you see one in the
existing code? CR handling in VMX is definitely not a candidate.

Jan

--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

Jan Kiszka

unread,
Apr 15, 2015, 3:07:50 AM4/15/15
to Valentine Sinitsyn, jailho...@googlegroups.com
Frankly, I don't remember. But I wouldn't be surprised if the reviewer
changed his mind again. ;)

Jan Kiszka

unread,
Apr 15, 2015, 3:10:07 AM4/15/15
to Valentine Sinitsyn, jailho...@googlegroups.com
Yes, the series was run with the usual tests on our AMD G-Series target.
But I hope you can try them as well on your boards.

The question would be how these MSRs could get possibly corrupted.

Valentine Sinitsyn

unread,
Apr 15, 2015, 3:26:00 AM4/15/15
to Jan Kiszka, jailho...@googlegroups.com
On 15.04.2015 12:06, Jan Kiszka wrote:
> On 2015-04-15 08:55, Valentine Sinitsyn wrote:
>> On 15.04.2015 10:25, Jan Kiszka wrote:
>>> This functions is SVM-specific.
>> In fact, it isn't IIRC. However, it is used only in svm.c now.
>> Maybe it would be better to factor out this one and similar function for
>> MMIO access parsing into separate file, say disasm.c?
>
> This is best done when we have a second user. Do you see one in the
> existing code? CR handling in VMX is definitely not a candidate.
No, svm.c is a single user here.

Valentine

Valentine Sinitsyn

unread,
Apr 15, 2015, 3:30:21 AM4/15/15
to Jan Kiszka, jailho...@googlegroups.com
Shame to say, but I ruined the OS on the only HDD I had available for
testing boards some time ago. Apparently not an issue, but I haven't got
time to install a new one and make a pre-configured snapshot since then.

Will do shortly, and then test everything. I will need it for giving my
IOMMU code a try anyway ;)

Valentine

Valentine Sinitsyn

unread,
Apr 15, 2015, 3:36:14 AM4/15/15
to jailho...@googlegroups.com
On 15.04.2015 10:25, Jan Kiszka wrote:
> SVM does not overwrite these MSRs on VM exit, thus does not require the
> restoration done in arch_cpu_restore so far. Make them VMX-specific so
> that we can drop a few lines of code.
>
> Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
> ---
> hypervisor/arch/x86/setup.c | 4 ----
> hypervisor/arch/x86/svm.c | 4 ----
> hypervisor/arch/x86/vmx.c | 6 +++---
> 3 files changed, 3 insertions(+), 11 deletions(-)
>
> diff --git a/hypervisor/arch/x86/setup.c b/hypervisor/arch/x86/setup.c
> index bb73439..af090af 100644
> --- a/hypervisor/arch/x86/setup.c
> +++ b/hypervisor/arch/x86/setup.c
> @@ -286,8 +286,4 @@ void arch_cpu_restore(struct per_cpu *cpu_data, int return_code)
>
> write_msr(MSR_FS_BASE, cpu_data->linux_fs.base);
> write_msr(MSR_GS_BASE, cpu_data->linux_gs.base);
> -
> - write_msr(MSR_IA32_SYSENTER_CS, cpu_data->linux_sysenter_cs);
> - write_msr(MSR_IA32_SYSENTER_EIP, cpu_data->linux_sysenter_eip);
> - write_msr(MSR_IA32_SYSENTER_ESP, cpu_data->linux_sysenter_esp);
I guess these were the MSRs I referred to. No problem then, as you
removed them form the generic path (i.e. vcpu_deactivate_vmm()).
Valentine

Valentine Sinitsyn

unread,
Apr 15, 2015, 3:45:03 AM4/15/15
to Jan Kiszka, jailho...@googlegroups.com
On 15.04.2015 10:26, Jan Kiszka wrote:
> Benchmarks indicate that we can gain about 160 cycles per VM exit &
> reentry by only saving/restoring MSR_GS_BASE. We don't touch the other
> states that vmload/vmsave deals with.
>
> Specifically, we don't depend on a valid TR/TSS while in root mode
> because has neither in userspace nor uses the IST for interrupts or
> exceptions.
nor what? I'm not nagging here, just feel this is important as it
provides some summary for 10+ message thread.
С уважением,
Синицын Валентин

Jan Kiszka

unread,
Apr 15, 2015, 3:49:01 AM4/15/15
to Valentine Sinitsyn, jailho...@googlegroups.com
On 2015-04-15 09:44, Valentine Sinitsyn wrote:
> On 15.04.2015 10:26, Jan Kiszka wrote:
>> Benchmarks indicate that we can gain about 160 cycles per VM exit &
>> reentry by only saving/restoring MSR_GS_BASE. We don't touch the other
>> states that vmload/vmsave deals with.
>>
>> Specifically, we don't depend on a valid TR/TSS while in root mode
>> because has neither in userspace nor uses the IST for interrupts or
>> exceptions.
> nor what? I'm not nagging here, just feel this is important as it
> provides some summary for 10+ message thread.

There is a subject in the sentence missing anyway:

"because Jailhouse has neither in userspace nor uses the IST for
interrupts or exceptions, thus does not try to access the TSS."

Better? Otherwise suggest a wording. I'll fix that up before merge.

Valentine Sinitsyn

unread,
Apr 15, 2015, 3:52:05 AM4/15/15
to Jan Kiszka, jailho...@googlegroups.com
On 15.04.2015 12:48, Jan Kiszka wrote:
> On 2015-04-15 09:44, Valentine Sinitsyn wrote:
>> On 15.04.2015 10:26, Jan Kiszka wrote:
>>> Benchmarks indicate that we can gain about 160 cycles per VM exit &
>>> reentry by only saving/restoring MSR_GS_BASE. We don't touch the other
>>> states that vmload/vmsave deals with.
>>>
>>> Specifically, we don't depend on a valid TR/TSS while in root mode
>>> because has neither in userspace nor uses the IST for interrupts or
>>> exceptions.
>> nor what? I'm not nagging here, just feel this is important as it
>> provides some summary for 10+ message thread.
>
> There is a subject in the sentence missing anyway:
>
> "because Jailhouse has neither in userspace nor uses the IST for
> interrupts or exceptions, thus does not try to access the TSS."
>
> Better? Otherwise suggest a wording. I'll fix that up before merge.
This is OK, thanks.

Valentine
Reply all
Reply to author
Forward
0 new messages