Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH v2 00/21] arm64: Virtualization Host Extension support

112 views
Skip to first unread message

Marc Zyngier

unread,
Jan 25, 2016, 10:54:26 AM1/25/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
ARMv8.1 comes with the "Virtualization Host Extension" (VHE for
short), which enables simpler support of Type-2 hypervisors.

This extension allows the kernel to directly run at EL2, and
significantly reduces the number of system registers shared between
host and guest, reducing the overhead of virtualization.

In order to have the same kernel binary running on all versions of the
architecture, this series makes heavy use of runtime code patching.

The first 20 patches massage the KVM code to deal with VHE and enable
Linux to run at EL2. The last patch catches an ugly case when VHE
capable CPUs are paired with some of their less capable siblings. This
should never happen, but hey...

I have deliberately left out some of the more "advanced"
optimizations, as they are likely to distract the reviewer from the
core infrastructure, which is what I care about at the moment.

A few things to note:

- Given that the code has been almost entierely rewritten, I've
dropped all Acks from the new patches

- GDB is currently busted on VHE systems, as it checks for version 6
on the debug architecture, while VHE is version 7. The binutils
people are on the case.

This has been tested on the FVP_Base_SLV-V8-A model, and based on
v4.5-rc1. I've put a branch out on:

git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/vhe

* From v1:
- Full rewrite now that the World Switch is written in C code.
- Dropped the "early IRQ handling" for the moment.

Marc Zyngier (21):
arm/arm64: Add new is_kernel_in_hyp_mode predicate
arm64: Allow the arch timer to use the HYP timer
arm64: Add ARM64_HAS_VIRT_HOST_EXTN feature
arm64: KVM: Skip HYP setup when already running in HYP
arm64: KVM: VHE: Turn VTCR_EL2 setup into a reusable macro
arm64: KVM: VHE: Patch out use of HVC
arm64: KVM: VHE: Patch out kern_hyp_va
arm64: KVM: VHE: Introduce unified system register accessors
arm64: KVM: VHE: Differenciate host/guest sysreg save/restore
arm64: KVM: VHE: Split save/restore of sysregs shared between EL1 and
EL2
arm64: KVM: VHE: Use unified system register accessors
arm64: KVM: VHE: Enable minimal sysreg save/restore
arm64: KVM: VHE: Make __fpsimd_enabled VHE aware
arm64: KVM: VHE: Implement VHE activate/deactivate_traps
arm64: KVM: VHE: Use unified sysreg accessors for timer
arm64: KVM: VHE: Add fpsimd enabling on guest access
arm64: KVM: VHE: Add alternative panic handling
arm64: KVM: Introduce hyp_alternate_value helper
arm64: KVM: Move most of the fault decoding to C
arm64: VHE: Add support for running Linux in EL2 mode
arm64: Panic when VHE and non VHE CPUs coexist

arch/arm/include/asm/virt.h | 5 ++
arch/arm/kvm/arm.c | 151 +++++++++++++++++++------------
arch/arm/kvm/mmu.c | 7 ++
arch/arm64/Kconfig | 13 +++
arch/arm64/include/asm/cpufeature.h | 3 +-
arch/arm64/include/asm/kvm_arm.h | 1 +
arch/arm64/include/asm/kvm_emulate.h | 3 +
arch/arm64/include/asm/kvm_mmu.h | 34 ++++++-
arch/arm64/include/asm/virt.h | 27 ++++++
arch/arm64/kernel/asm-offsets.c | 3 -
arch/arm64/kernel/cpufeature.c | 15 +++-
arch/arm64/kernel/head.S | 51 ++++++++++-
arch/arm64/kernel/smp.c | 3 +
arch/arm64/kvm/hyp-init.S | 18 +---
arch/arm64/kvm/hyp.S | 7 ++
arch/arm64/kvm/hyp/entry.S | 6 ++
arch/arm64/kvm/hyp/hyp-entry.S | 107 +++++++---------------
arch/arm64/kvm/hyp/hyp.h | 119 ++++++++++++++++++++++--
arch/arm64/kvm/hyp/switch.c | 170 +++++++++++++++++++++++++++++++----
arch/arm64/kvm/hyp/sysreg-sr.c | 147 ++++++++++++++++++++----------
arch/arm64/kvm/hyp/timer-sr.c | 10 +--
drivers/clocksource/arm_arch_timer.c | 96 ++++++++++++--------
22 files changed, 724 insertions(+), 272 deletions(-)

--
2.1.4

Marc Zyngier

unread,
Jan 25, 2016, 10:54:33 AM1/25/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
With ARMv8.1 VHE extension, it will be possible to run the kernel
at EL2 (aka HYP mode). In order for the kernel to easily find out
where it is running, add a new predicate that returns whether or
not the kernel is in HYP mode.

For completeness, the 32bit code also get such a predicate (always
returning false) so that code common to both architecture (timers,
KVM) can use it transparently.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm/include/asm/virt.h | 5 +++++
arch/arm64/include/asm/virt.h | 10 ++++++++++
2 files changed, 15 insertions(+)

diff --git a/arch/arm/include/asm/virt.h b/arch/arm/include/asm/virt.h
index 4371f45..b6a3cef 100644
--- a/arch/arm/include/asm/virt.h
+++ b/arch/arm/include/asm/virt.h
@@ -74,6 +74,11 @@ static inline bool is_hyp_mode_mismatched(void)
{
return !!(__boot_cpu_mode & BOOT_CPU_MODE_MISMATCH);
}
+
+static inline bool is_kernel_in_hyp_mode(void)
+{
+ return false;
+}
#endif

#endif /* __ASSEMBLY__ */
diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index 7a5df52..9f22dd6 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -23,6 +23,8 @@

#ifndef __ASSEMBLY__

+#include <asm/ptrace.h>
+
/*
* __boot_cpu_mode records what mode CPUs were booted in.
* A correctly-implemented bootloader must start all CPUs in the same mode:
@@ -50,6 +52,14 @@ static inline bool is_hyp_mode_mismatched(void)
return __boot_cpu_mode[0] != __boot_cpu_mode[1];
}

+static inline bool is_kernel_in_hyp_mode(void)
+{
+ u64 el;
+
+ asm("mrs %0, CurrentEL" : "=r" (el));
+ return el == CurrentEL_EL2;
+}
+
/* The section containing the hypervisor text */
extern char __hyp_text_start[];
extern char __hyp_text_end[];
--
2.1.4

Marc Zyngier

unread,
Jan 25, 2016, 10:54:54 AM1/25/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On a VHE-capable system, there is no point in setting VTCR_EL2
at KVM init time. We can perfectly set it up when the kernel
boots, removing the need for a more complicated configuration.

In order to allow this, turn VTCR_EL2 setup into a macro that
we'll be able to reuse at boot time.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/include/asm/kvm_mmu.h | 23 +++++++++++++++++++++++
arch/arm64/kvm/hyp-init.S | 18 +-----------------
2 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 7364339..d3e6d7b 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -56,6 +56,29 @@

#ifdef __ASSEMBLY__

+#include <asm/kvm_arm.h>
+
+.macro setup_vtcr tmp1, tmp2
+ mov \tmp1, #(VTCR_EL2_FLAGS & 0xffff)
+ movk \tmp1, #(VTCR_EL2_FLAGS >> 16), lsl #16
+ /*
+ * Read the PARange bits from ID_AA64MMFR0_EL1 and set the PS bits in
+ * VTCR_EL2.
+ */
+ mrs \tmp2, id_aa64mmfr0_el1
+ bfi \tmp1, \tmp2, #16, #3
+ /*
+ * Read the VMIDBits bits from ID_AA64MMFR1_EL1 and set the VS bit in
+ * VTCR_EL2.
+ */
+ mrs \tmp2, ID_AA64MMFR1_EL1
+ ubfx \tmp2, \tmp2, #5, #1
+ lsl \tmp2, \tmp2, #VTCR_EL2_VS
+ orr \tmp1, \tmp1, \tmp2
+
+ msr vtcr_el2, \tmp1
+ isb
+.endm
/*
* Convert a kernel VA into a HYP VA.
* reg: VA to be converted.
diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
index 3e568dc..4143e2c 100644
--- a/arch/arm64/kvm/hyp-init.S
+++ b/arch/arm64/kvm/hyp-init.S
@@ -87,23 +87,7 @@ __do_hyp_init:
#endif
msr tcr_el2, x4

- ldr x4, =VTCR_EL2_FLAGS
- /*
- * Read the PARange bits from ID_AA64MMFR0_EL1 and set the PS bits in
- * VTCR_EL2.
- */
- mrs x5, ID_AA64MMFR0_EL1
- bfi x4, x5, #16, #3
- /*
- * Read the VMIDBits bits from ID_AA64MMFR1_EL1 and set the VS bit in
- * VTCR_EL2.
- */
- mrs x5, ID_AA64MMFR1_EL1
- ubfx x5, x5, #5, #1
- lsl x5, x5, #VTCR_EL2_VS
- orr x4, x4, x5
-
- msr vtcr_el2, x4
+ setup_vtcr x4, x5

mrs x4, mair_el1
msr mair_el2, x4
--
2.1.4

Marc Zyngier

unread,
Jan 25, 2016, 10:54:58 AM1/25/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
A handful of system registers are still shared between EL1 and EL2,
even while using VHE. These are tpidr*_el[01], actlr_el1, sp0, elr,
and spsr.

In order to facilitate the introduction of a VHE-specific sysreg
save/restore, make move the access to these registers to their
own save/restore functions.

No functionnal change.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/kvm/hyp/sysreg-sr.c | 48 +++++++++++++++++++++++++++++-------------
1 file changed, 33 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index bd5b543..61bad17 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -23,13 +23,29 @@

#include "hyp.h"

-/* ctxt is already in the HYP VA space */
+/*
+ * Non-VHE: Both host and guest must save everything.
+ *
+ * VHE: Host must save tpidr*_el[01], actlr_el1, sp0, pc, pstate, and
+ * guest must save everything.
+ */
+
+static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
+{
+ ctxt->sys_regs[ACTLR_EL1] = read_sysreg(actlr_el1);
+ ctxt->sys_regs[TPIDR_EL0] = read_sysreg(tpidr_el0);
+ ctxt->sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0);
+ ctxt->sys_regs[TPIDR_EL1] = read_sysreg(tpidr_el1);
+ ctxt->gp_regs.regs.sp = read_sysreg(sp_el0);
+ ctxt->gp_regs.regs.pc = read_sysreg(elr_el2);
+ ctxt->gp_regs.regs.pstate = read_sysreg(spsr_el2);
+}
+
static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
{
ctxt->sys_regs[MPIDR_EL1] = read_sysreg(vmpidr_el2);
ctxt->sys_regs[CSSELR_EL1] = read_sysreg(csselr_el1);
ctxt->sys_regs[SCTLR_EL1] = read_sysreg(sctlr_el1);
- ctxt->sys_regs[ACTLR_EL1] = read_sysreg(actlr_el1);
ctxt->sys_regs[CPACR_EL1] = read_sysreg(cpacr_el1);
ctxt->sys_regs[TTBR0_EL1] = read_sysreg(ttbr0_el1);
ctxt->sys_regs[TTBR1_EL1] = read_sysreg(ttbr1_el1);
@@ -41,17 +57,11 @@ static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
ctxt->sys_regs[MAIR_EL1] = read_sysreg(mair_el1);
ctxt->sys_regs[VBAR_EL1] = read_sysreg(vbar_el1);
ctxt->sys_regs[CONTEXTIDR_EL1] = read_sysreg(contextidr_el1);
- ctxt->sys_regs[TPIDR_EL0] = read_sysreg(tpidr_el0);
- ctxt->sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0);
- ctxt->sys_regs[TPIDR_EL1] = read_sysreg(tpidr_el1);
ctxt->sys_regs[AMAIR_EL1] = read_sysreg(amair_el1);
ctxt->sys_regs[CNTKCTL_EL1] = read_sysreg(cntkctl_el1);
ctxt->sys_regs[PAR_EL1] = read_sysreg(par_el1);
ctxt->sys_regs[MDSCR_EL1] = read_sysreg(mdscr_el1);

- ctxt->gp_regs.regs.sp = read_sysreg(sp_el0);
- ctxt->gp_regs.regs.pc = read_sysreg(elr_el2);
- ctxt->gp_regs.regs.pstate = read_sysreg(spsr_el2);
ctxt->gp_regs.sp_el1 = read_sysreg(sp_el1);
ctxt->gp_regs.elr_el1 = read_sysreg(elr_el1);
ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg(spsr_el1);
@@ -60,11 +70,24 @@ static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
void __hyp_text __sysreg_save_host_state(struct kvm_cpu_context *ctxt)
{
__sysreg_save_state(ctxt);
+ __sysreg_save_common_state(ctxt);
}

void __hyp_text __sysreg_save_guest_state(struct kvm_cpu_context *ctxt)
{
__sysreg_save_state(ctxt);
+ __sysreg_save_common_state(ctxt);
+}
+
+static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctxt)
+{
+ write_sysreg(ctxt->sys_regs[ACTLR_EL1], actlr_el1);
+ write_sysreg(ctxt->sys_regs[TPIDR_EL0], tpidr_el0);
+ write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], tpidrro_el0);
+ write_sysreg(ctxt->sys_regs[TPIDR_EL1], tpidr_el1);
+ write_sysreg(ctxt->gp_regs.regs.sp, sp_el0);
+ write_sysreg(ctxt->gp_regs.regs.pc, elr_el2);
+ write_sysreg(ctxt->gp_regs.regs.pstate, spsr_el2);
}

static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
@@ -72,7 +95,6 @@ static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
write_sysreg(ctxt->sys_regs[MPIDR_EL1], vmpidr_el2);
write_sysreg(ctxt->sys_regs[CSSELR_EL1], csselr_el1);
write_sysreg(ctxt->sys_regs[SCTLR_EL1], sctlr_el1);
- write_sysreg(ctxt->sys_regs[ACTLR_EL1], actlr_el1);
write_sysreg(ctxt->sys_regs[CPACR_EL1], cpacr_el1);
write_sysreg(ctxt->sys_regs[TTBR0_EL1], ttbr0_el1);
write_sysreg(ctxt->sys_regs[TTBR1_EL1], ttbr1_el1);
@@ -84,17 +106,11 @@ static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
write_sysreg(ctxt->sys_regs[MAIR_EL1], mair_el1);
write_sysreg(ctxt->sys_regs[VBAR_EL1], vbar_el1);
write_sysreg(ctxt->sys_regs[CONTEXTIDR_EL1], contextidr_el1);
- write_sysreg(ctxt->sys_regs[TPIDR_EL0], tpidr_el0);
- write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], tpidrro_el0);
- write_sysreg(ctxt->sys_regs[TPIDR_EL1], tpidr_el1);
write_sysreg(ctxt->sys_regs[AMAIR_EL1], amair_el1);
write_sysreg(ctxt->sys_regs[CNTKCTL_EL1], cntkctl_el1);
write_sysreg(ctxt->sys_regs[PAR_EL1], par_el1);
write_sysreg(ctxt->sys_regs[MDSCR_EL1], mdscr_el1);

- write_sysreg(ctxt->gp_regs.regs.sp, sp_el0);
- write_sysreg(ctxt->gp_regs.regs.pc, elr_el2);
- write_sysreg(ctxt->gp_regs.regs.pstate, spsr_el2);
write_sysreg(ctxt->gp_regs.sp_el1, sp_el1);
write_sysreg(ctxt->gp_regs.elr_el1, elr_el1);
write_sysreg(ctxt->gp_regs.spsr[KVM_SPSR_EL1], spsr_el1);
@@ -103,11 +119,13 @@ static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
void __hyp_text __sysreg_restore_host_state(struct kvm_cpu_context *ctxt)
{
__sysreg_restore_state(ctxt);
+ __sysreg_restore_common_state(ctxt);
}

void __hyp_text __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt)
{
__sysreg_restore_state(ctxt);
+ __sysreg_restore_common_state(ctxt);
}

void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
--
2.1.4

Marc Zyngier

unread,
Jan 25, 2016, 10:55:05 AM1/25/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
Despite the fact that a VHE enabled kernel runs at EL2, it uses
CPACR_EL1 to trap FPSIMD access. Add the required alternative
code to re-enable guest FPSIMD access when it has trapped to
EL2.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/kvm/hyp/entry.S | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index fd0fbe9..759a0ec 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -130,9 +130,15 @@ ENDPROC(__guest_exit)
ENTRY(__fpsimd_guest_restore)
stp x4, lr, [sp, #-16]!

+alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
mrs x2, cptr_el2
bic x2, x2, #CPTR_EL2_TFP
msr cptr_el2, x2
+alternative_else
+ mrs x2, cpacr_el1
+ orr x2, x2, #(3 << 20)
+ msr cpacr_el1, x2
+alternative_endif
isb

mrs x3, tpidr_el2
--
2.1.4

Marc Zyngier

unread,
Jan 25, 2016, 10:55:15 AM1/25/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
As the kernel fully runs in HYP when VHE is enabled, we can
directly branch to the kernel's panic() implementation, and
not perform an exception return.

Add the alternative code to deal with this.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/kvm/hyp/switch.c | 35 +++++++++++++++++++++++++++--------
1 file changed, 27 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 77f7c94..0cadb7f 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -211,11 +211,34 @@ __alias(__guest_run) int __kvm_vcpu_run(struct kvm_vcpu *vcpu);

static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";

-void __hyp_text __noreturn __hyp_panic(void)
+static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par)
{
unsigned long str_va = (unsigned long)__hyp_panic_string;
- u64 spsr = read_sysreg(spsr_el2);
- u64 elr = read_sysreg(elr_el2);
+
+ __hyp_do_panic(hyp_kern_va(str_va),
+ spsr, elr,
+ read_sysreg(esr_el2), read_sysreg_el2(far),
+ read_sysreg(hpfar_el2), par,
+ (void *)read_sysreg(tpidr_el2));
+}
+
+static void __hyp_text __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par)
+{
+ panic(__hyp_panic_string,
+ spsr, elr,
+ read_sysreg_el2(esr), read_sysreg_el2(far),
+ read_sysreg(hpfar_el2), par,
+ (void *)read_sysreg(tpidr_el2));
+}
+
+static hyp_alternate_select(__hyp_call_panic,
+ __hyp_call_panic_nvhe, __hyp_call_panic_vhe,
+ ARM64_HAS_VIRT_HOST_EXTN);
+
+void __hyp_text __noreturn __hyp_panic(void)
+{
+ u64 spsr = read_sysreg_el2(spsr);
+ u64 elr = read_sysreg_el2(elr);
u64 par = read_sysreg(par_el1);

if (read_sysreg(vttbr_el2)) {
@@ -230,11 +253,7 @@ void __hyp_text __noreturn __hyp_panic(void)
}

/* Call panic for real */
- __hyp_do_panic(hyp_kern_va(str_va),
- spsr, elr,
- read_sysreg(esr_el2), read_sysreg(far_el2),
- read_sysreg(hpfar_el2), par,
- (void *)read_sysreg(tpidr_el2));
+ __hyp_call_panic()(spsr, elr, par);

unreachable();
}
--
2.1.4

Marc Zyngier

unread,
Jan 25, 2016, 10:56:23 AM1/25/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
We're now in a position where we can introduce VHE's minimal
save/restore, which is limited to the handful of shared sysregs.

Add the required alternative function calls that result in a
"do nothing" call on VHE, and the normal save/restore for non-VHE.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/kvm/hyp/sysreg-sr.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 7d7d757..36bbdec 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -23,6 +23,9 @@

#include "hyp.h"

+/* Yes, this does nothing, on purpose */
+static void __hyp_text __sysreg_do_nothing(struct kvm_cpu_context *ctxt) { }
+
/*
* Non-VHE: Both host and guest must save everything.
*
@@ -67,9 +70,13 @@ static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(spsr);
}

+static hyp_alternate_select(__sysreg_call_save_state,
+ __sysreg_save_state, __sysreg_do_nothing,
+ ARM64_HAS_VIRT_HOST_EXTN);
+
void __hyp_text __sysreg_save_host_state(struct kvm_cpu_context *ctxt)
{
- __sysreg_save_state(ctxt);
+ __sysreg_call_save_state()(ctxt);
__sysreg_save_common_state(ctxt);
}

@@ -116,9 +123,13 @@ static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],spsr);
}

+static hyp_alternate_select(__sysreg_call_restore_host_state,
+ __sysreg_restore_state, __sysreg_do_nothing,
+ ARM64_HAS_VIRT_HOST_EXTN);
+
void __hyp_text __sysreg_restore_host_state(struct kvm_cpu_context *ctxt)
{
- __sysreg_restore_state(ctxt);
+ __sysreg_call_restore_host_state()(ctxt);
__sysreg_restore_common_state(ctxt);
}

--
2.1.4

Marc Zyngier

unread,
Jan 25, 2016, 10:57:11 AM1/25/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
Having both VHE and non-VHE capable CPUs in the same system
is likely to be a recipe for disaster.

If the boot CPU has VHE, but a secondary is not, we won't be
able to downgrade and run the kernel at EL1. Add CPU hotplug
to the mix, and this produces a terrifying mess.

Let's solve the problem once and for all. If you mix VHE and
non-VHE CPUs in the same system, you deserve to loose, and this
patch makes sure you don't get a chance.

This is implemented by storing the kernel execution level in
a global variable. Secondaries will park themselves in a
WFI loop if they observe a mismatch. Also, the primary CPU
will detect that the secondary CPU has died on a mismatched
execution level. Panic will follow.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/include/asm/virt.h | 17 +++++++++++++++++
arch/arm64/kernel/head.S | 19 +++++++++++++++++++
arch/arm64/kernel/smp.c | 3 +++
3 files changed, 39 insertions(+)

diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index 9f22dd6..f81a345 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -36,6 +36,11 @@
*/
extern u32 __boot_cpu_mode[2];

+/*
+ * __run_cpu_mode records the mode the boot CPU uses for the kernel.
+ */
+extern u32 __run_cpu_mode[2];
+
void __hyp_set_vectors(phys_addr_t phys_vector_base);
phys_addr_t __hyp_get_vectors(void);

@@ -60,6 +65,18 @@ static inline bool is_kernel_in_hyp_mode(void)
return el == CurrentEL_EL2;
}

+static inline bool is_kernel_mode_mismatched(void)
+{
+ /*
+ * A mismatched CPU will have written its own CurrentEL in
+ * __run_cpu_mode[1] (initially set to zero) after failing to
+ * match the value in __run_cpu_mode[0]. Thus, a non-zero
+ * value in __run_cpu_mode[1] is enough to detect the
+ * pathological case.
+ */
+ return !!ACCESS_ONCE(__run_cpu_mode[1]);
+}
+
/* The section containing the hypervisor text */
extern char __hyp_text_start[];
extern char __hyp_text_end[];
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 2a7134c..bc44cf8 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -577,7 +577,23 @@ ENTRY(set_cpu_boot_mode_flag)
1: str w20, [x1] // This CPU has booted in EL1
dmb sy
dc ivac, x1 // Invalidate potentially stale cache line
+ adr_l x1, __run_cpu_mode
+ ldr w0, [x1]
+ mrs x20, CurrentEL
+ cbz x0, skip_el_check
+ cmp x0, x20
+ bne mismatched_el
+skip_el_check: // Only the first CPU gets to set the rule
+ str w20, [x1]
+ dmb sy
+ dc ivac, x1 // Invalidate potentially stale cache line
ret
+mismatched_el:
+ str w20, [x1, #4]
+ dmb sy
+ dc ivac, x1 // Invalidate potentially stale cache line
+1: wfi
+ b 1b
ENDPROC(set_cpu_boot_mode_flag)

/*
@@ -592,6 +608,9 @@ ENDPROC(set_cpu_boot_mode_flag)
ENTRY(__boot_cpu_mode)
.long BOOT_CPU_MODE_EL2
.long BOOT_CPU_MODE_EL1
+ENTRY(__run_cpu_mode)
+ .long 0
+ .long 0
.popsection

/*
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index b1adc51..bc7650a 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -113,6 +113,9 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
pr_crit("CPU%u: failed to come online\n", cpu);
ret = -EIO;
}
+
+ if (is_kernel_mode_mismatched())
+ panic("CPU%u: incompatible execution level", cpu);
} else {
pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
}
--
2.1.4

Marc Zyngier

unread,
Jan 25, 2016, 10:57:28 AM1/25/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
With ARMv8, host and guest share the same system register file,
making the save/restore procedure completely symetrical.
With VHE, host and guest now have different requirements, as they
use different sysregs.

In order to prepare for this, add split sysreg save/restore functions
for both host and guest. No functionnal change yet.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/kvm/hyp/hyp.h | 6 ++++--
arch/arm64/kvm/hyp/switch.c | 10 +++++-----
arch/arm64/kvm/hyp/sysreg-sr.c | 24 ++++++++++++++++++++++--
3 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
index 744c919..5dfa883 100644
--- a/arch/arm64/kvm/hyp/hyp.h
+++ b/arch/arm64/kvm/hyp/hyp.h
@@ -153,8 +153,10 @@ void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
void __timer_save_state(struct kvm_vcpu *vcpu);
void __timer_restore_state(struct kvm_vcpu *vcpu);

-void __sysreg_save_state(struct kvm_cpu_context *ctxt);
-void __sysreg_restore_state(struct kvm_cpu_context *ctxt);
+void __sysreg_save_host_state(struct kvm_cpu_context *ctxt);
+void __sysreg_restore_host_state(struct kvm_cpu_context *ctxt);
+void __sysreg_save_guest_state(struct kvm_cpu_context *ctxt);
+void __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt);
void __sysreg32_save_state(struct kvm_vcpu *vcpu);
void __sysreg32_restore_state(struct kvm_vcpu *vcpu);

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index ca8f5a5..9071dee 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -98,7 +98,7 @@ static int __hyp_text __guest_run(struct kvm_vcpu *vcpu)
host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
guest_ctxt = &vcpu->arch.ctxt;

- __sysreg_save_state(host_ctxt);
+ __sysreg_save_host_state(host_ctxt);
__debug_cond_save_host_state(vcpu);

__activate_traps(vcpu);
@@ -112,7 +112,7 @@ static int __hyp_text __guest_run(struct kvm_vcpu *vcpu)
* to Cortex-A57 erratum #852523.
*/
__sysreg32_restore_state(vcpu);
- __sysreg_restore_state(guest_ctxt);
+ __sysreg_restore_guest_state(guest_ctxt);
__debug_restore_state(vcpu, kern_hyp_va(vcpu->arch.debug_ptr), guest_ctxt);

/* Jump in the fire! */
@@ -121,7 +121,7 @@ static int __hyp_text __guest_run(struct kvm_vcpu *vcpu)

fp_enabled = __fpsimd_enabled();

- __sysreg_save_state(guest_ctxt);
+ __sysreg_save_guest_state(guest_ctxt);
__sysreg32_save_state(vcpu);
__timer_save_state(vcpu);
__vgic_save_state(vcpu);
@@ -129,7 +129,7 @@ static int __hyp_text __guest_run(struct kvm_vcpu *vcpu)
__deactivate_traps(vcpu);
__deactivate_vm(vcpu);

- __sysreg_restore_state(host_ctxt);
+ __sysreg_restore_host_state(host_ctxt);

if (fp_enabled) {
__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
@@ -161,7 +161,7 @@ void __hyp_text __noreturn __hyp_panic(void)
host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
__deactivate_traps(vcpu);
__deactivate_vm(vcpu);
- __sysreg_restore_state(host_ctxt);
+ __sysreg_restore_host_state(host_ctxt);
}

/* Call panic for real */
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 42563098..bd5b543 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -24,7 +24,7 @@
#include "hyp.h"

/* ctxt is already in the HYP VA space */
-void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
+static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
{
ctxt->sys_regs[MPIDR_EL1] = read_sysreg(vmpidr_el2);
ctxt->sys_regs[CSSELR_EL1] = read_sysreg(csselr_el1);
@@ -57,7 +57,17 @@ void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg(spsr_el1);
}

-void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
+void __hyp_text __sysreg_save_host_state(struct kvm_cpu_context *ctxt)
+{
+ __sysreg_save_state(ctxt);
+}
+
+void __hyp_text __sysreg_save_guest_state(struct kvm_cpu_context *ctxt)
+{
+ __sysreg_save_state(ctxt);
+}
+
+static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
{
write_sysreg(ctxt->sys_regs[MPIDR_EL1], vmpidr_el2);
write_sysreg(ctxt->sys_regs[CSSELR_EL1], csselr_el1);
@@ -90,6 +100,16 @@ void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
write_sysreg(ctxt->gp_regs.spsr[KVM_SPSR_EL1], spsr_el1);
}

+void __hyp_text __sysreg_restore_host_state(struct kvm_cpu_context *ctxt)
+{
+ __sysreg_restore_state(ctxt);
+}
+
+void __hyp_text __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt)
+{
+ __sysreg_restore_state(ctxt);
+}
+
void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
{
u64 *spsr, *sysreg;
--
2.1.4

Marc Zyngier

unread,
Jan 25, 2016, 10:57:35 AM1/25/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
With ARMv8.1 VHE, the architecture is able to (almost) transparently
run the kernel at EL2, despite being written for EL1.

This patch takes care of the "almost" part, mostly preventing the kernel
from dropping from EL2 to EL1, and setting up the HYP configuration.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/Kconfig | 13 +++++++++++++
arch/arm64/kernel/head.S | 32 +++++++++++++++++++++++++++++++-
2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 8cc6228..ada34df 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -750,6 +750,19 @@ config ARM64_LSE_ATOMICS
not support these instructions and requires the kernel to be
built with binutils >= 2.25.

+config ARM64_VHE
+ bool "Enable support for Virtualization Host Extension (VHE)"
+ default y
+ help
+ Virtualization Host Extension (VHE) allows the kernel to run
+ directly at EL2 (instead of EL1) on processors that support
+ it. This leads to better performance for KVM, as it reduces
+ the cost of the world switch.
+
+ Selecting this option allows the VHE feature to be detected
+ at runtime, and does not affect processors that do not
+ implement this feature.
+
endmenu

endmenu
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index ffe9c2b..2a7134c 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -30,6 +30,7 @@
#include <asm/cache.h>
#include <asm/cputype.h>
#include <asm/kernel-pgtable.h>
+#include <asm/kvm_mmu.h>
#include <asm/memory.h>
#include <asm/pgtable-hwdef.h>
#include <asm/pgtable.h>
@@ -464,8 +465,25 @@ CPU_LE( bic x0, x0, #(3 << 24) ) // Clear the EE and E0E bits for EL1
isb
ret

+2:
+#ifdef CONFIG_ARM64_VHE
+ /*
+ * Check for VHE being present. For the rest of the EL2 setup,
+ * x2 being non-zero indicates that we do have VHE, and that the
+ * kernel is intended to run at EL2.
+ */
+ mrs x2, id_aa64mmfr1_el1
+ ubfx x2, x2, #8, #4
+#else
+ mov x2, xzr
+#endif
+
/* Hyp configuration. */
-2: mov x0, #(1 << 31) // 64-bit EL1
+ mov x0, #HCR_RW // 64-bit EL1
+ cbz x2, set_hcr
+ orr x0, x0, #HCR_TGE // Enable Host Extensions
+ orr x0, x0, #HCR_E2H
+set_hcr:
msr hcr_el2, x0

/* Generic timers. */
@@ -507,6 +525,9 @@ CPU_LE( movk x0, #0x30d0, lsl #16 ) // Clear EE and E0E on LE systems

/* Coprocessor traps. */
mov x0, #0x33ff
+ cbz x2, set_cptr
+ orr x0, x0, #(3 << 20) // Don't trap FP
+set_cptr:
msr cptr_el2, x0 // Disable copro. traps to EL2

#ifdef CONFIG_COMPAT
@@ -521,6 +542,15 @@ CPU_LE( movk x0, #0x30d0, lsl #16 ) // Clear EE and E0E on LE systems
/* Stage-2 translation */
msr vttbr_el2, xzr

+ cbz x2, install_el2_stub
+
+ setup_vtcr x4, x5
+
+ mov w20, #BOOT_CPU_MODE_EL2 // This CPU booted in EL2
+ isb
+ ret
+
+install_el2_stub:
/* Hypervisor stub */
adrp x0, __hyp_stub_vectors
add x0, x0, #:lo12:__hyp_stub_vectors
--
2.1.4

Marc Zyngier

unread,
Jan 25, 2016, 10:58:00 AM1/25/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
The fault decoding process (including computing the IPA in the case
of a permission fault) would be much better done in C code, as we
have a reasonable infrastructure to deal with the VHE/non-VHE
differences.

Let's move the whole thing to C, including the workaround for
erratum 834220, and just patch the odd ESR_EL2 access remaining
in hyp-entry.S.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/kernel/asm-offsets.c | 3 --
arch/arm64/kvm/hyp/hyp-entry.S | 69 +++--------------------------------------
arch/arm64/kvm/hyp/switch.c | 54 ++++++++++++++++++++++++++++++++
3 files changed, 59 insertions(+), 67 deletions(-)

diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index fffa4ac6..b0ab4e9 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -110,9 +110,6 @@ int main(void)
DEFINE(CPU_USER_PT_REGS, offsetof(struct kvm_regs, regs));
DEFINE(CPU_FP_REGS, offsetof(struct kvm_regs, fp_regs));
DEFINE(VCPU_FPEXC32_EL2, offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2]));
- DEFINE(VCPU_ESR_EL2, offsetof(struct kvm_vcpu, arch.fault.esr_el2));
- DEFINE(VCPU_FAR_EL2, offsetof(struct kvm_vcpu, arch.fault.far_el2));
- DEFINE(VCPU_HPFAR_EL2, offsetof(struct kvm_vcpu, arch.fault.hpfar_el2));
DEFINE(VCPU_HOST_CONTEXT, offsetof(struct kvm_vcpu, arch.host_cpu_context));
#endif
#ifdef CONFIG_CPU_PM
diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index 9e0683f..213de52 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -19,7 +19,6 @@

#include <asm/alternative.h>
#include <asm/assembler.h>
-#include <asm/asm-offsets.h>
#include <asm/cpufeature.h>
#include <asm/kvm_arm.h>
#include <asm/kvm_asm.h>
@@ -67,7 +66,11 @@ ENDPROC(__vhe_hyp_call)
el1_sync: // Guest trapped into EL2
save_x0_to_x3

+alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
mrs x1, esr_el2
+alternative_else
+ mrs x1, esr_el1
+alternative_endif
lsr x2, x1, #ESR_ELx_EC_SHIFT

cmp x2, #ESR_ELx_EC_HVC64
@@ -103,72 +106,10 @@ el1_trap:
cmp x2, #ESR_ELx_EC_FP_ASIMD
b.eq __fpsimd_guest_restore

- cmp x2, #ESR_ELx_EC_DABT_LOW
- mov x0, #ESR_ELx_EC_IABT_LOW
- ccmp x2, x0, #4, ne
- b.ne 1f // Not an abort we care about
-
- /* This is an abort. Check for permission fault */
-alternative_if_not ARM64_WORKAROUND_834220
- and x2, x1, #ESR_ELx_FSC_TYPE
- cmp x2, #FSC_PERM
- b.ne 1f // Not a permission fault
-alternative_else
- nop // Use the permission fault path to
- nop // check for a valid S1 translation,
- nop // regardless of the ESR value.
-alternative_endif
-
- /*
- * Check for Stage-1 page table walk, which is guaranteed
- * to give a valid HPFAR_EL2.
- */
- tbnz x1, #7, 1f // S1PTW is set
-
- /* Preserve PAR_EL1 */
- mrs x3, par_el1
- stp x3, xzr, [sp, #-16]!
-
- /*
- * Permission fault, HPFAR_EL2 is invalid.
- * Resolve the IPA the hard way using the guest VA.
- * Stage-1 translation already validated the memory access rights.
- * As such, we can use the EL1 translation regime, and don't have
- * to distinguish between EL0 and EL1 access.
- */
- mrs x2, far_el2
- at s1e1r, x2
- isb
-
- /* Read result */
- mrs x3, par_el1
- ldp x0, xzr, [sp], #16 // Restore PAR_EL1 from the stack
- msr par_el1, x0
- tbnz x3, #0, 3f // Bail out if we failed the translation
- ubfx x3, x3, #12, #36 // Extract IPA
- lsl x3, x3, #4 // and present it like HPFAR
- b 2f
-
-1: mrs x3, hpfar_el2
- mrs x2, far_el2
-
-2: mrs x0, tpidr_el2
- str w1, [x0, #VCPU_ESR_EL2]
- str x2, [x0, #VCPU_FAR_EL2]
- str x3, [x0, #VCPU_HPFAR_EL2]
-
+ mrs x0, tpidr_el2
mov x1, #ARM_EXCEPTION_TRAP
b __guest_exit

- /*
- * Translation failed. Just return to the guest and
- * let it fault again. Another CPU is probably playing
- * behind our back.
- */
-3: restore_x0_to_x3
-
- eret
-
el1_irq:
save_x0_to_x3
mrs x0, tpidr_el2
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 0cadb7f..df2cce9 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -15,6 +15,7 @@
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/

+#include <linux/types.h>
#include <asm/kvm_asm.h>

#include "hyp.h"
@@ -150,6 +151,55 @@ static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
__vgic_call_restore_state()(vcpu);
}

+static hyp_alternate_value(__check_arm_834220,
+ false, true,
+ ARM64_WORKAROUND_834220);
+
+static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
+{
+ u64 esr = read_sysreg_el2(esr);
+ u8 ec = esr >> ESR_ELx_EC_SHIFT;
+ u64 hpfar, far;
+
+ vcpu->arch.fault.esr_el2 = esr;
+
+ if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
+ return true;
+
+ far = read_sysreg_el2(far);
+
+ if (!(esr & ESR_ELx_S1PTW) &&
+ (__check_arm_834220() || (esr & ESR_ELx_FSC_TYPE) == FSC_PERM)) {
+ u64 par, tmp;
+
+ /*
+ * Permission fault, HPFAR_EL2 is invalid. Resolve the
+ * IPA the hard way using the guest VA.
+ * Stage-1 translation already validated the memory
+ * access rights. As such, we can use the EL1
+ * translation regime, and don't have to distinguish
+ * between EL0 and EL1 access.
+ */
+ par = read_sysreg(par_el1);
+ asm volatile("at s1e1r, %0" : : "r" (far));
+ isb();
+
+ tmp = read_sysreg(par_el1);
+ write_sysreg(par, par_el1);
+
+ if (unlikely(tmp & 1))
+ return false; /* Translation failed, back to guest */
+
+ hpfar = ((tmp >> 12) & ((1UL << 36) - 1)) << 4;
+ } else {
+ hpfar = read_sysreg(hpfar_el2);
+ }
+
+ vcpu->arch.fault.far_el2 = far;
+ vcpu->arch.fault.hpfar_el2 = hpfar;
+ return true;
+}
+
static int __hyp_text __guest_run(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *host_ctxt;
@@ -181,9 +231,13 @@ static int __hyp_text __guest_run(struct kvm_vcpu *vcpu)
__debug_restore_state(vcpu, kern_hyp_va(vcpu->arch.debug_ptr), guest_ctxt);

/* Jump in the fire! */
+again:
exit_code = __guest_enter(vcpu, host_ctxt);
/* And we're baaack! */

+ if (exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu))
+ goto again;
+
fp_enabled = __fpsimd_enabled();

__sysreg_save_guest_state(guest_ctxt);
--
2.1.4

Marc Zyngier

unread,
Jan 25, 2016, 10:58:29 AM1/25/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
We already have hyp_alternate_select() to define a function pointer
that gets changed by a kernel feature or workaround.

It would be useful to have a similar feature that resolves in a
direct value, without requiring a function call. For this purpose,
introduce hyp_alternate_value(), which returns one of two values
depending on the state of the alternative.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/kvm/hyp/hyp.h | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
index 44eaff7..dc75fdb 100644
--- a/arch/arm64/kvm/hyp/hyp.h
+++ b/arch/arm64/kvm/hyp/hyp.h
@@ -144,6 +144,17 @@ typeof(orig) * __hyp_text fname(void) \
return val; \
}

+#define hyp_alternate_value(fname, orig, alt, cond) \
+typeof(orig) __hyp_text fname(void) \
+{ \
+ typeof(alt) val = orig; \
+ asm volatile(ALTERNATIVE("nop \n", \
+ "mov %0, %1 \n", \
+ cond) \
+ : "+r" (val) : "r" ((typeof(orig))alt)); \
+ return val; \
+}
+
void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);

--
2.1.4

Marc Zyngier

unread,
Jan 25, 2016, 10:59:02 AM1/25/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
With VHE, the host never issues an HVC instruction to get into the
KVM code, as we can simply branch there.

Use runtime code patching to simplify things a bit.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/kvm/hyp.S | 7 +++++++
arch/arm64/kvm/hyp/hyp-entry.S | 38 +++++++++++++++++++++++++++++---------
2 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 0ccdcbb..0689a74 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -17,7 +17,9 @@

#include <linux/linkage.h>

+#include <asm/alternative.h>
#include <asm/assembler.h>
+#include <asm/cpufeature.h>

/*
* u64 kvm_call_hyp(void *hypfn, ...);
@@ -38,6 +40,11 @@
* arch/arm64/kernel/hyp_stub.S.
*/
ENTRY(kvm_call_hyp)
+alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
hvc #0
ret
+alternative_else
+ b __vhe_hyp_call
+ nop
+alternative_endif
ENDPROC(kvm_call_hyp)
diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index 93e8d983..9e0683f 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -38,6 +38,32 @@
ldp x0, x1, [sp], #16
.endm

+.macro do_el2_call
+ /*
+ * Shuffle the parameters before calling the function
+ * pointed to in x0. Assumes parameters in x[1,2,3].
+ */
+ stp lr, xzr, [sp, #-16]!
+ mov lr, x0
+ mov x0, x1
+ mov x1, x2
+ mov x2, x3
+ blr lr
+ ldp lr, xzr, [sp], #16
+.endm
+
+ENTRY(__vhe_hyp_call)
+ do_el2_call
+ /*
+ * We used to rely on having an exception return to get
+ * an implicit isb. In the E2H case, we don't have it anymore.
+ * rather than changing all the leaf functions, just do it here
+ * before returning to the rest of the kernel.
+ */
+ isb
+ ret
+ENDPROC(__vhe_hyp_call)
+
el1_sync: // Guest trapped into EL2
save_x0_to_x3

@@ -58,19 +84,13 @@ el1_sync: // Guest trapped into EL2
mrs x0, vbar_el2
b 2f

-1: stp lr, xzr, [sp, #-16]!
-
+1:
/*
- * Compute the function address in EL2, and shuffle the parameters.
+ * Perform the EL2 call
*/
kern_hyp_va x0
- mov lr, x0
- mov x0, x1
- mov x1, x2
- mov x2, x3
- blr lr
+ do_el2_call

- ldp lr, xzr, [sp], #16
2: eret

el1_trap:
--
2.1.4

Marc Zyngier

unread,
Jan 25, 2016, 10:59:30 AM1/25/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
Switch the timer code to the unified sysreg accessors.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/kvm/hyp/timer-sr.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/timer-sr.c b/arch/arm64/kvm/hyp/timer-sr.c
index 1051e5d..f276d9e 100644
--- a/arch/arm64/kvm/hyp/timer-sr.c
+++ b/arch/arm64/kvm/hyp/timer-sr.c
@@ -31,12 +31,12 @@ void __hyp_text __timer_save_state(struct kvm_vcpu *vcpu)
u64 val;

if (kvm->arch.timer.enabled) {
- timer->cntv_ctl = read_sysreg(cntv_ctl_el0);
- timer->cntv_cval = read_sysreg(cntv_cval_el0);
+ timer->cntv_ctl = read_sysreg_el0(cntv_ctl);
+ timer->cntv_cval = read_sysreg_el0(cntv_cval);
}

/* Disable the virtual timer */
- write_sysreg(0, cntv_ctl_el0);
+ write_sysreg_el0(0, cntv_ctl);

/* Allow physical timer/counter access for the host */
val = read_sysreg(cnthctl_el2);
@@ -64,8 +64,8 @@ void __hyp_text __timer_restore_state(struct kvm_vcpu *vcpu)

if (kvm->arch.timer.enabled) {
write_sysreg(kvm->arch.timer.cntvoff, cntvoff_el2);
- write_sysreg(timer->cntv_cval, cntv_cval_el0);
+ write_sysreg_el0(timer->cntv_cval, cntv_cval);
isb();
- write_sysreg(timer->cntv_ctl, cntv_ctl_el0);
+ write_sysreg_el0(timer->cntv_ctl, cntv_ctl);
}
}
--
2.1.4

Marc Zyngier

unread,
Jan 25, 2016, 11:02:00 AM1/25/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
The kern_hyp_va macro is pretty meaninless with VHE, as there is
only one mapping - the kernel one.

In order to keep the code readable and efficient, use runtime
patching to replace the 'and' instruction used to compute the VA
with a 'nop'.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/include/asm/kvm_mmu.h | 11 ++++++++++-
arch/arm64/kvm/hyp/hyp.h | 25 ++++++++++++++++++++++---
2 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index d3e6d7b..62f0d14 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -23,13 +23,16 @@
#include <asm/cpufeature.h>

/*
- * As we only have the TTBR0_EL2 register, we cannot express
+ * As ARMv8.0 only has the TTBR0_EL2 register, we cannot express
* "negative" addresses. This makes it impossible to directly share
* mappings with the kernel.
*
* Instead, give the HYP mode its own VA region at a fixed offset from
* the kernel by just masking the top bits (which are all ones for a
* kernel address).
+ *
+ * ARMv8.1 (using VHE) does have a TTBR1_EL2, and doesn't use these
+ * macros (the entire kernel runs at EL2).
*/
#define HYP_PAGE_OFFSET_SHIFT VA_BITS
#define HYP_PAGE_OFFSET_MASK ((UL(1) << HYP_PAGE_OFFSET_SHIFT) - 1)
@@ -56,6 +59,8 @@

#ifdef __ASSEMBLY__

+#include <asm/alternative.h>
+#include <asm/cpufeature.h>
#include <asm/kvm_arm.h>

.macro setup_vtcr tmp1, tmp2
@@ -84,7 +89,11 @@
* reg: VA to be converted.
*/
.macro kern_hyp_va reg
+alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
and \reg, \reg, #HYP_PAGE_OFFSET_MASK
+alternative_else
+ nop
+alternative_endif
.endm

#else
diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
index fb27517..fc502f3 100644
--- a/arch/arm64/kvm/hyp/hyp.h
+++ b/arch/arm64/kvm/hyp/hyp.h
@@ -25,9 +25,28 @@

#define __hyp_text __section(.hyp.text) notrace

-#define kern_hyp_va(v) (typeof(v))((unsigned long)(v) & HYP_PAGE_OFFSET_MASK)
-#define hyp_kern_va(v) (typeof(v))((unsigned long)(v) - HYP_PAGE_OFFSET \
- + PAGE_OFFSET)
+static inline unsigned long __kern_hyp_va(unsigned long v)
+{
+ asm volatile(ALTERNATIVE("and %0, %0, %1",
+ "nop",
+ ARM64_HAS_VIRT_HOST_EXTN)
+ : "+r" (v) : "i" (HYP_PAGE_OFFSET_MASK));
+ return v;
+}
+
+#define kern_hyp_va(v) (typeof(v))(__kern_hyp_va((unsigned long)(v)))
+
+static inline unsigned long __hyp_kern_va(unsigned long v)
+{
+ u64 offset = PAGE_OFFSET - HYP_PAGE_OFFSET;
+ asm volatile(ALTERNATIVE("add %0, %0, %1",
+ "nop",
+ ARM64_HAS_VIRT_HOST_EXTN)
+ : "+r" (v) : "r" (offset));
+ return v;
+}
+
+#define hyp_kern_va(v) (typeof(v))(__hyp_kern_va((unsigned long)(v)))

/**
* hyp_alternate_select - Generates patchable code sequences that are
--
2.1.4

Marc Zyngier

unread,
Jan 25, 2016, 11:02:06 AM1/25/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
Running the kernel in HYP mode requires the HCR_E2H bit to be set
at all times, and the HCR_TGE bit to be set when running as a host
(and cleared when running as a guest). At the same time, the vector
must be set to the current role of the kernel (either host or
hypervisor), and a couple of system registers differ between VHE
and non-VHE.

We implement these by using another set of alternate functions
that get dynamically patched.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/include/asm/kvm_arm.h | 1 +
arch/arm64/include/asm/kvm_emulate.h | 3 +++
arch/arm64/kvm/hyp/switch.c | 52 +++++++++++++++++++++++++++++++++---
3 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 738a95f..73d3826 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -23,6 +23,7 @@
#include <asm/types.h>

/* Hyp Configuration Register (HCR) bits */
+#define HCR_E2H (UL(1) << 34)
#define HCR_ID (UL(1) << 33)
#define HCR_CD (UL(1) << 32)
#define HCR_RW_SHIFT 31
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 3066328..5ae0c69 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -29,6 +29,7 @@
#include <asm/kvm_mmio.h>
#include <asm/ptrace.h>
#include <asm/cputype.h>
+#include <asm/virt.h>

unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num);
unsigned long *vcpu_spsr32(const struct kvm_vcpu *vcpu);
@@ -43,6 +44,8 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
{
vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
+ if (is_kernel_in_hyp_mode())
+ vcpu->arch.hcr_el2 |= HCR_E2H;
if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features))
vcpu->arch.hcr_el2 &= ~HCR_RW;
}
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 6f264dc..77f7c94 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -15,6 +15,8 @@
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/

+#include <asm/kvm_asm.h>
+
#include "hyp.h"

static bool __hyp_text __fpsimd_enabled_nvhe(void)
@@ -36,6 +38,27 @@ bool __hyp_text __fpsimd_enabled(void)
return __fpsimd_is_enabled()();
}

+static void __hyp_text __activate_traps_vhe(void)
+{
+ u64 val;
+
+ val = read_sysreg(cpacr_el1);
+ val |= 1 << 28;
+ val &= ~(3 << 20);
+ write_sysreg(val, cpacr_el1);
+
+ write_sysreg(__kvm_hyp_vector, vbar_el1);
+}
+
+static void __hyp_text __activate_traps_nvhe(void)
+{
+ write_sysreg(CPTR_EL2_TTA | CPTR_EL2_TFP, cptr_el2);
+}
+
+static hyp_alternate_select(__activate_traps_arch,
+ __activate_traps_nvhe, __activate_traps_vhe,
+ ARM64_HAS_VIRT_HOST_EXTN);
+
static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
{
u64 val;
@@ -55,16 +78,39 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
write_sysreg(val, hcr_el2);
/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
write_sysreg(1 << 15, hstr_el2);
- write_sysreg(CPTR_EL2_TTA | CPTR_EL2_TFP, cptr_el2);
write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
+ __activate_traps_arch()();
}

-static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
+static void __hyp_text __deactivate_traps_vhe(void)
+{
+ extern char vectors[]; /* kernel exception vectors */
+ u64 val;
+
+ write_sysreg(HCR_RW | HCR_TGE | HCR_E2H, hcr_el2);
+
+ val = read_sysreg(cpacr_el1);
+ val |= 3 << 20;
+ write_sysreg(val, cpacr_el1);
+
+ write_sysreg(vectors, vbar_el1);
+}
+
+static void __hyp_text __deactivate_traps_nvhe(void)
{
write_sysreg(HCR_RW, hcr_el2);
+ write_sysreg(0, cptr_el2);
+}
+
+static hyp_alternate_select(__deactivate_traps_arch,
+ __deactivate_traps_nvhe, __deactivate_traps_vhe,
+ ARM64_HAS_VIRT_HOST_EXTN);
+
+static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
+{
+ __deactivate_traps_arch()();
write_sysreg(0, hstr_el2);
write_sysreg(read_sysreg(mdcr_el2) & MDCR_EL2_HPMN_MASK, mdcr_el2);
- write_sysreg(0, cptr_el2);
}

static void __hyp_text __activate_vm(struct kvm_vcpu *vcpu)
--
2.1.4

Marc Zyngier

unread,
Jan 25, 2016, 11:02:51 AM1/25/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
VHE brings its own bag of new system registers, or rather system
register accessors, as it define new ways to access both guest
and host system registers. For example, from the host:

- The host TCR_EL2 register is accessed using the TCR_EL1 accessor
- The guest TCR_EL1 register is accessed using the TCR_EL12 accessor

Obviously, this is confusing. A way to somehow reduce the complexity
of writing code for both ARMv8 and ARMv8.1 is to use a set of unified
accessors that will generate the right sysreg, depending on the mode
the CPU is running in. For example:

- read_sysreg_el1(tcr) will use TCR_EL1 on ARMv8, and TCR_EL12 on
ARMv8.1 with VHE.
- read_sysreg_el2(tcr) will use TCR_EL2 on ARMv8, and TCR_EL1 on
ARMv8.1 with VHE.

We end up with three sets of accessors ({read,write}_sysreg_el[012])
that can be directly used from C code. We take this opportunity to
also add the definition for the new VHE sysregs.(

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/kvm/hyp/hyp.h | 72 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 72 insertions(+)

diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
index fc502f3..744c919 100644
--- a/arch/arm64/kvm/hyp/hyp.h
+++ b/arch/arm64/kvm/hyp/hyp.h
@@ -48,6 +48,78 @@ static inline unsigned long __hyp_kern_va(unsigned long v)

#define hyp_kern_va(v) (typeof(v))(__hyp_kern_va((unsigned long)(v)))

+#define read_sysreg_elx(r,nvh,vh) \
+ ({ \
+ u64 reg; \
+ asm volatile(ALTERNATIVE("mrs %0, " __stringify(r##nvh),\
+ "mrs_s %0, " __stringify(r##vh),\
+ ARM64_HAS_VIRT_HOST_EXTN) \
+ : "=r" (reg)); \
+ reg; \
+ })
+
+#define write_sysreg_elx(v,r,nvh,vh) \
+ do { \
+ u64 __val = (u64)(v); \
+ asm volatile(ALTERNATIVE("msr " __stringify(r##nvh) ", %x0",\
+ "msr_s " __stringify(r##vh) ", %x0",\
+ ARM64_HAS_VIRT_HOST_EXTN) \
+ : : "rZ" (__val)); \
+ } while (0)
+
+/*
+ * Unified accessors for registers that have a different encoding
+ * between VHE and non-VHE. They must be specified without their "ELx"
+ * encoding.
+ */
+#define read_sysreg_el2(r) \
+ ({ \
+ u64 reg; \
+ asm volatile(ALTERNATIVE("mrs %0, " __stringify(r##_EL2),\
+ "mrs %0, " __stringify(r##_EL1),\
+ ARM64_HAS_VIRT_HOST_EXTN) \
+ : "=r" (reg)); \
+ reg; \
+ })
+
+#define write_sysreg_el2(v,r) \
+ do { \
+ u64 __val = (u64)(v); \
+ asm volatile(ALTERNATIVE("msr " __stringify(r##_EL2) ", %x0",\
+ "msr " __stringify(r##_EL1) ", %x0",\
+ ARM64_HAS_VIRT_HOST_EXTN) \
+ : : "rZ" (__val)); \
+ } while (0)
+
+#define read_sysreg_el0(r) read_sysreg_elx(r, _EL0, _EL02)
+#define write_sysreg_el0(v,r) write_sysreg_elx(v, r, _EL0, _EL02)
+#define read_sysreg_el1(r) read_sysreg_elx(r, _EL1, _EL12)
+#define write_sysreg_el1(v,r) write_sysreg_elx(v, r, _EL1, _EL12)
+
+/* The VHE specific system registers and their encoding */
+#define sctlr_EL12 sys_reg(3, 5, 1, 0, 0)
+#define cpacr_EL12 sys_reg(3, 5, 1, 0, 2)
+#define ttbr0_EL12 sys_reg(3, 5, 2, 0, 0)
+#define ttbr1_EL12 sys_reg(3, 5, 2, 0, 1)
+#define tcr_EL12 sys_reg(3, 5, 2, 0, 2)
+#define afsr0_EL12 sys_reg(3, 5, 5, 1, 0)
+#define afsr1_EL12 sys_reg(3, 5, 5, 1, 1)
+#define esr_EL12 sys_reg(3, 5, 5, 2, 0)
+#define far_EL12 sys_reg(3, 5, 6, 0, 0)
+#define mair_EL12 sys_reg(3, 5, 10, 2, 0)
+#define amair_EL12 sys_reg(3, 5, 10, 3, 0)
+#define vbar_EL12 sys_reg(3, 5, 12, 0, 0)
+#define contextidr_EL12 sys_reg(3, 5, 13, 0, 1)
+#define cntkctl_EL12 sys_reg(3, 5, 14, 1, 0)
+#define cntp_tval_EL02 sys_reg(3, 5, 14, 2, 0)
+#define cntp_ctl_EL02 sys_reg(3, 5, 14, 2, 1)
+#define cntp_cval_EL02 sys_reg(3, 5, 14, 2, 2)
+#define cntv_tval_EL02 sys_reg(3, 5, 14, 3, 0)
+#define cntv_ctl_EL02 sys_reg(3, 5, 14, 3, 1)
+#define cntv_cval_EL02 sys_reg(3, 5, 14, 3, 2)
+#define spsr_EL12 sys_reg(3, 5, 4, 0, 0)
+#define elr_EL12 sys_reg(3, 5, 4, 0, 1)
+
/**
* hyp_alternate_select - Generates patchable code sequences that are
* used to switch between two implementations of a function, depending
--
2.1.4

Marc Zyngier

unread,
Jan 25, 2016, 11:02:54 AM1/25/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
As non-VHE and VHE have different ways to express the trapping of
FPSIMD registers to EL2, make __fpsimd_enabled a patchable predicate
and provide a VHE implementation.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/kvm/hyp/hyp.h | 5 +----
arch/arm64/kvm/hyp/switch.c | 19 +++++++++++++++++++
2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
index 5dfa883..44eaff7 100644
--- a/arch/arm64/kvm/hyp/hyp.h
+++ b/arch/arm64/kvm/hyp/hyp.h
@@ -171,10 +171,7 @@ void __debug_cond_restore_host_state(struct kvm_vcpu *vcpu);

void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
-static inline bool __fpsimd_enabled(void)
-{
- return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
-}
+bool __fpsimd_enabled(void);

u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt);
void __noreturn __hyp_do_panic(unsigned long, ...);
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 9071dee..6f264dc 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -17,6 +17,25 @@

#include "hyp.h"

+static bool __hyp_text __fpsimd_enabled_nvhe(void)
+{
+ return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
+}
+
+static bool __hyp_text __fpsimd_enabled_vhe(void)
+{
+ return !!(read_sysreg(cpacr_el1) & (3 << 20));
+}
+
+static hyp_alternate_select(__fpsimd_is_enabled,
+ __fpsimd_enabled_nvhe, __fpsimd_enabled_vhe,
+ ARM64_HAS_VIRT_HOST_EXTN);
+
+bool __hyp_text __fpsimd_enabled(void)
+{
+ return __fpsimd_is_enabled()();
+}
+
static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
{
u64 val;
--
2.1.4

Marc Zyngier

unread,
Jan 25, 2016, 11:05:04 AM1/25/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
Use the recently introduced unified system register accessors for
those sysregs that behave differently depending on VHE being in
use or not.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/kvm/hyp/sysreg-sr.c | 84 +++++++++++++++++++++---------------------
1 file changed, 42 insertions(+), 42 deletions(-)

diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 61bad17..7d7d757 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -37,34 +37,34 @@ static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
ctxt->sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0);
ctxt->sys_regs[TPIDR_EL1] = read_sysreg(tpidr_el1);
ctxt->gp_regs.regs.sp = read_sysreg(sp_el0);
- ctxt->gp_regs.regs.pc = read_sysreg(elr_el2);
- ctxt->gp_regs.regs.pstate = read_sysreg(spsr_el2);
+ ctxt->gp_regs.regs.pc = read_sysreg_el2(elr);
+ ctxt->gp_regs.regs.pstate = read_sysreg_el2(spsr);
}

static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
{
ctxt->sys_regs[MPIDR_EL1] = read_sysreg(vmpidr_el2);
ctxt->sys_regs[CSSELR_EL1] = read_sysreg(csselr_el1);
- ctxt->sys_regs[SCTLR_EL1] = read_sysreg(sctlr_el1);
- ctxt->sys_regs[CPACR_EL1] = read_sysreg(cpacr_el1);
- ctxt->sys_regs[TTBR0_EL1] = read_sysreg(ttbr0_el1);
- ctxt->sys_regs[TTBR1_EL1] = read_sysreg(ttbr1_el1);
- ctxt->sys_regs[TCR_EL1] = read_sysreg(tcr_el1);
- ctxt->sys_regs[ESR_EL1] = read_sysreg(esr_el1);
- ctxt->sys_regs[AFSR0_EL1] = read_sysreg(afsr0_el1);
- ctxt->sys_regs[AFSR1_EL1] = read_sysreg(afsr1_el1);
- ctxt->sys_regs[FAR_EL1] = read_sysreg(far_el1);
- ctxt->sys_regs[MAIR_EL1] = read_sysreg(mair_el1);
- ctxt->sys_regs[VBAR_EL1] = read_sysreg(vbar_el1);
- ctxt->sys_regs[CONTEXTIDR_EL1] = read_sysreg(contextidr_el1);
- ctxt->sys_regs[AMAIR_EL1] = read_sysreg(amair_el1);
- ctxt->sys_regs[CNTKCTL_EL1] = read_sysreg(cntkctl_el1);
+ ctxt->sys_regs[SCTLR_EL1] = read_sysreg_el1(sctlr);
+ ctxt->sys_regs[CPACR_EL1] = read_sysreg_el1(cpacr);
+ ctxt->sys_regs[TTBR0_EL1] = read_sysreg_el1(ttbr0);
+ ctxt->sys_regs[TTBR1_EL1] = read_sysreg_el1(ttbr1);
+ ctxt->sys_regs[TCR_EL1] = read_sysreg_el1(tcr);
+ ctxt->sys_regs[ESR_EL1] = read_sysreg_el1(esr);
+ ctxt->sys_regs[AFSR0_EL1] = read_sysreg_el1(afsr0);
+ ctxt->sys_regs[AFSR1_EL1] = read_sysreg_el1(afsr1);
+ ctxt->sys_regs[FAR_EL1] = read_sysreg_el1(far);
+ ctxt->sys_regs[MAIR_EL1] = read_sysreg_el1(mair);
+ ctxt->sys_regs[VBAR_EL1] = read_sysreg_el1(vbar);
+ ctxt->sys_regs[CONTEXTIDR_EL1] = read_sysreg_el1(contextidr);
+ ctxt->sys_regs[AMAIR_EL1] = read_sysreg_el1(amair);
+ ctxt->sys_regs[CNTKCTL_EL1] = read_sysreg_el1(cntkctl);
ctxt->sys_regs[PAR_EL1] = read_sysreg(par_el1);
ctxt->sys_regs[MDSCR_EL1] = read_sysreg(mdscr_el1);

ctxt->gp_regs.sp_el1 = read_sysreg(sp_el1);
- ctxt->gp_regs.elr_el1 = read_sysreg(elr_el1);
- ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg(spsr_el1);
+ ctxt->gp_regs.elr_el1 = read_sysreg_el1(elr);
+ ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(spsr);
}

void __hyp_text __sysreg_save_host_state(struct kvm_cpu_context *ctxt)
@@ -86,34 +86,34 @@ static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctx
write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], tpidrro_el0);
write_sysreg(ctxt->sys_regs[TPIDR_EL1], tpidr_el1);
write_sysreg(ctxt->gp_regs.regs.sp, sp_el0);
- write_sysreg(ctxt->gp_regs.regs.pc, elr_el2);
- write_sysreg(ctxt->gp_regs.regs.pstate, spsr_el2);
+ write_sysreg_el2(ctxt->gp_regs.regs.pc, elr);
+ write_sysreg_el2(ctxt->gp_regs.regs.pstate, spsr);
}

static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
{
- write_sysreg(ctxt->sys_regs[MPIDR_EL1], vmpidr_el2);
- write_sysreg(ctxt->sys_regs[CSSELR_EL1], csselr_el1);
- write_sysreg(ctxt->sys_regs[SCTLR_EL1], sctlr_el1);
- write_sysreg(ctxt->sys_regs[CPACR_EL1], cpacr_el1);
- write_sysreg(ctxt->sys_regs[TTBR0_EL1], ttbr0_el1);
- write_sysreg(ctxt->sys_regs[TTBR1_EL1], ttbr1_el1);
- write_sysreg(ctxt->sys_regs[TCR_EL1], tcr_el1);
- write_sysreg(ctxt->sys_regs[ESR_EL1], esr_el1);
- write_sysreg(ctxt->sys_regs[AFSR0_EL1], afsr0_el1);
- write_sysreg(ctxt->sys_regs[AFSR1_EL1], afsr1_el1);
- write_sysreg(ctxt->sys_regs[FAR_EL1], far_el1);
- write_sysreg(ctxt->sys_regs[MAIR_EL1], mair_el1);
- write_sysreg(ctxt->sys_regs[VBAR_EL1], vbar_el1);
- write_sysreg(ctxt->sys_regs[CONTEXTIDR_EL1], contextidr_el1);
- write_sysreg(ctxt->sys_regs[AMAIR_EL1], amair_el1);
- write_sysreg(ctxt->sys_regs[CNTKCTL_EL1], cntkctl_el1);
- write_sysreg(ctxt->sys_regs[PAR_EL1], par_el1);
- write_sysreg(ctxt->sys_regs[MDSCR_EL1], mdscr_el1);
-
- write_sysreg(ctxt->gp_regs.sp_el1, sp_el1);
- write_sysreg(ctxt->gp_regs.elr_el1, elr_el1);
- write_sysreg(ctxt->gp_regs.spsr[KVM_SPSR_EL1], spsr_el1);
+ write_sysreg(ctxt->sys_regs[MPIDR_EL1], vmpidr_el2);
+ write_sysreg(ctxt->sys_regs[CSSELR_EL1], csselr_el1);
+ write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1], sctlr);
+ write_sysreg_el1(ctxt->sys_regs[CPACR_EL1], cpacr);
+ write_sysreg_el1(ctxt->sys_regs[TTBR0_EL1], ttbr0);
+ write_sysreg_el1(ctxt->sys_regs[TTBR1_EL1], ttbr1);
+ write_sysreg_el1(ctxt->sys_regs[TCR_EL1], tcr);
+ write_sysreg_el1(ctxt->sys_regs[ESR_EL1], esr);
+ write_sysreg_el1(ctxt->sys_regs[AFSR0_EL1], afsr0);
+ write_sysreg_el1(ctxt->sys_regs[AFSR1_EL1], afsr1);
+ write_sysreg_el1(ctxt->sys_regs[FAR_EL1], far);
+ write_sysreg_el1(ctxt->sys_regs[MAIR_EL1], mair);
+ write_sysreg_el1(ctxt->sys_regs[VBAR_EL1], vbar);
+ write_sysreg_el1(ctxt->sys_regs[CONTEXTIDR_EL1],contextidr);
+ write_sysreg_el1(ctxt->sys_regs[AMAIR_EL1], amair);
+ write_sysreg_el1(ctxt->sys_regs[CNTKCTL_EL1], cntkctl);
+ write_sysreg(ctxt->sys_regs[PAR_EL1], par_el1);
+ write_sysreg(ctxt->sys_regs[MDSCR_EL1], mdscr_el1);
+
+ write_sysreg(ctxt->gp_regs.sp_el1, sp_el1);
+ write_sysreg_el1(ctxt->gp_regs.elr_el1, elr);
+ write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],spsr);
}

void __hyp_text __sysreg_restore_host_state(struct kvm_cpu_context *ctxt)
--
2.1.4

Marc Zyngier

unread,
Jan 25, 2016, 11:06:50 AM1/25/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
With the kernel running at EL2, there is no point trying to
configure page tables for HYP, as the kernel is already mapped.

Take this opportunity to refactor the whole init a bit, allowing
the various parts of the hypervisor bringup to be split across
multiple functions.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm/kvm/arm.c | 151 +++++++++++++++++++++++++++++++++--------------------
arch/arm/kvm/mmu.c | 7 +++
2 files changed, 100 insertions(+), 58 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index dda1959..66e2d04 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -1035,6 +1035,67 @@ static inline void hyp_cpu_pm_init(void)
}
#endif

+static void teardown_common_resources(void)
+{
+ free_percpu(kvm_host_cpu_state);
+}
+
+static int init_common_resources(void)
+{
+ kvm_host_cpu_state = alloc_percpu(kvm_cpu_context_t);
+ if (!kvm_host_cpu_state) {
+ kvm_err("Cannot allocate host CPU state\n");
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+static int init_subsystems(void)
+{
+ int err;
+
+ /*
+ * Init HYP view of VGIC
+ */
+ err = kvm_vgic_hyp_init();
+ switch (err) {
+ case 0:
+ vgic_present = true;
+ break;
+ case -ENODEV:
+ case -ENXIO:
+ vgic_present = false;
+ break;
+ default:
+ return err;
+ }
+
+ /*
+ * Init HYP architected timer support
+ */
+ err = kvm_timer_hyp_init();
+ if (err)
+ return err;
+
+ kvm_perf_init();
+ kvm_coproc_table_init();
+
+ return 0;
+}
+
+static void teardown_hyp_mode(void)
+{
+ int cpu;
+
+ if (is_kernel_in_hyp_mode())
+ return;
+
+ free_hyp_pgds();
+ for_each_possible_cpu(cpu)
+ free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
+}
+
/**
* Inits Hyp-mode on all online CPUs
*/
@@ -1043,6 +1104,9 @@ static int init_hyp_mode(void)
int cpu;
int err = 0;

+ if (is_kernel_in_hyp_mode())
+ return 0;
+
/*
* Allocate Hyp PGD and setup Hyp identity mapping
*/
@@ -1065,7 +1129,7 @@ static int init_hyp_mode(void)
stack_page = __get_free_page(GFP_KERNEL);
if (!stack_page) {
err = -ENOMEM;
- goto out_free_stack_pages;
+ goto out_err;
}

per_cpu(kvm_arm_hyp_stack_page, cpu) = stack_page;
@@ -1077,13 +1141,13 @@ static int init_hyp_mode(void)
err = create_hyp_mappings(__kvm_hyp_code_start, __kvm_hyp_code_end);
if (err) {
kvm_err("Cannot map world-switch code\n");
- goto out_free_mappings;
+ goto out_err;
}

err = create_hyp_mappings(__start_rodata, __end_rodata);
if (err) {
kvm_err("Cannot map rodata section\n");
- goto out_free_mappings;
+ goto out_err;
}

/*
@@ -1095,20 +1159,10 @@ static int init_hyp_mode(void)

if (err) {
kvm_err("Cannot map hyp stack\n");
- goto out_free_mappings;
+ goto out_err;
}
}

- /*
- * Map the host CPU structures
- */
- kvm_host_cpu_state = alloc_percpu(kvm_cpu_context_t);
- if (!kvm_host_cpu_state) {
- err = -ENOMEM;
- kvm_err("Cannot allocate host CPU state\n");
- goto out_free_mappings;
- }
-
for_each_possible_cpu(cpu) {
kvm_cpu_context_t *cpu_ctxt;

@@ -1117,7 +1171,7 @@ static int init_hyp_mode(void)

if (err) {
kvm_err("Cannot map host CPU state: %d\n", err);
- goto out_free_context;
+ goto out_err;
}
}

@@ -1126,34 +1180,22 @@ static int init_hyp_mode(void)
*/
on_each_cpu(cpu_init_hyp_mode, NULL, 1);

- /*
- * Init HYP view of VGIC
- */
- err = kvm_vgic_hyp_init();
- switch (err) {
- case 0:
- vgic_present = true;
- break;
- case -ENODEV:
- case -ENXIO:
- vgic_present = false;
- break;
- default:
- goto out_free_context;
- }
-
- /*
- * Init HYP architected timer support
- */
- err = kvm_timer_hyp_init();
- if (err)
- goto out_free_context;
-
#ifndef CONFIG_HOTPLUG_CPU
free_boot_hyp_pgd();
#endif

- kvm_perf_init();
+ cpu_notifier_register_begin();
+
+ err = __register_cpu_notifier(&hyp_init_cpu_nb);
+
+ cpu_notifier_register_done();
+
+ if (err) {
+ kvm_err("Cannot register HYP init CPU notifier (%d)\n", err);
+ goto out_err;
+ }
+
+ hyp_cpu_pm_init();

/* set size of VMID supported by CPU */
kvm_vmid_bits = kvm_get_vmid_bits();
@@ -1162,14 +1204,9 @@ static int init_hyp_mode(void)
kvm_info("Hyp mode initialized successfully\n");

return 0;
-out_free_context:
- free_percpu(kvm_host_cpu_state);
-out_free_mappings:
- free_hyp_pgds();
-out_free_stack_pages:
- for_each_possible_cpu(cpu)
- free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
+
out_err:
+ teardown_hyp_mode();
kvm_err("error initializing Hyp mode: %d\n", err);
return err;
}
@@ -1213,26 +1250,24 @@ int kvm_arch_init(void *opaque)
}
}

- cpu_notifier_register_begin();
+ err = init_common_resources();
+ if (err)
+ return err;

err = init_hyp_mode();
if (err)
goto out_err;

- err = __register_cpu_notifier(&hyp_init_cpu_nb);
- if (err) {
- kvm_err("Cannot register HYP init CPU notifier (%d)\n", err);
- goto out_err;
- }
-
- cpu_notifier_register_done();
-
- hyp_cpu_pm_init();
+ err = init_subsystems();
+ if (err)
+ goto out_hyp;

- kvm_coproc_table_init();
return 0;
+
+out_hyp:
+ teardown_hyp_mode();
out_err:
- cpu_notifier_register_done();
+ teardown_common_resources();
return err;
}

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index aba61fd..920d0c3 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -28,6 +28,7 @@
#include <asm/kvm_mmio.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_emulate.h>
+#include <asm/virt.h>

#include "trace.h"

@@ -598,6 +599,9 @@ int create_hyp_mappings(void *from, void *to)
unsigned long start = KERN_TO_HYP((unsigned long)from);
unsigned long end = KERN_TO_HYP((unsigned long)to);

+ if (is_kernel_in_hyp_mode())
+ return 0;
+
start = start & PAGE_MASK;
end = PAGE_ALIGN(end);

@@ -630,6 +634,9 @@ int create_hyp_io_mappings(void *from, void *to, phys_addr_t phys_addr)
unsigned long start = KERN_TO_HYP((unsigned long)from);
unsigned long end = KERN_TO_HYP((unsigned long)to);

+ if (is_kernel_in_hyp_mode())
+ return 0;
+
/* Check for a valid kernel IO mapping */
if (!is_vmalloc_addr(from) || !is_vmalloc_addr(to - 1))
return -EINVAL;
--
2.1.4

Marc Zyngier

unread,
Jan 25, 2016, 11:07:09 AM1/25/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
Add a new ARM64_HAS_VIRT_HOST_EXTN features to indicate that the
CPU has the ARMv8.1 VHE capability.

This will be used to trigger kernel patching in KVM.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/include/asm/cpufeature.h | 3 ++-
arch/arm64/kernel/cpufeature.c | 15 +++++++++++++--
2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 8f271b8..c705d6a 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -30,8 +30,9 @@
#define ARM64_HAS_LSE_ATOMICS 5
#define ARM64_WORKAROUND_CAVIUM_23154 6
#define ARM64_WORKAROUND_834220 7
+#define ARM64_HAS_VIRT_HOST_EXTN 8

-#define ARM64_NCAPS 8
+#define ARM64_NCAPS 9

#ifndef __ASSEMBLY__

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 5c90aa4..8d3961e 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -26,6 +26,9 @@
#include <asm/cpu_ops.h>
#include <asm/processor.h>
#include <asm/sysreg.h>
+#include <asm/virt.h>
+
+#include <linux/irqchip/arm-gic-v3.h>

unsigned long elf_hwcap __read_mostly;
EXPORT_SYMBOL_GPL(elf_hwcap);
@@ -587,8 +590,6 @@ u64 read_system_reg(u32 id)
return regp->sys_val;
}

-#include <linux/irqchip/arm-gic-v3.h>
-
static bool
feature_matches(u64 reg, const struct arm64_cpu_capabilities *entry)
{
@@ -621,6 +622,11 @@ static bool has_useable_gicv3_cpuif(const struct arm64_cpu_capabilities *entry)
return has_sre;
}

+static bool runs_at_el2(const struct arm64_cpu_capabilities *entry)
+{
+ return is_kernel_in_hyp_mode();
+}
+
static const struct arm64_cpu_capabilities arm64_features[] = {
{
.desc = "GIC system register CPU interface",
@@ -651,6 +657,11 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
.min_field_value = 2,
},
#endif /* CONFIG_AS_LSE && CONFIG_ARM64_LSE_ATOMICS */
+ {
+ .desc = "Virtualization Host Extensions",
+ .capability = ARM64_HAS_VIRT_HOST_EXTN,
+ .matches = runs_at_el2,
+ },
{},
};

--
2.1.4

Marc Zyngier

unread,
Jan 25, 2016, 11:08:19 AM1/25/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
With the ARMv8.1 VHE, the kernel can run in HYP mode, and thus
use the HYP timer instead of the normal guest timer in a mostly
transparent way, except for the interrupt line.

This patch reworks the arch timer code to allow the selection of
the HYP PPI, possibly falling back to the guest timer if not
available.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
drivers/clocksource/arm_arch_timer.c | 96 ++++++++++++++++++++++--------------
1 file changed, 59 insertions(+), 37 deletions(-)

diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index c64d543..ffe9d1c 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -67,7 +67,7 @@ static int arch_timer_ppi[MAX_TIMER_PPI];

static struct clock_event_device __percpu *arch_timer_evt;

-static bool arch_timer_use_virtual = true;
+static enum ppi_nr arch_timer_uses_ppi = VIRT_PPI;
static bool arch_timer_c3stop;
static bool arch_timer_mem_use_virtual;

@@ -263,14 +263,20 @@ static void __arch_timer_setup(unsigned type,
clk->name = "arch_sys_timer";
clk->rating = 450;
clk->cpumask = cpumask_of(smp_processor_id());
- if (arch_timer_use_virtual) {
- clk->irq = arch_timer_ppi[VIRT_PPI];
+ clk->irq = arch_timer_ppi[arch_timer_uses_ppi];
+ switch (arch_timer_uses_ppi) {
+ case VIRT_PPI:
clk->set_state_shutdown = arch_timer_shutdown_virt;
clk->set_next_event = arch_timer_set_next_event_virt;
- } else {
- clk->irq = arch_timer_ppi[PHYS_SECURE_PPI];
+ break;
+ case PHYS_SECURE_PPI:
+ case PHYS_NONSECURE_PPI:
+ case HYP_PPI:
clk->set_state_shutdown = arch_timer_shutdown_phys;
clk->set_next_event = arch_timer_set_next_event_phys;
+ break;
+ default:
+ BUG();
}
} else {
clk->features |= CLOCK_EVT_FEAT_DYNIRQ;
@@ -338,17 +344,20 @@ static void arch_counter_set_user_access(void)
arch_timer_set_cntkctl(cntkctl);
}

+static bool arch_timer_has_nonsecure_ppi(void)
+{
+ return (arch_timer_uses_ppi == PHYS_SECURE_PPI &&
+ arch_timer_ppi[PHYS_NONSECURE_PPI]);
+}
+
static int arch_timer_setup(struct clock_event_device *clk)
{
__arch_timer_setup(ARCH_CP15_TIMER, clk);

- if (arch_timer_use_virtual)
- enable_percpu_irq(arch_timer_ppi[VIRT_PPI], 0);
- else {
- enable_percpu_irq(arch_timer_ppi[PHYS_SECURE_PPI], 0);
- if (arch_timer_ppi[PHYS_NONSECURE_PPI])
- enable_percpu_irq(arch_timer_ppi[PHYS_NONSECURE_PPI], 0);
- }
+ enable_percpu_irq(arch_timer_ppi[arch_timer_uses_ppi], 0);
+
+ if (arch_timer_has_nonsecure_ppi())
+ enable_percpu_irq(arch_timer_ppi[PHYS_NONSECURE_PPI], 0);

arch_counter_set_user_access();
if (IS_ENABLED(CONFIG_ARM_ARCH_TIMER_EVTSTREAM))
@@ -390,7 +399,7 @@ static void arch_timer_banner(unsigned type)
(unsigned long)arch_timer_rate / 1000000,
(unsigned long)(arch_timer_rate / 10000) % 100,
type & ARCH_CP15_TIMER ?
- arch_timer_use_virtual ? "virt" : "phys" :
+ (arch_timer_uses_ppi == VIRT_PPI) ? "virt" : "phys" :
"",
type == (ARCH_CP15_TIMER | ARCH_MEM_TIMER) ? "/" : "",
type & ARCH_MEM_TIMER ?
@@ -460,7 +469,7 @@ static void __init arch_counter_register(unsigned type)

/* Register the CP15 based counter if we have one */
if (type & ARCH_CP15_TIMER) {
- if (IS_ENABLED(CONFIG_ARM64) || arch_timer_use_virtual)
+ if (IS_ENABLED(CONFIG_ARM64) || arch_timer_uses_ppi == VIRT_PPI)
arch_timer_read_counter = arch_counter_get_cntvct;
else
arch_timer_read_counter = arch_counter_get_cntpct;
@@ -490,13 +499,9 @@ static void arch_timer_stop(struct clock_event_device *clk)
pr_debug("arch_timer_teardown disable IRQ%d cpu #%d\n",
clk->irq, smp_processor_id());

- if (arch_timer_use_virtual)
- disable_percpu_irq(arch_timer_ppi[VIRT_PPI]);
- else {
- disable_percpu_irq(arch_timer_ppi[PHYS_SECURE_PPI]);
- if (arch_timer_ppi[PHYS_NONSECURE_PPI])
- disable_percpu_irq(arch_timer_ppi[PHYS_NONSECURE_PPI]);
- }
+ disable_percpu_irq(arch_timer_ppi[arch_timer_uses_ppi]);
+ if (arch_timer_has_nonsecure_ppi())
+ disable_percpu_irq(arch_timer_ppi[PHYS_NONSECURE_PPI]);

clk->set_state_shutdown(clk);
}
@@ -562,12 +567,14 @@ static int __init arch_timer_register(void)
goto out;
}

- if (arch_timer_use_virtual) {
- ppi = arch_timer_ppi[VIRT_PPI];
+ ppi = arch_timer_ppi[arch_timer_uses_ppi];
+ switch (arch_timer_uses_ppi) {
+ case VIRT_PPI:
err = request_percpu_irq(ppi, arch_timer_handler_virt,
"arch_timer", arch_timer_evt);
- } else {
- ppi = arch_timer_ppi[PHYS_SECURE_PPI];
+ break;
+ case PHYS_SECURE_PPI:
+ case PHYS_NONSECURE_PPI:
err = request_percpu_irq(ppi, arch_timer_handler_phys,
"arch_timer", arch_timer_evt);
if (!err && arch_timer_ppi[PHYS_NONSECURE_PPI]) {
@@ -578,6 +585,13 @@ static int __init arch_timer_register(void)
free_percpu_irq(arch_timer_ppi[PHYS_SECURE_PPI],
arch_timer_evt);
}
+ break;
+ case HYP_PPI:
+ err = request_percpu_irq(ppi, arch_timer_handler_phys,
+ "arch_timer", arch_timer_evt);
+ break;
+ default:
+ BUG();
}

if (err) {
@@ -602,15 +616,10 @@ static int __init arch_timer_register(void)
out_unreg_notify:
unregister_cpu_notifier(&arch_timer_cpu_nb);
out_free_irq:
- if (arch_timer_use_virtual)
- free_percpu_irq(arch_timer_ppi[VIRT_PPI], arch_timer_evt);
- else {
- free_percpu_irq(arch_timer_ppi[PHYS_SECURE_PPI],
+ free_percpu_irq(arch_timer_ppi[arch_timer_uses_ppi], arch_timer_evt);
+ if (arch_timer_has_nonsecure_ppi())
+ free_percpu_irq(arch_timer_ppi[PHYS_NONSECURE_PPI],
arch_timer_evt);
- if (arch_timer_ppi[PHYS_NONSECURE_PPI])
- free_percpu_irq(arch_timer_ppi[PHYS_NONSECURE_PPI],
- arch_timer_evt);
- }

out_free:
free_percpu(arch_timer_evt);
@@ -697,12 +706,25 @@ static void __init arch_timer_init(void)
*
* If no interrupt provided for virtual timer, we'll have to
* stick to the physical timer. It'd better be accessible...
+ *
+ * On ARMv8.1 with VH extensions, the kernel runs in HYP. VHE
+ * accesses to CNTP_*_EL1 registers are silently redirected to
+ * their CNTHP_*_EL2 counterparts, and use a different PPI
+ * number.
*/
if (is_hyp_mode_available() || !arch_timer_ppi[VIRT_PPI]) {
- arch_timer_use_virtual = false;
+ bool has_ppi;
+
+ if (is_kernel_in_hyp_mode()) {
+ arch_timer_uses_ppi = HYP_PPI;
+ has_ppi = !!arch_timer_ppi[HYP_PPI];
+ } else {
+ arch_timer_uses_ppi = PHYS_SECURE_PPI;
+ has_ppi = (!!arch_timer_ppi[PHYS_SECURE_PPI] ||
+ !!arch_timer_ppi[PHYS_NONSECURE_PPI]);
+ }

- if (!arch_timer_ppi[PHYS_SECURE_PPI] ||
- !arch_timer_ppi[PHYS_NONSECURE_PPI]) {
+ if (!has_ppi) {
pr_warn("arch_timer: No interrupt available, giving up\n");
return;
}
@@ -735,7 +757,7 @@ static void __init arch_timer_of_init(struct device_node *np)
*/
if (IS_ENABLED(CONFIG_ARM) &&
of_property_read_bool(np, "arm,cpu-registers-not-fw-configured"))
- arch_timer_use_virtual = false;
+ arch_timer_uses_ppi = PHYS_SECURE_PPI;

arch_timer_init();
}
--
2.1.4

Arnd Bergmann

unread,
Jan 25, 2016, 11:17:00 AM1/25/16
to linux-ar...@lists.infradead.org, Marc Zyngier, Catalin Marinas, Will Deacon, Christoffer Dall, kvm...@lists.cs.columbia.edu, linux-...@vger.kernel.org, k...@vger.kernel.org
On Monday 25 January 2016 15:53:34 Marc Zyngier wrote:
> host and guest, reducing the overhead of virtualization.
>
> In order to have the same kernel binary running on all versions of the
> architecture, this series makes heavy use of runtime code patching.
>
> The first 20 patches massage the KVM code to deal with VHE and enable
> Linux to run at EL2. The last patch catches an ugly case when VHE
> capable CPUs are paired with some of their less capable siblings. This
> should never happen, but hey...
>
> I have deliberately left out some of the more "advanced"
> optimizations, as they are likely to distract the reviewer from the
> core infrastructure, which is what I care about at the moment.

One question: as you mention that you use a lot of runtime code patching
to make this work transparently, how does this compare to runtime patching
the existing kernel to run in EL2 mode without VHE? Is that even possible?

My interpretation so far as always been "that's too complicated to
do because it would require a lot of runtime patching", but now we seem
to get that anyway because we want to run a hypervisor-enabled kernel in
either EL1 or EL2 depending on the presence of another feature.

Arnd

Marc Zyngier

unread,
Jan 25, 2016, 11:23:47 AM1/25/16
to Arnd Bergmann, linux-ar...@lists.infradead.org, Catalin Marinas, Will Deacon, Christoffer Dall, kvm...@lists.cs.columbia.edu, linux-...@vger.kernel.org, k...@vger.kernel.org
On 25/01/16 16:15, Arnd Bergmann wrote:
> On Monday 25 January 2016 15:53:34 Marc Zyngier wrote:
>> host and guest, reducing the overhead of virtualization.
>>
>> In order to have the same kernel binary running on all versions of the
>> architecture, this series makes heavy use of runtime code patching.
>>
>> The first 20 patches massage the KVM code to deal with VHE and enable
>> Linux to run at EL2. The last patch catches an ugly case when VHE
>> capable CPUs are paired with some of their less capable siblings. This
>> should never happen, but hey...
>>
>> I have deliberately left out some of the more "advanced"
>> optimizations, as they are likely to distract the reviewer from the
>> core infrastructure, which is what I care about at the moment.
>
> One question: as you mention that you use a lot of runtime code patching
> to make this work transparently, how does this compare to runtime patching
> the existing kernel to run in EL2 mode without VHE? Is that even possible?

I haven't explored that particular avenue - by the look of it, this
would require a lot more work, as v8.0 EL2 lacks a number of features
that Linux currently requires (like having two TTBRs, for example).

> My interpretation so far as always been "that's too complicated to
> do because it would require a lot of runtime patching", but now we seem
> to get that anyway because we want to run a hypervisor-enabled kernel in
> either EL1 or EL2 depending on the presence of another feature.

The kernel itself is mostly untouched (what runs at EL1 also runs at EL2
without any patching, because the new EL2 is now a superset of EL1). It
is the hypervisor code that gets a beating with the code-patching stick.

Thanks,

M.
--
Jazz is not dead. It just smells funny...

Will Deacon

unread,
Jan 25, 2016, 11:26:58 AM1/25/16
to Marc Zyngier, Catalin Marinas, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On Mon, Jan 25, 2016 at 03:53:34PM +0000, Marc Zyngier wrote:
> ARMv8.1 comes with the "Virtualization Host Extension" (VHE for
> short), which enables simpler support of Type-2 hypervisors.
>
> This extension allows the kernel to directly run at EL2, and
> significantly reduces the number of system registers shared between
> host and guest, reducing the overhead of virtualization.
>
> In order to have the same kernel binary running on all versions of the
> architecture, this series makes heavy use of runtime code patching.
>
> The first 20 patches massage the KVM code to deal with VHE and enable
> Linux to run at EL2. The last patch catches an ugly case when VHE
> capable CPUs are paired with some of their less capable siblings. This
> should never happen, but hey...
>
> I have deliberately left out some of the more "advanced"
> optimizations, as they are likely to distract the reviewer from the
> core infrastructure, which is what I care about at the moment.
>
> A few things to note:
>
> - Given that the code has been almost entierely rewritten, I've
> dropped all Acks from the new patches
>
> - GDB is currently busted on VHE systems, as it checks for version 6
> on the debug architecture, while VHE is version 7. The binutils
> people are on the case.

[...]

> arch/arm/include/asm/virt.h | 5 ++
> arch/arm/kvm/arm.c | 151 +++++++++++++++++++------------
> arch/arm/kvm/mmu.c | 7 ++
> arch/arm64/Kconfig | 13 +++
> arch/arm64/include/asm/cpufeature.h | 3 +-
> arch/arm64/include/asm/kvm_arm.h | 1 +
> arch/arm64/include/asm/kvm_emulate.h | 3 +
> arch/arm64/include/asm/kvm_mmu.h | 34 ++++++-
> arch/arm64/include/asm/virt.h | 27 ++++++
> arch/arm64/kernel/asm-offsets.c | 3 -
> arch/arm64/kernel/cpufeature.c | 15 +++-
> arch/arm64/kernel/head.S | 51 ++++++++++-
> arch/arm64/kernel/smp.c | 3 +
> arch/arm64/kvm/hyp-init.S | 18 +---
> arch/arm64/kvm/hyp.S | 7 ++
> arch/arm64/kvm/hyp/entry.S | 6 ++
> arch/arm64/kvm/hyp/hyp-entry.S | 107 +++++++---------------
> arch/arm64/kvm/hyp/hyp.h | 119 ++++++++++++++++++++++--
> arch/arm64/kvm/hyp/switch.c | 170 +++++++++++++++++++++++++++++++----
> arch/arm64/kvm/hyp/sysreg-sr.c | 147 ++++++++++++++++++++----------
> arch/arm64/kvm/hyp/timer-sr.c | 10 +--
> drivers/clocksource/arm_arch_timer.c | 96 ++++++++++++--------
> 22 files changed, 724 insertions(+), 272 deletions(-)

Have you tried hw_breakpoint/perf/ptrace with these changes? I was under
the impression that the debug architecture was aware of E2H and did need
some changes made. I know you say that GDB is broken anyway, but we should
check that the kernel does the right thing if userspace pokes it the
right way.

Will

Arnd Bergmann

unread,
Jan 25, 2016, 11:27:01 AM1/25/16
to Marc Zyngier, linux-ar...@lists.infradead.org, Catalin Marinas, Will Deacon, Christoffer Dall, kvm...@lists.cs.columbia.edu, linux-...@vger.kernel.org, k...@vger.kernel.org
On Monday 25 January 2016 16:23:37 Marc Zyngier wrote:
> On 25/01/16 16:15, Arnd Bergmann wrote:
> > On Monday 25 January 2016 15:53:34 Marc Zyngier wrote:
> >> host and guest, reducing the overhead of virtualization.
> >>
> >> In order to have the same kernel binary running on all versions of the
> >> architecture, this series makes heavy use of runtime code patching.
> >>
> >> The first 20 patches massage the KVM code to deal with VHE and enable
> >> Linux to run at EL2. The last patch catches an ugly case when VHE
> >> capable CPUs are paired with some of their less capable siblings. This
> >> should never happen, but hey...
> >>
> >> I have deliberately left out some of the more "advanced"
> >> optimizations, as they are likely to distract the reviewer from the
> >> core infrastructure, which is what I care about at the moment.
> >
> > One question: as you mention that you use a lot of runtime code patching
> > to make this work transparently, how does this compare to runtime patching
> > the existing kernel to run in EL2 mode without VHE? Is that even possible?
>
> I haven't explored that particular avenue - by the look of it, this
> would require a lot more work, as v8.0 EL2 lacks a number of features
> that Linux currently requires (like having two TTBRs, for example).

Ok, I see.

> > My interpretation so far as always been "that's too complicated to
> > do because it would require a lot of runtime patching", but now we seem
> > to get that anyway because we want to run a hypervisor-enabled kernel in
> > either EL1 or EL2 depending on the presence of another feature.
>
> The kernel itself is mostly untouched (what runs at EL1 also runs at EL2
> without any patching, because the new EL2 is now a superset of EL1). It
> is the hypervisor code that gets a beating with the code-patching stick.

Thanks for the explanation, makes sense.

Arnd

Marc Zyngier

unread,
Jan 25, 2016, 11:37:52 AM1/25/16
to Will Deacon, Catalin Marinas, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
I did use HW breakpoints on the model by hacking the host kernel to
return Debug Version 6 instead of 7, and things seem to work as
expected. strace also works out of the box.

As for perf, did you have something precise in mind?

Will Deacon

unread,
Jan 25, 2016, 11:44:47 AM1/25/16
to Marc Zyngier, Catalin Marinas, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
It would be worth trying things like the filter options on perf events
(perf stat -e cycles:k to count cycles in kernel space) and also
breakpoints (perf stat -e mem:<addr>:rwx on kernel addresses).

Will

Marc Zyngier

unread,
Jan 25, 2016, 2:17:12 PM1/25/16
to Will Deacon, Catalin Marinas, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
So indeed these didn't work (perf reported 0 for kernel accesses). The
fixes are pretty trivial, and I've put them on top of my kvm-arm64/vhe
branch, for those who want to have a look.

Suzuki K. Poulose

unread,
Jan 26, 2016, 9:05:13 AM1/26/16
to Marc Zyngier, Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On 25/01/16 15:53, Marc Zyngier wrote:
> With ARMv8.1 VHE, the architecture is able to (almost) transparently
> run the kernel at EL2, despite being written for EL1.
>
> This patch takes care of the "almost" part, mostly preventing the kernel
> from dropping from EL2 to EL1, and setting up the HYP configuration.

> #ifdef CONFIG_COMPAT
> @@ -521,6 +542,15 @@ CPU_LE( movk x0, #0x30d0, lsl #16 ) // Clear EE and E0E on LE systems
> /* Stage-2 translation */
> msr vttbr_el2, xzr
>
> + cbz x2, install_el2_stub

Though it is apparent, may be its worth adding a comment here that we don't drop to EL1 here ?

> +
> + setup_vtcr x4, x5
> +
> + mov w20, #BOOT_CPU_MODE_EL2 // This CPU booted in EL2
> + isb
> + ret

> +
> +install_el2_stub:

And a comment here mentioning, install the hyp stub and drops to EL1 ?

> /* Hypervisor stub */
> adrp x0, __hyp_stub_vectors
> add x0, x0, #:lo12:__hyp_stub_vectors
>

Cheers
Suzuki


Suzuki K. Poulose

unread,
Jan 26, 2016, 9:25:55 AM1/26/16
to Marc Zyngier, Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On 25/01/16 15:53, Marc Zyngier wrote:
> Having both VHE and non-VHE capable CPUs in the same system
> is likely to be a recipe for disaster.


> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index b1adc51..bc7650a 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -113,6 +113,9 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
> pr_crit("CPU%u: failed to come online\n", cpu);
> ret = -EIO;
> }
> +
> + if (is_kernel_mode_mismatched())
> + panic("CPU%u: incompatible execution level", cpu);


fyi,

I have a series which tries to perform some checks for early CPU features,
like this at [1] and adds support for early CPU boot failures, passing the error
status back to the master. May be we could move this check there(once it settles),
and fail the CPU boot with CPU_PANIC_KERNEL status.


[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/401727.html

Thanks
Suzuki

Suzuki K. Poulose

unread,
Jan 26, 2016, 9:30:25 AM1/26/16
to Marc Zyngier, Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On 26/01/16 14:04, Suzuki K. Poulose wrote:
> On 25/01/16 15:53, Marc Zyngier wrote:
>> With ARMv8.1 VHE, the architecture is able to (almost) transparently
>> run the kernel at EL2, despite being written for EL1.
>>
>> This patch takes care of the "almost" part, mostly preventing the kernel
>> from dropping from EL2 to EL1, and setting up the HYP configuration.
>
>> #ifdef CONFIG_COMPAT
>> @@ -521,6 +542,15 @@ CPU_LE( movk x0, #0x30d0, lsl #16 ) // Clear EE and E0E on LE systems
>> /* Stage-2 translation */
>> msr vttbr_el2, xzr
>>
>> + cbz x2, install_el2_stub
>
> Though it is apparent, may be its worth adding a comment here that we don't drop to EL1 here ?
>
>> +
>> + setup_vtcr x4, x5
>> +
>> + mov w20, #BOOT_CPU_MODE_EL2 // This CPU booted in EL2
>> + isb
>> + ret
>
>> +
>> +install_el2_stub:
>
> And a comment here mentioning, install the hyp stub and drops to EL1 ?

Also, the comments around el2_setup invocation still says, Drop to EL1 which may
need to be updated.

Cheers
Suzuki


Marc Zyngier

unread,
Jan 26, 2016, 9:34:46 AM1/26/16
to Suzuki K. Poulose, Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
Definitely, there is room for consolidation in this area...

Christoffer Dall

unread,
Feb 1, 2016, 7:28:49 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
shouldn't this simply test the PHYS_SECURE_PPI since otherwise you could
potentially have the PHYS_NONSECURE_PPI but not PHYS_SECURE_PPI and
you'll try to request IRQ 0 for this later... ?

> + }
>
> - if (!arch_timer_ppi[PHYS_SECURE_PPI] ||
> - !arch_timer_ppi[PHYS_NONSECURE_PPI]) {
> + if (!has_ppi) {
> pr_warn("arch_timer: No interrupt available, giving up\n");
> return;
> }
> @@ -735,7 +757,7 @@ static void __init arch_timer_of_init(struct device_node *np)
> */
> if (IS_ENABLED(CONFIG_ARM) &&
> of_property_read_bool(np, "arm,cpu-registers-not-fw-configured"))
> - arch_timer_use_virtual = false;
> + arch_timer_uses_ppi = PHYS_SECURE_PPI;
>
> arch_timer_init();
> }
> --
> 2.1.4
>

Thanks,
-Christoffer

Christoffer Dall

unread,
Feb 1, 2016, 8:12:43 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On Mon, Jan 25, 2016 at 03:53:39PM +0000, Marc Zyngier wrote:
> On a VHE-capable system, there is no point in setting VTCR_EL2
> at KVM init time. We can perfectly set it up when the kernel
> boots, removing the need for a more complicated configuration.

what's the complicated configuration which is avoided?

>
> In order to allow this, turn VTCR_EL2 setup into a macro that
> we'll be able to reuse at boot time.
>
> Signed-off-by: Marc Zyngier <marc.z...@arm.com>
> ---
> arch/arm64/include/asm/kvm_mmu.h | 23 +++++++++++++++++++++++
> arch/arm64/kvm/hyp-init.S | 18 +-----------------
> 2 files changed, 24 insertions(+), 17 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 7364339..d3e6d7b 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -56,6 +56,29 @@
>
> #ifdef __ASSEMBLY__
>
> +#include <asm/kvm_arm.h>
> +
> +.macro setup_vtcr tmp1, tmp2
> + mov \tmp1, #(VTCR_EL2_FLAGS & 0xffff)
> + movk \tmp1, #(VTCR_EL2_FLAGS >> 16), lsl #16
> + /*
> + * Read the PARange bits from ID_AA64MMFR0_EL1 and set the PS bits in
> + * VTCR_EL2.
> + */
> + mrs \tmp2, id_aa64mmfr0_el1
> + bfi \tmp1, \tmp2, #16, #3
> + /*
> + * Read the VMIDBits bits from ID_AA64MMFR1_EL1 and set the VS bit in
> + * VTCR_EL2.
> + */
> + mrs \tmp2, ID_AA64MMFR1_EL1
> + ubfx \tmp2, \tmp2, #5, #1
> + lsl \tmp2, \tmp2, #VTCR_EL2_VS
> + orr \tmp1, \tmp1, \tmp2
> +
> + msr vtcr_el2, \tmp1
> + isb
> +.endm

this feels like an awful lot of code in a header file.

Is it crazy to imagine wanting to have different T0SZ for different VMs
in the future? In that case, the T0SZ stuff should stay in KVM...

Thanks,
-Christoffer

> /*
> * Convert a kernel VA into a HYP VA.
> * reg: VA to be converted.
> diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
> index 3e568dc..4143e2c 100644
> --- a/arch/arm64/kvm/hyp-init.S
> +++ b/arch/arm64/kvm/hyp-init.S
> @@ -87,23 +87,7 @@ __do_hyp_init:
> #endif
> msr tcr_el2, x4
>
> - ldr x4, =VTCR_EL2_FLAGS
> - /*
> - * Read the PARange bits from ID_AA64MMFR0_EL1 and set the PS bits in
> - * VTCR_EL2.
> - */
> - mrs x5, ID_AA64MMFR0_EL1
> - bfi x4, x5, #16, #3
> - /*
> - * Read the VMIDBits bits from ID_AA64MMFR1_EL1 and set the VS bit in
> - * VTCR_EL2.
> - */
> - mrs x5, ID_AA64MMFR1_EL1
> - ubfx x5, x5, #5, #1
> - lsl x5, x5, #VTCR_EL2_VS
> - orr x4, x4, x5
> -
> - msr vtcr_el2, x4
> + setup_vtcr x4, x5
>
> mrs x4, mair_el1
> msr mair_el2, x4
> --
> 2.1.4
>

Christoffer Dall

unread,
Feb 1, 2016, 8:16:18 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
remind me why this pair isn't just doing "str" instead of "stp" with the
xzr ?

> + mov lr, x0
> + mov x0, x1
> + mov x1, x2
> + mov x2, x3
> + blr lr
> + ldp lr, xzr, [sp], #16
> +.endm
> +
> +ENTRY(__vhe_hyp_call)
> + do_el2_call
> + /*
> + * We used to rely on having an exception return to get
> + * an implicit isb. In the E2H case, we don't have it anymore.
> + * rather than changing all the leaf functions, just do it here
> + * before returning to the rest of the kernel.
> + */

why is this not the case with an ISB before do_el2_call then?

Christoffer Dall

unread,
Feb 1, 2016, 8:20:28 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
why do we need this casting of values instead of just defining these
inlines and calling them directly with proper typing?

-Christoffer

Marc Zyngier

unread,
Feb 1, 2016, 8:34:27 AM2/1/16
to Christoffer Dall, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
Because SP has to be aligned on a 16 bytes boundary at all times.

>
>> + mov lr, x0
>> + mov x0, x1
>> + mov x1, x2
>> + mov x2, x3
>> + blr lr
>> + ldp lr, xzr, [sp], #16
>> +.endm
>> +
>> +ENTRY(__vhe_hyp_call)
>> + do_el2_call
>> + /*
>> + * We used to rely on having an exception return to get
>> + * an implicit isb. In the E2H case, we don't have it anymore.
>> + * rather than changing all the leaf functions, just do it here
>> + * before returning to the rest of the kernel.
>> + */
>
> why is this not the case with an ISB before do_el2_call then?

That's a good point. I guess the safest thing to do would be to add one,
but looking at the various functions we call, I don't see any that could
go wrong by not having a ISB in their prologue.

Or maybe you've identified such a case?

Marc Zyngier

unread,
Feb 1, 2016, 8:39:10 AM2/1/16
to Christoffer Dall, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
We commonly pass both pointers and unsigned long to this helper. Do you
really want a separate helper for each type instead of one that does it all?

Marc Zyngier

unread,
Feb 1, 2016, 8:42:45 AM2/1/16
to Christoffer Dall, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
I don't really see how you could have the non-secure PPI, but not the
secure one, as the binding doesn't give you opportunity to do so (the
first interrupt is the secure one, then the non-secure one...).

Christoffer Dall

unread,
Feb 1, 2016, 8:47:27 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On Mon, Jan 25, 2016 at 03:53:42PM +0000, Marc Zyngier wrote:
> VHE brings its own bag of new system registers, or rather system
> register accessors, as it define new ways to access both guest
> and host system registers. For example, from the host:
>
> - The host TCR_EL2 register is accessed using the TCR_EL1 accessor
> - The guest TCR_EL1 register is accessed using the TCR_EL12 accessor
>
> Obviously, this is confusing. A way to somehow reduce the complexity
> of writing code for both ARMv8 and ARMv8.1 is to use a set of unified
> accessors that will generate the right sysreg, depending on the mode
> the CPU is running in. For example:
>
> - read_sysreg_el1(tcr) will use TCR_EL1 on ARMv8, and TCR_EL12 on
> ARMv8.1 with VHE.
> - read_sysreg_el2(tcr) will use TCR_EL2 on ARMv8, and TCR_EL1 on
> ARMv8.1 with VHE.
>
> We end up with three sets of accessors ({read,write}_sysreg_el[012])
> that can be directly used from C code. We take this opportunity to
> also add the definition for the new VHE sysregs.(

weird closing parenthesis.
what is rZ ?
(complete Google-fu failure misery)
as always, fun stuff to review.

> +#define spsr_EL12 sys_reg(3, 5, 4, 0, 0)
> +#define elr_EL12 sys_reg(3, 5, 4, 0, 1)
> +

I couldn't quite decipher the spec as to how these are the right
instruction encodings, so I'm going to trust the testing that this is
done right.

> /**
> * hyp_alternate_select - Generates patchable code sequences that are
> * used to switch between two implementations of a function, depending
> --
> 2.1.4
>

Thanks,
-Christoffer

Christoffer Dall

unread,
Feb 1, 2016, 8:54:38 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On Mon, Jan 25, 2016 at 03:53:44PM +0000, Marc Zyngier wrote:
> A handful of system registers are still shared between EL1 and EL2,
> even while using VHE. These are tpidr*_el[01], actlr_el1, sp0, elr,
> and spsr.

So by shared registers you mean registers that do both have an EL0/1
version as well as an EL2 version, but where accesses aren't rewritten
transparently?

also, by sp0 do you mean sp_el0, and by elr you mean elr_el1, and by
spsr you mean spsr_el1 ?


>
> In order to facilitate the introduction of a VHE-specific sysreg
> save/restore, make move the access to these registers to their
> own save/restore functions.
>
> No functionnal change.

Otherwise:

Reviewed-by: Christoffer Dall <christof...@linaro.org>

>
> Signed-off-by: Marc Zyngier <marc.z...@arm.com>
> ---
> arch/arm64/kvm/hyp/sysreg-sr.c | 48 +++++++++++++++++++++++++++++-------------
> 1 file changed, 33 insertions(+), 15 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index bd5b543..61bad17 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> @@ -23,13 +23,29 @@
>
> #include "hyp.h"
>
> -/* ctxt is already in the HYP VA space */
> +/*
> + * Non-VHE: Both host and guest must save everything.
> + *
> + * VHE: Host must save tpidr*_el[01], actlr_el1, sp0, pc, pstate, and
> + * guest must save everything.
> + */
> +
> +static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
> +{
> + ctxt->sys_regs[ACTLR_EL1] = read_sysreg(actlr_el1);
> + ctxt->sys_regs[TPIDR_EL0] = read_sysreg(tpidr_el0);
> + ctxt->sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0);
> + ctxt->sys_regs[TPIDR_EL1] = read_sysreg(tpidr_el1);
> + ctxt->gp_regs.regs.sp = read_sysreg(sp_el0);
> + ctxt->gp_regs.regs.pc = read_sysreg(elr_el2);
> + ctxt->gp_regs.regs.pstate = read_sysreg(spsr_el2);
> +}
> +
> static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
> {
> ctxt->sys_regs[MPIDR_EL1] = read_sysreg(vmpidr_el2);
> ctxt->sys_regs[CSSELR_EL1] = read_sysreg(csselr_el1);
> ctxt->sys_regs[SCTLR_EL1] = read_sysreg(sctlr_el1);
> - ctxt->sys_regs[ACTLR_EL1] = read_sysreg(actlr_el1);
> ctxt->sys_regs[CPACR_EL1] = read_sysreg(cpacr_el1);
> ctxt->sys_regs[TTBR0_EL1] = read_sysreg(ttbr0_el1);
> ctxt->sys_regs[TTBR1_EL1] = read_sysreg(ttbr1_el1);
> @@ -41,17 +57,11 @@ static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
> ctxt->sys_regs[MAIR_EL1] = read_sysreg(mair_el1);
> ctxt->sys_regs[VBAR_EL1] = read_sysreg(vbar_el1);
> ctxt->sys_regs[CONTEXTIDR_EL1] = read_sysreg(contextidr_el1);
> - ctxt->sys_regs[TPIDR_EL0] = read_sysreg(tpidr_el0);
> - ctxt->sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0);
> - ctxt->sys_regs[TPIDR_EL1] = read_sysreg(tpidr_el1);
> ctxt->sys_regs[AMAIR_EL1] = read_sysreg(amair_el1);
> ctxt->sys_regs[CNTKCTL_EL1] = read_sysreg(cntkctl_el1);
> ctxt->sys_regs[PAR_EL1] = read_sysreg(par_el1);
> ctxt->sys_regs[MDSCR_EL1] = read_sysreg(mdscr_el1);
>
> - ctxt->gp_regs.regs.sp = read_sysreg(sp_el0);
> - ctxt->gp_regs.regs.pc = read_sysreg(elr_el2);
> - ctxt->gp_regs.regs.pstate = read_sysreg(spsr_el2);
> ctxt->gp_regs.sp_el1 = read_sysreg(sp_el1);
> ctxt->gp_regs.elr_el1 = read_sysreg(elr_el1);
> ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg(spsr_el1);
> @@ -60,11 +70,24 @@ static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
> void __hyp_text __sysreg_save_host_state(struct kvm_cpu_context *ctxt)
> {
> __sysreg_save_state(ctxt);
> + __sysreg_save_common_state(ctxt);
> }
>
> void __hyp_text __sysreg_save_guest_state(struct kvm_cpu_context *ctxt)
> {
> __sysreg_save_state(ctxt);
> + __sysreg_save_common_state(ctxt);
> +}
> +
> +static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctxt)
> +{
> + write_sysreg(ctxt->sys_regs[ACTLR_EL1], actlr_el1);
> + write_sysreg(ctxt->sys_regs[TPIDR_EL0], tpidr_el0);
> + write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], tpidrro_el0);
> + write_sysreg(ctxt->sys_regs[TPIDR_EL1], tpidr_el1);
> + write_sysreg(ctxt->gp_regs.regs.sp, sp_el0);
> + write_sysreg(ctxt->gp_regs.regs.pc, elr_el2);
> + write_sysreg(ctxt->gp_regs.regs.pstate, spsr_el2);
> }
>
> static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
> @@ -72,7 +95,6 @@ static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
> write_sysreg(ctxt->sys_regs[MPIDR_EL1], vmpidr_el2);
> write_sysreg(ctxt->sys_regs[CSSELR_EL1], csselr_el1);
> write_sysreg(ctxt->sys_regs[SCTLR_EL1], sctlr_el1);
> - write_sysreg(ctxt->sys_regs[ACTLR_EL1], actlr_el1);
> write_sysreg(ctxt->sys_regs[CPACR_EL1], cpacr_el1);
> write_sysreg(ctxt->sys_regs[TTBR0_EL1], ttbr0_el1);
> write_sysreg(ctxt->sys_regs[TTBR1_EL1], ttbr1_el1);
> @@ -84,17 +106,11 @@ static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
> write_sysreg(ctxt->sys_regs[MAIR_EL1], mair_el1);
> write_sysreg(ctxt->sys_regs[VBAR_EL1], vbar_el1);
> write_sysreg(ctxt->sys_regs[CONTEXTIDR_EL1], contextidr_el1);
> - write_sysreg(ctxt->sys_regs[TPIDR_EL0], tpidr_el0);
> - write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], tpidrro_el0);
> - write_sysreg(ctxt->sys_regs[TPIDR_EL1], tpidr_el1);
> write_sysreg(ctxt->sys_regs[AMAIR_EL1], amair_el1);
> write_sysreg(ctxt->sys_regs[CNTKCTL_EL1], cntkctl_el1);
> write_sysreg(ctxt->sys_regs[PAR_EL1], par_el1);
> write_sysreg(ctxt->sys_regs[MDSCR_EL1], mdscr_el1);
>
> - write_sysreg(ctxt->gp_regs.regs.sp, sp_el0);
> - write_sysreg(ctxt->gp_regs.regs.pc, elr_el2);
> - write_sysreg(ctxt->gp_regs.regs.pstate, spsr_el2);
> write_sysreg(ctxt->gp_regs.sp_el1, sp_el1);
> write_sysreg(ctxt->gp_regs.elr_el1, elr_el1);
> write_sysreg(ctxt->gp_regs.spsr[KVM_SPSR_EL1], spsr_el1);
> @@ -103,11 +119,13 @@ static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
> void __hyp_text __sysreg_restore_host_state(struct kvm_cpu_context *ctxt)
> {
> __sysreg_restore_state(ctxt);
> + __sysreg_restore_common_state(ctxt);
> }
>
> void __hyp_text __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt)
> {
> __sysreg_restore_state(ctxt);
> + __sysreg_restore_common_state(ctxt);
> }
>
> void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
> --
> 2.1.4
>

Christoffer Dall

unread,
Feb 1, 2016, 8:58:52 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On Mon, Jan 25, 2016 at 03:53:45PM +0000, Marc Zyngier wrote:
> Use the recently introduced unified system register accessors for
> those sysregs that behave differently depending on VHE being in
> use or not.
>
> Signed-off-by: Marc Zyngier <marc.z...@arm.com>

Reviewed-by: Christoffer Dall <christof...@linaro.org>

Christoffer Dall

unread,
Feb 1, 2016, 8:58:58 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On Mon, Jan 25, 2016 at 03:53:35PM +0000, Marc Zyngier wrote:
> With ARMv8.1 VHE extension, it will be possible to run the kernel
> at EL2 (aka HYP mode). In order for the kernel to easily find out
> where it is running, add a new predicate that returns whether or
> not the kernel is in HYP mode.
>
> For completeness, the 32bit code also get such a predicate (always
> returning false) so that code common to both architecture (timers,
> KVM) can use it transparently.
>
> Signed-off-by: Marc Zyngier <marc.z...@arm.com>

Acked-by: Christoffer Dall <christof...@linaro.org>

> ---
> arch/arm/include/asm/virt.h | 5 +++++
> arch/arm64/include/asm/virt.h | 10 ++++++++++
> 2 files changed, 15 insertions(+)
>
> diff --git a/arch/arm/include/asm/virt.h b/arch/arm/include/asm/virt.h
> index 4371f45..b6a3cef 100644
> --- a/arch/arm/include/asm/virt.h
> +++ b/arch/arm/include/asm/virt.h
> @@ -74,6 +74,11 @@ static inline bool is_hyp_mode_mismatched(void)
> {
> return !!(__boot_cpu_mode & BOOT_CPU_MODE_MISMATCH);
> }
> +
> +static inline bool is_kernel_in_hyp_mode(void)
> +{
> + return false;
> +}
> #endif
>
> #endif /* __ASSEMBLY__ */
> diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
> index 7a5df52..9f22dd6 100644
> --- a/arch/arm64/include/asm/virt.h
> +++ b/arch/arm64/include/asm/virt.h
> @@ -23,6 +23,8 @@
>
> #ifndef __ASSEMBLY__
>
> +#include <asm/ptrace.h>
> +
> /*
> * __boot_cpu_mode records what mode CPUs were booted in.
> * A correctly-implemented bootloader must start all CPUs in the same mode:
> @@ -50,6 +52,14 @@ static inline bool is_hyp_mode_mismatched(void)
> return __boot_cpu_mode[0] != __boot_cpu_mode[1];
> }
>
> +static inline bool is_kernel_in_hyp_mode(void)
> +{
> + u64 el;
> +
> + asm("mrs %0, CurrentEL" : "=r" (el));
> + return el == CurrentEL_EL2;
> +}
> +
> /* The section containing the hypervisor text */
> extern char __hyp_text_start[];
> extern char __hyp_text_end[];
> --
> 2.1.4
>

Christoffer Dall

unread,
Feb 1, 2016, 8:59:08 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
unrelated change?

> static bool
> feature_matches(u64 reg, const struct arm64_cpu_capabilities *entry)
> {
> @@ -621,6 +622,11 @@ static bool has_useable_gicv3_cpuif(const struct arm64_cpu_capabilities *entry)
> return has_sre;
> }
>
> +static bool runs_at_el2(const struct arm64_cpu_capabilities *entry)
> +{
> + return is_kernel_in_hyp_mode();
> +}
> +
> static const struct arm64_cpu_capabilities arm64_features[] = {
> {
> .desc = "GIC system register CPU interface",
> @@ -651,6 +657,11 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
> .min_field_value = 2,
> },
> #endif /* CONFIG_AS_LSE && CONFIG_ARM64_LSE_ATOMICS */
> + {
> + .desc = "Virtualization Host Extensions",
> + .capability = ARM64_HAS_VIRT_HOST_EXTN,
> + .matches = runs_at_el2,
> + },
> {},
> };
>
> --
> 2.1.4
>

Otherwise:
Acked-by: Christoffer Dall <christof...@linaro.org>

Christoffer Dall

unread,
Feb 1, 2016, 8:59:15 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On Mon, Jan 25, 2016 at 03:53:38PM +0000, Marc Zyngier wrote:
> With the kernel running at EL2, there is no point trying to
> configure page tables for HYP, as the kernel is already mapped.
>
> Take this opportunity to refactor the whole init a bit, allowing
> the various parts of the hypervisor bringup to be split across
> multiple functions.
>
> Signed-off-by: Marc Zyngier <marc.z...@arm.com>

Reviewed-by: Christoffer Dall <christof...@linaro.org>

> ---
> arch/arm/kvm/arm.c | 151 +++++++++++++++++++++++++++++++++--------------------
> arch/arm/kvm/mmu.c | 7 +++
> 2 files changed, 100 insertions(+), 58 deletions(-)
>
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index dda1959..66e2d04 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -1035,6 +1035,67 @@ static inline void hyp_cpu_pm_init(void)
> }
> #endif
>
> +static void teardown_common_resources(void)
> +{
> + free_percpu(kvm_host_cpu_state);
> +}
> +
> +static int init_common_resources(void)
> +{
> + kvm_host_cpu_state = alloc_percpu(kvm_cpu_context_t);
> + if (!kvm_host_cpu_state) {
> + kvm_err("Cannot allocate host CPU state\n");
> + return -ENOMEM;
> + }
> +
> + return 0;
> +}
> +
> +static int init_subsystems(void)
> +{
> + int err;
> +
> + /*
> + * Init HYP view of VGIC
> + */
> + err = kvm_vgic_hyp_init();
> + switch (err) {
> + case 0:
> + vgic_present = true;
> + break;
> + case -ENODEV:
> + case -ENXIO:
> + vgic_present = false;
> + break;
> + default:
> + return err;
> + }
> +
> + /*
> + * Init HYP architected timer support
> + */
> + err = kvm_timer_hyp_init();
> + if (err)
> + return err;
> +
> + kvm_perf_init();
> + kvm_coproc_table_init();
> +
> + return 0;
> +}
> +
> +static void teardown_hyp_mode(void)
> +{
> + int cpu;
> +
> + if (is_kernel_in_hyp_mode())
> + return;
> +
> + free_hyp_pgds();
> + for_each_possible_cpu(cpu)
> + free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
> +}
> +
> /**
> * Inits Hyp-mode on all online CPUs
> */
> @@ -1043,6 +1104,9 @@ static int init_hyp_mode(void)
> int cpu;
> int err = 0;
>
> + if (is_kernel_in_hyp_mode())
> + return 0;
> +
> /*
> * Allocate Hyp PGD and setup Hyp identity mapping
> */
> @@ -1065,7 +1129,7 @@ static int init_hyp_mode(void)
> stack_page = __get_free_page(GFP_KERNEL);
> if (!stack_page) {
> err = -ENOMEM;
> - goto out_free_stack_pages;
> + goto out_err;
> }
>
> per_cpu(kvm_arm_hyp_stack_page, cpu) = stack_page;
> @@ -1077,13 +1141,13 @@ static int init_hyp_mode(void)
> err = create_hyp_mappings(__kvm_hyp_code_start, __kvm_hyp_code_end);
> if (err) {
> kvm_err("Cannot map world-switch code\n");
> - goto out_free_mappings;
> + goto out_err;
> }
>
> err = create_hyp_mappings(__start_rodata, __end_rodata);
> if (err) {
> kvm_err("Cannot map rodata section\n");
> - goto out_free_mappings;
> + goto out_err;
> }
>
> /*
> @@ -1095,20 +1159,10 @@ static int init_hyp_mode(void)
>
> if (err) {
> kvm_err("Cannot map hyp stack\n");
> - goto out_free_mappings;
> + goto out_err;
> }
> }
>
> - /*
> - * Map the host CPU structures
> - */
> - kvm_host_cpu_state = alloc_percpu(kvm_cpu_context_t);
> - if (!kvm_host_cpu_state) {
> - err = -ENOMEM;
> - kvm_err("Cannot allocate host CPU state\n");
> - goto out_free_mappings;
> - }
> -
> for_each_possible_cpu(cpu) {
> kvm_cpu_context_t *cpu_ctxt;
>
> @@ -1117,7 +1171,7 @@ static int init_hyp_mode(void)
>
> if (err) {
> kvm_err("Cannot map host CPU state: %d\n", err);
> - goto out_free_context;
> + goto out_err;
> }
> }
>
> @@ -1126,34 +1180,22 @@ static int init_hyp_mode(void)
> */
> on_each_cpu(cpu_init_hyp_mode, NULL, 1);
>
> - /*
> - * Init HYP view of VGIC
> - */
> - err = kvm_vgic_hyp_init();
> - switch (err) {
> - case 0:
> - vgic_present = true;
> - break;
> - case -ENODEV:
> - case -ENXIO:
> - vgic_present = false;
> - break;
> - default:
> - goto out_free_context;
> - }
> -
> - /*
> - * Init HYP architected timer support
> - */
> - err = kvm_timer_hyp_init();
> - if (err)
> - goto out_free_context;
> -
> #ifndef CONFIG_HOTPLUG_CPU
> free_boot_hyp_pgd();
> #endif
>
> - kvm_perf_init();
> + cpu_notifier_register_begin();
> +
> + err = __register_cpu_notifier(&hyp_init_cpu_nb);
> +
> + cpu_notifier_register_done();
> +
> + if (err) {
> + kvm_err("Cannot register HYP init CPU notifier (%d)\n", err);
> + goto out_err;
> + }
> +
> + hyp_cpu_pm_init();
>
> /* set size of VMID supported by CPU */
> kvm_vmid_bits = kvm_get_vmid_bits();
> @@ -1162,14 +1204,9 @@ static int init_hyp_mode(void)
> kvm_info("Hyp mode initialized successfully\n");
>
> return 0;
> -out_free_context:
> - free_percpu(kvm_host_cpu_state);
> -out_free_mappings:
> - free_hyp_pgds();
> -out_free_stack_pages:
> - for_each_possible_cpu(cpu)
> - free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
> +
> out_err:
> + teardown_hyp_mode();
> kvm_err("error initializing Hyp mode: %d\n", err);
> return err;
> }
> @@ -1213,26 +1250,24 @@ int kvm_arch_init(void *opaque)
> }
> }
>
> - cpu_notifier_register_begin();
> + err = init_common_resources();
> + if (err)
> + return err;
>
> err = init_hyp_mode();
> if (err)
> goto out_err;
>
> - err = __register_cpu_notifier(&hyp_init_cpu_nb);
> - if (err) {
> - kvm_err("Cannot register HYP init CPU notifier (%d)\n", err);
> - goto out_err;
> - }
> -
> - cpu_notifier_register_done();
> -
> - hyp_cpu_pm_init();
> + err = init_subsystems();
> + if (err)
> + goto out_hyp;
>
> - kvm_coproc_table_init();
> return 0;
> +
> +out_hyp:
> + teardown_hyp_mode();
> out_err:
> - cpu_notifier_register_done();
> + teardown_common_resources();
> return err;
> }
>
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index aba61fd..920d0c3 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -28,6 +28,7 @@
> #include <asm/kvm_mmio.h>
> #include <asm/kvm_asm.h>
> #include <asm/kvm_emulate.h>
> +#include <asm/virt.h>
>
> #include "trace.h"
>
> @@ -598,6 +599,9 @@ int create_hyp_mappings(void *from, void *to)
> unsigned long start = KERN_TO_HYP((unsigned long)from);
> unsigned long end = KERN_TO_HYP((unsigned long)to);
>
> + if (is_kernel_in_hyp_mode())
> + return 0;
> +
> start = start & PAGE_MASK;
> end = PAGE_ALIGN(end);
>
> @@ -630,6 +634,9 @@ int create_hyp_io_mappings(void *from, void *to, phys_addr_t phys_addr)
> unsigned long start = KERN_TO_HYP((unsigned long)from);
> unsigned long end = KERN_TO_HYP((unsigned long)to);
>
> + if (is_kernel_in_hyp_mode())
> + return 0;
> +
> /* Check for a valid kernel IO mapping */
> if (!is_vmalloc_addr(from) || !is_vmalloc_addr(to - 1))
> return -EINVAL;
> --
> 2.1.4
>

Christoffer Dall

unread,
Feb 1, 2016, 8:59:16 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On Mon, Jan 25, 2016 at 03:53:43PM +0000, Marc Zyngier wrote:
> With ARMv8, host and guest share the same system register file,
> making the save/restore procedure completely symetrical.
> With VHE, host and guest now have different requirements, as they
> use different sysregs.
>
> In order to prepare for this, add split sysreg save/restore functions
> for both host and guest. No functionnal change yet.

Christoffer Dall

unread,
Feb 1, 2016, 9:01:56 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On Mon, Jan 25, 2016 at 03:53:46PM +0000, Marc Zyngier wrote:
> We're now in a position where we can introduce VHE's minimal
> save/restore, which is limited to the handful of shared sysregs.
>
> Add the required alternative function calls that result in a
> "do nothing" call on VHE, and the normal save/restore for non-VHE.
>
> Signed-off-by: Marc Zyngier <marc.z...@arm.com>
> ---
> arch/arm64/kvm/hyp/sysreg-sr.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index 7d7d757..36bbdec 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> @@ -23,6 +23,9 @@
>
> #include "hyp.h"
>
> +/* Yes, this does nothing, on purpose */
> +static void __hyp_text __sysreg_do_nothing(struct kvm_cpu_context *ctxt) { }
> +
> /*
> * Non-VHE: Both host and guest must save everything.
> *
> @@ -67,9 +70,13 @@ static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
> ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(spsr);
> }
>
> +static hyp_alternate_select(__sysreg_call_save_state,

__sysreg_call_save_host_state for symmetry with the restore path below?

> + __sysreg_save_state, __sysreg_do_nothing,
> + ARM64_HAS_VIRT_HOST_EXTN);
> +
> void __hyp_text __sysreg_save_host_state(struct kvm_cpu_context *ctxt)
> {
> - __sysreg_save_state(ctxt);
> + __sysreg_call_save_state()(ctxt);
> __sysreg_save_common_state(ctxt);
> }
>
> @@ -116,9 +123,13 @@ static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
> write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],spsr);
> }
>
> +static hyp_alternate_select(__sysreg_call_restore_host_state,
> + __sysreg_restore_state, __sysreg_do_nothing,
> + ARM64_HAS_VIRT_HOST_EXTN);
> +
> void __hyp_text __sysreg_restore_host_state(struct kvm_cpu_context *ctxt)
> {
> - __sysreg_restore_state(ctxt);
> + __sysreg_call_restore_host_state()(ctxt);
> __sysreg_restore_common_state(ctxt);
> }
>
> --
> 2.1.4

Marc Zyngier

unread,
Feb 1, 2016, 9:04:50 AM2/1/16
to Christoffer Dall, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
This gives the assembler the opportunity to generate a XZR register
access if the value is zero. See:

https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints
Well, short of having publicly available documentation, or force
everyone to upgrade their binutils to deal be able to cope with the new
sysregs, I don't know what else to do. I'm open to suggestions, though.

>
>> +#define spsr_EL12 sys_reg(3, 5, 4, 0, 0)
>> +#define elr_EL12 sys_reg(3, 5, 4, 0, 1)
>> +
>
> I couldn't quite decipher the spec as to how these are the right
> instruction encodings, so I'm going to trust the testing that this is
> done right.

If you have access to the spec, you have to play a substitution game
between the canonical encoding of the register accessed, and the
register used. For example:

SPSR_EL1 (3, 0, 4, 0, 0) -> SPSR_EL12 (3, 5, 4, 0, 0)

In practice, only Op1 changes.

Christoffer Dall

unread,
Feb 1, 2016, 9:17:14 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On Mon, Jan 25, 2016 at 03:53:47PM +0000, Marc Zyngier wrote:
> As non-VHE and VHE have different ways to express the trapping of
> FPSIMD registers to EL2, make __fpsimd_enabled a patchable predicate
> and provide a VHE implementation.
>
> Signed-off-by: Marc Zyngier <marc.z...@arm.com>
> ---
> arch/arm64/kvm/hyp/hyp.h | 5 +----
> arch/arm64/kvm/hyp/switch.c | 19 +++++++++++++++++++
> 2 files changed, 20 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> index 5dfa883..44eaff7 100644
> --- a/arch/arm64/kvm/hyp/hyp.h
> +++ b/arch/arm64/kvm/hyp/hyp.h
> @@ -171,10 +171,7 @@ void __debug_cond_restore_host_state(struct kvm_vcpu *vcpu);
>
> void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
> void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
> -static inline bool __fpsimd_enabled(void)
> -{
> - return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
> -}
> +bool __fpsimd_enabled(void);
>
> u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt);
> void __noreturn __hyp_do_panic(unsigned long, ...);
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 9071dee..6f264dc 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -17,6 +17,25 @@
>
> #include "hyp.h"
>
> +static bool __hyp_text __fpsimd_enabled_nvhe(void)
> +{
> + return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
> +}
> +
> +static bool __hyp_text __fpsimd_enabled_vhe(void)
> +{
> + return !!(read_sysreg(cpacr_el1) & (3 << 20));

so this access to cpacr_el1 is really rewritten by the HW to access the
cptr_el2, and the cptr_el2 is redefined to have bits[21:20] have the
semantics that if the bits are both set we don't trap (FPEN), so that
means SIMD is enabled for the guest to use. Right, simple, crisp,
clear, and intuitive.

nit: you could add a define for the bitfield somewhere reusable?

> +}
> +
> +static hyp_alternate_select(__fpsimd_is_enabled,
> + __fpsimd_enabled_nvhe, __fpsimd_enabled_vhe,
> + ARM64_HAS_VIRT_HOST_EXTN);
> +
> +bool __hyp_text __fpsimd_enabled(void)
> +{
> + return __fpsimd_is_enabled()();
> +}
> +
> static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
> {
> u64 val;
> --
> 2.1.4
>

Otherwise,
Reviewed-by: Christoffer Dall <christof...@linaro.org>

Christoffer Dall

unread,
Feb 1, 2016, 9:20:29 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On Mon, Jan 25, 2016 at 03:53:48PM +0000, Marc Zyngier wrote:
> Running the kernel in HYP mode requires the HCR_E2H bit to be set
> at all times, and the HCR_TGE bit to be set when running as a host
> (and cleared when running as a guest). At the same time, the vector
> must be set to the current role of the kernel (either host or
> hypervisor), and a couple of system registers differ between VHE
> and non-VHE.
>
> We implement these by using another set of alternate functions
> that get dynamically patched.
>
> Signed-off-by: Marc Zyngier <marc.z...@arm.com>
> ---
> arch/arm64/include/asm/kvm_arm.h | 1 +
> arch/arm64/include/asm/kvm_emulate.h | 3 +++
> arch/arm64/kvm/hyp/switch.c | 52 +++++++++++++++++++++++++++++++++---
> 3 files changed, 53 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 738a95f..73d3826 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -23,6 +23,7 @@
> #include <asm/types.h>
>
> /* Hyp Configuration Register (HCR) bits */
> +#define HCR_E2H (UL(1) << 34)
> #define HCR_ID (UL(1) << 33)
> #define HCR_CD (UL(1) << 32)
> #define HCR_RW_SHIFT 31
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 3066328..5ae0c69 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -29,6 +29,7 @@
> #include <asm/kvm_mmio.h>
> #include <asm/ptrace.h>
> #include <asm/cputype.h>
> +#include <asm/virt.h>
>
> unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num);
> unsigned long *vcpu_spsr32(const struct kvm_vcpu *vcpu);
> @@ -43,6 +44,8 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
> static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
> {
> vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
> + if (is_kernel_in_hyp_mode())
> + vcpu->arch.hcr_el2 |= HCR_E2H;
> if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features))
> vcpu->arch.hcr_el2 &= ~HCR_RW;
> }
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 6f264dc..77f7c94 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -15,6 +15,8 @@
> * along with this program. If not, see <http://www.gnu.org/licenses/>.
> */
>
> +#include <asm/kvm_asm.h>
> +
> #include "hyp.h"
>
> static bool __hyp_text __fpsimd_enabled_nvhe(void)
> @@ -36,6 +38,27 @@ bool __hyp_text __fpsimd_enabled(void)
> return __fpsimd_is_enabled()();
> }
>
> +static void __hyp_text __activate_traps_vhe(void)
> +{
> + u64 val;
> +
> + val = read_sysreg(cpacr_el1);
> + val |= 1 << 28;
> + val &= ~(3 << 20);

could you define these bitfields as well?

> + write_sysreg(val, cpacr_el1);
> +
> + write_sysreg(__kvm_hyp_vector, vbar_el1);
> +}
> +
> +static void __hyp_text __activate_traps_nvhe(void)
> +{
> + write_sysreg(CPTR_EL2_TTA | CPTR_EL2_TFP, cptr_el2);
> +}
> +
> +static hyp_alternate_select(__activate_traps_arch,
> + __activate_traps_nvhe, __activate_traps_vhe,
> + ARM64_HAS_VIRT_HOST_EXTN);
> +
> static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
> {
> u64 val;
> @@ -55,16 +78,39 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
> write_sysreg(val, hcr_el2);
> /* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
> write_sysreg(1 << 15, hstr_el2);
> - write_sysreg(CPTR_EL2_TTA | CPTR_EL2_TFP, cptr_el2);
> write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
> + __activate_traps_arch()();
> }
>
> -static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
> +static void __hyp_text __deactivate_traps_vhe(void)
> +{
> + extern char vectors[]; /* kernel exception vectors */
> + u64 val;
> +
> + write_sysreg(HCR_RW | HCR_TGE | HCR_E2H, hcr_el2);

perhaps we should define the EL2_HOST_HCR bit settings somewhere
globally and reuse that here?

> +
> + val = read_sysreg(cpacr_el1);
> + val |= 3 << 20;
> + write_sysreg(val, cpacr_el1);
> +
> + write_sysreg(vectors, vbar_el1);
> +}
> +
> +static void __hyp_text __deactivate_traps_nvhe(void)
> {
> write_sysreg(HCR_RW, hcr_el2);
> + write_sysreg(0, cptr_el2);

I'm noticing here that there's actually a bunch of RES1 bits in the
cptr_el2, so perhaps we should fix this as we're at it?

> +}
> +
> +static hyp_alternate_select(__deactivate_traps_arch,
> + __deactivate_traps_nvhe, __deactivate_traps_vhe,
> + ARM64_HAS_VIRT_HOST_EXTN);
> +
> +static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
> +{
> + __deactivate_traps_arch()();
> write_sysreg(0, hstr_el2);
> write_sysreg(read_sysreg(mdcr_el2) & MDCR_EL2_HPMN_MASK, mdcr_el2);
> - write_sysreg(0, cptr_el2);
> }
>
> static void __hyp_text __activate_vm(struct kvm_vcpu *vcpu)
> --
> 2.1.4
>

Otherwise looks good.

-Christoffer

Marc Zyngier

unread,
Feb 1, 2016, 9:22:09 AM2/1/16
to Christoffer Dall, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On 01/02/16 13:13, Christoffer Dall wrote:
> On Mon, Jan 25, 2016 at 03:53:39PM +0000, Marc Zyngier wrote:
>> On a VHE-capable system, there is no point in setting VTCR_EL2
>> at KVM init time. We can perfectly set it up when the kernel
>> boots, removing the need for a more complicated configuration.
>
> what's the complicated configuration which is avoided?

With VHE, there is no hyp-init at all, so what we avoid is a weird init
sequence where we have to execute part of this hyp-init, but not all of it.
interrupt_head.S respectfully disagrees with you ;-).

> Is it crazy to imagine wanting to have different T0SZ for different VMs
> in the future? In that case, the T0SZ stuff should stay in KVM...

That's a rather compelling argument indeed. I'll see if I can turn the
thing around in a slightly nicer way. How about moving it out of
hyp-init.S altogether, and into C code?

Thanks,

Christoffer Dall

unread,
Feb 1, 2016, 9:23:01 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On Mon, Jan 25, 2016 at 03:53:49PM +0000, Marc Zyngier wrote:
> Switch the timer code to the unified sysreg accessors.
>
> Signed-off-by: Marc Zyngier <marc.z...@arm.com>

Reviewed-by: Christoffer Dall <christof...@linaro.org>

> ---
> arch/arm64/kvm/hyp/timer-sr.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/timer-sr.c b/arch/arm64/kvm/hyp/timer-sr.c
> index 1051e5d..f276d9e 100644
> --- a/arch/arm64/kvm/hyp/timer-sr.c
> +++ b/arch/arm64/kvm/hyp/timer-sr.c
> @@ -31,12 +31,12 @@ void __hyp_text __timer_save_state(struct kvm_vcpu *vcpu)
> u64 val;
>
> if (kvm->arch.timer.enabled) {
> - timer->cntv_ctl = read_sysreg(cntv_ctl_el0);
> - timer->cntv_cval = read_sysreg(cntv_cval_el0);
> + timer->cntv_ctl = read_sysreg_el0(cntv_ctl);
> + timer->cntv_cval = read_sysreg_el0(cntv_cval);
> }
>
> /* Disable the virtual timer */
> - write_sysreg(0, cntv_ctl_el0);
> + write_sysreg_el0(0, cntv_ctl);
>
> /* Allow physical timer/counter access for the host */
> val = read_sysreg(cnthctl_el2);
> @@ -64,8 +64,8 @@ void __hyp_text __timer_restore_state(struct kvm_vcpu *vcpu)
>
> if (kvm->arch.timer.enabled) {
> write_sysreg(kvm->arch.timer.cntvoff, cntvoff_el2);
> - write_sysreg(timer->cntv_cval, cntv_cval_el0);
> + write_sysreg_el0(timer->cntv_cval, cntv_cval);
> isb();
> - write_sysreg(timer->cntv_ctl, cntv_ctl_el0);
> + write_sysreg_el0(timer->cntv_ctl, cntv_ctl);
> }
> }
> --
> 2.1.4
>

Christoffer Dall

unread,
Feb 1, 2016, 9:24:00 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On Mon, Jan 25, 2016 at 03:53:50PM +0000, Marc Zyngier wrote:
> Despite the fact that a VHE enabled kernel runs at EL2, it uses
> CPACR_EL1 to trap FPSIMD access. Add the required alternative
> code to re-enable guest FPSIMD access when it has trapped to
> EL2.
>
> Signed-off-by: Marc Zyngier <marc.z...@arm.com>
> ---
> arch/arm64/kvm/hyp/entry.S | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index fd0fbe9..759a0ec 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -130,9 +130,15 @@ ENDPROC(__guest_exit)
> ENTRY(__fpsimd_guest_restore)
> stp x4, lr, [sp, #-16]!
>
> +alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> mrs x2, cptr_el2
> bic x2, x2, #CPTR_EL2_TFP
> msr cptr_el2, x2
> +alternative_else
> + mrs x2, cpacr_el1
> + orr x2, x2, #(3 << 20)

nit: bitfield definition again

> + msr cpacr_el1, x2
> +alternative_endif
> isb
>
> mrs x3, tpidr_el2
> --
> 2.1.4
>

Reviewed-by: Christoffer Dall <christof...@linaro.org>

Christoffer Dall

unread,
Feb 1, 2016, 9:26:02 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On Mon, Jan 25, 2016 at 03:53:51PM +0000, Marc Zyngier wrote:
> As the kernel fully runs in HYP when VHE is enabled, we can
> directly branch to the kernel's panic() implementation, and
> not perform an exception return.
>
> Add the alternative code to deal with this.
>
> Signed-off-by: Marc Zyngier <marc.z...@arm.com>

Reviewed-by: Christoffer Dall <christof...@linaro.org>

> ---
> arch/arm64/kvm/hyp/switch.c | 35 +++++++++++++++++++++++++++--------
> 1 file changed, 27 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 77f7c94..0cadb7f 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -211,11 +211,34 @@ __alias(__guest_run) int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>
> static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
>
> -void __hyp_text __noreturn __hyp_panic(void)
> +static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par)
> {
> unsigned long str_va = (unsigned long)__hyp_panic_string;
> - u64 spsr = read_sysreg(spsr_el2);
> - u64 elr = read_sysreg(elr_el2);
> +
> + __hyp_do_panic(hyp_kern_va(str_va),
> + spsr, elr,
> + read_sysreg(esr_el2), read_sysreg_el2(far),
> + read_sysreg(hpfar_el2), par,
> + (void *)read_sysreg(tpidr_el2));
> +}
> +
> +static void __hyp_text __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par)
> +{
> + panic(__hyp_panic_string,
> + spsr, elr,
> + read_sysreg_el2(esr), read_sysreg_el2(far),
> + read_sysreg(hpfar_el2), par,
> + (void *)read_sysreg(tpidr_el2));
> +}
> +
> +static hyp_alternate_select(__hyp_call_panic,
> + __hyp_call_panic_nvhe, __hyp_call_panic_vhe,
> + ARM64_HAS_VIRT_HOST_EXTN);
> +
> +void __hyp_text __noreturn __hyp_panic(void)
> +{
> + u64 spsr = read_sysreg_el2(spsr);
> + u64 elr = read_sysreg_el2(elr);
> u64 par = read_sysreg(par_el1);
>
> if (read_sysreg(vttbr_el2)) {
> @@ -230,11 +253,7 @@ void __hyp_text __noreturn __hyp_panic(void)
> }
>
> /* Call panic for real */
> - __hyp_do_panic(hyp_kern_va(str_va),
> - spsr, elr,
> - read_sysreg(esr_el2), read_sysreg(far_el2),
> - read_sysreg(hpfar_el2), par,
> - (void *)read_sysreg(tpidr_el2));
> + __hyp_call_panic()(spsr, elr, par);
>
> unreachable();
> }
> --
> 2.1.4
>

Christoffer Dall

unread,
Feb 1, 2016, 9:40:46 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On Mon, Jan 25, 2016 at 03:53:52PM +0000, Marc Zyngier wrote:
> We already have hyp_alternate_select() to define a function pointer
> that gets changed by a kernel feature or workaround.
>
> It would be useful to have a similar feature that resolves in a
> direct value, without requiring a function call. For this purpose,
> introduce hyp_alternate_value(), which returns one of two values
> depending on the state of the alternative.
>
> Signed-off-by: Marc Zyngier <marc.z...@arm.com>
> ---
> arch/arm64/kvm/hyp/hyp.h | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> index 44eaff7..dc75fdb 100644
> --- a/arch/arm64/kvm/hyp/hyp.h
> +++ b/arch/arm64/kvm/hyp/hyp.h
> @@ -144,6 +144,17 @@ typeof(orig) * __hyp_text fname(void) \
> return val; \
> }
>
> +#define hyp_alternate_value(fname, orig, alt, cond) \
> +typeof(orig) __hyp_text fname(void) \
> +{ \
> + typeof(alt) val = orig; \
> + asm volatile(ALTERNATIVE("nop \n", \
> + "mov %0, %1 \n", \
> + cond) \
> + : "+r" (val) : "r" ((typeof(orig))alt)); \
> + return val; \
> +}
> +
> void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
> void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
>
> --
> 2.1.4
>
I'm really not convinced that this is more readable than simply defining
a function where needed. Perhaps the thing that needs a definition is
the "asm volatile(ALTERNATIVE(...))" part? I also don't see why any of
this is specific to KVM or Hyp ?

-Christoffer

Christoffer Dall

unread,
Feb 1, 2016, 10:21:08 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On Mon, Jan 25, 2016 at 03:53:53PM +0000, Marc Zyngier wrote:
> The fault decoding process (including computing the IPA in the case
> of a permission fault) would be much better done in C code, as we
> have a reasonable infrastructure to deal with the VHE/non-VHE
> differences.
>
> Let's move the whole thing to C, including the workaround for
> erratum 834220, and just patch the odd ESR_EL2 access remaining
> in hyp-entry.S.
>
> Signed-off-by: Marc Zyngier <marc.z...@arm.com>
> ---
> arch/arm64/kernel/asm-offsets.c | 3 --
> arch/arm64/kvm/hyp/hyp-entry.S | 69 +++--------------------------------------
> arch/arm64/kvm/hyp/switch.c | 54 ++++++++++++++++++++++++++++++++
> 3 files changed, 59 insertions(+), 67 deletions(-)
>
> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index fffa4ac6..b0ab4e9 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -110,9 +110,6 @@ int main(void)
> DEFINE(CPU_USER_PT_REGS, offsetof(struct kvm_regs, regs));
> DEFINE(CPU_FP_REGS, offsetof(struct kvm_regs, fp_regs));
> DEFINE(VCPU_FPEXC32_EL2, offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2]));
> - DEFINE(VCPU_ESR_EL2, offsetof(struct kvm_vcpu, arch.fault.esr_el2));
> - DEFINE(VCPU_FAR_EL2, offsetof(struct kvm_vcpu, arch.fault.far_el2));
> - DEFINE(VCPU_HPFAR_EL2, offsetof(struct kvm_vcpu, arch.fault.hpfar_el2));
> DEFINE(VCPU_HOST_CONTEXT, offsetof(struct kvm_vcpu, arch.host_cpu_context));
> #endif
> #ifdef CONFIG_CPU_PM
> diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
> index 9e0683f..213de52 100644
> --- a/arch/arm64/kvm/hyp/hyp-entry.S
> +++ b/arch/arm64/kvm/hyp/hyp-entry.S
> @@ -19,7 +19,6 @@
>
> #include <asm/alternative.h>
> #include <asm/assembler.h>
> -#include <asm/asm-offsets.h>
> #include <asm/cpufeature.h>
> #include <asm/kvm_arm.h>
> #include <asm/kvm_asm.h>
> @@ -67,7 +66,11 @@ ENDPROC(__vhe_hyp_call)
> el1_sync: // Guest trapped into EL2
> save_x0_to_x3
>
> +alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> mrs x1, esr_el2
> +alternative_else
> + mrs x1, esr_el1
> +alternative_endif

I suppose this is not technically part of what the patch description
says it does, but ok...

> lsr x2, x1, #ESR_ELx_EC_SHIFT
>
> cmp x2, #ESR_ELx_EC_HVC64
> @@ -103,72 +106,10 @@ el1_trap:
> cmp x2, #ESR_ELx_EC_FP_ASIMD
> b.eq __fpsimd_guest_restore
>
> - cmp x2, #ESR_ELx_EC_DABT_LOW
> - mov x0, #ESR_ELx_EC_IABT_LOW
> - ccmp x2, x0, #4, ne
> - b.ne 1f // Not an abort we care about
> -
> - /* This is an abort. Check for permission fault */
> -alternative_if_not ARM64_WORKAROUND_834220
> - and x2, x1, #ESR_ELx_FSC_TYPE
> - cmp x2, #FSC_PERM
> - b.ne 1f // Not a permission fault
> -alternative_else
> - nop // Use the permission fault path to
> - nop // check for a valid S1 translation,
> - nop // regardless of the ESR value.
> -alternative_endif
> -
> - /*
> - * Check for Stage-1 page table walk, which is guaranteed
> - * to give a valid HPFAR_EL2.
> - */
> - tbnz x1, #7, 1f // S1PTW is set
> -
> - /* Preserve PAR_EL1 */
> - mrs x3, par_el1
> - stp x3, xzr, [sp, #-16]!
> -
> - /*
> - * Permission fault, HPFAR_EL2 is invalid.
> - * Resolve the IPA the hard way using the guest VA.
> - * Stage-1 translation already validated the memory access rights.
> - * As such, we can use the EL1 translation regime, and don't have
> - * to distinguish between EL0 and EL1 access.
> - */
> - mrs x2, far_el2
> - at s1e1r, x2
> - isb
> -
> - /* Read result */
> - mrs x3, par_el1
> - ldp x0, xzr, [sp], #16 // Restore PAR_EL1 from the stack
> - msr par_el1, x0
> - tbnz x3, #0, 3f // Bail out if we failed the translation
> - ubfx x3, x3, #12, #36 // Extract IPA
> - lsl x3, x3, #4 // and present it like HPFAR
> - b 2f
> -
> -1: mrs x3, hpfar_el2
> - mrs x2, far_el2
> -
> -2: mrs x0, tpidr_el2
> - str w1, [x0, #VCPU_ESR_EL2]
> - str x2, [x0, #VCPU_FAR_EL2]
> - str x3, [x0, #VCPU_HPFAR_EL2]
> -
> + mrs x0, tpidr_el2
> mov x1, #ARM_EXCEPTION_TRAP
> b __guest_exit
>
> - /*
> - * Translation failed. Just return to the guest and
> - * let it fault again. Another CPU is probably playing
> - * behind our back.
> - */
> -3: restore_x0_to_x3
> -
> - eret
> -
> el1_irq:
> save_x0_to_x3
> mrs x0, tpidr_el2
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 0cadb7f..df2cce9 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -15,6 +15,7 @@
> * along with this program. If not, see <http://www.gnu.org/licenses/>.
> */
>
> +#include <linux/types.h>
> #include <asm/kvm_asm.h>
>
> #include "hyp.h"
> @@ -150,6 +151,55 @@ static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
> __vgic_call_restore_state()(vcpu);
> }
>
> +static hyp_alternate_value(__check_arm_834220,
> + false, true,
> + ARM64_WORKAROUND_834220);
> +
> +static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
> +{
> + u64 esr = read_sysreg_el2(esr);
> + u8 ec = esr >> ESR_ELx_EC_SHIFT;
> + u64 hpfar, far;
> +
> + vcpu->arch.fault.esr_el2 = esr;
> +
> + if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
> + return true;
> +
> + far = read_sysreg_el2(far);
> +
> + if (!(esr & ESR_ELx_S1PTW) &&
> + (__check_arm_834220() || (esr & ESR_ELx_FSC_TYPE) == FSC_PERM)) {

this is really hard to read. How do you feel about putting the below
block into its own function and changing to something like this:

/*
* The HPFAR can be invalid if the stage 2 fault did not happen during a
* stage 1 page table walk (the ESR_EL2.S1PTW bit is clear) and one of
* the two following cases are true:
* 1. The fault was due to a permission fault
* 2. The processor carries errata 834220
*
* Therefore, for all non S1PTW faults where we either have a permission
* fault or the errata workaround is enabled, we resolve the IPA using
* the AT instruction.
*/
if (!(esr & ESR_ELx_S1PTW) &&
(__check_arm_834220() || (esr & ESR_ELx_FSC_TYPE) == FSC_PERM)) {
if (!__translate_far_to_ipa(&hpfar))
return false; /* Translation failed, back to guest */
} else {
hpfar = read_sysreg(hpfar_el2);
}

not sure if it helps that much, perhaps it's just complicated by nature.

> + u64 par, tmp;
> +
> + /*
> + * Permission fault, HPFAR_EL2 is invalid. Resolve the
> + * IPA the hard way using the guest VA.
> + * Stage-1 translation already validated the memory
> + * access rights. As such, we can use the EL1
> + * translation regime, and don't have to distinguish
> + * between EL0 and EL1 access.
> + */
> + par = read_sysreg(par_el1);

in any cas I think we also need the comment about preserving par_el1
here, which is only something we do because we may return early, IIUC.

> + asm volatile("at s1e1r, %0" : : "r" (far));
> + isb();
> +
> + tmp = read_sysreg(par_el1);
> + write_sysreg(par, par_el1);
> +
> + if (unlikely(tmp & 1))
> + return false; /* Translation failed, back to guest */
> +

nit: add comment /* Convert PAR to HPFAR format */

> + hpfar = ((tmp >> 12) & ((1UL << 36) - 1)) << 4;
> + } else {
> + hpfar = read_sysreg(hpfar_el2);
> + }
> +
> + vcpu->arch.fault.far_el2 = far;
> + vcpu->arch.fault.hpfar_el2 = hpfar;
> + return true;
> +}
> +
> static int __hyp_text __guest_run(struct kvm_vcpu *vcpu)
> {
> struct kvm_cpu_context *host_ctxt;
> @@ -181,9 +231,13 @@ static int __hyp_text __guest_run(struct kvm_vcpu *vcpu)
> __debug_restore_state(vcpu, kern_hyp_va(vcpu->arch.debug_ptr), guest_ctxt);
>
> /* Jump in the fire! */
> +again:
> exit_code = __guest_enter(vcpu, host_ctxt);
> /* And we're baaack! */
>
> + if (exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu))
> + goto again;
> +
> fp_enabled = __fpsimd_enabled();
>
> __sysreg_save_guest_state(guest_ctxt);
> --
> 2.1.4
>
The good news are that I couldn't find any bugs in the code.

-Christoffer

Christoffer Dall

unread,
Feb 1, 2016, 10:25:56 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On Mon, Jan 25, 2016 at 03:53:54PM +0000, Marc Zyngier wrote:
> With ARMv8.1 VHE, the architecture is able to (almost) transparently
> run the kernel at EL2, despite being written for EL1.
>
> This patch takes care of the "almost" part, mostly preventing the kernel
> from dropping from EL2 to EL1, and setting up the HYP configuration.
>
> Signed-off-by: Marc Zyngier <marc.z...@arm.com>
> ---
> arch/arm64/Kconfig | 13 +++++++++++++
> arch/arm64/kernel/head.S | 32 +++++++++++++++++++++++++++++++-
> 2 files changed, 44 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 8cc6228..ada34df 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -750,6 +750,19 @@ config ARM64_LSE_ATOMICS
> not support these instructions and requires the kernel to be
> built with binutils >= 2.25.
>
> +config ARM64_VHE
> + bool "Enable support for Virtualization Host Extension (VHE)"

Extensions (plural)

> + default y
> + help
> + Virtualization Host Extension (VHE) allows the kernel to run

same

> + directly at EL2 (instead of EL1) on processors that support
> + it. This leads to better performance for KVM, as it reduces

s/it/them/ then?

> + the cost of the world switch.
> +
> + Selecting this option allows the VHE feature to be detected
> + at runtime, and does not affect processors that do not
> + implement this feature.
> +
> endmenu
>
> endmenu
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index ffe9c2b..2a7134c 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -30,6 +30,7 @@
> #include <asm/cache.h>
> #include <asm/cputype.h>
> #include <asm/kernel-pgtable.h>
> +#include <asm/kvm_mmu.h>
> #include <asm/memory.h>
> #include <asm/pgtable-hwdef.h>
> #include <asm/pgtable.h>
> @@ -464,8 +465,25 @@ CPU_LE( bic x0, x0, #(3 << 24) ) // Clear the EE and E0E bits for EL1
> isb
> ret
>
> +2:
> +#ifdef CONFIG_ARM64_VHE
> + /*
> + * Check for VHE being present. For the rest of the EL2 setup,
> + * x2 being non-zero indicates that we do have VHE, and that the
> + * kernel is intended to run at EL2.
> + */
> + mrs x2, id_aa64mmfr1_el1
> + ubfx x2, x2, #8, #4
> +#else
> + mov x2, xzr
> +#endif
> +
> /* Hyp configuration. */
> -2: mov x0, #(1 << 31) // 64-bit EL1
> + mov x0, #HCR_RW // 64-bit EL1
> + cbz x2, set_hcr
> + orr x0, x0, #HCR_TGE // Enable Host Extensions
> + orr x0, x0, #HCR_E2H
> +set_hcr:
> msr hcr_el2, x0
>
> /* Generic timers. */
> @@ -507,6 +525,9 @@ CPU_LE( movk x0, #0x30d0, lsl #16 ) // Clear EE and E0E on LE systems
>
> /* Coprocessor traps. */
> mov x0, #0x33ff
> + cbz x2, set_cptr
> + orr x0, x0, #(3 << 20) // Don't trap FP

nit: If you make that define we discussed earlier you can use it here too

> +set_cptr:
> msr cptr_el2, x0 // Disable copro. traps to EL2
>
> #ifdef CONFIG_COMPAT
> @@ -521,6 +542,15 @@ CPU_LE( movk x0, #0x30d0, lsl #16 ) // Clear EE and E0E on LE systems
> /* Stage-2 translation */
> msr vttbr_el2, xzr
>
> + cbz x2, install_el2_stub
> +
> + setup_vtcr x4, x5
> +
> + mov w20, #BOOT_CPU_MODE_EL2 // This CPU booted in EL2
> + isb
> + ret
> +
> +install_el2_stub:
> /* Hypervisor stub */
> adrp x0, __hyp_stub_vectors
> add x0, x0, #:lo12:__hyp_stub_vectors

Christoffer Dall

unread,
Feb 1, 2016, 10:35:50 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On Mon, Jan 25, 2016 at 03:53:55PM +0000, Marc Zyngier wrote:
> Having both VHE and non-VHE capable CPUs in the same system
> is likely to be a recipe for disaster.
>
> If the boot CPU has VHE, but a secondary is not, we won't be
> able to downgrade and run the kernel at EL1. Add CPU hotplug
> to the mix, and this produces a terrifying mess.
>
> Let's solve the problem once and for all. If you mix VHE and
> non-VHE CPUs in the same system, you deserve to loose, and this
> patch makes sure you don't get a chance.
>
> This is implemented by storing the kernel execution level in
> a global variable. Secondaries will park themselves in a
> WFI loop if they observe a mismatch. Also, the primary CPU
> will detect that the secondary CPU has died on a mismatched
> execution level. Panic will follow.
>
> Signed-off-by: Marc Zyngier <marc.z...@arm.com>
> ---
> arch/arm64/include/asm/virt.h | 17 +++++++++++++++++
> arch/arm64/kernel/head.S | 19 +++++++++++++++++++
> arch/arm64/kernel/smp.c | 3 +++
> 3 files changed, 39 insertions(+)
>
> diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
> index 9f22dd6..f81a345 100644
> --- a/arch/arm64/include/asm/virt.h
> +++ b/arch/arm64/include/asm/virt.h
> @@ -36,6 +36,11 @@
> */
> extern u32 __boot_cpu_mode[2];
>
> +/*
> + * __run_cpu_mode records the mode the boot CPU uses for the kernel.
> + */
> +extern u32 __run_cpu_mode[2];
> +
> void __hyp_set_vectors(phys_addr_t phys_vector_base);
> phys_addr_t __hyp_get_vectors(void);
>
> @@ -60,6 +65,18 @@ static inline bool is_kernel_in_hyp_mode(void)
> return el == CurrentEL_EL2;
> }
>
> +static inline bool is_kernel_mode_mismatched(void)
> +{
> + /*
> + * A mismatched CPU will have written its own CurrentEL in
> + * __run_cpu_mode[1] (initially set to zero) after failing to
> + * match the value in __run_cpu_mode[0]. Thus, a non-zero
> + * value in __run_cpu_mode[1] is enough to detect the
> + * pathological case.
> + */
> + return !!ACCESS_ONCE(__run_cpu_mode[1]);
> +}
> +
> /* The section containing the hypervisor text */
> extern char __hyp_text_start[];
> extern char __hyp_text_end[];
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 2a7134c..bc44cf8 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -577,7 +577,23 @@ ENTRY(set_cpu_boot_mode_flag)
> 1: str w20, [x1] // This CPU has booted in EL1
> dmb sy
> dc ivac, x1 // Invalidate potentially stale cache line
> + adr_l x1, __run_cpu_mode
> + ldr w0, [x1]
> + mrs x20, CurrentEL
> + cbz x0, skip_el_check
> + cmp x0, x20
> + bne mismatched_el

can't you do a ret here instead of writing the same value and flushing
caches etc.?

> +skip_el_check: // Only the first CPU gets to set the rule
> + str w20, [x1]
> + dmb sy
> + dc ivac, x1 // Invalidate potentially stale cache line
> ret
> +mismatched_el:
> + str w20, [x1, #4]
> + dmb sy
> + dc ivac, x1 // Invalidate potentially stale cache line
> +1: wfi

I'm no expert on SMP bringup, but doesn't this prevent the CPU from
signaling completion and thus you'll never actually reach the checking
code in __cpu_up?

Thanks,
-Christoffer

> + b 1b
> ENDPROC(set_cpu_boot_mode_flag)
>
> /*
> @@ -592,6 +608,9 @@ ENDPROC(set_cpu_boot_mode_flag)
> ENTRY(__boot_cpu_mode)
> .long BOOT_CPU_MODE_EL2
> .long BOOT_CPU_MODE_EL1
> +ENTRY(__run_cpu_mode)
> + .long 0
> + .long 0
> .popsection
>
> /*
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index b1adc51..bc7650a 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -113,6 +113,9 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
> pr_crit("CPU%u: failed to come online\n", cpu);
> ret = -EIO;
> }
> +
> + if (is_kernel_mode_mismatched())
> + panic("CPU%u: incompatible execution level", cpu);
> } else {
> pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
> }
> --
> 2.1.4
>

Christoffer Dall

unread,
Feb 1, 2016, 10:36:50 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
I didn't bring the DT-binding-by-heart part of my brain to work today.

You're right, thanks.

-Christoffer

Catalin Marinas

unread,
Feb 1, 2016, 10:37:02 AM2/1/16
to Marc Zyngier, Christoffer Dall, k...@vger.kernel.org, Will Deacon, linux-...@vger.kernel.org, kvm...@lists.cs.columbia.edu, linux-ar...@lists.infradead.org
On Mon, Feb 01, 2016 at 01:34:16PM +0000, Marc Zyngier wrote:
> On 01/02/16 13:16, Christoffer Dall wrote:
> > On Mon, Jan 25, 2016 at 03:53:40PM +0000, Marc Zyngier wrote:
> >> diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
> >> index 93e8d983..9e0683f 100644
> >> --- a/arch/arm64/kvm/hyp/hyp-entry.S
> >> +++ b/arch/arm64/kvm/hyp/hyp-entry.S
> >> @@ -38,6 +38,32 @@
> >> ldp x0, x1, [sp], #16
> >> .endm
> >>
> >> +.macro do_el2_call
> >> + /*
> >> + * Shuffle the parameters before calling the function
> >> + * pointed to in x0. Assumes parameters in x[1,2,3].
> >> + */
> >> + stp lr, xzr, [sp, #-16]!
> >
> > remind me why this pair isn't just doing "str" instead of "stp" with the
> > xzr ?
>
> Because SP has to be aligned on a 16 bytes boundary at all times.

You could do something like:

sub sp, sp, #16
str lr, [sp]

--
Catalin

Christoffer Dall

unread,
Feb 1, 2016, 10:37:56 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
That sounds better to me.

Thanks,
-Christoffer

Christoffer Dall

unread,
Feb 1, 2016, 10:38:55 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On Mon, Feb 01, 2016 at 01:34:16PM +0000, Marc Zyngier wrote:
right, duh.

> >
> >> + mov lr, x0
> >> + mov x0, x1
> >> + mov x1, x2
> >> + mov x2, x3
> >> + blr lr
> >> + ldp lr, xzr, [sp], #16
> >> +.endm
> >> +
> >> +ENTRY(__vhe_hyp_call)
> >> + do_el2_call
> >> + /*
> >> + * We used to rely on having an exception return to get
> >> + * an implicit isb. In the E2H case, we don't have it anymore.
> >> + * rather than changing all the leaf functions, just do it here
> >> + * before returning to the rest of the kernel.
> >> + */
> >
> > why is this not the case with an ISB before do_el2_call then?
>
> That's a good point. I guess the safest thing to do would be to add one,
> but looking at the various functions we call, I don't see any that could
> go wrong by not having a ISB in their prologue.
>
> Or maybe you've identified such a case?
>
I think I argued on Mario's VFP patches that we could rely on an ISB
before the hyp call, but they're not merged yet, so, hey...

-Christoffer

Christoffer Dall

unread,
Feb 1, 2016, 10:40:37 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
no, what I was suggesting was to always take a specific type and change
the callers that don't match to use a specific cast so you give the
compiler a chance to scream at you when writing new code.

But I don't feel strongly about it.

Thanks,
-Christoffer

Christoffer Dall

unread,
Feb 1, 2016, 10:43:15 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
I don't know why I didn't find this.

So "Integer constant zero " means, this may be zero. Right.
I wasn't suggesting you do anything different, I just put the stupid
comment there so you knew that I actually checked the values and so that
I didn't forget that I did, and ended up doing it again if I look at
this later...

> >
> >> +#define spsr_EL12 sys_reg(3, 5, 4, 0, 0)
> >> +#define elr_EL12 sys_reg(3, 5, 4, 0, 1)
> >> +
> >
> > I couldn't quite decipher the spec as to how these are the right
> > instruction encodings, so I'm going to trust the testing that this is
> > done right.
>
> If you have access to the spec, you have to play a substitution game
> between the canonical encoding of the register accessed, and the
> register used. For example:
>
> SPSR_EL1 (3, 0, 4, 0, 0) -> SPSR_EL12 (3, 5, 4, 0, 0)
>
> In practice, only Op1 changes.
>
That's what I assumed, thanks for confirming.

-Christoffer

Marc Zyngier

unread,
Feb 1, 2016, 11:20:52 AM2/1/16
to Catalin Marinas, Christoffer Dall, k...@vger.kernel.org, Will Deacon, linux-...@vger.kernel.org, kvm...@lists.cs.columbia.edu, linux-ar...@lists.infradead.org
Ah, fair enough. I'll fold that in.

Christoffer Dall

unread,
Feb 1, 2016, 11:25:08 AM2/1/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On Mon, Jan 25, 2016 at 03:53:34PM +0000, Marc Zyngier wrote:
> ARMv8.1 comes with the "Virtualization Host Extension" (VHE for
> short), which enables simpler support of Type-2 hypervisors.
>
> This extension allows the kernel to directly run at EL2, and
> significantly reduces the number of system registers shared between
> host and guest, reducing the overhead of virtualization.
>
> In order to have the same kernel binary running on all versions of the
> architecture, this series makes heavy use of runtime code patching.
>
> The first 20 patches massage the KVM code to deal with VHE and enable
> Linux to run at EL2. The last patch catches an ugly case when VHE
> capable CPUs are paired with some of their less capable siblings. This
> should never happen, but hey...
>
> I have deliberately left out some of the more "advanced"
> optimizations, as they are likely to distract the reviewer from the
> core infrastructure, which is what I care about at the moment.
>
> A few things to note:
>
> - Given that the code has been almost entierely rewritten, I've
> dropped all Acks from the new patches
>
> - GDB is currently busted on VHE systems, as it checks for version 6
> on the debug architecture, while VHE is version 7. The binutils
> people are on the case.
>
> This has been tested on the FVP_Base_SLV-V8-A model, and based on
> v4.5-rc1. I've put a branch out on:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/vhe
>
> * From v1:
> - Full rewrite now that the World Switch is written in C code.
> - Dropped the "early IRQ handling" for the moment.
>
> Marc Zyngier (21):
> arm/arm64: Add new is_kernel_in_hyp_mode predicate
> arm64: Allow the arch timer to use the HYP timer
> arm64: Add ARM64_HAS_VIRT_HOST_EXTN feature
> arm64: KVM: Skip HYP setup when already running in HYP
> arm64: KVM: VHE: Turn VTCR_EL2 setup into a reusable macro
> arm64: KVM: VHE: Patch out use of HVC
> arm64: KVM: VHE: Patch out kern_hyp_va
> arm64: KVM: VHE: Introduce unified system register accessors
> arm64: KVM: VHE: Differenciate host/guest sysreg save/restore
> arm64: KVM: VHE: Split save/restore of sysregs shared between EL1 and
> EL2
> arm64: KVM: VHE: Use unified system register accessors
> arm64: KVM: VHE: Enable minimal sysreg save/restore
> arm64: KVM: VHE: Make __fpsimd_enabled VHE aware
> arm64: KVM: VHE: Implement VHE activate/deactivate_traps
> arm64: KVM: VHE: Use unified sysreg accessors for timer
> arm64: KVM: VHE: Add fpsimd enabling on guest access
> arm64: KVM: VHE: Add alternative panic handling
> arm64: KVM: Introduce hyp_alternate_value helper
> arm64: KVM: Move most of the fault decoding to C
> arm64: VHE: Add support for running Linux in EL2 mode
> arm64: Panic when VHE and non VHE CPUs coexist

These patches generally look awesome!

I found some trailing white space in patch 6 and 7 that you can fix up
if you care to.

Thanks,
-Christoffer

Ard Biesheuvel

unread,
Feb 1, 2016, 12:08:39 PM2/1/16
to Marc Zyngier, Catalin Marinas, KVM devel mailing list, Will Deacon, linux-...@vger.kernel.org, Christoffer Dall, kvm...@lists.cs.columbia.edu, linux-ar...@lists.infradead.org
Since we're micro-reviewing: what's wrong with

str lr, [sp, #-16]!

?

Marc Zyngier

unread,
Feb 1, 2016, 12:28:47 PM2/1/16
to Ard Biesheuvel, Catalin Marinas, KVM devel mailing list, Will Deacon, linux-...@vger.kernel.org, Christoffer Dall, kvm...@lists.cs.columbia.edu, linux-ar...@lists.infradead.org
I suspect that on most micro-architectures, a register writeback is
going to be slower than doing the sub independently.

I may be wrong, though.

Marc Zyngier

unread,
Feb 2, 2016, 4:46:19 AM2/2/16
to Christoffer Dall, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On 01/02/16 13:54, Christoffer Dall wrote:
> On Mon, Jan 25, 2016 at 03:53:44PM +0000, Marc Zyngier wrote:
>> A handful of system registers are still shared between EL1 and EL2,
>> even while using VHE. These are tpidr*_el[01], actlr_el1, sp0, elr,
>> and spsr.
>
> So by shared registers you mean registers that do both have an EL0/1
> version as well as an EL2 version, but where accesses aren't rewritten
> transparently?

No, I mean that these registers do *not* have a separate banked version.
There is only a single set of registers, which have to be save/restored
the old way.

>
> also, by sp0 do you mean sp_el0, and by elr you mean elr_el1, and by
> spsr you mean spsr_el1 ?

sp0 -> sp_el0 indeed. elr and spsr really are the guest PC and PSTATE,
so I should really reword this commit message, it is utterly confusing.

>
>>
>> In order to facilitate the introduction of a VHE-specific sysreg
>> save/restore, make move the access to these registers to their
>> own save/restore functions.
>>
>> No functionnal change.
>
> Otherwise:
>
> Reviewed-by: Christoffer Dall <christof...@linaro.org>

Thanks,

Marc Zyngier

unread,
Feb 2, 2016, 6:27:20 AM2/2/16
to Christoffer Dall, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
Sure.
I'm going for HCR_HOST_VHE_FLAGS, as a pendant to HCR_GUEST_FLAGS.

Marc Zyngier

unread,
Feb 2, 2016, 8:42:19 AM2/2/16
to Christoffer Dall, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
I can easily factor out the whole asm volatile part. What I'm trying to
avoid is an additional function call, but maybe we shouldn't need to
worry about the overhead on page faults altogether?

I'll drop it for now, and we can reconsider it later.

Marc Zyngier

unread,
Feb 2, 2016, 9:24:28 AM2/2/16
to Christoffer Dall, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
That's what the "... and just patch the odd ESR_EL2 access remaining in
hyp-entry.S." meant. Would you prefer this as a separate patch?
Not only. At that point, we still haven't saved the vcpu sysregs, so we
most save/restore it in order to save it later for good. Not the fastest
thing, but I guess that everything sucks so much when we take a page
fault that it really doesn't matter.
Right. So I've applied most of your comments directly, because they
definitely made sense. let's see how it looks on round 3.

Thanks,

Marc Zyngier

unread,
Feb 2, 2016, 10:32:16 AM2/2/16
to Christoffer Dall, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
Yes, good point.

>
>> +skip_el_check: // Only the first CPU gets to set the rule
>> + str w20, [x1]
>> + dmb sy
>> + dc ivac, x1 // Invalidate potentially stale cache line
>> ret
>> +mismatched_el:
>> + str w20, [x1, #4]
>> + dmb sy
>> + dc ivac, x1 // Invalidate potentially stale cache line
>> +1: wfi
>
> I'm no expert on SMP bringup, but doesn't this prevent the CPU from
> signaling completion and thus you'll never actually reach the checking
> code in __cpu_up?

Indeed, and that's the whole point. The primary CPU will notice that the
secondary CPU has failed to boot (timeout), and will find the reason in
__run_cpu_mode.

Christoffer Dall

unread,
Feb 2, 2016, 10:41:55 AM2/2/16
to Marc Zyngier, Ard Biesheuvel, Catalin Marinas, KVM devel mailing list, Will Deacon, linux-...@vger.kernel.org, kvm...@lists.cs.columbia.edu, linux-ar...@lists.infradead.org
For the record, I don't mind it the way it was in the original patch
either, I was just curious about the store of xzr and had forgottten the
stack alignment thing.

-Christoffer

Christoffer Dall

unread,
Feb 2, 2016, 10:46:26 AM2/2/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On Tue, Feb 02, 2016 at 09:46:05AM +0000, Marc Zyngier wrote:
> On 01/02/16 13:54, Christoffer Dall wrote:
> > On Mon, Jan 25, 2016 at 03:53:44PM +0000, Marc Zyngier wrote:
> >> A handful of system registers are still shared between EL1 and EL2,
> >> even while using VHE. These are tpidr*_el[01], actlr_el1, sp0, elr,
> >> and spsr.
> >
> > So by shared registers you mean registers that do both have an EL0/1
> > version as well as an EL2 version, but where accesses aren't rewritten
> > transparently?
>
> No, I mean that these registers do *not* have a separate banked version.
> There is only a single set of registers, which have to be save/restored
> the old way.

huh, ARMv8 clearly specifies the existence of TPIDR_EL0, TPIDR_EL1, and
TPIDR_EL2, for example.

I cannot seem to find anywhere in the VHE spec that says that the
TPIDR_EL2 goes away. I'm confused now.

>
> >
> > also, by sp0 do you mean sp_el0, and by elr you mean elr_el1, and by
> > spsr you mean spsr_el1 ?
>
> sp0 -> sp_el0 indeed. elr and spsr really are the guest PC and PSTATE,
> so I should really reword this commit message, it is utterly confusing.
>
I guess I don't understand the definition of a 'shared' register given
your comments here...

Thanks,
-Christoffer

Christoffer Dall

unread,
Feb 2, 2016, 10:47:24 AM2/2/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
Sounds good to me.

Thanks,
-Christoffer

Christoffer Dall

unread,
Feb 2, 2016, 10:50:04 AM2/2/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
I guess I stopped reading at ", and" for some reason, sorry.

It's fine to keep it in this patch.
right. I was a bit confued by reading this code in the C version, but
on the other hand, this code is the kind of code you shouldn't try to
think you can easily understand, but you really have to know what you're
doing here, so perhaps I'm creating too much fuss for nothing.
ok, sounds great.

Thanks,
-Christoffer

Marc Zyngier

unread,
Feb 2, 2016, 11:19:54 AM2/2/16
to Christoffer Dall, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
On 02/02/16 15:46, Christoffer Dall wrote:
> On Tue, Feb 02, 2016 at 09:46:05AM +0000, Marc Zyngier wrote:
>> On 01/02/16 13:54, Christoffer Dall wrote:
>>> On Mon, Jan 25, 2016 at 03:53:44PM +0000, Marc Zyngier wrote:
>>>> A handful of system registers are still shared between EL1 and EL2,
>>>> even while using VHE. These are tpidr*_el[01], actlr_el1, sp0, elr,
>>>> and spsr.
>>>
>>> So by shared registers you mean registers that do both have an EL0/1
>>> version as well as an EL2 version, but where accesses aren't rewritten
>>> transparently?
>>
>> No, I mean that these registers do *not* have a separate banked version.
>> There is only a single set of registers, which have to be save/restored
>> the old way.
>
> huh, ARMv8 clearly specifies the existence of TPIDR_EL0, TPIDR_EL1, and
> TPIDR_EL2, for example.
>
> I cannot seem to find anywhere in the VHE spec that says that the
> TPIDR_EL2 goes away. I'm confused now.

Nothing goes away, but these registers do not get renamed either. For
example, TPIDR_EL1 doesn't magically access TPIDR_EL2 when running at
EL2+VHE, and there is no TPIDR_EL12 accessor either.

So TPIDR_EL1 is effectively "shared" between host and guest, and must be
save/restored (note that the host kernel still uses TIPDR_EL1 even when
running with VHE, and that KVM still uses TPIDR_EL2 to cache the current
vcpu).

>>
>>>
>>> also, by sp0 do you mean sp_el0, and by elr you mean elr_el1, and by
>>> spsr you mean spsr_el1 ?
>>
>> sp0 -> sp_el0 indeed. elr and spsr really are the guest PC and PSTATE,
>> so I should really reword this commit message, it is utterly confusing.
>>
> I guess I don't understand the definition of a 'shared' register given
> your comments here...

Does this make it clearer?

Christoffer Dall

unread,
Feb 2, 2016, 3:07:01 PM2/2/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
ok, I can understand as long as we're saying a register is shared
between the host and the guest, but it was the "registers are shared
between EL1 and EL2" that threw me off.

> >>
> >>>
> >>> also, by sp0 do you mean sp_el0, and by elr you mean elr_el1, and by
> >>> spsr you mean spsr_el1 ?
> >>
> >> sp0 -> sp_el0 indeed. elr and spsr really are the guest PC and PSTATE,
> >> so I should really reword this commit message, it is utterly confusing.
> >>
> > I guess I don't understand the definition of a 'shared' register given
> > your comments here...
>
> Does this make it clearer?
>
yes. You could change the host to path it when using VHE to use
TPIDR_EL2 if you wanted and store the vcpu pointer on the stack while
running the guest, but there's probably no real benefit of doing so.

I'll be shutting up now...

Thanks,
-Christoffer

Christoffer Dall

unread,
Feb 3, 2016, 3:50:05 AM2/3/16
to Marc Zyngier, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
That wasn't exactly my point. If I understand correctly and __cpu_up is
the primary CPU executing a function to bring up a secondary core, then
it will wait for the cpu_running completion which should be signalled by
the secondary core, but because the secondary core never makes any
progress it will timeout the wait for completion and you will see that
error "..failed to come online" instead of the "incompatible execution
level".

(This is based on my reading of the code as the completion is signalled
in secondary_start_kernl which happens after this stuff above in
head.S).

-Christoffer

Marc Zyngier

unread,
Feb 3, 2016, 12:45:59 PM2/3/16
to Christoffer Dall, Catalin Marinas, Will Deacon, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
It will actually do both. Here's an example on the model configured for
such a braindead case:

CPU4: failed to come online
Kernel panic - not syncing: CPU4: incompatible execution level
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.5.0-rc2+ #5459
Hardware name: FVP Base (DT)
Call trace:
[<ffffffc0000899e0>] dump_backtrace+0x0/0x180
[<ffffffc000089b74>] show_stack+0x14/0x20
[<ffffffc000333b08>] dump_stack+0x90/0xc8
[<ffffffc00014d424>] panic+0x10c/0x250
[<ffffffc00008ef24>] __cpu_up+0xfc/0x100
[<ffffffc0000b7a9c>] _cpu_up+0x154/0x188
[<ffffffc0000b7b54>] cpu_up+0x84/0xa8
[<ffffffc0009e9d00>] smp_init+0xbc/0xc0
[<ffffffc0009dca10>] kernel_init_freeable+0x94/0x1ec
[<ffffffc000712f90>] kernel_init+0x10/0xe0
[<ffffffc000085cd0>] ret_from_fork+0x10/0x40

Am I missing something *really* obvious?

Marc Zyngier

unread,
Feb 3, 2016, 1:00:36 PM2/3/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
ARMv8.1 comes with the "Virtualization Host Extension" (VHE for
short), which enables simpler support of Type-2 hypervisors.

This extension allows the kernel to directly run at EL2, and
significantly reduces the number of system registers shared between
host and guest, reducing the overhead of virtualization.

In order to have the same kernel binary running on all versions of the
architecture, this series makes heavy use of runtime code patching.

The first 22 patches massage the KVM code to deal with VHE and enable
Linux to run at EL2. The last patch catches an ugly case when VHE
capable CPUs are paired with some of their less capable siblings. This
should never happen, but hey...

I have deliberately left out some of the more "advanced"
optimizations, as they are likely to distract the reviewer from the
core infrastructure, which is what I care about at the moment.

Note: GDB is currently busted on VHE systems, as it checks for version
6 on the debug architecture, while VHE is version 7. The
binutils people are on the case.

This has been tested on the FVP_Base_SLV-V8-A model, and based on
v4.5-rc2. I've put a branch out on:

git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/vhe

* From v2:
- Added support for perf to count kernel events in EL2
- Added support for EL2 breakpoints
- Moved the VTCR_EL2 setup from assembly to C
- Made the fault handling easier to understand (hopefuly)
- Plenty of smaller fixups

* From v1:
- Full rewrite now that the World Switch is written in C code.
- Dropped the "early IRQ handling" for the moment.

Marc Zyngier (23):
arm/arm64: KVM: Add hook for C-based stage2 init
arm64: KVM: Switch to C-based stage2 init
arm/arm64: Add new is_kernel_in_hyp_mode predicate
arm64: Allow the arch timer to use the HYP timer
arm64: Add ARM64_HAS_VIRT_HOST_EXTN feature
arm64: KVM: Skip HYP setup when already running in HYP
arm64: KVM: VHE: Patch out use of HVC
arm64: KVM: VHE: Patch out kern_hyp_va
arm64: KVM: VHE: Introduce unified system register accessors
arm64: KVM: VHE: Differenciate host/guest sysreg save/restore
arm64: KVM: VHE: Split save/restore of registers shared between guest
and host
arm64: KVM: VHE: Use unified system register accessors
arm64: KVM: VHE: Enable minimal sysreg save/restore
arm64: KVM: VHE: Make __fpsimd_enabled VHE aware
arm64: KVM: VHE: Implement VHE activate/deactivate_traps
arm64: KVM: VHE: Use unified sysreg accessors for timer
arm64: KVM: VHE: Add fpsimd enabling on guest access
arm64: KVM: VHE: Add alternative panic handling
arm64: KVM: Move most of the fault decoding to C
arm64: perf: Count EL2 events if the kernel is running in HYP
arm64: hw_breakpoint: Allow EL2 breakpoints if running in HYP
arm64: VHE: Add support for running Linux in EL2 mode
arm64: Panic when VHE and non VHE CPUs coexist

arch/arm/include/asm/kvm_host.h | 4 +
arch/arm/include/asm/virt.h | 5 +
arch/arm/kvm/arm.c | 174 +++++++++++++++++++----------
arch/arm/kvm/mmu.c | 7 ++
arch/arm64/Kconfig | 13 +++
arch/arm64/include/asm/cpufeature.h | 3 +-
arch/arm64/include/asm/hw_breakpoint.h | 49 ++++++---
arch/arm64/include/asm/kvm_arm.h | 6 +-
arch/arm64/include/asm/kvm_asm.h | 2 +
arch/arm64/include/asm/kvm_emulate.h | 3 +
arch/arm64/include/asm/kvm_host.h | 6 +
arch/arm64/include/asm/kvm_mmu.h | 12 +-
arch/arm64/include/asm/virt.h | 27 +++++
arch/arm64/kernel/asm-offsets.c | 3 -
arch/arm64/kernel/cpufeature.c | 11 ++
arch/arm64/kernel/head.S | 48 +++++++-
arch/arm64/kernel/perf_event.c | 14 ++-
arch/arm64/kernel/smp.c | 3 +
arch/arm64/kvm/hyp-init.S | 18 ---
arch/arm64/kvm/hyp.S | 7 ++
arch/arm64/kvm/hyp/Makefile | 1 +
arch/arm64/kvm/hyp/entry.S | 6 +
arch/arm64/kvm/hyp/hyp-entry.S | 109 ++++++------------
arch/arm64/kvm/hyp/hyp.h | 108 ++++++++++++++++--
arch/arm64/kvm/hyp/s2-setup.c | 44 ++++++++
arch/arm64/kvm/hyp/switch.c | 196 ++++++++++++++++++++++++++++++---
arch/arm64/kvm/hyp/sysreg-sr.c | 147 ++++++++++++++++---------
arch/arm64/kvm/hyp/timer-sr.c | 10 +-
drivers/clocksource/arm_arch_timer.c | 96 +++++++++-------
29 files changed, 837 insertions(+), 295 deletions(-)
create mode 100644 arch/arm64/kvm/hyp/s2-setup.c

--
2.1.4

Marc Zyngier

unread,
Feb 3, 2016, 1:00:41 PM2/3/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
With the ARMv8.1 VHE, the kernel can run in HYP mode, and thus
use the HYP timer instead of the normal guest timer in a mostly
transparent way, except for the interrupt line.

This patch reworks the arch timer code to allow the selection of
the HYP PPI, possibly falling back to the guest timer if not
available.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
+ }

- if (!arch_timer_ppi[PHYS_SECURE_PPI] ||
- !arch_timer_ppi[PHYS_NONSECURE_PPI]) {
+ if (!has_ppi) {
pr_warn("arch_timer: No interrupt available, giving up\n");
return;
}
@@ -735,7 +757,7 @@ static void __init arch_timer_of_init(struct device_node *np)
*/
if (IS_ENABLED(CONFIG_ARM) &&
of_property_read_bool(np, "arm,cpu-registers-not-fw-configured"))
- arch_timer_use_virtual = false;
+ arch_timer_uses_ppi = PHYS_SECURE_PPI;

arch_timer_init();
}
--
2.1.4

Marc Zyngier

unread,
Feb 3, 2016, 1:00:49 PM2/3/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
With the kernel running at EL2, there is no point trying to
configure page tables for HYP, as the kernel is already mapped.

Take this opportunity to refactor the whole init a bit, allowing
the various parts of the hypervisor bringup to be split across
multiple functions.

Reviewed-by: Christoffer Dall <christof...@linaro.org>
Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm/kvm/arm.c | 173 +++++++++++++++++++++++++++++++++++------------------
arch/arm/kvm/mmu.c | 7 +++
2 files changed, 121 insertions(+), 59 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 6b76e01..58f89e3 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -967,6 +967,11 @@ long kvm_arch_vm_ioctl(struct file *filp,
}
}

+static void cpu_init_stage2(void *dummy)
+{
+ __cpu_init_stage2();
+}
+
static void cpu_init_hyp_mode(void *dummy)
{
phys_addr_t boot_pgd_ptr;
@@ -1036,6 +1041,82 @@ static inline void hyp_cpu_pm_init(void)
}
#endif

+static void teardown_common_resources(void)
+{
+ free_percpu(kvm_host_cpu_state);
+}
+
+static int init_common_resources(void)
+{
+ kvm_host_cpu_state = alloc_percpu(kvm_cpu_context_t);
+ if (!kvm_host_cpu_state) {
+ kvm_err("Cannot allocate host CPU state\n");
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+static int init_subsystems(void)
+{
+ int err;
+
+ /*
+ * Init HYP view of VGIC
+ */
+ err = kvm_vgic_hyp_init();
+ switch (err) {
+ case 0:
+ vgic_present = true;
+ break;
+ case -ENODEV:
+ case -ENXIO:
+ vgic_present = false;
+ break;
+ default:
+ return err;
+ }
+
+ /*
+ * Init HYP architected timer support
+ */
+ err = kvm_timer_hyp_init();
+ if (err)
+ return err;
+
+ kvm_perf_init();
+ kvm_coproc_table_init();
+
+ return 0;
+}
+
+static void teardown_hyp_mode(void)
+{
+ int cpu;
+
+ if (is_kernel_in_hyp_mode())
+ return;
+
+ free_hyp_pgds();
+ for_each_possible_cpu(cpu)
+ free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
+}
+
+static int init_vhe_mode(void)
+{
+ /*
+ * Execute the init code on each CPU.
+ */
+ on_each_cpu(cpu_init_stage2, NULL, 1);
+
+ /* set size of VMID supported by CPU */
+ kvm_vmid_bits = kvm_get_vmid_bits();
+ kvm_info("%d-bit VMID\n", kvm_vmid_bits);
+
+ kvm_info("VHE mode initialized successfully\n");
+ return 0;
+}
+
/**
* Inits Hyp-mode on all online CPUs
*/
@@ -1066,7 +1147,7 @@ static int init_hyp_mode(void)
stack_page = __get_free_page(GFP_KERNEL);
if (!stack_page) {
err = -ENOMEM;
- goto out_free_stack_pages;
+ goto out_err;
}

per_cpu(kvm_arm_hyp_stack_page, cpu) = stack_page;
@@ -1078,13 +1159,13 @@ static int init_hyp_mode(void)
err = create_hyp_mappings(__kvm_hyp_code_start, __kvm_hyp_code_end);
if (err) {
kvm_err("Cannot map world-switch code\n");
- goto out_free_mappings;
+ goto out_err;
}

err = create_hyp_mappings(__start_rodata, __end_rodata);
if (err) {
kvm_err("Cannot map rodata section\n");
- goto out_free_mappings;
+ goto out_err;
}

/*
@@ -1096,20 +1177,10 @@ static int init_hyp_mode(void)

if (err) {
kvm_err("Cannot map hyp stack\n");
- goto out_free_mappings;
+ goto out_err;
}
}

- /*
- * Map the host CPU structures
- */
- kvm_host_cpu_state = alloc_percpu(kvm_cpu_context_t);
- if (!kvm_host_cpu_state) {
- err = -ENOMEM;
- kvm_err("Cannot allocate host CPU state\n");
- goto out_free_mappings;
- }
-
for_each_possible_cpu(cpu) {
kvm_cpu_context_t *cpu_ctxt;

@@ -1118,7 +1189,7 @@ static int init_hyp_mode(void)

if (err) {
kvm_err("Cannot map host CPU state: %d\n", err);
- goto out_free_context;
+ goto out_err;
}
}

@@ -1127,34 +1198,22 @@ static int init_hyp_mode(void)
*/
on_each_cpu(cpu_init_hyp_mode, NULL, 1);

- /*
- * Init HYP view of VGIC
- */
- err = kvm_vgic_hyp_init();
- switch (err) {
- case 0:
- vgic_present = true;
- break;
- case -ENODEV:
- case -ENXIO:
- vgic_present = false;
- break;
- default:
- goto out_free_context;
- }
-
- /*
- * Init HYP architected timer support
- */
- err = kvm_timer_hyp_init();
- if (err)
- goto out_free_context;
-
#ifndef CONFIG_HOTPLUG_CPU
free_boot_hyp_pgd();
#endif

- kvm_perf_init();
+ cpu_notifier_register_begin();
+
+ err = __register_cpu_notifier(&hyp_init_cpu_nb);
+
+ cpu_notifier_register_done();
+
+ if (err) {
+ kvm_err("Cannot register HYP init CPU notifier (%d)\n", err);
+ goto out_err;
+ }
+
+ hyp_cpu_pm_init();

/* set size of VMID supported by CPU */
kvm_vmid_bits = kvm_get_vmid_bits();
@@ -1163,14 +1222,9 @@ static int init_hyp_mode(void)
kvm_info("Hyp mode initialized successfully\n");

return 0;
-out_free_context:
- free_percpu(kvm_host_cpu_state);
-out_free_mappings:
- free_hyp_pgds();
-out_free_stack_pages:
- for_each_possible_cpu(cpu)
- free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
+
out_err:
+ teardown_hyp_mode();
kvm_err("error initializing Hyp mode: %d\n", err);
return err;
}
@@ -1214,26 +1268,27 @@ int kvm_arch_init(void *opaque)
}
}

- cpu_notifier_register_begin();
-
- err = init_hyp_mode();
+ err = init_common_resources();
if (err)
- goto out_err;
+ return err;

- err = __register_cpu_notifier(&hyp_init_cpu_nb);
- if (err) {
- kvm_err("Cannot register HYP init CPU notifier (%d)\n", err);
+ if (is_kernel_in_hyp_mode())
+ err = init_vhe_mode();
+ else
+ err = init_hyp_mode();
+ if (err)
goto out_err;
- }
-
- cpu_notifier_register_done();

- hyp_cpu_pm_init();
+ err = init_subsystems();
+ if (err)
+ goto out_hyp;

- kvm_coproc_table_init();
return 0;
+
+out_hyp:
+ teardown_hyp_mode();
out_err:
- cpu_notifier_register_done();
+ teardown_common_resources();
return err;
}

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index aba61fd..920d0c3 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -28,6 +28,7 @@
#include <asm/kvm_mmio.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_emulate.h>
+#include <asm/virt.h>

#include "trace.h"

@@ -598,6 +599,9 @@ int create_hyp_mappings(void *from, void *to)
unsigned long start = KERN_TO_HYP((unsigned long)from);
unsigned long end = KERN_TO_HYP((unsigned long)to);

+ if (is_kernel_in_hyp_mode())
+ return 0;
+
start = start & PAGE_MASK;
end = PAGE_ALIGN(end);

@@ -630,6 +634,9 @@ int create_hyp_io_mappings(void *from, void *to, phys_addr_t phys_addr)
unsigned long start = KERN_TO_HYP((unsigned long)from);
unsigned long end = KERN_TO_HYP((unsigned long)to);

+ if (is_kernel_in_hyp_mode())
+ return 0;
+
/* Check for a valid kernel IO mapping */
if (!is_vmalloc_addr(from) || !is_vmalloc_addr(to - 1))
return -EINVAL;
--
2.1.4

Marc Zyngier

unread,
Feb 3, 2016, 1:00:54 PM2/3/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
As non-VHE and VHE have different ways to express the trapping of
FPSIMD registers to EL2, make __fpsimd_enabled a patchable predicate
and provide a VHE implementation.

Reviewed-by: Christoffer Dall <christof...@linaro.org>
Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/include/asm/kvm_arm.h | 3 +++
arch/arm64/kvm/hyp/hyp.h | 5 +----
arch/arm64/kvm/hyp/switch.c | 19 +++++++++++++++++++
3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 738a95f..498335e 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -217,4 +217,7 @@
ECN(SOFTSTP_CUR), ECN(WATCHPT_LOW), ECN(WATCHPT_CUR), \
ECN(BKPT32), ECN(VECTOR32), ECN(BRK64)

+#define CPACR_EL1_FPEN (3 << 20)
+#define CPACR_EL1_TTA (1 << 28)
+
#endif /* __ARM64_KVM_ARM_H__ */
diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
index 5dfa883..44eaff7 100644
--- a/arch/arm64/kvm/hyp/hyp.h
+++ b/arch/arm64/kvm/hyp/hyp.h
@@ -171,10 +171,7 @@ void __debug_cond_restore_host_state(struct kvm_vcpu *vcpu);

void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
-static inline bool __fpsimd_enabled(void)
-{
- return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
-}
+bool __fpsimd_enabled(void);

u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt);
void __noreturn __hyp_do_panic(unsigned long, ...);
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 9071dee..0db161e 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -17,6 +17,25 @@

#include "hyp.h"

+static bool __hyp_text __fpsimd_enabled_nvhe(void)
+{
+ return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
+}
+
+static bool __hyp_text __fpsimd_enabled_vhe(void)
+{
+ return !!(read_sysreg(cpacr_el1) & CPACR_EL1_FPEN);
+}
+
+static hyp_alternate_select(__fpsimd_is_enabled,
+ __fpsimd_enabled_nvhe, __fpsimd_enabled_vhe,
+ ARM64_HAS_VIRT_HOST_EXTN);
+
+bool __hyp_text __fpsimd_enabled(void)
+{
+ return __fpsimd_is_enabled()();
+}
+
static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
{
u64 val;
--
2.1.4

Marc Zyngier

unread,
Feb 3, 2016, 1:01:06 PM2/3/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
With ARMv8.1 VHE, the architecture is able to (almost) transparently
run the kernel at EL2, despite being written for EL1.

This patch takes care of the "almost" part, mostly preventing the kernel
from dropping from EL2 to EL1, and setting up the HYP configuration.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/Kconfig | 13 +++++++++++++
arch/arm64/kernel/head.S | 28 +++++++++++++++++++++++++++-
2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 8cc6228..cf118d9 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -750,6 +750,19 @@ config ARM64_LSE_ATOMICS
not support these instructions and requires the kernel to be
built with binutils >= 2.25.

+config ARM64_VHE
+ bool "Enable support for Virtualization Host Extensions (VHE)"
+ default y
+ help
+ Virtualization Host Extensions (VHE) allow the kernel to run
+ directly at EL2 (instead of EL1) on processors that support
+ it. This leads to better performance for KVM, as they reduce
+ the cost of the world switch.
+
+ Selecting this option allows the VHE feature to be detected
+ at runtime, and does not affect processors that do not
+ implement this feature.
+
endmenu

endmenu
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 917d981..6f2f377 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -30,6 +30,7 @@
#include <asm/cache.h>
#include <asm/cputype.h>
#include <asm/kernel-pgtable.h>
+#include <asm/kvm_arm.h>
#include <asm/memory.h>
#include <asm/pgtable-hwdef.h>
#include <asm/pgtable.h>
@@ -464,9 +465,27 @@ CPU_LE( bic x0, x0, #(3 << 24) ) // Clear the EE and E0E bits for EL1
isb
ret

+2:
+#ifdef CONFIG_ARM64_VHE
+ /*
+ * Check for VHE being present. For the rest of the EL2 setup,
+ * x2 being non-zero indicates that we do have VHE, and that the
+ * kernel is intended to run at EL2.
+ */
+ mrs x2, id_aa64mmfr1_el1
+ ubfx x2, x2, #8, #4
+#else
+ mov x2, xzr
+#endif
+
/* Hyp configuration. */
-2: mov x0, #(1 << 31) // 64-bit EL1
+ mov x0, #HCR_RW // 64-bit EL1
+ cbz x2, set_hcr
+ orr x0, x0, #HCR_TGE // Enable Host Extensions
+ orr x0, x0, #HCR_E2H
+set_hcr:
msr hcr_el2, x0
+ isb

/* Generic timers. */
mrs x0, cnthctl_el2
@@ -526,6 +545,13 @@ CPU_LE( movk x0, #0x30d0, lsl #16 ) // Clear EE and E0E on LE systems
/* Stage-2 translation */
msr vttbr_el2, xzr

+ cbz x2, install_el2_stub
+
+ mov w20, #BOOT_CPU_MODE_EL2 // This CPU booted in EL2
+ isb
+ ret
+
+install_el2_stub:
/* Hypervisor stub */
adrp x0, __hyp_stub_vectors
add x0, x0, #:lo12:__hyp_stub_vectors
--
2.1.4

Marc Zyngier

unread,
Feb 3, 2016, 1:01:19 PM2/3/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
Having both VHE and non-VHE capable CPUs in the same system
is likely to be a recipe for disaster.

If the boot CPU has VHE, but a secondary is not, we won't be
able to downgrade and run the kernel at EL1. Add CPU hotplug
to the mix, and this produces a terrifying mess.

Let's solve the problem once and for all. If you mix VHE and
non-VHE CPUs in the same system, you deserve to loose, and this
patch makes sure you don't get a chance.

This is implemented by storing the kernel execution level in
a global variable. Secondaries will park themselves in a
WFI loop if they observe a mismatch. Also, the primary CPU
will detect that the secondary CPU has died on a mismatched
execution level. Panic will follow.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/include/asm/virt.h | 17 +++++++++++++++++
arch/arm64/kernel/head.S | 20 ++++++++++++++++++++
arch/arm64/kernel/smp.c | 3 +++
3 files changed, 40 insertions(+)

diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index 9f22dd6..f81a345 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -36,6 +36,11 @@
*/
extern u32 __boot_cpu_mode[2];

+/*
+ * __run_cpu_mode records the mode the boot CPU uses for the kernel.
+ */
+extern u32 __run_cpu_mode[2];
+
void __hyp_set_vectors(phys_addr_t phys_vector_base);
phys_addr_t __hyp_get_vectors(void);

@@ -60,6 +65,18 @@ static inline bool is_kernel_in_hyp_mode(void)
return el == CurrentEL_EL2;
}

+static inline bool is_kernel_mode_mismatched(void)
+{
+ /*
+ * A mismatched CPU will have written its own CurrentEL in
+ * __run_cpu_mode[1] (initially set to zero) after failing to
+ * match the value in __run_cpu_mode[0]. Thus, a non-zero
+ * value in __run_cpu_mode[1] is enough to detect the
+ * pathological case.
+ */
+ return !!ACCESS_ONCE(__run_cpu_mode[1]);
+}
+
/* The section containing the hypervisor text */
extern char __hyp_text_start[];
extern char __hyp_text_end[];
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 6f2f377..f9b6a5b 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -578,7 +578,24 @@ ENTRY(set_cpu_boot_mode_flag)
1: str w20, [x1] // This CPU has booted in EL1
dmb sy
dc ivac, x1 // Invalidate potentially stale cache line
+ adr_l x1, __run_cpu_mode
+ ldr w0, [x1]
+ mrs x20, CurrentEL
+ cbz x0, skip_el_check
+ cmp x0, x20
+ bne mismatched_el
ret
+skip_el_check: // Only the first CPU gets to set the rule
+ str w20, [x1]
+ dmb sy
+ dc ivac, x1 // Invalidate potentially stale cache line
+ ret
+mismatched_el:
+ str w20, [x1, #4]
+ dmb sy
+ dc ivac, x1 // Invalidate potentially stale cache line
+1: wfi
+ b 1b
ENDPROC(set_cpu_boot_mode_flag)

/*
@@ -593,6 +610,9 @@ ENDPROC(set_cpu_boot_mode_flag)
ENTRY(__boot_cpu_mode)
.long BOOT_CPU_MODE_EL2
.long BOOT_CPU_MODE_EL1
+ENTRY(__run_cpu_mode)
+ .long 0
+ .long 0
.popsection

/*
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index b1adc51..bc7650a 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -113,6 +113,9 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
pr_crit("CPU%u: failed to come online\n", cpu);
ret = -EIO;
}
+
+ if (is_kernel_mode_mismatched())
+ panic("CPU%u: incompatible execution level", cpu);
} else {
pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
}
--
2.1.4

Marc Zyngier

unread,
Feb 3, 2016, 1:01:43 PM2/3/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
The fault decoding process (including computing the IPA in the case
of a permission fault) would be much better done in C code, as we
have a reasonable infrastructure to deal with the VHE/non-VHE
differences.

Let's move the whole thing to C, including the workaround for
erratum 834220, and just patch the odd ESR_EL2 access remaining
in hyp-entry.S.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/kernel/asm-offsets.c | 3 --
arch/arm64/kvm/hyp/hyp-entry.S | 69 +++------------------------------
arch/arm64/kvm/hyp/switch.c | 85 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 90 insertions(+), 67 deletions(-)

diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index fffa4ac6..b0ab4e9 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -110,9 +110,6 @@ int main(void)
DEFINE(CPU_USER_PT_REGS, offsetof(struct kvm_regs, regs));
DEFINE(CPU_FP_REGS, offsetof(struct kvm_regs, fp_regs));
DEFINE(VCPU_FPEXC32_EL2, offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2]));
- DEFINE(VCPU_ESR_EL2, offsetof(struct kvm_vcpu, arch.fault.esr_el2));
- DEFINE(VCPU_FAR_EL2, offsetof(struct kvm_vcpu, arch.fault.far_el2));
- DEFINE(VCPU_HPFAR_EL2, offsetof(struct kvm_vcpu, arch.fault.hpfar_el2));
DEFINE(VCPU_HOST_CONTEXT, offsetof(struct kvm_vcpu, arch.host_cpu_context));
#endif
#ifdef CONFIG_CPU_PM
diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index 1bdeee7..3488894 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -19,7 +19,6 @@

#include <asm/alternative.h>
#include <asm/assembler.h>
-#include <asm/asm-offsets.h>
#include <asm/cpufeature.h>
#include <asm/kvm_arm.h>
#include <asm/kvm_asm.h>
@@ -69,7 +68,11 @@ ENDPROC(__vhe_hyp_call)
el1_sync: // Guest trapped into EL2
save_x0_to_x3

+alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
mrs x1, esr_el2
+alternative_else
+ mrs x1, esr_el1
+alternative_endif
lsr x2, x1, #ESR_ELx_EC_SHIFT

cmp x2, #ESR_ELx_EC_HVC64
@@ -105,72 +108,10 @@ el1_trap:
cmp x2, #ESR_ELx_EC_FP_ASIMD
b.eq __fpsimd_guest_restore

- cmp x2, #ESR_ELx_EC_DABT_LOW
- mov x0, #ESR_ELx_EC_IABT_LOW
- ccmp x2, x0, #4, ne
- b.ne 1f // Not an abort we care about
-
- /* This is an abort. Check for permission fault */
-alternative_if_not ARM64_WORKAROUND_834220
- and x2, x1, #ESR_ELx_FSC_TYPE
- cmp x2, #FSC_PERM
- b.ne 1f // Not a permission fault
-alternative_else
- nop // Use the permission fault path to
- nop // check for a valid S1 translation,
- nop // regardless of the ESR value.
-alternative_endif
-
- /*
- * Check for Stage-1 page table walk, which is guaranteed
- * to give a valid HPFAR_EL2.
- */
- tbnz x1, #7, 1f // S1PTW is set
-
- /* Preserve PAR_EL1 */
- mrs x3, par_el1
- stp x3, xzr, [sp, #-16]!
-
- /*
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index e90683a..a192357 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -15,6 +15,7 @@
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/

+#include <linux/types.h>
#include <asm/kvm_asm.h>

#include "hyp.h"
@@ -145,6 +146,86 @@ static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
__vgic_call_restore_state()(vcpu);
}

+static bool __hyp_text __true_value(void)
+{
+ return true;
+}
+
+static bool __hyp_text __false_value(void)
+{
+ return false;
+}
+
+static hyp_alternate_select(__check_arm_834220,
+ __false_value, __true_value,
+ ARM64_WORKAROUND_834220);
+
+static bool __hyp_text __translate_far_to_hpfar(u64 far, u64 *hpfar)
+{
+ u64 par, tmp;
+
+ /*
+ * Resolve the IPA the hard way using the guest VA.
+ *
+ * Stage-1 translation already validated the memory access
+ * rights. As such, we can use the EL1 translation regime, and
+ * don't have to distinguish between EL0 and EL1 access.
+ *
+ * We do need to save/restore PAR_EL1 though, as we haven't
+ * saved the guest context yet, and we may return early...
+ */
+ par = read_sysreg(par_el1);
+ asm volatile("at s1e1r, %0" : : "r" (far));
+ isb();
+
+ tmp = read_sysreg(par_el1);
+ write_sysreg(par, par_el1);
+
+ if (unlikely(tmp & 1))
+ return false; /* Translation failed, back to guest */
+
+ /* Convert PAR to HPFAR format */
+ *hpfar = ((tmp >> 12) & ((1UL << 36) - 1)) << 4;
+ return true;
+}
+
+static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
+{
+ u64 esr = read_sysreg_el2(esr);
+ u8 ec = esr >> ESR_ELx_EC_SHIFT;
+ u64 hpfar, far;
+
+ vcpu->arch.fault.esr_el2 = esr;
+
+ if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
+ return true;
+
+ far = read_sysreg_el2(far);
+
+ /*
+ * The HPFAR can be invalid if the stage 2 fault did not
+ * happen during a stage 1 page table walk (the ESR_EL2.S1PTW
+ * bit is clear) and one of the two following cases are true:
+ * 1. The fault was due to a permission fault
+ * 2. The processor carries errata 834220
+ *
+ * Therefore, for all non S1PTW faults where we either have a
+ * permission fault or the errata workaround is enabled, we
+ * resolve the IPA using the AT instruction.
+ */
+ if (!(esr & ESR_ELx_S1PTW) &&
+ (__check_arm_834220()() || (esr & ESR_ELx_FSC_TYPE) == FSC_PERM)) {
+ if (!__translate_far_to_hpfar(far, &hpfar))
+ return false;
+ } else {
+ hpfar = read_sysreg(hpfar_el2);
+ }
+
+ vcpu->arch.fault.far_el2 = far;
+ vcpu->arch.fault.hpfar_el2 = hpfar;
+ return true;
+}
+
static int __hyp_text __guest_run(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *host_ctxt;
@@ -176,9 +257,13 @@ static int __hyp_text __guest_run(struct kvm_vcpu *vcpu)

Marc Zyngier

unread,
Feb 3, 2016, 1:01:50 PM2/3/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
With VHE, we place kernel {watch,break}-points at EL2 to get things
like kgdb and "perf -e mem:..." working.

This requires a bit of repainting in the low-level encore/decode,
but is otherwise pretty simple.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/include/asm/hw_breakpoint.h | 49 +++++++++++++++++++++-------------
1 file changed, 31 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/include/asm/hw_breakpoint.h b/arch/arm64/include/asm/hw_breakpoint.h
index 9732908..0da0272 100644
--- a/arch/arm64/include/asm/hw_breakpoint.h
+++ b/arch/arm64/include/asm/hw_breakpoint.h
@@ -18,6 +18,7 @@

#include <asm/cputype.h>
#include <asm/cpufeature.h>
+#include <asm/virt.h>

#ifdef __KERNEL__

@@ -35,24 +36,6 @@ struct arch_hw_breakpoint {
struct arch_hw_breakpoint_ctrl ctrl;
};

-static inline u32 encode_ctrl_reg(struct arch_hw_breakpoint_ctrl ctrl)
-{
- return (ctrl.len << 5) | (ctrl.type << 3) | (ctrl.privilege << 1) |
- ctrl.enabled;
-}
-
-static inline void decode_ctrl_reg(u32 reg,
- struct arch_hw_breakpoint_ctrl *ctrl)
-{
- ctrl->enabled = reg & 0x1;
- reg >>= 1;
- ctrl->privilege = reg & 0x3;
- reg >>= 2;
- ctrl->type = reg & 0x3;
- reg >>= 2;
- ctrl->len = reg & 0xff;
-}
-
/* Breakpoint */
#define ARM_BREAKPOINT_EXECUTE 0

@@ -76,6 +59,36 @@ static inline void decode_ctrl_reg(u32 reg,
#define ARM_KERNEL_STEP_ACTIVE 1
#define ARM_KERNEL_STEP_SUSPEND 2

+#define DBG_HMC_HYP (1 << 13)
+#define DBG_SSC_HYP (3 << 14)
+
+static inline u32 encode_ctrl_reg(struct arch_hw_breakpoint_ctrl ctrl)
+{
+ u32 val = (ctrl.len << 5) | (ctrl.type << 3) | ctrl.enabled;
+
+ if (is_kernel_in_hyp_mode() && ctrl.privilege == AARCH64_BREAKPOINT_EL1)
+ val |= DBG_HMC_HYP | DBG_SSC_HYP;
+ else
+ val |= ctrl.privilege << 1;
+
+ return val;
+}
+
+static inline void decode_ctrl_reg(u32 reg,
+ struct arch_hw_breakpoint_ctrl *ctrl)
+{
+ ctrl->enabled = reg & 0x1;
+ reg >>= 1;
+ if (is_kernel_in_hyp_mode())
+ ctrl->privilege = !!(reg & (DBG_HMC_HYP >> 1));
+ else
+ ctrl->privilege = reg & 0x3;
+ reg >>= 2;
+ ctrl->type = reg & 0x3;
+ reg >>= 2;
+ ctrl->len = reg & 0xff;
+}
+
/*
* Limits.
* Changing these will require modifications to the register accessors.
--
2.1.4

Marc Zyngier

unread,
Feb 3, 2016, 1:02:14 PM2/3/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
When the kernel is running in HYP (with VHE), it is necessary to
include EL2 events if the user requests counting kernel or
hypervisor events.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/kernel/perf_event.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index f7ab14c..6013a38 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -20,6 +20,7 @@
*/

#include <asm/irq_regs.h>
+#include <asm/virt.h>

#include <linux/of.h>
#include <linux/perf/arm_pmu.h>
@@ -693,10 +694,15 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
return -EPERM;
if (attr->exclude_user)
config_base |= ARMV8_EXCLUDE_EL0;
- if (attr->exclude_kernel)
- config_base |= ARMV8_EXCLUDE_EL1;
- if (!attr->exclude_hv)
- config_base |= ARMV8_INCLUDE_EL2;
+ if (is_kernel_in_hyp_mode()) {
+ if (!attr->exclude_kernel || !attr->exclude_hv)
+ config_base |= ARMV8_INCLUDE_EL2;
+ } else {
+ if (attr->exclude_kernel)
+ config_base |= ARMV8_EXCLUDE_EL1;
+ if (!attr->exclude_hv)
+ config_base |= ARMV8_INCLUDE_EL2;
+ }

/*
* Install the filter into config_base as this is used to
--
2.1.4

Marc Zyngier

unread,
Feb 3, 2016, 1:02:38 PM2/3/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
Despite the fact that a VHE enabled kernel runs at EL2, it uses
CPACR_EL1 to trap FPSIMD access. Add the required alternative
code to re-enable guest FPSIMD access when it has trapped to
EL2.

Reviewed-by: Christoffer Dall <christof...@linaro.org>
Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/kvm/hyp/entry.S | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index fd0fbe9..ce9e5e5 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -130,9 +130,15 @@ ENDPROC(__guest_exit)
ENTRY(__fpsimd_guest_restore)
stp x4, lr, [sp, #-16]!

+alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
mrs x2, cptr_el2
bic x2, x2, #CPTR_EL2_TFP
msr cptr_el2, x2
+alternative_else
+ mrs x2, cpacr_el1
+ orr x2, x2, #CPACR_EL1_FPEN
+ msr cpacr_el1, x2
+alternative_endif
isb

mrs x3, tpidr_el2
--
2.1.4

Marc Zyngier

unread,
Feb 3, 2016, 1:03:08 PM2/3/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
As the kernel fully runs in HYP when VHE is enabled, we can
directly branch to the kernel's panic() implementation, and
not perform an exception return.

Add the alternative code to deal with this.

Reviewed-by: Christoffer Dall <christof...@linaro.org>
Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/kvm/hyp/switch.c | 35 +++++++++++++++++++++++++++--------
1 file changed, 27 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 686ca35..e90683a 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -206,11 +206,34 @@ __alias(__guest_run) int __kvm_vcpu_run(struct kvm_vcpu *vcpu);

static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";

-void __hyp_text __noreturn __hyp_panic(void)
+static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par)
{
unsigned long str_va = (unsigned long)__hyp_panic_string;
- u64 spsr = read_sysreg(spsr_el2);
- u64 elr = read_sysreg(elr_el2);
+
+ __hyp_do_panic(hyp_kern_va(str_va),
+ spsr, elr,
+ read_sysreg(esr_el2), read_sysreg_el2(far),
+ read_sysreg(hpfar_el2), par,
+ (void *)read_sysreg(tpidr_el2));
+}
+
+static void __hyp_text __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par)
+{
+ panic(__hyp_panic_string,
+ spsr, elr,
+ read_sysreg_el2(esr), read_sysreg_el2(far),
+ read_sysreg(hpfar_el2), par,
+ (void *)read_sysreg(tpidr_el2));
+}
+
+static hyp_alternate_select(__hyp_call_panic,
+ __hyp_call_panic_nvhe, __hyp_call_panic_vhe,
+ ARM64_HAS_VIRT_HOST_EXTN);
+
+void __hyp_text __noreturn __hyp_panic(void)
+{
+ u64 spsr = read_sysreg_el2(spsr);
+ u64 elr = read_sysreg_el2(elr);
u64 par = read_sysreg(par_el1);

if (read_sysreg(vttbr_el2)) {
@@ -225,11 +248,7 @@ void __hyp_text __noreturn __hyp_panic(void)
}

/* Call panic for real */
- __hyp_do_panic(hyp_kern_va(str_va),
- spsr, elr,
- read_sysreg(esr_el2), read_sysreg(far_el2),
- read_sysreg(hpfar_el2), par,
- (void *)read_sysreg(tpidr_el2));
+ __hyp_call_panic()(spsr, elr, par);

unreachable();
}
--
2.1.4

Marc Zyngier

unread,
Feb 3, 2016, 1:03:33 PM2/3/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
Switch the timer code to the unified sysreg accessors.

Reviewed-by: Christoffer Dall <christof...@linaro.org>
Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/kvm/hyp/timer-sr.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/timer-sr.c b/arch/arm64/kvm/hyp/timer-sr.c
index 1051e5d..f276d9e 100644
--- a/arch/arm64/kvm/hyp/timer-sr.c
+++ b/arch/arm64/kvm/hyp/timer-sr.c
@@ -31,12 +31,12 @@ void __hyp_text __timer_save_state(struct kvm_vcpu *vcpu)
u64 val;

if (kvm->arch.timer.enabled) {
- timer->cntv_ctl = read_sysreg(cntv_ctl_el0);
- timer->cntv_cval = read_sysreg(cntv_cval_el0);
+ timer->cntv_ctl = read_sysreg_el0(cntv_ctl);
+ timer->cntv_cval = read_sysreg_el0(cntv_cval);
}

/* Disable the virtual timer */
- write_sysreg(0, cntv_ctl_el0);
+ write_sysreg_el0(0, cntv_ctl);

/* Allow physical timer/counter access for the host */
val = read_sysreg(cnthctl_el2);
@@ -64,8 +64,8 @@ void __hyp_text __timer_restore_state(struct kvm_vcpu *vcpu)

if (kvm->arch.timer.enabled) {
write_sysreg(kvm->arch.timer.cntvoff, cntvoff_el2);
- write_sysreg(timer->cntv_cval, cntv_cval_el0);
+ write_sysreg_el0(timer->cntv_cval, cntv_cval);
isb();
- write_sysreg(timer->cntv_ctl, cntv_ctl_el0);
+ write_sysreg_el0(timer->cntv_ctl, cntv_ctl);
}
}
--
2.1.4

Marc Zyngier

unread,
Feb 3, 2016, 1:04:13 PM2/3/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
Running the kernel in HYP mode requires the HCR_E2H bit to be set
at all times, and the HCR_TGE bit to be set when running as a host
(and cleared when running as a guest). At the same time, the vector
must be set to the current role of the kernel (either host or
hypervisor), and a couple of system registers differ between VHE
and non-VHE.

We implement these by using another set of alternate functions
that get dynamically patched.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/include/asm/kvm_arm.h | 3 ++-
arch/arm64/include/asm/kvm_emulate.h | 3 +++
arch/arm64/kvm/hyp/switch.c | 47 +++++++++++++++++++++++++++++++++---
3 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 498335e..100cbec 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -23,6 +23,7 @@
#include <asm/types.h>

/* Hyp Configuration Register (HCR) bits */
+#define HCR_E2H (UL(1) << 34)
#define HCR_ID (UL(1) << 33)
#define HCR_CD (UL(1) << 32)
#define HCR_RW_SHIFT 31
@@ -81,7 +82,7 @@
HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW)
#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
#define HCR_INT_OVERRIDE (HCR_FMO | HCR_IMO)
-
+#define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)

/* Hyp System Control Register (SCTLR_EL2) bits */
#define SCTLR_EL2_EE (1 << 25)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 3066328..5ae0c69 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -29,6 +29,7 @@
#include <asm/kvm_mmio.h>
#include <asm/ptrace.h>
#include <asm/cputype.h>
+#include <asm/virt.h>

unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num);
unsigned long *vcpu_spsr32(const struct kvm_vcpu *vcpu);
@@ -43,6 +44,8 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
{
vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
+ if (is_kernel_in_hyp_mode())
+ vcpu->arch.hcr_el2 |= HCR_E2H;
if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features))
vcpu->arch.hcr_el2 &= ~HCR_RW;
}
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 0db161e..686ca35 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -15,6 +15,8 @@
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/

+#include <asm/kvm_asm.h>
+
#include "hyp.h"

static bool __hyp_text __fpsimd_enabled_nvhe(void)
@@ -36,6 +38,27 @@ bool __hyp_text __fpsimd_enabled(void)
return __fpsimd_is_enabled()();
}

+static void __hyp_text __activate_traps_vhe(void)
+{
+ u64 val;
+
+ val = read_sysreg(cpacr_el1);
+ val |= CPACR_EL1_TTA;
+ val &= ~CPACR_EL1_FPEN;
+ write_sysreg(val, cpacr_el1);
+
+ write_sysreg(__kvm_hyp_vector, vbar_el1);
+}
+
+static void __hyp_text __activate_traps_nvhe(void)
+{
+ write_sysreg(CPTR_EL2_TTA | CPTR_EL2_TFP, cptr_el2);
+}
+
+static hyp_alternate_select(__activate_traps_arch,
+ __activate_traps_nvhe, __activate_traps_vhe,
+ ARM64_HAS_VIRT_HOST_EXTN);
+
static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
{
u64 val;
@@ -55,16 +78,34 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
write_sysreg(val, hcr_el2);
/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
write_sysreg(1 << 15, hstr_el2);
- write_sysreg(CPTR_EL2_TTA | CPTR_EL2_TFP, cptr_el2);
write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
+ __activate_traps_arch()();
}

-static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
+static void __hyp_text __deactivate_traps_vhe(void)
+{
+ extern char vectors[]; /* kernel exception vectors */
+
+ write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
+ write_sysreg(CPACR_EL1_FPEN, cpacr_el1);
+ write_sysreg(vectors, vbar_el1);
+}
+
+static void __hyp_text __deactivate_traps_nvhe(void)
{
write_sysreg(HCR_RW, hcr_el2);
+ write_sysreg(0, cptr_el2);
+}
+
+static hyp_alternate_select(__deactivate_traps_arch,
+ __deactivate_traps_nvhe, __deactivate_traps_vhe,
+ ARM64_HAS_VIRT_HOST_EXTN);
+
+static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
+{
+ __deactivate_traps_arch()();
write_sysreg(0, hstr_el2);
write_sysreg(read_sysreg(mdcr_el2) & MDCR_EL2_HPMN_MASK, mdcr_el2);
- write_sysreg(0, cptr_el2);
}

static void __hyp_text __activate_vm(struct kvm_vcpu *vcpu)
--
2.1.4

Marc Zyngier

unread,
Feb 3, 2016, 1:04:35 PM2/3/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
Use the recently introduced unified system register accessors for
those sysregs that behave differently depending on VHE being in
use or not.

Reviewed-by: Christoffer Dall <christof...@linaro.org>
Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/kvm/hyp/sysreg-sr.c | 84 +++++++++++++++++++++---------------------
1 file changed, 42 insertions(+), 42 deletions(-)

diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 61bad17..7d7d757 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -37,34 +37,34 @@ static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
ctxt->sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0);
ctxt->sys_regs[TPIDR_EL1] = read_sysreg(tpidr_el1);
ctxt->gp_regs.regs.sp = read_sysreg(sp_el0);
- ctxt->gp_regs.regs.pc = read_sysreg(elr_el2);
- ctxt->gp_regs.regs.pstate = read_sysreg(spsr_el2);
+ ctxt->gp_regs.regs.pc = read_sysreg_el2(elr);
+ ctxt->gp_regs.regs.pstate = read_sysreg_el2(spsr);
}

static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
{
ctxt->sys_regs[MPIDR_EL1] = read_sysreg(vmpidr_el2);
ctxt->sys_regs[CSSELR_EL1] = read_sysreg(csselr_el1);
- ctxt->sys_regs[SCTLR_EL1] = read_sysreg(sctlr_el1);
- ctxt->sys_regs[CPACR_EL1] = read_sysreg(cpacr_el1);
- ctxt->sys_regs[TTBR0_EL1] = read_sysreg(ttbr0_el1);
- ctxt->sys_regs[TTBR1_EL1] = read_sysreg(ttbr1_el1);
- ctxt->sys_regs[TCR_EL1] = read_sysreg(tcr_el1);
- ctxt->sys_regs[ESR_EL1] = read_sysreg(esr_el1);
- ctxt->sys_regs[AFSR0_EL1] = read_sysreg(afsr0_el1);
- ctxt->sys_regs[AFSR1_EL1] = read_sysreg(afsr1_el1);
- ctxt->sys_regs[FAR_EL1] = read_sysreg(far_el1);
- ctxt->sys_regs[MAIR_EL1] = read_sysreg(mair_el1);
- ctxt->sys_regs[VBAR_EL1] = read_sysreg(vbar_el1);
- ctxt->sys_regs[CONTEXTIDR_EL1] = read_sysreg(contextidr_el1);
- ctxt->sys_regs[AMAIR_EL1] = read_sysreg(amair_el1);
- ctxt->sys_regs[CNTKCTL_EL1] = read_sysreg(cntkctl_el1);
+ ctxt->sys_regs[SCTLR_EL1] = read_sysreg_el1(sctlr);
+ ctxt->sys_regs[CPACR_EL1] = read_sysreg_el1(cpacr);
+ ctxt->sys_regs[TTBR0_EL1] = read_sysreg_el1(ttbr0);
+ ctxt->sys_regs[TTBR1_EL1] = read_sysreg_el1(ttbr1);
+ ctxt->sys_regs[TCR_EL1] = read_sysreg_el1(tcr);
+ ctxt->sys_regs[ESR_EL1] = read_sysreg_el1(esr);
+ ctxt->sys_regs[AFSR0_EL1] = read_sysreg_el1(afsr0);
+ ctxt->sys_regs[AFSR1_EL1] = read_sysreg_el1(afsr1);
+ ctxt->sys_regs[FAR_EL1] = read_sysreg_el1(far);
+ ctxt->sys_regs[MAIR_EL1] = read_sysreg_el1(mair);
+ ctxt->sys_regs[VBAR_EL1] = read_sysreg_el1(vbar);
+ ctxt->sys_regs[CONTEXTIDR_EL1] = read_sysreg_el1(contextidr);
+ ctxt->sys_regs[AMAIR_EL1] = read_sysreg_el1(amair);
+ ctxt->sys_regs[CNTKCTL_EL1] = read_sysreg_el1(cntkctl);
ctxt->sys_regs[PAR_EL1] = read_sysreg(par_el1);
ctxt->sys_regs[MDSCR_EL1] = read_sysreg(mdscr_el1);

ctxt->gp_regs.sp_el1 = read_sysreg(sp_el1);
- ctxt->gp_regs.elr_el1 = read_sysreg(elr_el1);
- ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg(spsr_el1);
+ ctxt->gp_regs.elr_el1 = read_sysreg_el1(elr);
+ ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(spsr);
}

void __hyp_text __sysreg_save_host_state(struct kvm_cpu_context *ctxt)
@@ -86,34 +86,34 @@ static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctx
write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], tpidrro_el0);
write_sysreg(ctxt->sys_regs[TPIDR_EL1], tpidr_el1);
write_sysreg(ctxt->gp_regs.regs.sp, sp_el0);
- write_sysreg(ctxt->gp_regs.regs.pc, elr_el2);
- write_sysreg(ctxt->gp_regs.regs.pstate, spsr_el2);
+ write_sysreg_el2(ctxt->gp_regs.regs.pc, elr);
+ write_sysreg_el2(ctxt->gp_regs.regs.pstate, spsr);
}

static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
{
- write_sysreg(ctxt->sys_regs[MPIDR_EL1], vmpidr_el2);
- write_sysreg(ctxt->sys_regs[CSSELR_EL1], csselr_el1);
- write_sysreg(ctxt->sys_regs[SCTLR_EL1], sctlr_el1);
- write_sysreg(ctxt->sys_regs[CPACR_EL1], cpacr_el1);
- write_sysreg(ctxt->sys_regs[TTBR0_EL1], ttbr0_el1);
- write_sysreg(ctxt->sys_regs[TTBR1_EL1], ttbr1_el1);
- write_sysreg(ctxt->sys_regs[TCR_EL1], tcr_el1);
- write_sysreg(ctxt->sys_regs[ESR_EL1], esr_el1);
- write_sysreg(ctxt->sys_regs[AFSR0_EL1], afsr0_el1);
- write_sysreg(ctxt->sys_regs[AFSR1_EL1], afsr1_el1);
- write_sysreg(ctxt->sys_regs[FAR_EL1], far_el1);
- write_sysreg(ctxt->sys_regs[MAIR_EL1], mair_el1);
- write_sysreg(ctxt->sys_regs[VBAR_EL1], vbar_el1);
- write_sysreg(ctxt->sys_regs[CONTEXTIDR_EL1], contextidr_el1);
- write_sysreg(ctxt->sys_regs[AMAIR_EL1], amair_el1);
- write_sysreg(ctxt->sys_regs[CNTKCTL_EL1], cntkctl_el1);
- write_sysreg(ctxt->sys_regs[PAR_EL1], par_el1);
- write_sysreg(ctxt->sys_regs[MDSCR_EL1], mdscr_el1);
-
- write_sysreg(ctxt->gp_regs.sp_el1, sp_el1);
- write_sysreg(ctxt->gp_regs.elr_el1, elr_el1);
- write_sysreg(ctxt->gp_regs.spsr[KVM_SPSR_EL1], spsr_el1);
+ write_sysreg(ctxt->sys_regs[MPIDR_EL1], vmpidr_el2);
+ write_sysreg(ctxt->sys_regs[CSSELR_EL1], csselr_el1);
+ write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1], sctlr);
+ write_sysreg_el1(ctxt->sys_regs[CPACR_EL1], cpacr);
+ write_sysreg_el1(ctxt->sys_regs[TTBR0_EL1], ttbr0);
+ write_sysreg_el1(ctxt->sys_regs[TTBR1_EL1], ttbr1);
+ write_sysreg_el1(ctxt->sys_regs[TCR_EL1], tcr);
+ write_sysreg_el1(ctxt->sys_regs[ESR_EL1], esr);
+ write_sysreg_el1(ctxt->sys_regs[AFSR0_EL1], afsr0);
+ write_sysreg_el1(ctxt->sys_regs[AFSR1_EL1], afsr1);
+ write_sysreg_el1(ctxt->sys_regs[FAR_EL1], far);
+ write_sysreg_el1(ctxt->sys_regs[MAIR_EL1], mair);
+ write_sysreg_el1(ctxt->sys_regs[VBAR_EL1], vbar);
+ write_sysreg_el1(ctxt->sys_regs[CONTEXTIDR_EL1],contextidr);
+ write_sysreg_el1(ctxt->sys_regs[AMAIR_EL1], amair);
+ write_sysreg_el1(ctxt->sys_regs[CNTKCTL_EL1], cntkctl);
+ write_sysreg(ctxt->sys_regs[PAR_EL1], par_el1);
+ write_sysreg(ctxt->sys_regs[MDSCR_EL1], mdscr_el1);
+
+ write_sysreg(ctxt->gp_regs.sp_el1, sp_el1);
+ write_sysreg_el1(ctxt->gp_regs.elr_el1, elr);
+ write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],spsr);
}

void __hyp_text __sysreg_restore_host_state(struct kvm_cpu_context *ctxt)
--
2.1.4

Marc Zyngier

unread,
Feb 3, 2016, 1:05:12 PM2/3/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
A handful of system registers are still shared between host and guest,
even while using VHE (tpidr*_el[01] and actlr_el1).

Also, some of the vcpu state (sp_el0, PC and PSTATE) must be
save/restored on entry/exit, as they are used on the host as well.

In order to facilitate the introduction of a VHE-specific sysreg
save/restore, make move the access to these registers to their
own save/restore functions.

No functionnal change.

Reviewed-by: Christoffer Dall <christof...@linaro.org>
Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/kvm/hyp/sysreg-sr.c | 48 +++++++++++++++++++++++++++++-------------
1 file changed, 33 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index bd5b543..61bad17 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -23,13 +23,29 @@

#include "hyp.h"

-/* ctxt is already in the HYP VA space */
+/*
+ * Non-VHE: Both host and guest must save everything.
+ *
+ * VHE: Host must save tpidr*_el[01], actlr_el1, sp0, pc, pstate, and
+ * guest must save everything.
+ */
+
+static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
+{
+ ctxt->sys_regs[ACTLR_EL1] = read_sysreg(actlr_el1);
+ ctxt->sys_regs[TPIDR_EL0] = read_sysreg(tpidr_el0);
+ ctxt->sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0);
+ ctxt->sys_regs[TPIDR_EL1] = read_sysreg(tpidr_el1);
+ ctxt->gp_regs.regs.sp = read_sysreg(sp_el0);
+ ctxt->gp_regs.regs.pc = read_sysreg(elr_el2);
+ ctxt->gp_regs.regs.pstate = read_sysreg(spsr_el2);
+}
+
static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
{
ctxt->sys_regs[MPIDR_EL1] = read_sysreg(vmpidr_el2);
ctxt->sys_regs[CSSELR_EL1] = read_sysreg(csselr_el1);
ctxt->sys_regs[SCTLR_EL1] = read_sysreg(sctlr_el1);
- ctxt->sys_regs[ACTLR_EL1] = read_sysreg(actlr_el1);
ctxt->sys_regs[CPACR_EL1] = read_sysreg(cpacr_el1);
ctxt->sys_regs[TTBR0_EL1] = read_sysreg(ttbr0_el1);
ctxt->sys_regs[TTBR1_EL1] = read_sysreg(ttbr1_el1);
@@ -41,17 +57,11 @@ static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
ctxt->sys_regs[MAIR_EL1] = read_sysreg(mair_el1);
ctxt->sys_regs[VBAR_EL1] = read_sysreg(vbar_el1);
ctxt->sys_regs[CONTEXTIDR_EL1] = read_sysreg(contextidr_el1);
- ctxt->sys_regs[TPIDR_EL0] = read_sysreg(tpidr_el0);
- ctxt->sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0);
- ctxt->sys_regs[TPIDR_EL1] = read_sysreg(tpidr_el1);
ctxt->sys_regs[AMAIR_EL1] = read_sysreg(amair_el1);
ctxt->sys_regs[CNTKCTL_EL1] = read_sysreg(cntkctl_el1);
ctxt->sys_regs[PAR_EL1] = read_sysreg(par_el1);
ctxt->sys_regs[MDSCR_EL1] = read_sysreg(mdscr_el1);

- ctxt->gp_regs.regs.sp = read_sysreg(sp_el0);
- ctxt->gp_regs.regs.pc = read_sysreg(elr_el2);
- ctxt->gp_regs.regs.pstate = read_sysreg(spsr_el2);
ctxt->gp_regs.sp_el1 = read_sysreg(sp_el1);
ctxt->gp_regs.elr_el1 = read_sysreg(elr_el1);
ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg(spsr_el1);
@@ -60,11 +70,24 @@ static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
void __hyp_text __sysreg_save_host_state(struct kvm_cpu_context *ctxt)
{
__sysreg_save_state(ctxt);
+ __sysreg_save_common_state(ctxt);
}

void __hyp_text __sysreg_save_guest_state(struct kvm_cpu_context *ctxt)
{
__sysreg_save_state(ctxt);
+ __sysreg_save_common_state(ctxt);
+}
+
+static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctxt)
+{
+ write_sysreg(ctxt->sys_regs[ACTLR_EL1], actlr_el1);
+ write_sysreg(ctxt->sys_regs[TPIDR_EL0], tpidr_el0);
+ write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], tpidrro_el0);
+ write_sysreg(ctxt->sys_regs[TPIDR_EL1], tpidr_el1);
+ write_sysreg(ctxt->gp_regs.regs.sp, sp_el0);
+ write_sysreg(ctxt->gp_regs.regs.pc, elr_el2);
+ write_sysreg(ctxt->gp_regs.regs.pstate, spsr_el2);
}

static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
@@ -72,7 +95,6 @@ static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
write_sysreg(ctxt->sys_regs[MPIDR_EL1], vmpidr_el2);
write_sysreg(ctxt->sys_regs[CSSELR_EL1], csselr_el1);
write_sysreg(ctxt->sys_regs[SCTLR_EL1], sctlr_el1);
- write_sysreg(ctxt->sys_regs[ACTLR_EL1], actlr_el1);
write_sysreg(ctxt->sys_regs[CPACR_EL1], cpacr_el1);
write_sysreg(ctxt->sys_regs[TTBR0_EL1], ttbr0_el1);
write_sysreg(ctxt->sys_regs[TTBR1_EL1], ttbr1_el1);
@@ -84,17 +106,11 @@ static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
write_sysreg(ctxt->sys_regs[MAIR_EL1], mair_el1);
write_sysreg(ctxt->sys_regs[VBAR_EL1], vbar_el1);
write_sysreg(ctxt->sys_regs[CONTEXTIDR_EL1], contextidr_el1);
- write_sysreg(ctxt->sys_regs[TPIDR_EL0], tpidr_el0);
- write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], tpidrro_el0);
- write_sysreg(ctxt->sys_regs[TPIDR_EL1], tpidr_el1);
write_sysreg(ctxt->sys_regs[AMAIR_EL1], amair_el1);
write_sysreg(ctxt->sys_regs[CNTKCTL_EL1], cntkctl_el1);
write_sysreg(ctxt->sys_regs[PAR_EL1], par_el1);
write_sysreg(ctxt->sys_regs[MDSCR_EL1], mdscr_el1);

- write_sysreg(ctxt->gp_regs.regs.sp, sp_el0);
- write_sysreg(ctxt->gp_regs.regs.pc, elr_el2);
- write_sysreg(ctxt->gp_regs.regs.pstate, spsr_el2);
write_sysreg(ctxt->gp_regs.sp_el1, sp_el1);
write_sysreg(ctxt->gp_regs.elr_el1, elr_el1);
write_sysreg(ctxt->gp_regs.spsr[KVM_SPSR_EL1], spsr_el1);
@@ -103,11 +119,13 @@ static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
void __hyp_text __sysreg_restore_host_state(struct kvm_cpu_context *ctxt)
{
__sysreg_restore_state(ctxt);
+ __sysreg_restore_common_state(ctxt);
}

void __hyp_text __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt)
{
__sysreg_restore_state(ctxt);
+ __sysreg_restore_common_state(ctxt);
}

void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
--
2.1.4

Marc Zyngier

unread,
Feb 3, 2016, 1:05:54 PM2/3/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
We're now in a position where we can introduce VHE's minimal
save/restore, which is limited to the handful of shared sysregs.

Add the required alternative function calls that result in a
"do nothing" call on VHE, and the normal save/restore for non-VHE.

Reviewed-by: Christoffer Dall <christof...@linaro.org>
Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/kvm/hyp/sysreg-sr.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 7d7d757..74b5f81 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -23,6 +23,9 @@

#include "hyp.h"

+/* Yes, this does nothing, on purpose */
+static void __hyp_text __sysreg_do_nothing(struct kvm_cpu_context *ctxt) { }
+
/*
* Non-VHE: Both host and guest must save everything.
*
@@ -67,9 +70,13 @@ static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(spsr);
}

+static hyp_alternate_select(__sysreg_call_save_host_state,
+ __sysreg_save_state, __sysreg_do_nothing,
+ ARM64_HAS_VIRT_HOST_EXTN);
+
void __hyp_text __sysreg_save_host_state(struct kvm_cpu_context *ctxt)
{
- __sysreg_save_state(ctxt);
+ __sysreg_call_save_host_state()(ctxt);
__sysreg_save_common_state(ctxt);
}

@@ -116,9 +123,13 @@ static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],spsr);
}

+static hyp_alternate_select(__sysreg_call_restore_host_state,
+ __sysreg_restore_state, __sysreg_do_nothing,
+ ARM64_HAS_VIRT_HOST_EXTN);
+
void __hyp_text __sysreg_restore_host_state(struct kvm_cpu_context *ctxt)
{
- __sysreg_restore_state(ctxt);
+ __sysreg_call_restore_host_state()(ctxt);
__sysreg_restore_common_state(ctxt);
}

--
2.1.4

Marc Zyngier

unread,
Feb 3, 2016, 1:06:44 PM2/3/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
With VHE, the host never issues an HVC instruction to get into the
KVM code, as we can simply branch there.

Use runtime code patching to simplify things a bit.

Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/kvm/hyp.S | 7 +++++++
arch/arm64/kvm/hyp/hyp-entry.S | 40 +++++++++++++++++++++++++++++++---------
2 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 0ccdcbb..0689a74 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -17,7 +17,9 @@

#include <linux/linkage.h>

+#include <asm/alternative.h>
#include <asm/assembler.h>
+#include <asm/cpufeature.h>

/*
* u64 kvm_call_hyp(void *hypfn, ...);
@@ -38,6 +40,11 @@
* arch/arm64/kernel/hyp_stub.S.
*/
ENTRY(kvm_call_hyp)
+alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
hvc #0
ret
+alternative_else
+ b __vhe_hyp_call
+ nop
+alternative_endif
ENDPROC(kvm_call_hyp)
diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index 93e8d983..1bdeee7 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -38,6 +38,34 @@
ldp x0, x1, [sp], #16
.endm

+.macro do_el2_call
+ /*
+ * Shuffle the parameters before calling the function
+ * pointed to in x0. Assumes parameters in x[1,2,3].
+ */
+ sub sp, sp, #16
+ str lr, [sp]
+ mov lr, x0
+ mov x0, x1
+ mov x1, x2
+ mov x2, x3
+ blr lr
+ ldr lr, [sp]
+ add sp, sp, #16
+.endm
+
+ENTRY(__vhe_hyp_call)
+ do_el2_call
+ /*
+ * We used to rely on having an exception return to get
+ * an implicit isb. In the E2H case, we don't have it anymore.
+ * rather than changing all the leaf functions, just do it here
+ * before returning to the rest of the kernel.
+ */
+ isb
+ ret
+ENDPROC(__vhe_hyp_call)
+
el1_sync: // Guest trapped into EL2
save_x0_to_x3

@@ -58,19 +86,13 @@ el1_sync: // Guest trapped into EL2
mrs x0, vbar_el2
b 2f

-1: stp lr, xzr, [sp, #-16]!
-
+1:
/*
- * Compute the function address in EL2, and shuffle the parameters.
+ * Perform the EL2 call
*/
kern_hyp_va x0
- mov lr, x0
- mov x0, x1
- mov x1, x2
- mov x2, x3
- blr lr
+ do_el2_call

- ldp lr, xzr, [sp], #16
2: eret

el1_trap:
--
2.1.4

Marc Zyngier

unread,
Feb 3, 2016, 1:06:46 PM2/3/16
to Catalin Marinas, Will Deacon, Christoffer Dall, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, k...@vger.kernel.org, kvm...@lists.cs.columbia.edu
With ARMv8, host and guest share the same system register file,
making the save/restore procedure completely symetrical.
With VHE, host and guest now have different requirements, as they
use different sysregs.

In order to prepare for this, add split sysreg save/restore functions
for both host and guest. No functional changes yet.

Acked-by: Christoffer Dall <christof...@linaro.org>
Signed-off-by: Marc Zyngier <marc.z...@arm.com>
---
arch/arm64/kvm/hyp/hyp.h | 6 ++++--
arch/arm64/kvm/hyp/switch.c | 10 +++++-----
arch/arm64/kvm/hyp/sysreg-sr.c | 24 ++++++++++++++++++++++--
3 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
index 744c919..5dfa883 100644
--- a/arch/arm64/kvm/hyp/hyp.h
+++ b/arch/arm64/kvm/hyp/hyp.h
@@ -153,8 +153,10 @@ void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
void __timer_save_state(struct kvm_vcpu *vcpu);
void __timer_restore_state(struct kvm_vcpu *vcpu);

-void __sysreg_save_state(struct kvm_cpu_context *ctxt);
-void __sysreg_restore_state(struct kvm_cpu_context *ctxt);
+void __sysreg_save_host_state(struct kvm_cpu_context *ctxt);
+void __sysreg_restore_host_state(struct kvm_cpu_context *ctxt);
+void __sysreg_save_guest_state(struct kvm_cpu_context *ctxt);
+void __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt);
void __sysreg32_save_state(struct kvm_vcpu *vcpu);
void __sysreg32_restore_state(struct kvm_vcpu *vcpu);

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index ca8f5a5..9071dee 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -98,7 +98,7 @@ static int __hyp_text __guest_run(struct kvm_vcpu *vcpu)
host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
guest_ctxt = &vcpu->arch.ctxt;

- __sysreg_save_state(host_ctxt);
+ __sysreg_save_host_state(host_ctxt);
__debug_cond_save_host_state(vcpu);

__activate_traps(vcpu);
@@ -112,7 +112,7 @@ static int __hyp_text __guest_run(struct kvm_vcpu *vcpu)
* to Cortex-A57 erratum #852523.
*/
__sysreg32_restore_state(vcpu);
- __sysreg_restore_state(guest_ctxt);
+ __sysreg_restore_guest_state(guest_ctxt);
__debug_restore_state(vcpu, kern_hyp_va(vcpu->arch.debug_ptr), guest_ctxt);

/* Jump in the fire! */
@@ -121,7 +121,7 @@ static int __hyp_text __guest_run(struct kvm_vcpu *vcpu)

fp_enabled = __fpsimd_enabled();

- __sysreg_save_state(guest_ctxt);
+ __sysreg_save_guest_state(guest_ctxt);
__sysreg32_save_state(vcpu);
__timer_save_state(vcpu);
__vgic_save_state(vcpu);
@@ -129,7 +129,7 @@ static int __hyp_text __guest_run(struct kvm_vcpu *vcpu)
__deactivate_traps(vcpu);
__deactivate_vm(vcpu);

- __sysreg_restore_state(host_ctxt);
+ __sysreg_restore_host_state(host_ctxt);

if (fp_enabled) {
__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
@@ -161,7 +161,7 @@ void __hyp_text __noreturn __hyp_panic(void)
host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
__deactivate_traps(vcpu);
__deactivate_vm(vcpu);
- __sysreg_restore_state(host_ctxt);
+ __sysreg_restore_host_state(host_ctxt);
}

/* Call panic for real */
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 42563098..bd5b543 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -24,7 +24,7 @@
#include "hyp.h"

/* ctxt is already in the HYP VA space */
-void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
+static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
{
ctxt->sys_regs[MPIDR_EL1] = read_sysreg(vmpidr_el2);
ctxt->sys_regs[CSSELR_EL1] = read_sysreg(csselr_el1);
@@ -57,7 +57,17 @@ void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg(spsr_el1);
}

-void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
+void __hyp_text __sysreg_save_host_state(struct kvm_cpu_context *ctxt)
+{
+ __sysreg_save_state(ctxt);
+}
+
+void __hyp_text __sysreg_save_guest_state(struct kvm_cpu_context *ctxt)
+{
+ __sysreg_save_state(ctxt);
+}
+
+static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
{
write_sysreg(ctxt->sys_regs[MPIDR_EL1], vmpidr_el2);
write_sysreg(ctxt->sys_regs[CSSELR_EL1], csselr_el1);
@@ -90,6 +100,16 @@ void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
write_sysreg(ctxt->gp_regs.spsr[KVM_SPSR_EL1], spsr_el1);
}

+void __hyp_text __sysreg_restore_host_state(struct kvm_cpu_context *ctxt)
+{
+ __sysreg_restore_state(ctxt);
+}
+
+void __hyp_text __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt)
+{
+ __sysreg_restore_state(ctxt);
+}
+
void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
{
u64 *spsr, *sysreg;
--
2.1.4

It is loading more messages.
0 new messages