On 31/08/2016 01:16, Andrew Waterman wrote:
> On Tue, Aug 30, 2016 at 4:01 AM, Paolo Bonzini <
bon...@gnu.org> wrote:
>> The separation of S and H levels in RISC-V is repeating how PPC first,
>> and ARM later introduced virtualization extensions. As mentioned in the
>> privileged interface specification, this model suits Type-1 hypervisors
>> very well, but it can be a bit messy for Type-2 hypervisors such as KVM.
>>
>> For Type-2 hypervisors, the same kernel (e.g., Linux) would have to be
>> able to run both in S and H mode. In one case it would use s* registers,
>> in the other it would use h* registers. It is possible to avoid this
>> through a small stub that sets up the h* CSRs so as to be able to use s*
>> registers for regular kernel operation. For example, all interrupts
>> are usually delegated. This works, but it makes the world switch code
>> very slow, as we've seen with KVM on ARMv7, because it needs to read
>> and write all the s* CSRs twice (once on entry, once on exit).
>
> One thing that confuses me is why swapping the the supervisor state is
> so expensive. It takes roughly 30 instructions to swap out all the
> supervisor CSRs and the PLIC state, none of which should require a
> pipeline flush or egregious stalling. Swapping the integer/FP
> registers should dominate.
On ARM the culprit is mostly the GIC (see the ISCA 2016 article on VHE
--
http://www.cs.columbia.edu/~cdall/pubs/isca2016-dall.pdf); saving its
state takes 75% of the time on the microbenchmark.
I pointed out supervisor state because you do need to swap it twice.
GPRs only have to be swapped once by the stub, because the host kernel's
world switch code for can be written to withstand clobbering of most
GPRs. Swapping FPRs can be avoided completely.
It depends on the processor architecture whether swapping the supervisor
CSRs requires pipeline flushes or not. However, HRET and trapping back
to H mode probably do require pipeline flushes, and why do those twice
rather than once?...
H mode and the H mode stub also complicate paging. There are basically
two choices:
- use physical addresses in H mode (as in PPC), which limits or at least
complicates the implementation of the hypervisor;
- keep H mode page tables synchronized with host kernel page tables;
possibly this may require defining two separate page table formats in
the hypervisor specification, one for H mode itself (possibly based on
the S mode format) and one for second-level address translation.
Without H mode, hypervisor page tables _are_ the host kernel page
tables, because the host kernel runs in S mode and uses sptbr as usual.
I just would like to avoid ending up with Linux in H mode five years
down the road (as in ARM VHE), because that is good for performance but
more complex and also even worse for nested virtualization support.
RISC-V has the advantage of starting from a clean slate, and the
s390/x86 approach is IMHO clearly superior.
Thanks,
Paolo