On 2016-07-20 11:05, Marc Zyngier wrote:
> Hey Jan,
>
> On 20/07/16 07:09, Jan Kiszka wrote:
>> Hi all,
>>
>> I'd like to have some word by an expert on what we have to do in
>
> OK, I'll leave... ;-)
Don't make me nervous...! ;)
>
>> Jailhouse when changing the stage-2 page tables. This is what currently
>> happens for 32-bit ARM (maybe Tony can add a word on potential
>> deviations for ARM64):
>>
>> - all affected CPUs, except the one executing the changes, are put in
>> an busy-wait loop
>> - we modify the page tables, i.e. we directly write the desired state
>> on the executing CPU
>> - we resume the CPUs, and that triggers
>> - TLBs flushes (TLBIALL) on CPUs affected by the changes
>> - in some cases also dcache flushes
>
> It would be interesting to find out in which context you're doing that
> because the "busy-wait" loop seems a bit odd. What I'd expect would be to:
>
> - take the page-table lock
Consider that busy-wait as a "big-hypervisor lock", in replacement of
fine-grained locking to coordinate potential concurrent modifications.
Jailhouse only reconfigures those mappings at well defined points, and
that with that "big hammer".
> - clear the entry
> - DCache clean for that line (only on ARMv7)
> - TBLIALLIS (ARMv8 can do better by only invalidating the S2 VA)
> - Set the entry to the new value
And here we need not additional DCache flush?
> - release the lock
>
> There is no need to do a second TLBI, as invalid entries are guaranteed
> not to be cached in the TLBs. You do not need to to TBLIALL on all CPUs,
> and can rely on a TLBIALLIS that will broadcast it for you.
Ah, that would simplify things!
>
>>
>> I've two questions for a while now:
>>
>> - Are dcache flushes consistently required, or does the MMU snoop
>> caches when doing stage-2 translations? We could easily add a
>> DCCIMVAC for the modified entries, but I saw KVM flushing even
>> page-wise.
>
> Two things:
> - ARMv8 guarantees that the page table walker can snoop the caches,
> while we don't have that guarantee on ARMv7, so you need to clean the
> cache to the PoC (if memory serves well).
OK.
> - KVM cleans the cache to the PoC when mapping a page if the guest
> doesn't have its own caches enabled. Otherwise you will happily run with
> junk while the data sits pretty in the cache. (see my presentation at
> KF'15).
Caching/cache flushing vs. guest caches disabled is another topic which
caused a lot of headache to me in the past weeks (because it seems
fairly broken in Jailhouse, up to causing hypervisor crashes). As you
already brought it to that point:
We currently flush the DCache completely before starting a new guest
(with new code etc.), and also after resetting a guest CPU into
caches-disabled state. The flushing takes place in the middle of the
vmexit. Once in a while, I got nasty crashes due to inconsistent states
of the hypervisor (states that are shared between the CPUs).
My theory for this: After the flush, the CPU picks up some state
variable again into its local cache, then enters the guest which has
caches off. While we are in the guest, another CPU changes that shared
state, but the change is not propagated to the other CPU cache because
that is off. Now that first CPU leaves the guest, uses its stale cache
content and ruins our day. Following that theory, I moved the flush
right before the vmentry, ie. at a point where no more shared state is
touched. That made the crashes go away, apparently.
However, if that theory is true, I think we would have to flush caches
before *every* vmentry if the guest has caches off (we don't to that
right now). Now, what are the real requirements when dealing with guests
in non-cached mode?
>
>> - Under which conditions do we need to apply [1] ("break before make")
>> in our setup? Is it really required to first write an invalid
>> PTE/PMD/etc., then flush TLBs on all affected CPUs, and only then
>> write the new entries?
>
> BBM is always required if you're changing the output address of a page
> table entry. No ifs, no buts. For a very long time, we thought we could
> get away without it, but as CPUs get much more aggressive at partial TLB
> caching, we've seen all kind of weird issues, and Mark can rant about it
> for hours (I'm sure he will once he reads this email).
>
> You don't have to BBM if you're changing the permissions (or going from
> an invalid entry to a valid one, obviously).
>
>> Any insights welcome (including potential references to ARM ARM sections
>> so that we can document this in the code)!
>
> Now you're asking too much. I've only had one coffee, and the idea of
> trying to navigate the ARM ARM that early makes me feel slightly
> nauseous. Or maybe that the beer from last night. Or the weather.
>
> I'll blame the weather.
>
> M.
>
Hehe, fully understand. I will reference this discussion. ;)
Thanks!
Jan