Do you mean upgrading Jailhouse (and only that) from 0.9 to 0.11?
We refactored quite a bit between the two releases. Namely, 0.10 brought
per-cpu page tables.
You could first of try to narrow the reason down a bit: Do the exit
statistics look different for both versions?
With Intel x86, you should
normally have no exit for an external or timer interrupt injection.
From
that perspective, even 20 µs is too high. Try to identify the path that
causes the latency.
You could also try to bisect Jailhouse between the two versions, in
order to identify the causing commit. But that is only plan B I would say.
Hi all,I'm not sure if this is relevant in this case, but I have noticed that on Intel x86-64, if hardware p-states (HWP) are enabled in the CPU (which they are by default if the CPU supports it), this introduces frequency scaling coupling between cores, even when the cores are isolated in separate cells. So you get this unexpected behavior where an inmate with 1 core will run faster if the other cores are very busy, and run a lot slower if the other cores are all idle. This is because the CPU itself does the frequency scaling automatically in hardware and doesn't know anything about what Jailhouse is doing.Passing intel_pstate=no_hwp to the Linux kernel command line disables hardware p-states and gets rid of this coupling as far as I can tell. It appears that HWP is a relatively new feature (last couple of years) in Intel CPUs.
-Michael
On 25.10.19 15:52, Henning Schild wrote:
> Well you only have soo many shared ressources and if it is not
> additional exits/interrupts then it is contention on shared ressources.
>
> We are probably talking about caches, TLBs and buses.
>
> You should be able to use i.e. "perf" on Linux to read out hardware
> performance counters. And there you might want to look for TLB and
> cache misses.
>
> But the bisecting might be the better idea. Jan already mentioned the
> "features" that could be responsible. With a bit of educated guessing
> you will get away with just a few tries.
BTW, does your RTOS happen to use anything of the inmate bootstrap code
to start in Jailhouse? That also changed.
Jan
>
> Henning
>
> Am Fri, 25 Oct 2019 00:04:36 -0700
> schrieb Chung-Fan Yang <soni...@gmail.com>:
Interesting findings already, but I'm afraid we will need to dig deeper:
Can you describe what all is part of your measured latency path?
Do you just run code in guest mode or do you also trigger VM exits, e.g. to
issue ivshmem interrupts to a remote side?
Maybe you can sample some latencies along the critical path so that we have a better picture about
where we lose time, overall or rather on specific actions.
>
> Maybe you can sample some latencies along the critical path so that
> we have a better picture about
>
> where we lose time, overall or rather on specific actions.
>
>
> Basically, it is an overall slowdown.
> But code in the scheduler and application slowdown more than other places.
>
> BTW, I tested the again with a partially working setup of <kernel
> 4.19/Jailhouse v0.11/old ivshmem2>.
> Currently, I cannot get my application running, due to some mystery, but
> I am observing some slowdown.
> Pinging the RTOS using ivshmem-net the RTT has about 2x latency:
> * <kernel 4.19/Jailhouse v0.11/old ivshmem2>: ~0.060ms
> * <kernel 4.19/Jailhouse v0.11/new ivshmem2>: ~0.130ms
>
Sound like as if we have some caching related problem. You could enable
access to the perf MSRs (small patch to the MSR bitmap in vmx.c) and
check if you see excessive cache misses in the counters.