Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH 0/5] x86: KVM vdso and clock improvements

10 views
Skip to first unread message

Andy Lutomirski

unread,
Dec 9, 2015, 6:12:28 PM12/9/15
to x...@kernel.org, Marcelo Tosatti, Radim Krcmar, Paolo Bonzini, linux-...@vger.kernel.org, k...@vger.kernel.org, Alexander Graf, Andy Lutomirski
NB: patch 1 doesn't really belong here, but it makes this a lot
easier for me to test. Patch 1, if it's okay at all, should go
though the kvm tree. The rest should probably go through
tip:x86/vdso once they're reviewed.

I'll do a followup to enable vdso pvclock on 32-bit guests.
I'm not currently set up to test it. (The KVM people could also
do it very easily on top of these patches.)

Andy Lutomirski (5):
x86/kvm: On KVM re-enable (e.g. after suspend), update clocks
x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader
x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap
x86/vdso: Remove pvclock fixmap machinery
x86/vdso: Enable vdso pvclock access on all vdso variants

arch/x86/entry/vdso/vclock_gettime.c | 151 ++++++++++++++++------------------
arch/x86/entry/vdso/vdso-layout.lds.S | 3 +-
arch/x86/entry/vdso/vdso2c.c | 3 +
arch/x86/entry/vdso/vma.c | 14 ++++
arch/x86/include/asm/fixmap.h | 5 --
arch/x86/include/asm/pvclock.h | 14 ++--
arch/x86/include/asm/vdso.h | 1 +
arch/x86/kernel/kvmclock.c | 11 ++-
arch/x86/kernel/pvclock.c | 24 ------
arch/x86/kvm/x86.c | 75 +----------------
10 files changed, 110 insertions(+), 191 deletions(-)

--
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Paolo Bonzini

unread,
Dec 10, 2015, 4:10:10 AM12/10/15
to Andy Lutomirski, x...@kernel.org, Marcelo Tosatti, Radim Krcmar, linux-...@vger.kernel.org, k...@vger.kernel.org, Alexander Graf, Andy Lutomirski


On 10/12/2015 00:12, Andy Lutomirski wrote:
> From: Andy Lutomirski <lu...@amacapital.net>
>
> The pvclock vdso code was too abstracted to understand easily and
> excessively paranoid. Simplify it for a huge speedup.
>
> This opens the door for additional simplifications, as the vdso no
> longer accesses the pvti for any vcpu other than vcpu 0.
>
> Before, vclock_gettime using kvm-clock took about 45ns on my machine.
> With this change, it takes 29ns, which is almost as fast as the pure TSC
> implementation.
>
> Signed-off-by: Andy Lutomirski <lu...@amacapital.net>
> ---
> arch/x86/entry/vdso/vclock_gettime.c | 81 ++++++++++++++++++++----------------
> 1 file changed, 46 insertions(+), 35 deletions(-)
>
> diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
> index ca94fa649251..c325ba1bdddf 100644
> --- a/arch/x86/entry/vdso/vclock_gettime.c
> +++ b/arch/x86/entry/vdso/vclock_gettime.c
> @@ -78,47 +78,58 @@ static notrace const struct pvclock_vsyscall_time_info *get_pvti(int cpu)
>
> static notrace cycle_t vread_pvclock(int *mode)
> {
> - const struct pvclock_vsyscall_time_info *pvti;
> + const struct pvclock_vcpu_time_info *pvti = &get_pvti(0)->pvti;
> cycle_t ret;
> - u64 last;
> - u32 version;
> - u8 flags;
> - unsigned cpu, cpu1;
> -
> + u64 tsc, pvti_tsc;
> + u64 last, delta, pvti_system_time;
> + u32 version, pvti_tsc_to_system_mul, pvti_tsc_shift;
>
> /*
> - * Note: hypervisor must guarantee that:
> - * 1. cpu ID number maps 1:1 to per-CPU pvclock time info.
> - * 2. that per-CPU pvclock time info is updated if the
> - * underlying CPU changes.
> - * 3. that version is increased whenever underlying CPU
> - * changes.
> + * Note: The kernel and hypervisor must guarantee that cpu ID
> + * number maps 1:1 to per-CPU pvclock time info.
> + *
> + * Because the hypervisor is entirely unaware of guest userspace
> + * preemption, it cannot guarantee that per-CPU pvclock time
> + * info is updated if the underlying CPU changes or that that
> + * version is increased whenever underlying CPU changes.
> *
> + * On KVM, we are guaranteed that pvti updates for any vCPU are
> + * atomic as seen by *all* vCPUs. This is an even stronger
> + * guarantee than we get with a normal seqlock.
> + *
> + * On Xen, we don't appear to have that guarantee, but Xen still
> + * supplies a valid seqlock using the version field.
> +
> + * We only do pvclock vdso timing at all if
> + * PVCLOCK_TSC_STABLE_BIT is set, and we interpret that bit to
> + * mean that all vCPUs have matching pvti and that the TSC is
> + * synced, so we can just look at vCPU 0's pvti.
> */
> - do {
> - cpu = __getcpu() & VGETCPU_CPU_MASK;
> - /* TODO: We can put vcpu id into higher bits of pvti.version.
> - * This will save a couple of cycles by getting rid of
> - * __getcpu() calls (Gleb).
> - */
> -
> - pvti = get_pvti(cpu);
> -
> - version = __pvclock_read_cycles(&pvti->pvti, &ret, &flags);
> -
> - /*
> - * Test we're still on the cpu as well as the version.
> - * We could have been migrated just after the first
> - * vgetcpu but before fetching the version, so we
> - * wouldn't notice a version change.
> - */
> - cpu1 = __getcpu() & VGETCPU_CPU_MASK;
> - } while (unlikely(cpu != cpu1 ||
> - (pvti->pvti.version & 1) ||
> - pvti->pvti.version != version));
> -
> - if (unlikely(!(flags & PVCLOCK_TSC_STABLE_BIT)))
> +
> + if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))) {
> *mode = VCLOCK_NONE;
> + return 0;
> + }
> +
> + do {
> + version = pvti->version;
> +
> + /* This is also a read barrier, so we'll read version first. */
> + tsc = rdtsc_ordered();
> +
> + pvti_tsc_to_system_mul = pvti->tsc_to_system_mul;
> + pvti_tsc_shift = pvti->tsc_shift;
> + pvti_system_time = pvti->system_time;
> + pvti_tsc = pvti->tsc_timestamp;
> +
> + /* Make sure that the version double-check is last. */
> + smp_rmb();
> + } while (unlikely((version & 1) || version != pvti->version));
> +
> + delta = tsc - pvti_tsc;
> + ret = pvti_system_time +
> + pvclock_scale_delta(delta, pvti_tsc_to_system_mul,
> + pvti_tsc_shift);
>
> /* refer to tsc.c read_tsc() comment for rationale */
> last = gtod->cycle_last;
>

Reviewed-by: Paolo Bonzini <pbon...@redhat.com>

Paolo Bonzini

unread,
Dec 10, 2015, 4:10:26 AM12/10/15
to Andy Lutomirski, x...@kernel.org, Marcelo Tosatti, Radim Krcmar, linux-...@vger.kernel.org, k...@vger.kernel.org, Alexander Graf


On 10/12/2015 00:12, Andy Lutomirski wrote:
> Signed-off-by: Andy Lutomirski <lu...@kernel.org>
> ---
> arch/x86/entry/vdso/vclock_gettime.c | 1 -
> arch/x86/entry/vdso/vma.c | 1 +
> arch/x86/include/asm/fixmap.h | 5 -----
> arch/x86/include/asm/pvclock.h | 5 -----
> arch/x86/kernel/kvmclock.c | 6 ------
> arch/x86/kernel/pvclock.c | 24 ------------------------
> 6 files changed, 1 insertion(+), 41 deletions(-)
>
> diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
> index 5dd363d54348..59a98c25bde7 100644
> --- a/arch/x86/entry/vdso/vclock_gettime.c
> +++ b/arch/x86/entry/vdso/vclock_gettime.c
> @@ -45,7 +45,6 @@ extern u8 pvclock_page
>
> #include <linux/kernel.h>
> #include <asm/vsyscall.h>
> -#include <asm/fixmap.h>
> #include <asm/pvclock.h>
>
> notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
> diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
> index aa828191c654..b8f69e264ac4 100644
> --- a/arch/x86/entry/vdso/vma.c
> +++ b/arch/x86/entry/vdso/vma.c
> @@ -12,6 +12,7 @@
> #include <linux/random.h>
> #include <linux/elf.h>
> #include <linux/cpu.h>
> +#include <asm/pvclock.h>
> #include <asm/vgtod.h>
> #include <asm/proto.h>
> #include <asm/vdso.h>
> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
> index f80d70009ff8..6d7d0e52ed5a 100644
> --- a/arch/x86/include/asm/fixmap.h
> +++ b/arch/x86/include/asm/fixmap.h
> @@ -19,7 +19,6 @@
> #include <asm/acpi.h>
> #include <asm/apicdef.h>
> #include <asm/page.h>
> -#include <asm/pvclock.h>
> #ifdef CONFIG_X86_32
> #include <linux/threads.h>
> #include <asm/kmap_types.h>
> @@ -72,10 +71,6 @@ enum fixed_addresses {
> #ifdef CONFIG_X86_VSYSCALL_EMULATION
> VSYSCALL_PAGE = (FIXADDR_TOP - VSYSCALL_ADDR) >> PAGE_SHIFT,
> #endif
> -#ifdef CONFIG_PARAVIRT_CLOCK
> - PVCLOCK_FIXMAP_BEGIN,
> - PVCLOCK_FIXMAP_END = PVCLOCK_FIXMAP_BEGIN+PVCLOCK_VSYSCALL_NR_PAGES-1,
> -#endif
> #endif
> FIX_DBGP_BASE,
> FIX_EARLYCON_MEM_BASE,
> diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
> index 3864398c7cb2..66df22b2e0c9 100644
> --- a/arch/x86/include/asm/pvclock.h
> +++ b/arch/x86/include/asm/pvclock.h
> @@ -100,10 +100,5 @@ struct pvclock_vsyscall_time_info {
> } __attribute__((__aligned__(SMP_CACHE_BYTES)));
>
> #define PVTI_SIZE sizeof(struct pvclock_vsyscall_time_info)
> -#define PVCLOCK_VSYSCALL_NR_PAGES (((NR_CPUS-1)/(PAGE_SIZE/PVTI_SIZE))+1)
> -
> -int __init pvclock_init_vsyscall(struct pvclock_vsyscall_time_info *i,
> - int size);
> -struct pvclock_vcpu_time_info *pvclock_get_vsyscall_time_info(int cpu);
>
> #endif /* _ASM_X86_PVCLOCK_H */
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index ec1b06dc82d2..72cef58693c7 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -310,7 +310,6 @@ int __init kvm_setup_vsyscall_timeinfo(void)
> {
> #ifdef CONFIG_X86_64
> int cpu;
> - int ret;
> u8 flags;
> struct pvclock_vcpu_time_info *vcpu_time;
> unsigned int size;
> @@ -330,11 +329,6 @@ int __init kvm_setup_vsyscall_timeinfo(void)
> return 1;
> }
>
> - if ((ret = pvclock_init_vsyscall(hv_clock, size))) {
> - put_cpu();
> - return ret;
> - }
> -
> put_cpu();
>
> kvm_clock.archdata.vclock_mode = VCLOCK_PVCLOCK;
> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> index 2f355d229a58..99bfc025111d 100644
> --- a/arch/x86/kernel/pvclock.c
> +++ b/arch/x86/kernel/pvclock.c
> @@ -140,27 +140,3 @@ void pvclock_read_wallclock(struct pvclock_wall_clock *wall_clock,
>
> set_normalized_timespec(ts, now.tv_sec, now.tv_nsec);
> }
> -
> -#ifdef CONFIG_X86_64
> -/*
> - * Initialize the generic pvclock vsyscall state. This will allocate
> - * a/some page(s) for the per-vcpu pvclock information, set up a
> - * fixmap mapping for the page(s)
> - */
> -
> -int __init pvclock_init_vsyscall(struct pvclock_vsyscall_time_info *i,
> - int size)
> -{
> - int idx;
> -
> - WARN_ON (size != PVCLOCK_VSYSCALL_NR_PAGES*PAGE_SIZE);
> -
> - for (idx = 0; idx <= (PVCLOCK_FIXMAP_END-PVCLOCK_FIXMAP_BEGIN); idx++) {
> - __set_fixmap(PVCLOCK_FIXMAP_BEGIN + idx,
> - __pa(i) + (idx*PAGE_SIZE),
> - PAGE_KERNEL_VVAR);
> - }
> -
> - return 0;
> -}
> -#endif
>

Acked-by: Paolo Bonzini <pbon...@redhat.com>

Paolo Bonzini

unread,
Dec 10, 2015, 4:11:11 AM12/10/15
to Andy Lutomirski, x...@kernel.org, Marcelo Tosatti, Radim Krcmar, linux-...@vger.kernel.org, k...@vger.kernel.org, Alexander Graf


On 10/12/2015 00:12, Andy Lutomirski wrote:
> Now that pvclock doesn't require access to the fixmap, all vdso
> variants can use it.
>
> The kernel side isn't wired up for 32-bit kernels yet, but this
> covers 32-bit and x32 userspace on 64-bit kernels.
>
> Signed-off-by: Andy Lutomirski <lu...@kernel.org>
> ---
> arch/x86/entry/vdso/vclock_gettime.c | 91 ++++++++++++++++--------------------
> 1 file changed, 40 insertions(+), 51 deletions(-)
>
> diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
> index 59a98c25bde7..8602f06c759f 100644
> --- a/arch/x86/entry/vdso/vclock_gettime.c
> +++ b/arch/x86/entry/vdso/vclock_gettime.c
> @@ -17,8 +17,10 @@
> #include <asm/vvar.h>
> #include <asm/unistd.h>
> #include <asm/msr.h>
> +#include <asm/pvclock.h>
> #include <linux/math64.h>
> #include <linux/time.h>
> +#include <linux/kernel.h>
>
> #define gtod (&VVAR(vsyscall_gtod_data))
>
> @@ -43,10 +45,6 @@ extern u8 pvclock_page
>
> #ifndef BUILD_VDSO32
>
> -#include <linux/kernel.h>
> -#include <asm/vsyscall.h>
> -#include <asm/pvclock.h>
> -
> notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
> {
> long ret;
> @@ -64,8 +62,42 @@ notrace static long vdso_fallback_gtod(struct timeval *tv, struct timezone *tz)
> return ret;
> }
>
> -#ifdef CONFIG_PARAVIRT_CLOCK
>
> +#else
> +
> +notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
> +{
> + long ret;
> +
> + asm(
> + "mov %%ebx, %%edx \n"
> + "mov %2, %%ebx \n"
> + "call __kernel_vsyscall \n"
> + "mov %%edx, %%ebx \n"
> + : "=a" (ret)
> + : "0" (__NR_clock_gettime), "g" (clock), "c" (ts)
> + : "memory", "edx");
> + return ret;
> +}
> +
> +notrace static long vdso_fallback_gtod(struct timeval *tv, struct timezone *tz)
> +{
> + long ret;
> +
> + asm(
> + "mov %%ebx, %%edx \n"
> + "mov %2, %%ebx \n"
> + "call __kernel_vsyscall \n"
> + "mov %%edx, %%ebx \n"
> + : "=a" (ret)
> + : "0" (__NR_gettimeofday), "g" (tv), "c" (tz)
> + : "memory", "edx");
> + return ret;
> +}
> +
> +#endif
> +
> +#ifdef CONFIG_PARAVIRT_CLOCK
> static notrace const struct pvclock_vsyscall_time_info *get_pvti0(void)
> {
> return (const struct pvclock_vsyscall_time_info *)&pvclock_page;
> @@ -109,9 +141,9 @@ static notrace cycle_t vread_pvclock(int *mode)
> do {
> version = pvti->version;
>
> - /* This is also a read barrier, so we'll read version first. */
> - tsc = rdtsc_ordered();
> + smp_rmb();
>
> + tsc = rdtsc_ordered();
> pvti_tsc_to_system_mul = pvti->tsc_to_system_mul;
> pvti_tsc_shift = pvti->tsc_shift;
> pvti_system_time = pvti->system_time;
> @@ -126,7 +158,7 @@ static notrace cycle_t vread_pvclock(int *mode)
> pvclock_scale_delta(delta, pvti_tsc_to_system_mul,
> pvti_tsc_shift);
>
> - /* refer to tsc.c read_tsc() comment for rationale */
> + /* refer to vread_tsc() comment for rationale */
> last = gtod->cycle_last;
>
> if (likely(ret >= last))
> @@ -136,49 +168,6 @@ static notrace cycle_t vread_pvclock(int *mode)
> }
> #endif
>
> -#else
> -
> -notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
> -{
> - long ret;
> -
> - asm(
> - "mov %%ebx, %%edx \n"
> - "mov %2, %%ebx \n"
> - "call __kernel_vsyscall \n"
> - "mov %%edx, %%ebx \n"
> - : "=a" (ret)
> - : "0" (__NR_clock_gettime), "g" (clock), "c" (ts)
> - : "memory", "edx");
> - return ret;
> -}
> -
> -notrace static long vdso_fallback_gtod(struct timeval *tv, struct timezone *tz)
> -{
> - long ret;
> -
> - asm(
> - "mov %%ebx, %%edx \n"
> - "mov %2, %%ebx \n"
> - "call __kernel_vsyscall \n"
> - "mov %%edx, %%ebx \n"
> - : "=a" (ret)
> - : "0" (__NR_gettimeofday), "g" (tv), "c" (tz)
> - : "memory", "edx");
> - return ret;
> -}
> -
> -#ifdef CONFIG_PARAVIRT_CLOCK
> -
> -static notrace cycle_t vread_pvclock(int *mode)
> -{
> - *mode = VCLOCK_NONE;
> - return 0;
> -}
> -#endif
> -
> -#endif
> -
> notrace static cycle_t vread_tsc(void)
> {
> cycle_t ret = (cycle_t)rdtsc_ordered();
>

Acked-by: Paolo Bonzini <pbon...@redhat.com>

Paolo Bonzini

unread,
Dec 10, 2015, 4:12:00 AM12/10/15
to Andy Lutomirski, x...@kernel.org, Marcelo Tosatti, Radim Krcmar, linux-...@vger.kernel.org, k...@vger.kernel.org, Alexander Graf


On 10/12/2015 00:12, Andy Lutomirski wrote:
> Signed-off-by: Andy Lutomirski <lu...@kernel.org>
> ---
> arch/x86/entry/vdso/vclock_gettime.c | 20 ++++++++------------
> arch/x86/entry/vdso/vdso-layout.lds.S | 3 ++-
> arch/x86/entry/vdso/vdso2c.c | 3 +++
> arch/x86/entry/vdso/vma.c | 13 +++++++++++++
> arch/x86/include/asm/pvclock.h | 9 +++++++++
> arch/x86/include/asm/vdso.h | 1 +
> arch/x86/kernel/kvmclock.c | 5 +++++
> 7 files changed, 41 insertions(+), 13 deletions(-)
>
> diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
> index c325ba1bdddf..5dd363d54348 100644
> --- a/arch/x86/entry/vdso/vclock_gettime.c
> +++ b/arch/x86/entry/vdso/vclock_gettime.c
> @@ -36,6 +36,11 @@ static notrace cycle_t vread_hpet(void)
> }
> #endif
>
> +#ifdef CONFIG_PARAVIRT_CLOCK
> +extern u8 pvclock_page
> + __attribute__((visibility("hidden")));
> +#endif
> +
> #ifndef BUILD_VDSO32
>
> #include <linux/kernel.h>
> @@ -62,23 +67,14 @@ notrace static long vdso_fallback_gtod(struct timeval *tv, struct timezone *tz)
>
> #ifdef CONFIG_PARAVIRT_CLOCK
>
> -static notrace const struct pvclock_vsyscall_time_info *get_pvti(int cpu)
> +static notrace const struct pvclock_vsyscall_time_info *get_pvti0(void)
> {
> - const struct pvclock_vsyscall_time_info *pvti_base;
> - int idx = cpu / (PAGE_SIZE/PVTI_SIZE);
> - int offset = cpu % (PAGE_SIZE/PVTI_SIZE);
> -
> - BUG_ON(PVCLOCK_FIXMAP_BEGIN + idx > PVCLOCK_FIXMAP_END);
> -
> - pvti_base = (struct pvclock_vsyscall_time_info *)
> - __fix_to_virt(PVCLOCK_FIXMAP_BEGIN+idx);
> -
> - return &pvti_base[offset];
> + return (const struct pvclock_vsyscall_time_info *)&pvclock_page;
> }
>
> static notrace cycle_t vread_pvclock(int *mode)
> {
> - const struct pvclock_vcpu_time_info *pvti = &get_pvti(0)->pvti;
> + const struct pvclock_vcpu_time_info *pvti = &get_pvti0()->pvti;
> cycle_t ret;
> u64 tsc, pvti_tsc;
> u64 last, delta, pvti_system_time;
> diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S b/arch/x86/entry/vdso/vdso-layout.lds.S
> index de2c921025f5..4158acc17df0 100644
> --- a/arch/x86/entry/vdso/vdso-layout.lds.S
> +++ b/arch/x86/entry/vdso/vdso-layout.lds.S
> @@ -25,7 +25,7 @@ SECTIONS
> * segment.
> */
>
> - vvar_start = . - 2 * PAGE_SIZE;
> + vvar_start = . - 3 * PAGE_SIZE;
> vvar_page = vvar_start;
>
> /* Place all vvars at the offsets in asm/vvar.h. */
> @@ -36,6 +36,7 @@ SECTIONS
> #undef EMIT_VVAR
>
> hpet_page = vvar_start + PAGE_SIZE;
> + pvclock_page = vvar_start + 2 * PAGE_SIZE;
>
> . = SIZEOF_HEADERS;
>
> diff --git a/arch/x86/entry/vdso/vdso2c.c b/arch/x86/entry/vdso/vdso2c.c
> index 785d9922b106..491020b2826d 100644
> --- a/arch/x86/entry/vdso/vdso2c.c
> +++ b/arch/x86/entry/vdso/vdso2c.c
> @@ -73,6 +73,7 @@ enum {
> sym_vvar_start,
> sym_vvar_page,
> sym_hpet_page,
> + sym_pvclock_page,
> sym_VDSO_FAKE_SECTION_TABLE_START,
> sym_VDSO_FAKE_SECTION_TABLE_END,
> };
> @@ -80,6 +81,7 @@ enum {
> const int special_pages[] = {
> sym_vvar_page,
> sym_hpet_page,
> + sym_pvclock_page,
> };
>
> struct vdso_sym {
> @@ -91,6 +93,7 @@ struct vdso_sym required_syms[] = {
> [sym_vvar_start] = {"vvar_start", true},
> [sym_vvar_page] = {"vvar_page", true},
> [sym_hpet_page] = {"hpet_page", true},
> + [sym_pvclock_page] = {"pvclock_page", true},
> [sym_VDSO_FAKE_SECTION_TABLE_START] = {
> "VDSO_FAKE_SECTION_TABLE_START", false
> },
> diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
> index 64df47148160..aa828191c654 100644
> --- a/arch/x86/entry/vdso/vma.c
> +++ b/arch/x86/entry/vdso/vma.c
> @@ -100,6 +100,7 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr)
> .name = "[vvar]",
> .pages = no_pages,
> };
> + struct pvclock_vsyscall_time_info *pvti;
>
> if (calculate_addr) {
> addr = vdso_addr(current->mm->start_stack,
> @@ -169,6 +170,18 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr)
> }
> #endif
>
> + pvti = pvclock_pvti_cpu0_va();
> + if (pvti && image->sym_pvclock_page) {
> + ret = remap_pfn_range(vma,
> + text_start + image->sym_pvclock_page,
> + __pa(pvti) >> PAGE_SHIFT,
> + PAGE_SIZE,
> + PAGE_READONLY);
> +
> + if (ret)
> + goto up_fail;
> + }
> +
> up_fail:
> if (ret)
> current->mm->context.vdso = NULL;
> diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
> index 7a6bed5c08bc..3864398c7cb2 100644
> --- a/arch/x86/include/asm/pvclock.h
> +++ b/arch/x86/include/asm/pvclock.h
> @@ -4,6 +4,15 @@
> #include <linux/clocksource.h>
> #include <asm/pvclock-abi.h>
>
> +#ifdef CONFIG_PARAVIRT_CLOCK
> +extern struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void);
> +#else
> +static inline struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
> +{
> + return NULL;
> +}
> +#endif
> +
> /* some helper functions for xen and kvm pv clock sources */
> cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
> u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src);
> diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
> index 756de9190aec..deabaf9759b6 100644
> --- a/arch/x86/include/asm/vdso.h
> +++ b/arch/x86/include/asm/vdso.h
> @@ -22,6 +22,7 @@ struct vdso_image {
>
> long sym_vvar_page;
> long sym_hpet_page;
> + long sym_pvclock_page;
> long sym_VDSO32_NOTE_MASK;
> long sym___kernel_sigreturn;
> long sym___kernel_rt_sigreturn;
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index 2bd81e302427..ec1b06dc82d2 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -45,6 +45,11 @@ early_param("no-kvmclock", parse_no_kvmclock);
> static struct pvclock_vsyscall_time_info *hv_clock;
> static struct pvclock_wall_clock wall_clock;
>
> +struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
> +{
> + return hv_clock;
> +}
> +
> /*
> * The wallclock is the time of day when we booted. Since then, some time may
> * have elapsed since the hypervisor wrote the data. So we try to account for
>

Acked-by: Paolo Bonzini <pbon...@redhat.com>

Andy Lutomirski

unread,
Dec 10, 2015, 10:22:33 PM12/10/15
to Andy Lutomirski, X86 ML, linux-...@vger.kernel.org, linu...@kvack.org, Andrew Morton
On Thu, Dec 10, 2015 at 7:20 PM, Andy Lutomirski <lu...@kernel.org> wrote:
> NB: patch 1 doesn't really belong here, but it makes this a lot

Ugh, please disregard the resend. I typoed my git send-email command slightly.

--Andy

Ingo Molnar

unread,
Dec 11, 2015, 2:53:02 AM12/11/15
to Paolo Bonzini, Andy Lutomirski, x...@kernel.org, Marcelo Tosatti, Radim Krcmar, linux-...@vger.kernel.org, k...@vger.kernel.org, Alexander Graf, Andy Lutomirski
Thanks. I've added your Reviewed-by to the 1/5 patch as well - to be able to put
the whole series into the tip:x86/entry tree. Let me know if you'd like it to be
done differently.

Thanks,

Ingo

Ingo Molnar

unread,
Dec 11, 2015, 3:06:35 AM12/11/15
to Andy Lutomirski, x...@kernel.org, Marcelo Tosatti, Radim Krcmar, Paolo Bonzini, linux-...@vger.kernel.org, k...@vger.kernel.org, Alexander Graf

* Andy Lutomirski <lu...@kernel.org> wrote:

> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
> index f80d70009ff8..6d7d0e52ed5a 100644
> --- a/arch/x86/include/asm/fixmap.h
> +++ b/arch/x86/include/asm/fixmap.h
> @@ -19,7 +19,6 @@
> #include <asm/acpi.h>
> #include <asm/apicdef.h>
> #include <asm/page.h>
> -#include <asm/pvclock.h>
> #ifdef CONFIG_X86_32
> #include <linux/threads.h>
> #include <asm/kmap_types.h>

So this change triggered a build failure on 64-bit allmodconfig - fixed via the
patch below. Your change unearthed a latent bug, a missing header inclusion.

Thanks,

Ingo

============>
From d51953b0873358d13b189996e6976dfa12a9b59d Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mi...@kernel.org>
Date: Fri, 11 Dec 2015 09:01:30 +0100
Subject: [PATCH] x86/platform/uv: Include clocksource.h for clocksource_touch_watchdog()

This build failure triggers on 64-bit allmodconfig:

arch/x86/platform/uv/uv_nmi.c:493:2: error: implicit declaration of function ‘clocksource_touch_watchdog’ [-Werror=implicit-function-declaration]

which is caused by recent changes exposing a missing clocksource.h include
in uv_nmi.c:

cc1e24fdb064 x86/vdso: Remove pvclock fixmap machinery

this file got clocksource.h indirectly via fixmap.h - that stealth route
of header inclusion is now gone.

Cc: Borislav Petkov <b...@alien8.de>
Cc: H. Peter Anvin <h...@zytor.com>
Cc: Linus Torvalds <torv...@linux-foundation.org>
Cc: Thomas Gleixner <tg...@linutronix.de>
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
arch/x86/platform/uv/uv_nmi.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/platform/uv/uv_nmi.c b/arch/x86/platform/uv/uv_nmi.c
index 327f21c3bde1..8dd80050d705 100644
--- a/arch/x86/platform/uv/uv_nmi.c
+++ b/arch/x86/platform/uv/uv_nmi.c
@@ -28,6 +28,7 @@
#include <linux/nmi.h>
#include <linux/sched.h>
#include <linux/slab.h>
+#include <linux/clocksource.h>

#include <asm/apic.h>
#include <asm/current.h>

Paolo Bonzini

unread,
Dec 11, 2015, 3:42:17 AM12/11/15
to Ingo Molnar, Andy Lutomirski, x...@kernel.org, Marcelo Tosatti, Radim Krcmar, linux-...@vger.kernel.org, k...@vger.kernel.org, Alexander Graf, Andy Lutomirski
The 1/5 patch is entirely in KVM and is not necessary for the rest of
the series to work. I would like it to be separate, because Marcelo has
not yet chimed in to say why it was necessary.

Can you just apply patches 2-5?

Paolo

Andy Lutomirski

unread,
Dec 11, 2015, 12:34:18 PM12/11/15
to Ingo Molnar, Andy Lutomirski, X86 ML, Marcelo Tosatti, Radim Krcmar, Paolo Bonzini, linux-...@vger.kernel.org, kvm list, Alexander Graf
LGTM.

--Andy

Andy Lutomirski

unread,
Dec 11, 2015, 1:03:51 PM12/11/15
to Paolo Bonzini, Ingo Molnar, Andy Lutomirski, X86 ML, Marcelo Tosatti, Radim Krcmar, linux-...@vger.kernel.org, kvm list, Alexander Graf
Yes, please. I don't grok the clock update mechanism in the KVM host
well enough to be sure that patch 1 is actually correct. All I know
is that it works better on my laptop with the patch than without the
patch and that it seems at least conceptually correct.

In any event, patch 1 is a host patch and 2-5 are guest patches, and
they only interact to the extent that it's hard for me to test 2-5 on
the guest without patch 1 on the host because without patch 1 my
laptop's host kernel tends to disable stable kvmclock, thus disabling
the entire mechanism in the guest.

--Andy

tip-bot for Andy Lutomirski

unread,
Dec 14, 2015, 3:17:59 AM12/14/15
to linux-ti...@vger.kernel.org, lu...@amacapital.net, tg...@linutronix.de, h...@zytor.com, b...@alien8.de, pbon...@redhat.com, pet...@infradead.org, mi...@kernel.org, brg...@gmail.com, dvla...@redhat.com, linux-...@vger.kernel.org, torv...@linux-foundation.org
Commit-ID: 6b078f5de7fc0851af4102493c7b5bb07e49c4cb
Gitweb: http://git.kernel.org/tip/6b078f5de7fc0851af4102493c7b5bb07e49c4cb
Author: Andy Lutomirski <lu...@amacapital.net>
AuthorDate: Thu, 10 Dec 2015 19:20:19 -0800
Committer: Ingo Molnar <mi...@kernel.org>
CommitDate: Fri, 11 Dec 2015 08:56:02 +0100

x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

The pvclock vdso code was too abstracted to understand easily
and excessively paranoid. Simplify it for a huge speedup.

This opens the door for additional simplifications, as the vdso
no longer accesses the pvti for any vcpu other than vcpu 0.

Before, vclock_gettime using kvm-clock took about 45ns on my
machine. With this change, it takes 29ns, which is almost as
fast as the pure TSC implementation.

Signed-off-by: Andy Lutomirski <lu...@amacapital.net>
Reviewed-by: Paolo Bonzini <pbon...@redhat.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Brian Gerst <brg...@gmail.com>
Cc: Denys Vlasenko <dvla...@redhat.com>
Cc: H. Peter Anvin <h...@zytor.com>
Cc: Linus Torvalds <torv...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: linu...@kvack.org
Link: http://lkml.kernel.org/r/6b51dcc41f1b101f963945c5ec7093...@kernel.org
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
arch/x86/entry/vdso/vclock_gettime.c | 81 ++++++++++++++++++++----------------
1 file changed, 46 insertions(+), 35 deletions(-)

diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index ca94fa6..c325ba1 100644

tip-bot for Andy Lutomirski

unread,
Dec 14, 2015, 3:18:22 AM12/14/15
to linux-ti...@vger.kernel.org, torv...@linux-foundation.org, linux-...@vger.kernel.org, h...@zytor.com, lu...@amacapital.net, lu...@kernel.org, b...@alien8.de, tg...@linutronix.de, pbon...@redhat.com, brg...@gmail.com, mi...@kernel.org, pet...@infradead.org, dvla...@redhat.com
Commit-ID: dac16fba6fc590fa7239676b35ed75dae4c4cd2b
Gitweb: http://git.kernel.org/tip/dac16fba6fc590fa7239676b35ed75dae4c4cd2b
Author: Andy Lutomirski <lu...@kernel.org>
AuthorDate: Thu, 10 Dec 2015 19:20:20 -0800
Committer: Ingo Molnar <mi...@kernel.org>
CommitDate: Fri, 11 Dec 2015 08:56:03 +0100

x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap

Signed-off-by: Andy Lutomirski <lu...@kernel.org>
Reviewed-by: Paolo Bonzini <pbon...@redhat.com>
Cc: Andy Lutomirski <lu...@amacapital.net>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Brian Gerst <brg...@gmail.com>
Cc: Denys Vlasenko <dvla...@redhat.com>
Cc: H. Peter Anvin <h...@zytor.com>
Cc: Linus Torvalds <torv...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: linu...@kvack.org
Link: http://lkml.kernel.org/r/9d37826fdc7e2d2809efe31d5345f9...@kernel.org
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
arch/x86/entry/vdso/vclock_gettime.c | 20 ++++++++------------
arch/x86/entry/vdso/vdso-layout.lds.S | 3 ++-
arch/x86/entry/vdso/vdso2c.c | 3 +++
arch/x86/entry/vdso/vma.c | 13 +++++++++++++
arch/x86/include/asm/pvclock.h | 9 +++++++++
arch/x86/include/asm/vdso.h | 1 +
arch/x86/kernel/kvmclock.c | 5 +++++
7 files changed, 41 insertions(+), 13 deletions(-)

diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index c325ba1..5dd363d 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -36,6 +36,11 @@ static notrace cycle_t vread_hpet(void)
}
#endif

+#ifdef CONFIG_PARAVIRT_CLOCK
+extern u8 pvclock_page
+ __attribute__((visibility("hidden")));
+#endif
+
#ifndef BUILD_VDSO32

#include <linux/kernel.h>
@@ -62,23 +67,14 @@ notrace static long vdso_fallback_gtod(struct timeval *tv, struct timezone *tz)

#ifdef CONFIG_PARAVIRT_CLOCK

-static notrace const struct pvclock_vsyscall_time_info *get_pvti(int cpu)
+static notrace const struct pvclock_vsyscall_time_info *get_pvti0(void)
{
- const struct pvclock_vsyscall_time_info *pvti_base;
- int idx = cpu / (PAGE_SIZE/PVTI_SIZE);
- int offset = cpu % (PAGE_SIZE/PVTI_SIZE);
-
- BUG_ON(PVCLOCK_FIXMAP_BEGIN + idx > PVCLOCK_FIXMAP_END);
-
- pvti_base = (struct pvclock_vsyscall_time_info *)
- __fix_to_virt(PVCLOCK_FIXMAP_BEGIN+idx);
-
- return &pvti_base[offset];
+ return (const struct pvclock_vsyscall_time_info *)&pvclock_page;
}

static notrace cycle_t vread_pvclock(int *mode)
{
- const struct pvclock_vcpu_time_info *pvti = &get_pvti(0)->pvti;
+ const struct pvclock_vcpu_time_info *pvti = &get_pvti0()->pvti;
cycle_t ret;
u64 tsc, pvti_tsc;
u64 last, delta, pvti_system_time;
diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S b/arch/x86/entry/vdso/vdso-layout.lds.S
index de2c921..4158acc 100644
--- a/arch/x86/entry/vdso/vdso-layout.lds.S
+++ b/arch/x86/entry/vdso/vdso-layout.lds.S
@@ -25,7 +25,7 @@ SECTIONS
* segment.
*/

- vvar_start = . - 2 * PAGE_SIZE;
+ vvar_start = . - 3 * PAGE_SIZE;
vvar_page = vvar_start;

/* Place all vvars at the offsets in asm/vvar.h. */
@@ -36,6 +36,7 @@ SECTIONS
#undef EMIT_VVAR

hpet_page = vvar_start + PAGE_SIZE;
+ pvclock_page = vvar_start + 2 * PAGE_SIZE;

. = SIZEOF_HEADERS;

diff --git a/arch/x86/entry/vdso/vdso2c.c b/arch/x86/entry/vdso/vdso2c.c
index 785d992..491020b 100644
--- a/arch/x86/entry/vdso/vdso2c.c
+++ b/arch/x86/entry/vdso/vdso2c.c
@@ -73,6 +73,7 @@ enum {
sym_vvar_start,
sym_vvar_page,
sym_hpet_page,
+ sym_pvclock_page,
sym_VDSO_FAKE_SECTION_TABLE_START,
sym_VDSO_FAKE_SECTION_TABLE_END,
};
@@ -80,6 +81,7 @@ enum {
const int special_pages[] = {
sym_vvar_page,
sym_hpet_page,
+ sym_pvclock_page,
};

struct vdso_sym {
@@ -91,6 +93,7 @@ struct vdso_sym required_syms[] = {
[sym_vvar_start] = {"vvar_start", true},
[sym_vvar_page] = {"vvar_page", true},
[sym_hpet_page] = {"hpet_page", true},
+ [sym_pvclock_page] = {"pvclock_page", true},
[sym_VDSO_FAKE_SECTION_TABLE_START] = {
"VDSO_FAKE_SECTION_TABLE_START", false
},
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 64df471..aa82819 100644
index 7a6bed5..3864398 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -4,6 +4,15 @@
#include <linux/clocksource.h>
#include <asm/pvclock-abi.h>

+#ifdef CONFIG_PARAVIRT_CLOCK
+extern struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void);
+#else
+static inline struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
+{
+ return NULL;
+}
+#endif
+
/* some helper functions for xen and kvm pv clock sources */
cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src);
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index 756de91..deabaf9 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -22,6 +22,7 @@ struct vdso_image {

long sym_vvar_page;
long sym_hpet_page;
+ long sym_pvclock_page;
long sym_VDSO32_NOTE_MASK;
long sym___kernel_sigreturn;
long sym___kernel_rt_sigreturn;
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 2bd81e3..ec1b06d 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -45,6 +45,11 @@ early_param("no-kvmclock", parse_no_kvmclock);
static struct pvclock_vsyscall_time_info *hv_clock;
static struct pvclock_wall_clock wall_clock;

+struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
+{
+ return hv_clock;
+}
+
/*
* The wallclock is the time of day when we booted. Since then, some time may
* have elapsed since the hypervisor wrote the data. So we try to account for

tip-bot for Andy Lutomirski

unread,
Dec 14, 2015, 3:18:45 AM12/14/15
to linux-ti...@vger.kernel.org, pet...@infradead.org, mi...@kernel.org, linux-...@vger.kernel.org, h...@zytor.com, dvla...@redhat.com, pbon...@redhat.com, torv...@linux-foundation.org, lu...@amacapital.net, lu...@kernel.org, tg...@linutronix.de, b...@alien8.de, brg...@gmail.com
Commit-ID: cc1e24fdb064d3126a494716f22ad4fc39306742
Gitweb: http://git.kernel.org/tip/cc1e24fdb064d3126a494716f22ad4fc39306742
Author: Andy Lutomirski <lu...@kernel.org>
AuthorDate: Thu, 10 Dec 2015 19:20:21 -0800
Committer: Ingo Molnar <mi...@kernel.org>
CommitDate: Fri, 11 Dec 2015 08:56:03 +0100

x86/vdso: Remove pvclock fixmap machinery

Signed-off-by: Andy Lutomirski <lu...@kernel.org>
Reviewed-by: Paolo Bonzini <pbon...@redhat.com>
Cc: Andy Lutomirski <lu...@amacapital.net>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Brian Gerst <brg...@gmail.com>
Cc: Denys Vlasenko <dvla...@redhat.com>
Cc: H. Peter Anvin <h...@zytor.com>
Cc: Linus Torvalds <torv...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: linu...@kvack.org
Link: http://lkml.kernel.org/r/4933029991103ae44672c82b97a200...@kernel.org
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
arch/x86/entry/vdso/vclock_gettime.c | 1 -
arch/x86/entry/vdso/vma.c | 1 +
arch/x86/include/asm/fixmap.h | 5 -----
arch/x86/include/asm/pvclock.h | 5 -----
arch/x86/kernel/kvmclock.c | 6 ------
arch/x86/kernel/pvclock.c | 24 ------------------------
6 files changed, 1 insertion(+), 41 deletions(-)

diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index 5dd363d..59a98c2 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -45,7 +45,6 @@ extern u8 pvclock_page

#include <linux/kernel.h>
#include <asm/vsyscall.h>
-#include <asm/fixmap.h>
#include <asm/pvclock.h>

notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index aa82819..b8f69e2 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -12,6 +12,7 @@
#include <linux/random.h>
#include <linux/elf.h>
#include <linux/cpu.h>
+#include <asm/pvclock.h>
#include <asm/vgtod.h>
#include <asm/proto.h>
#include <asm/vdso.h>
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index f80d700..6d7d0e5 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -19,7 +19,6 @@
#include <asm/acpi.h>
#include <asm/apicdef.h>
#include <asm/page.h>
-#include <asm/pvclock.h>
#ifdef CONFIG_X86_32
#include <linux/threads.h>
#include <asm/kmap_types.h>
@@ -72,10 +71,6 @@ enum fixed_addresses {
#ifdef CONFIG_X86_VSYSCALL_EMULATION
VSYSCALL_PAGE = (FIXADDR_TOP - VSYSCALL_ADDR) >> PAGE_SHIFT,
#endif
-#ifdef CONFIG_PARAVIRT_CLOCK
- PVCLOCK_FIXMAP_BEGIN,
- PVCLOCK_FIXMAP_END = PVCLOCK_FIXMAP_BEGIN+PVCLOCK_VSYSCALL_NR_PAGES-1,
-#endif
#endif
FIX_DBGP_BASE,
FIX_EARLYCON_MEM_BASE,
diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index 3864398..66df22b 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -100,10 +100,5 @@ struct pvclock_vsyscall_time_info {
} __attribute__((__aligned__(SMP_CACHE_BYTES)));

#define PVTI_SIZE sizeof(struct pvclock_vsyscall_time_info)
-#define PVCLOCK_VSYSCALL_NR_PAGES (((NR_CPUS-1)/(PAGE_SIZE/PVTI_SIZE))+1)
-
-int __init pvclock_init_vsyscall(struct pvclock_vsyscall_time_info *i,
- int size);
-struct pvclock_vcpu_time_info *pvclock_get_vsyscall_time_info(int cpu);

#endif /* _ASM_X86_PVCLOCK_H */
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index ec1b06d..72cef58 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -310,7 +310,6 @@ int __init kvm_setup_vsyscall_timeinfo(void)
{
#ifdef CONFIG_X86_64
int cpu;
- int ret;
u8 flags;
struct pvclock_vcpu_time_info *vcpu_time;
unsigned int size;
@@ -330,11 +329,6 @@ int __init kvm_setup_vsyscall_timeinfo(void)
return 1;
}

- if ((ret = pvclock_init_vsyscall(hv_clock, size))) {
- put_cpu();
- return ret;
- }
-
put_cpu();

kvm_clock.archdata.vclock_mode = VCLOCK_PVCLOCK;
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 2f355d2..99bfc02 100644
- return 0;
-}
-#endif

tip-bot for Andy Lutomirski

unread,
Dec 14, 2015, 3:18:58 AM12/14/15
to linux-ti...@vger.kernel.org, pbon...@redhat.com, brg...@gmail.com, b...@alien8.de, tg...@linutronix.de, dvla...@redhat.com, torv...@linux-foundation.org, lu...@amacapital.net, pet...@infradead.org, linux-...@vger.kernel.org, h...@zytor.com, lu...@kernel.org, mi...@kernel.org
Commit-ID: 76480a6a55a03d0fe5dd6290ccde7f78678ab85e
Gitweb: http://git.kernel.org/tip/76480a6a55a03d0fe5dd6290ccde7f78678ab85e
Author: Andy Lutomirski <lu...@kernel.org>
AuthorDate: Thu, 10 Dec 2015 19:20:22 -0800
Committer: Ingo Molnar <mi...@kernel.org>
CommitDate: Fri, 11 Dec 2015 08:56:03 +0100

x86/vdso: Enable vdso pvclock access on all vdso variants

Now that pvclock doesn't require access to the fixmap, all vdso
variants can use it.

The kernel side isn't wired up for 32-bit kernels yet, but this
covers 32-bit and x32 userspace on 64-bit kernels.

Signed-off-by: Andy Lutomirski <lu...@kernel.org>
Reviewed-by: Paolo Bonzini <pbon...@redhat.com>
Cc: Andy Lutomirski <lu...@amacapital.net>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Brian Gerst <brg...@gmail.com>
Cc: Denys Vlasenko <dvla...@redhat.com>
Cc: H. Peter Anvin <h...@zytor.com>
Cc: Linus Torvalds <torv...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: linu...@kvack.org
Link: http://lkml.kernel.org/r/a7ef693b7a4c88dd2173dc1d4bf6bc...@kernel.org
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
arch/x86/entry/vdso/vclock_gettime.c | 91 ++++++++++++++++--------------------
1 file changed, 40 insertions(+), 51 deletions(-)

diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index 59a98c2..8602f06 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -17,8 +17,10 @@
#include <asm/vvar.h>
#include <asm/unistd.h>
#include <asm/msr.h>
+#include <asm/pvclock.h>
#include <linux/math64.h>
#include <linux/time.h>
+#include <linux/kernel.h>

#define gtod (&VVAR(vsyscall_gtod_data))

@@ -43,10 +45,6 @@ extern u8 pvclock_page

#ifndef BUILD_VDSO32

-#include <linux/kernel.h>
-#include <asm/vsyscall.h>
-#include <asm/pvclock.h>
-
notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
static notrace const struct pvclock_vsyscall_time_info *get_pvti0(void)
{
return (const struct pvclock_vsyscall_time_info *)&pvclock_page;
@@ -109,9 +141,9 @@ static notrace cycle_t vread_pvclock(int *mode)
do {
version = pvti->version;

- /* This is also a read barrier, so we'll read version first. */
- tsc = rdtsc_ordered();
+ smp_rmb();

+ tsc = rdtsc_ordered();
pvti_tsc_to_system_mul = pvti->tsc_to_system_mul;
pvti_tsc_shift = pvti->tsc_shift;
pvti_system_time = pvti->system_time;
@@ -126,7 +158,7 @@ static notrace cycle_t vread_pvclock(int *mode)
pvclock_scale_delta(delta, pvti_tsc_to_system_mul,
pvti_tsc_shift);

- /* refer to tsc.c read_tsc() comment for rationale */
+ /* refer to vread_tsc() comment for rationale */
last = gtod->cycle_last;

if (likely(ret >= last))
@@ -136,49 +168,6 @@ static notrace cycle_t vread_pvclock(int *mode)
}
#endif

-#else
-
-notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
-{
- long ret;
-
- asm(
- "mov %%ebx, %%edx \n"
- "mov %2, %%ebx \n"
- "call __kernel_vsyscall \n"
- "mov %%edx, %%ebx \n"
- : "=a" (ret)
- : "0" (__NR_clock_gettime), "g" (clock), "c" (ts)
- : "memory", "edx");
- return ret;
-}
-
-notrace static long vdso_fallback_gtod(struct timeval *tv, struct timezone *tz)
-{
- long ret;
-
- asm(
- "mov %%ebx, %%edx \n"
- "mov %2, %%ebx \n"
- "call __kernel_vsyscall \n"
- "mov %%edx, %%ebx \n"
- : "=a" (ret)
- : "0" (__NR_gettimeofday), "g" (tv), "c" (tz)
- : "memory", "edx");
- return ret;
-}
-
-#ifdef CONFIG_PARAVIRT_CLOCK
-
-static notrace cycle_t vread_pvclock(int *mode)
-{
- *mode = VCLOCK_NONE;
- return 0;
-}
-#endif
-
-#endif
-
notrace static cycle_t vread_tsc(void)
{
cycle_t ret = (cycle_t)rdtsc_ordered();

Andy Lutomirski

unread,
Dec 20, 2015, 6:06:57 AM12/20/15
to x...@kernel.org, Marcelo Tosatti, Radim Krcmar, Paolo Bonzini, linux-...@vger.kernel.org, k...@vger.kernel.org, Alexander Graf, Andy Lutomirski
x86: KVM vdso and clock improvements

NB: patch 1 doesn't really belong here, but it makes this a lot
easier for me to test. Patch 1, if it's okay at all, should go
though the kvm tree. The rest should probably go through
tip:x86/vdso once they're reviewed.

I'll do a followup to enable vdso pvclock on 32-bit guests.
I'm not currently set up to test it. (The KVM people could also
do it very easily on top of these patches.)

Changes from v1:
- Dropped patch 1
- Added Paolo's review and acks
- Fixed a build issue on some configs

Andy Lutomirski (4):
x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader
x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap
x86/vdso: Remove pvclock fixmap machinery
x86/vdso: Enable vdso pvclock access on all vdso variants

arch/x86/entry/vdso/vclock_gettime.c | 151 ++++++++++++++++------------------
arch/x86/entry/vdso/vdso-layout.lds.S | 3 +-
arch/x86/entry/vdso/vdso2c.c | 3 +
arch/x86/entry/vdso/vma.c | 14 ++++
arch/x86/include/asm/fixmap.h | 5 --
arch/x86/include/asm/pvclock.h | 14 ++--
arch/x86/include/asm/vdso.h | 1 +
arch/x86/kernel/kvmclock.c | 11 ++-
arch/x86/kernel/pvclock.c | 24 ------
9 files changed, 107 insertions(+), 119 deletions(-)

--
2.5.0

Marcelo Tosatti

unread,
Jan 4, 2016, 3:49:22 PM1/4/16
to Andy Lutomirski, x...@kernel.org, Radim Krcmar, Paolo Bonzini, linux-...@vger.kernel.org, k...@vger.kernel.org, Alexander Graf, Andy Lutomirski
On Sun, Dec 20, 2015 at 03:05:41AM -0800, Andy Lutomirski wrote:
> From: Andy Lutomirski <lu...@amacapital.net>
>
> The pvclock vdso code was too abstracted to understand easily and
> excessively paranoid. Simplify it for a huge speedup.
>
> This opens the door for additional simplifications, as the vdso no
> longer accesses the pvti for any vcpu other than vcpu 0.
>
> Before, vclock_gettime using kvm-clock took about 45ns on my machine.
> With this change, it takes 29ns, which is almost as fast as the pure TSC
> implementation.
>
> Reviewed-by: Paolo Bonzini <pbon...@redhat.com>
> Signed-off-by: Andy Lutomirski <lu...@amacapital.net>
> ---
> arch/x86/entry/vdso/vclock_gettime.c | 81 ++++++++++++++++++++----------------
> 1 file changed, 46 insertions(+), 35 deletions(-)
>
> diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
> index ca94fa649251..c325ba1bdddf 100644
> --- a/arch/x86/entry/vdso/vclock_gettime.c
> +++ b/arch/x86/entry/vdso/vclock_gettime.c
> @@ -78,47 +78,58 @@ static notrace const struct pvclock_vsyscall_time_info *get_pvti(int cpu)
>
> static notrace cycle_t vread_pvclock(int *mode)
Andy,

What happens if PVCLOCK_TSC_STABLE_BIT is disabled here?

> +
> + delta = tsc - pvti_tsc;
> + ret = pvti_system_time +
> + pvclock_scale_delta(delta, pvti_tsc_to_system_mul,
> + pvti_tsc_shift);
>
> /* refer to tsc.c read_tsc() comment for rationale */
> last = gtod->cycle_last;
> --
> 2.5.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in

Andy Lutomirski

unread,
Jan 4, 2016, 5:33:45 PM1/4/16
to Marcelo Tosatti, Andy Lutomirski, X86 ML, Radim Krcmar, Paolo Bonzini, linux-...@vger.kernel.org, kvm list, Alexander Graf
Do you mean what happens if it's disabled in the loop part after the
first check? If that's actually possible, I'll do a follow-up to bail
if that happens by moving the check into the loop.

--Andy

Marcelo Tosatti

unread,
Jan 4, 2016, 5:59:48 PM1/4/16
to Andy Lutomirski, Andy Lutomirski, X86 ML, Radim Krcmar, Paolo Bonzini, linux-...@vger.kernel.org, kvm list, Alexander Graf
It is possible.

Andy Lutomirski

unread,
Jan 4, 2016, 6:14:40 PM1/4/16
to X86 ML, Radim Krcmar, Paolo Bonzini, linux-...@vger.kernel.org, k...@vger.kernel.org, Alexander Graf, Andy Lutomirski
If the clock becomes unstable while we're reading it, we need to
bail. We can do this by simply moving the check into the seqcount
loop.

Reported-by: Marcelo Tosatti <mtos...@redhat.com>
Signed-off-by: Andy Lutomirski <lu...@kernel.org>
---

Marcelo, how's this?

arch/x86/entry/vdso/vclock_gettime.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index 8602f06c759f..1a50e09c945b 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -126,23 +126,23 @@ static notrace cycle_t vread_pvclock(int *mode)
*
* On Xen, we don't appear to have that guarantee, but Xen still
* supplies a valid seqlock using the version field.
-
+ *
* We only do pvclock vdso timing at all if
* PVCLOCK_TSC_STABLE_BIT is set, and we interpret that bit to
* mean that all vCPUs have matching pvti and that the TSC is
* synced, so we can just look at vCPU 0's pvti.
*/

- if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))) {
- *mode = VCLOCK_NONE;
- return 0;
- }
-
do {
version = pvti->version;

smp_rmb();

+ if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))) {
+ *mode = VCLOCK_NONE;
+ return 0;
+ }
+
tsc = rdtsc_ordered();
pvti_tsc_to_system_mul = pvti->tsc_to_system_mul;
pvti_tsc_shift = pvti->tsc_shift;
--
2.4.3

Marcelo Tosatti

unread,
Jan 7, 2016, 4:03:50 PM1/7/16
to Andy Lutomirski, X86 ML, Radim Krcmar, Paolo Bonzini, linux-...@vger.kernel.org, k...@vger.kernel.org, Alexander Graf
Check it before returning the value (once cleared, it can't be set back
to 1), similarly to what was in place before.


Andy Lutomirski

unread,
Jan 7, 2016, 4:14:09 PM1/7/16
to Marcelo Tosatti, Andy Lutomirski, X86 ML, Radim Krcmar, Paolo Bonzini, linux-...@vger.kernel.org, kvm list, Alexander Graf
I don't understand what you mean.

In the old code (4.3 and 4.4), the vdso checks STABLE_BIT at the end,
which is correct as long as STABLE_BIT can never change from 0 to 1.

In the -tip code, it's clearly wrong.

In the code in this patch, it should be correct regardless of how
STABLE_BIT changes as long as the seqcount works. Given that the
performance cost of doing that is zero, I'd rather keep it that way.
If we're really paranoid, we could move it after the rest of the pvti
reads and add a barrier, but is there really any host on which that
matters?

--Andy

--
Andy Lutomirski
AMA Capital Management, LLC

Paolo Bonzini

unread,
Jan 7, 2016, 4:48:09 PM1/7/16
to Andy Lutomirski, Marcelo Tosatti, Andy Lutomirski, X86 ML, Radim Krcmar, linux-...@vger.kernel.org, kvm list, Alexander Graf


On 07/01/2016 22:13, Andy Lutomirski wrote:
> I don't understand what you mean.
>
> In the old code (4.3 and 4.4), the vdso checks STABLE_BIT at the end,
> which is correct as long as STABLE_BIT can never change from 0 to 1.
>
> In the -tip code, it's clearly wrong.
>
> In the code in this patch, it should be correct regardless of how
> STABLE_BIT changes as long as the seqcount works. Given that the
> performance cost of doing that is zero, I'd rather keep it that way.
> If we're really paranoid, we could move it after the rest of the pvti
> reads and add a barrier, but is there really any host on which that
> matters?

I agree that your patch is fine.

Reviewed-by: Paolo Bonzini <pbon...@redhat.com>

Paolo

Marcelo Tosatti

unread,
Jan 8, 2016, 2:45:56 PM1/8/16
to Andy Lutomirski, Andy Lutomirski, X86 ML, Radim Krcmar, Paolo Bonzini, linux-...@vger.kernel.org, kvm list, Alexander Graf
Right, its OK due to version check, thanks.

Andy Lutomirski

unread,
Jan 12, 2016, 2:49:10 PM1/12/16
to Ingo Molnar, Andy Lutomirski, X86 ML, Radim Krcmar, Paolo Bonzini, linux-...@vger.kernel.org, kvm list, Alexander Graf, Marcelo Tosatti
Hi Ingo-

Can you apply this before the tip:x86/asm pull request goes out? It
fixes a regression in tip:x86/asm.

--Andy

Ingo Molnar

unread,
Jan 13, 2016, 5:46:27 AM1/13/16
to Andy Lutomirski, Andy Lutomirski, X86 ML, Radim Krcmar, Paolo Bonzini, linux-...@vger.kernel.org, kvm list, Alexander Graf, Marcelo Tosatti, Linus Torvalds

* Andy Lutomirski <lu...@amacapital.net> wrote:

> Hi Ingo-
>
> Can you apply this before the tip:x86/asm pull request goes out? It
> fixes a regression in tip:x86/asm.

Ooops, saw this mail too late - I'll merge this up into x86/urgent right now and
send all pending fixes to Linus.

Thanks,

Ingo

tip-bot for Andy Lutomirski

unread,
Jan 14, 2016, 4:08:16 AM1/14/16
to linux-ti...@vger.kernel.org, linux-...@vger.kernel.org, h...@zytor.com, ag...@suse.de, pbon...@redhat.com, mtos...@redhat.com, mi...@kernel.org, pet...@infradead.org, b...@alien8.de, lu...@amacapital.net, rkr...@redhat.com, lu...@kernel.org, torv...@linux-foundation.org, tg...@linutronix.de, dvla...@redhat.com, brg...@gmail.com
Commit-ID: 78fd8c7288e0a4bba3ad1d69caf9396a6b69cb00
Gitweb: http://git.kernel.org/tip/78fd8c7288e0a4bba3ad1d69caf9396a6b69cb00
Author: Andy Lutomirski <lu...@kernel.org>
AuthorDate: Mon, 4 Jan 2016 15:14:28 -0800
Committer: Ingo Molnar <mi...@kernel.org>
CommitDate: Wed, 13 Jan 2016 11:46:29 +0100

x86/vdso/pvclock: Protect STABLE check with the seqcount

If the clock becomes unstable while we're reading it, we need to
bail. We can do this by simply moving the check into the
seqcount loop.

Reported-by: Marcelo Tosatti <mtos...@redhat.com>
Signed-off-by: Andy Lutomirski <lu...@kernel.org>
Cc: Alexander Graf <ag...@suse.de>
Cc: Andy Lutomirski <lu...@amacapital.net>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Brian Gerst <brg...@gmail.com>
Cc: Denys Vlasenko <dvla...@redhat.com>
Cc: H. Peter Anvin <h...@zytor.com>
Cc: Linus Torvalds <torv...@linux-foundation.org>
Cc: Paolo Bonzini <pbon...@redhat.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Radim Krcmar <rkr...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
Link: http://lkml.kernel.org/r/755dcedb17269e1d7ce12a9a713dea...@kernel.org
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
arch/x86/entry/vdso/vclock_gettime.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index 8602f06..1a50e09 100644
0 new messages