Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
kvm: Improving undercommit,overcommit scenarios in PLE handler
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 51 - 75 of 126 - Collapse all  -  Translate all to Translated (View all originals) < Older  Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Peter Zijlstra  
View profile  
 More options Sep 26 2012, 9:26 am
Newsgroups: fa.linux.kernel
From: Peter Zijlstra <pet...@infradead.org>
Date: Wed, 26 Sep 2012 13:26:30 UTC
Local: Wed, Sep 26 2012 9:26 am
Subject: Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

On Wed, 2012-09-26 at 15:20 +0200, Andrew Jones wrote:
> Wouldn't a clean solution be to promote a task's scheduler
> class to the spinner class when we PLE (or come from some special
> syscall
> for userspace spinlocks?)?

Userspace spinlocks are typically employed to avoid syscalls..

> That class would be higher priority than the
> fair class and would schedule in FIFO order, but it would only run its
> tasks for short periods before switching.

Since lock hold times aren't limited, esp. for things like userspace
'spin' locks, you've got a very good denial of service / opportunity for
abuse right there.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andrew Jones  
View profile  
 More options Sep 26 2012, 9:40 am
Newsgroups: fa.linux.kernel
From: Andrew Jones <drjo...@redhat.com>
Date: Wed, 26 Sep 2012 13:40:37 UTC
Local: Wed, Sep 26 2012 9:40 am
Subject: Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

On Wed, Sep 26, 2012 at 03:26:11PM +0200, Peter Zijlstra wrote:
> On Wed, 2012-09-26 at 15:20 +0200, Andrew Jones wrote:
> > Wouldn't a clean solution be to promote a task's scheduler
> > class to the spinner class when we PLE (or come from some special
> > syscall
> > for userspace spinlocks?)?

> Userspace spinlocks are typically employed to avoid syscalls..

I'm guessing there could be a slow path - spin N times and then give
up and yield.

> > That class would be higher priority than the
> > fair class and would schedule in FIFO order, but it would only run its
> > tasks for short periods before switching.

> Since lock hold times aren't limited, esp. for things like userspace
> 'spin' locks, you've got a very good denial of service / opportunity for
> abuse right there.

Maybe add some throttling to avoid overuse/maliciousness?

> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Zijlstra  
View profile  
 More options Sep 26 2012, 9:45 am
Newsgroups: fa.linux.kernel
From: Peter Zijlstra <pet...@infradead.org>
Date: Wed, 26 Sep 2012 13:45:35 UTC
Local: Wed, Sep 26 2012 9:45 am
Subject: Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

On Wed, 2012-09-26 at 15:39 +0200, Andrew Jones wrote:
> On Wed, Sep 26, 2012 at 03:26:11PM +0200, Peter Zijlstra wrote:
> > On Wed, 2012-09-26 at 15:20 +0200, Andrew Jones wrote:
> > > Wouldn't a clean solution be to promote a task's scheduler
> > > class to the spinner class when we PLE (or come from some special
> > > syscall
> > > for userspace spinlocks?)?

> > Userspace spinlocks are typically employed to avoid syscalls..

> I'm guessing there could be a slow path - spin N times and then give
> up and yield.

Much better they should do a blocking futex call or so, once you do the
syscall you're in kernel space anyway and have paid the transition cost.

> > > That class would be higher priority than the
> > > fair class and would schedule in FIFO order, but it would only run its
> > > tasks for short periods before switching.

> > Since lock hold times aren't limited, esp. for things like userspace
> > 'spin' locks, you've got a very good denial of service / opportunity for
> > abuse right there.

> Maybe add some throttling to avoid overuse/maliciousness?

At which point you're pretty much back to where you started.

A much better approach is using things like priority inheritance, which
can be extended to cover the fair class just fine..

Also note that user-space spinning is inherently prone to live-locks
when combined with the static priority RT scheduling classes.

In general its a very bad idea..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "kvm: Handle undercommitted guest case in PLE handler" by Gleb Natapov
Gleb Natapov  
View profile  
 More options Sep 27 2012, 3:44 am
Newsgroups: fa.linux.kernel
From: Gleb Natapov <g...@redhat.com>
Date: Thu, 27 Sep 2012 07:44:24 UTC
Local: Thurs, Sep 27 2012 3:44 am
Subject: Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

Can user return notifier can be used instead? Set bit in
kvm->preempted_vcpus on return to userspace.

--
                        Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "kvm: Improving undercommit,overcommit scenarios in PLE handler" by Avi Kivity
Avi Kivity  
View profile  
 More options Sep 27 2012, 4:37 am
Newsgroups: fa.linux.kernel
From: Avi Kivity <a...@redhat.com>
Date: Thu, 27 Sep 2012 08:37:20 UTC
Local: Thurs, Sep 27 2012 4:37 am
Subject: Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler
On 09/25/2012 03:40 PM, Raghavendra K T wrote:

This gives us a good case for tracking preemption on a per-vm basis.  As
long as we aren't preempted, we can keep the PLE window high, and also
return immediately from the handler without looking for candidates.

--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "kvm: Handle undercommitted guest case in PLE handler" by Avi Kivity
Avi Kivity  
View profile  
 More options Sep 27 2012, 4:43 am
Newsgroups: fa.linux.kernel
From: Avi Kivity <a...@redhat.com>
Date: Thu, 27 Sep 2012 08:43:50 UTC
Local: Thurs, Sep 27 2012 4:43 am
Subject: Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler
On 09/25/2012 04:21 PM, Takuya Yoshikawa wrote:

> On Tue, 25 Sep 2012 10:12:49 +0200
> Avi Kivity <a...@redhat.com> wrote:

>> It will.  The tradeoff is between false-positive costs (undercommit) and
>> true positive costs (overcommit).  I think undercommit should perform
>> well no matter what.

>> If we utilize preempt notifiers to track overcommit dynamically, then we
>> can vary the spin time dynamically.  Keep it long initially, as we get
>> more preempted vcpus make it shorter.

> What will happen if we pin each vcpu thread to some core?
> I don't want to see so many vcpu threads moving around without
> being pinned at all.

If you do that you've removed a lot of flexibility from the scheduler,
so overcommit becomes even less likely to work well (a trivial example
is pinning two vcpus from the same vm to the same core -- it's so
obviously bad no one considers doing it).

> In that case, we don't want to make KVM do any work of searching
> a vcpu thread to yield to.

Why not?  If a vcpu thread on another core has been preempted, and is
the lock holder, and we can boost it, then we've fixed our problem.
Even if the spinning thread keeps spinning because it is the only task
eligible to run on its core.

--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Avi Kivity  
View profile  
 More options Sep 27 2012, 4:50 am
Newsgroups: fa.linux.kernel
From: Avi Kivity <a...@redhat.com>
Date: Thu, 27 Sep 2012 08:50:53 UTC
Local: Thurs, Sep 27 2012 4:50 am
Subject: Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler
On 09/25/2012 04:43 PM, Jiannan Ouyang wrote:

> I've actually implemented this preempted_bitmap idea.

Interesting, please share the code if you can.

> However, I'm doing this to expose this information to the guest, so the
> guest is able to know if the lock holder is preempted or not before
> spining. Right now, I'm doing experiment to show that this idea works.

> I'm wondering what do you guys think of the relationship between the
> pv_ticketlock approach and PLE handler approach. Are we going to adopt
> PLE instead of the pv ticketlock, and why?

Right now we're searching for the best solution.  The tradeoffs are more
or less:

PLE:
- works for unmodified / non-Linux guests
- works for all types of spins (e.g. smp_call_function*())
- utilizes an existing hardware interface (PAUSE instruction) so likely
more robust compared to a software interface

PV:
- has more information, so it can perform better

Given these tradeoffs, if we can get PLE to work for moderate amounts of
overcommit then I'll prefer it (even if it slightly underperforms PV).
If we are unable to make it work well, then we'll have to add PV.

--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Avi Kivity  
View profile  
 More options Sep 27 2012, 4:59 am
Newsgroups: fa.linux.kernel
From: Avi Kivity <a...@redhat.com>
Date: Thu, 27 Sep 2012 08:59:40 UTC
Local: Thurs, Sep 27 2012 4:59 am
Subject: Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler
On 09/27/2012 09:44 AM, Gleb Natapov wrote:

User return notifier is per-cpu, not per-task.  There is a new task_work
(<linux/task_work.h>) that does what you want.  With these
technicalities out of the way, I think it's the wrong idea.  If a vcpu
thread is in userspace, that doesn't mean it's preempted, there's no
point in boosting it if it's already running.

btw, we can have secondary effects.  A vcpu can be waiting for a lock in
the host kernel, or for a host page fault.  There's no point in boosting
anything for that.  Or a vcpu in userspace can be waiting for a lock
that is held by another thread, which has been preempted.  This is (like
I think Peter already said) a priority inheritance problem.  However
with fine-grained locking in userspace, we can make it go away.  The
guest kernel is unlikely to access one device simultaneously from two
threads (and if it does, we just need to improve the threading in the
device model).

--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gleb Natapov  
View profile  
 More options Sep 27 2012, 5:11 am
Newsgroups: fa.linux.kernel
From: Gleb Natapov <g...@redhat.com>
Date: Thu, 27 Sep 2012 09:11:43 UTC
Local: Thurs, Sep 27 2012 5:11 am
Subject: Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

Ah, so you want to set bit in kvm->preempted_vcpus if task is _not_
TASK_RUNNING in sched_out (you wrote opposite in your email)? If a task
is in userspace it is definitely not preempted.

> btw, we can have secondary effects.  A vcpu can be waiting for a lock in
> the host kernel, or for a host page fault.  There's no point in boosting
> anything for that.  Or a vcpu in userspace can be waiting for a lock
> that is held by another thread, which has been preempted.

Do you mean userspace spinlock? Because otherwise task that's waits on
a kernel lock will sleep in the kernel.

>                                                            This is (like
> I think Peter already said) a priority inheritance problem.  However
> with fine-grained locking in userspace, we can make it go away.  The
> guest kernel is unlikely to access one device simultaneously from two
> threads (and if it does, we just need to improve the threading in the
> device model).

> --
> error compiling committee.c: too many arguments to function

--
                        Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Avi Kivity  
View profile  
 More options Sep 27 2012, 5:34 am
Newsgroups: fa.linux.kernel
From: Avi Kivity <a...@redhat.com>
Date: Thu, 27 Sep 2012 09:34:37 UTC
Local: Thurs, Sep 27 2012 5:34 am
Subject: Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler
On 09/27/2012 11:11 AM, Gleb Natapov wrote:

>> User return notifier is per-cpu, not per-task.  There is a new task_work
>> (<linux/task_work.h>) that does what you want.  With these
>> technicalities out of the way, I think it's the wrong idea.  If a vcpu
>> thread is in userspace, that doesn't mean it's preempted, there's no
>> point in boosting it if it's already running.

> Ah, so you want to set bit in kvm->preempted_vcpus if task is _not_
> TASK_RUNNING in sched_out (you wrote opposite in your email)? If a task
> is in userspace it is definitely not preempted.

No, as I originally wrote.  If it's TASK_RUNNING when it saw sched_out,
then it is preempted (i.e. runnable), not sleeping on some waitqueue,
voluntarily (HLT) or involuntarily (page fault).

>> btw, we can have secondary effects.  A vcpu can be waiting for a lock in
>> the host kernel, or for a host page fault.  There's no point in boosting
>> anything for that.  Or a vcpu in userspace can be waiting for a lock
>> that is held by another thread, which has been preempted.
> Do you mean userspace spinlock? Because otherwise task that's waits on
> a kernel lock will sleep in the kernel.

I meant a kernel mutex.

vcpu 0: take guest spinlock
vcpu 0: vmexit
vcpu 0: spin_lock(some_lock)
vcpu 1: take same guest spinlock
vcpu 1: PLE vmexit
vcpu 1: wtf?

Waiting on a host kernel spinlock is not too bad because we expect to be
out shortly.  Waiting on a host kernel mutex can be a lot worse.

--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "kvm: Improving undercommit,overcommit scenarios in PLE handler" by Raghavendra K T
Raghavendra K T  
View profile  
 More options Sep 27 2012, 5:53 am
Newsgroups: fa.linux.kernel
From: Raghavendra K T <raghavendra...@linux.vnet.ibm.com>
Date: Thu, 27 Sep 2012 09:53:47 UTC
Local: Thurs, Sep 27 2012 5:53 am
Subject: Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler
On 09/25/2012 08:30 PM, Dor Laor wrote:

No, I am not there yet.

So In summary, we are suffering with inconsistent benchmark result,
while measuring the benefit of our improvement in PLE/pvlock etc..

So good point from your suggestion is,
- Giving predictability to workload that runs in guest, so that we have
pi-pi comparison of improvement.

- we can easily tune the workload via sysfs, and we can have script to
automate them.

What is complicated is:
- How can we simulate a workload close to what we measure with
benchmarks?
- How can we mimic lock holding time/ lock hierarchy close to the way
it is seen with real workloads (for e.g. highly contended zone lru lock
with similar amount of lockholding times).
- How close it would be to when we forget about other types of spinning
(for e.g, flush_tlb).

So I feel it is not as trivial as it looks like.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "kvm: Handle undercommitted guest case in PLE handler" by Gleb Natapov
Gleb Natapov  
View profile  
 More options Sep 27 2012, 6:00 am
Newsgroups: fa.linux.kernel
From: Gleb Natapov <g...@redhat.com>
Date: Thu, 27 Sep 2012 10:00:48 UTC
Local: Thurs, Sep 27 2012 6:00 am
Subject: Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

Of course, I got it all backwards. Need more coffee.

We can't do much about it without PV spinlock since there is not
information about what vcpu holds which guest spinlock, no?

--
                        Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Avi Kivity  
View profile  
 More options Sep 27 2012, 6:05 am
Newsgroups: fa.linux.kernel
From: Avi Kivity <a...@redhat.com>
Date: Thu, 27 Sep 2012 10:05:43 UTC
Local: Thurs, Sep 27 2012 6:05 am
Subject: Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler
On 09/27/2012 11:58 AM, Gleb Natapov wrote:

It doesn't help.  If the lock holder is waiting for another lock in the
host kernel, boosting it doesn't help even if we know who it is.  We
need to boost the real lock holder, but we have no idea who it is (and
even if we did, we often can't do anything about it).

--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gleb Natapov  
View profile  
 More options Sep 27 2012, 6:08 am
Newsgroups: fa.linux.kernel
From: Gleb Natapov <g...@redhat.com>
Date: Thu, 27 Sep 2012 10:08:20 UTC
Local: Thurs, Sep 27 2012 6:08 am
Subject: Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

Without PV lock we will boost random preempted vcpu instead of going to
sleep in the situation you described.

--
                        Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "kvm: Improving undercommit,overcommit scenarios in PLE handler" by Raghavendra K T
Raghavendra K T  
View profile  
 More options Sep 27 2012, 6:11 am
Newsgroups: fa.linux.kernel
From: Raghavendra K T <raghavendra...@linux.vnet.ibm.com>
Date: Thu, 27 Sep 2012 10:11:40 UTC
Local: Thurs, Sep 27 2012 6:11 am
Subject: Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler
On 09/26/2012 05:57 PM, Konrad Rzeszutek Wilk wrote:

Hmm true. I think it is indeed difficult to shoe-in all workloads.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "kvm: Handle undercommitted guest case in PLE handler" by Avi Kivity
Avi Kivity  
View profile  
 More options Sep 27 2012, 6:15 am
Newsgroups: fa.linux.kernel
From: Avi Kivity <a...@redhat.com>
Date: Thu, 27 Sep 2012 10:15:36 UTC
Local: Thurs, Sep 27 2012 6:15 am
Subject: Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler
On 09/27/2012 12:08 PM, Gleb Natapov wrote:

True.  In theory boosting a random vcpu shouldn't have any negative
effects though.  Right now the problem is that the boosting itself is
expensive.

--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "kvm: Improving undercommit,overcommit scenarios in PLE handler" by Raghavendra K T
Raghavendra K T  
View profile  
 More options Sep 27 2012, 6:25 am
Newsgroups: fa.linux.kernel
From: Raghavendra K T <raghavendra...@linux.vnet.ibm.com>
Date: Thu, 27 Sep 2012 10:25:32 UTC
Local: Thurs, Sep 27 2012 6:25 am
Subject: Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler
On 09/26/2012 06:27 PM, Andrew Jones wrote:

We indeed checked early in original undercommit patch and it has given
result closer to PLE disabled case. But Agree with Peter that it is ugly
to export nr_running info to ple handler.

Looking at the result and comparing result of A and C,

> Base = 3.6.0-rc5 + ple handler optimization patches
> A = Base + checking rq_running in vcpu_on_spin() patch
> B = Base + checking rq->nr_running in sched/core
> C = Base - PLE

>    % improvements w.r.t BASE
> ---+------------+------------+------------+
>    |      A     |    B       |     C      |
> ---+------------+------------+------------+
> 1x | 206.37603  |  139.70410 |  210.19323 |

I have a feeling that vmexit has not caused significant overhead
compared to iterating over vcpus in PLE handler.. Does it not sound so?

But

> vmcs_write32(PLE_WINDOW, (kvm->ple_window += PLE_WINDOW_BUMP))

is worth trying. I will have to see it eventually.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andrew Jones  
View profile  
 More options Sep 27 2012, 6:28 am
Newsgroups: fa.linux.kernel
From: Andrew Jones <drjo...@redhat.com>
Date: Thu, 27 Sep 2012 10:28:54 UTC
Local: Thurs, Sep 27 2012 6:28 am
Subject: Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

Are you measuring the combined throughput of all running guests, or
just looking at the results of the benchmarks in a single test guest?

I've done some benchmarking as well and my stddevs look pretty good for
kcbench, ebizzy, dbench, and sysbench-memory. I do 5 runs for each
overcommit level (1.0 - 3.0, stepped by .25 or .5), and 2 runs of that
full sequence of tests (one with the overcommit levels in scrambled
order). The relative stddevs for each of the sets of 5 runs look pretty
good, and the data for the 2 runs match nicely as well.

To try and get consistent results I do the following
- interleave the memory of all guests across all numa nodes on the
  machine
- echo 0 > /proc/sys/kernel/randomize_va_space on both host and test
  guest
- echo 3 > /proc/sys/vm/drop_caches on both host and test guest before
  each run
- use a ramdisk for the benchmark output files on all running guests
- no periodically running services installed on the test guest
- HT is turned off as you do, although I'd like to try running again
  with it turned back on

Although, I still need to run again measuring the combined throughput
of all running vms (including the ones launched just to generate busy
vcpus). Maybe my results won't be as consistent then...

Drew

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dor Laor  
View profile  
 More options Sep 27 2012, 6:33 am
Newsgroups: fa.linux.kernel
From: Dor Laor <dl...@redhat.com>
Date: Thu, 27 Sep 2012 10:33:24 UTC
Local: Thurs, Sep 27 2012 6:33 am
Subject: Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler
On 09/27/2012 11:49 AM, Raghavendra K T wrote:

You can spin for a similar instruction count that you're interested

> - How close it would be to when we forget about other types of spinning
> (for e.g, flush_tlb).

> So I feel it is not as trivial as it looks like.

Indeed this is mainly a tool that can serve to optimize few synthetic
workloads.
I still believe that it worth to go through this exercise since a 100%
predictable and controlled case can help us purely asses the state of
PLE and pvticket code. Otherwise we're dealing w/ too many parameters
and assumptions at once.

Dor

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Avi Kivity  
View profile  
 More options Sep 27 2012, 6:44 am
Newsgroups: fa.linux.kernel
From: Avi Kivity <a...@redhat.com>
Date: Thu, 27 Sep 2012 10:44:52 UTC
Local: Thurs, Sep 27 2012 6:44 am
Subject: Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler
On 09/27/2012 12:28 PM, Andrew Jones wrote:

Another way to test is to execute

 perf stat -e 'kvm_exit exit_reason==40' sleep 10

to see how many PAUSEs were intercepted in a given time (except I just
invented the filter syntax).  The fewer we get, the more useful work the
system does.  This ignores kvm_vcpu_on_spin overhead though, so it's
just a rough measure.

--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Raghavendra K T  
View profile  
 More options Sep 27 2012, 7:27 am
Newsgroups: fa.linux.kernel
From: Raghavendra K T <raghavendra...@linux.vnet.ibm.com>
Date: Thu, 27 Sep 2012 11:27:47 UTC
Local: Thurs, Sep 27 2012 7:27 am
Subject: Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler
On 09/27/2012 02:06 PM, Avi Kivity wrote:

1) So do you think, deferring preemption patch ( Vatsa was mentioning
long back)  is also another thing worth trying, so we reduce the chance
of LHP.

IIRC, with defer preemption :
we will have hook in spinlock/unlock path to measure depth of lock held,
and shared with host scheduler (may be via MSRs now).
Host scheduler 'prefers' not to preempt lock holding vcpu. (or rather
give say one chance.

2) looking at the result (comparing A & C) , I do feel we have
significant in iterating over vcpus (when compared to even vmexit)
so We still would need undercommit fix sugested by PeterZ (improving by
140%). ?

So looking back at threads/ discussions so far, I am trying to
summarize, the discussions so far. I feel, at least here are the few
potential candidates to go in:

1) Avoiding double runqueue lock overhead  (Andrew Theurer/ PeterZ)
2) Dynamically changing PLE window (Avi/Andrew/Chegu)
3) preempt_notify handler to identify preempted VCPUs (Avi)
4) Avoiding iterating over VCPUs in undercommit scenario. (Raghu/PeterZ)
5) Avoiding unnecessary spinning in overcommit scenario (Raghu/Rik)
6) Pv spinlock
7) Jiannan's proposed improvements
8) Defer preemption patches

Did we miss anything (or added extra?)

So here are my action items:
- I plan to repost this series with what PeterZ, Rik suggested with
performance analysis.
- I ll go back and explore on (3) and (6) ..

Please Let me know..

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "kvm: Handle undercommitted guest case in PLE handler" by Raghavendra K T
Raghavendra K T  
View profile  
 More options Sep 27 2012, 7:30 am
Newsgroups: fa.linux.kernel
From: Raghavendra K T <raghavendra...@linux.vnet.ibm.com>
Date: Thu, 27 Sep 2012 11:30:53 UTC
Local: Thurs, Sep 27 2012 7:30 am
Subject: Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler
On 09/27/2012 02:20 PM, Avi Kivity wrote:

Should we also consider that we always have an edge here for non-PLE
machine?

> Given these tradeoffs, if we can get PLE to work for moderate amounts of
> overcommit then I'll prefer it (even if it slightly underperforms PV).
> If we are unable to make it work well, then we'll have to add PV.

Avi,
Thanks for this summary.. It is of great help to proceed in right
direction..

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "kvm: Improving undercommit,overcommit scenarios in PLE handler" by Raghavendra K T
Raghavendra K T  
View profile  
 More options Sep 27 2012, 7:35 am
Newsgroups: fa.linux.kernel
From: Raghavendra K T <raghavendra...@linux.vnet.ibm.com>
Date: Thu, 27 Sep 2012 11:35:26 UTC
Local: Thurs, Sep 27 2012 7:35 am
Subject: Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler
On 09/27/2012 03:58 PM, Andrew Jones wrote:

I was not doing this.

> - echo 3 > /proc/sys/vm/drop_caches on both host and test guest before
>    each run

was doing already as you know

> - use a ramdisk for the benchmark output files on all running guests

Yes.. this is also helpful

> - no periodically running services installed on the test guest
> - HT is turned off as you do, although I'd like to try running again
>    with it turned back on
> Although, I still need to run again measuring the combined throughput
> of all running vms (including the ones launched just to generate busy
> vcpus). Maybe my results won't be as consistent then...

May be. I take average from all the VMs..

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Avi Kivity  
View profile  
 More options Sep 27 2012, 8:04 am
Newsgroups: fa.linux.kernel
From: Avi Kivity <a...@redhat.com>
Date: Thu, 27 Sep 2012 12:04:47 UTC
Local: Thurs, Sep 27 2012 8:04 am
Subject: Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler
On 09/27/2012 01:23 PM, Raghavendra K T wrote:

>> This gives us a good case for tracking preemption on a per-vm basis.  As
>> long as we aren't preempted, we can keep the PLE window high, and also
>> return immediately from the handler without looking for candidates.

> 1) So do you think, deferring preemption patch ( Vatsa was mentioning
> long back)  is also another thing worth trying, so we reduce the chance
> of LHP.

Yes, we have to keep it in mind.  It will be useful for fine grained
locks, not so much so coarse locks or IPIs.

I would still of course prefer a PLE solution, but if we can't get it to
work we can consider preemption deferral.

> IIRC, with defer preemption :
> we will have hook in spinlock/unlock path to measure depth of lock held,
> and shared with host scheduler (may be via MSRs now).
> Host scheduler 'prefers' not to preempt lock holding vcpu. (or rather
> give say one chance.

A downside is that we have to do that even when undercommitted.

Also there may be a lot of false positives (deferred preemptions even
when there is no contention).

> 2) looking at the result (comparing A & C) , I do feel we have
> significant in iterating over vcpus (when compared to even vmexit)
> so We still would need undercommit fix sugested by PeterZ (improving by
> 140%). ?

Looking only at the current runqueue?  My worry is that it misses a lot
of cases.  Maybe try the current runqueue first and then others.

Or were you referring to something else?

Undoubtedly we'll think of more stuff.  But this looks like a good start.

--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "kvm: Handle undercommitted guest case in PLE handler" by Avi Kivity
Avi Kivity  
View profile  
 More options Sep 27 2012, 8:07 am
Newsgroups: fa.linux.kernel
From: Avi Kivity <a...@redhat.com>
Date: Thu, 27 Sep 2012 12:07:19 UTC
Local: Thurs, Sep 27 2012 8:07 am
Subject: Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler
On 09/27/2012 01:26 PM, Raghavendra K T wrote:

True.  The deployment share for these is decreasing rapidly though.  I
hate optimizing for obsolete hardware.

--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Messages 51 - 75 of 126 < Older  Newer >
« Back to Discussions « Newer topic     Older topic »