Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH] emulate accessed bit for EPT

4 views
Skip to first unread message

Rik van Riel

unread,
Feb 3, 2010, 4:20:01 PM2/3/10
to
Currently KVM pretends that pages with EPT mappings never got
accessed. This has some side effects in the VM, like swapping
out actively used guest pages and needlessly breaking up actively
used hugepages.

We can avoid those very costly side effects by emulating the
accessed bit for EPT PTEs, which should only be slightly costly
because pages pass through page_referenced infrequently.

TLB flushing is taken care of by kvm_mmu_notifier_clear_flush_young().

This seems to help prevent KVM guests from being swapped out when
they should not on my system.

Signed-off-by: Rik van Riel <ri...@redhat.com>
---
Jeff, does this patch fix the issue you saw a few months ago, with
a 256MB KVM guest in a cgroup limited to 128GB memory?

arch/x86/kvm/mmu.c | 10 ++++++++--
1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 89a49fb..6101615 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -856,9 +856,15 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
u64 *spte;
int young = 0;

- /* always return old for EPT */
+ /*
+ * Emulate the accessed bit for EPT, by checking if this page has
+ * an EPT mapping, and clearing it if it does. On the next access,
+ * a new EPT mapping will be established.
+ * This has some overhead, but not as much as the cost of swapping
+ * out actively used pages or breaking up actively used hugepages.
+ */
if (!shadow_accessed_mask)
- return 0;
+ return kvm_unmap_rmapp(kvm, rmapp, data);

spte = rmap_next(kvm, rmapp, NULL);
while (spte) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Balbir Singh

unread,
Feb 3, 2010, 11:20:02 PM2/3/10
to
* Rik van Riel <ri...@redhat.com> [2010-02-03 16:11:03]:

Quite a clever implementation, one side effect is that one would see a
larger number of minor faults with EPT enabled and an increase in
allocation/frees of rmap entries, but that can be easily explained.

--
Balbir

Rik van Riel

unread,
Feb 4, 2010, 8:50:01 AM2/4/10
to
On 02/03/2010 11:12 PM, Balbir Singh wrote:
> * Rik van Riel<ri...@redhat.com> [2010-02-03 16:11:03]:
>
>> Currently KVM pretends that pages with EPT mappings never got
>> accessed. This has some side effects in the VM, like swapping
>> out actively used guest pages and needlessly breaking up actively
>> used hugepages.
>>
>> We can avoid those very costly side effects by emulating the
>> accessed bit for EPT PTEs, which should only be slightly costly
>> because pages pass through page_referenced infrequently.

> Quite a clever implementation, one side effect is that one would see a


> larger number of minor faults with EPT enabled and an increase in
> allocation/frees of rmap entries, but that can be easily explained.

I suspect it won't be very many. I have been monitoring
/proc/meminfo on my system while testing this patch, and
it is quite typical that the size of the inactive anon
list does not change for minutes at a time.

In other words, no pages are moved onto or off of the
inactive anon list for several minutes. That corresponds
to a very small number of minor faults introduced by my
patch.

Of course, when the system is swapping, we will have more
minor faults. However, minor faults should be less of a
performance issue than major faults :)

--
All rights reversed.

Balbir Singh

unread,
Feb 4, 2010, 10:40:02 AM2/4/10
to
* Rik van Riel <ri...@redhat.com> [2010-02-04 08:40:43]:

> On 02/03/2010 11:12 PM, Balbir Singh wrote:
> >* Rik van Riel<ri...@redhat.com> [2010-02-03 16:11:03]:
> >
> >>Currently KVM pretends that pages with EPT mappings never got
> >>accessed. This has some side effects in the VM, like swapping
> >>out actively used guest pages and needlessly breaking up actively
> >>used hugepages.
> >>
> >>We can avoid those very costly side effects by emulating the
> >>accessed bit for EPT PTEs, which should only be slightly costly
> >>because pages pass through page_referenced infrequently.
>
> >Quite a clever implementation, one side effect is that one would see a
> >larger number of minor faults with EPT enabled and an increase in
> >allocation/frees of rmap entries, but that can be easily explained.
>
> I suspect it won't be very many. I have been monitoring
> /proc/meminfo on my system while testing this patch, and
> it is quite typical that the size of the inactive anon
> list does not change for minutes at a time.
>
> In other words, no pages are moved onto or off of the
> inactive anon list for several minutes. That corresponds
> to a very small number of minor faults introduced by my
> patch.
>
> Of course, when the system is swapping, we will have more
> minor faults. However, minor faults should be less of a
> performance issue than major faults :)
>

I do agree with you.

--
Balbir

Rik van Riel

unread,
Feb 4, 2010, 10:50:02 AM2/4/10
to

After 20 hours of uptime, it appears that this patch has
resolved the "KVM guests get swapped while buffer and page
cache stay in memory" problem my home system was experiencing.

Balbir Singh

unread,
Feb 4, 2010, 11:00:02 AM2/4/10
to
* Rik van Riel <ri...@redhat.com> [2010-02-04 10:41:14]:

Is this with cgroups enabled as defined by the setup Jeff had?

--
Balbir

Jeff Dike

unread,
Feb 4, 2010, 11:20:02 AM2/4/10
to
On Wed, Feb 03, 2010 at 04:11:03PM -0500, Rik van Riel wrote:
> Jeff, does this patch fix the issue you saw a few months ago, with
> a 256MB KVM guest in a cgroup limited to 128GB memory?

Hum, let me dust off that workload and give it a shot...

Jeff

Andrea Arcangeli

unread,
Feb 4, 2010, 12:50:01 PM2/4/10
to
On Thu, Feb 04, 2010 at 08:40:43AM -0500, Rik van Riel wrote:
> I suspect it won't be very many. I have been monitoring
> /proc/meminfo on my system while testing this patch, and
> it is quite typical that the size of the inactive anon
> list does not change for minutes at a time.
>
> In other words, no pages are moved onto or off of the
> inactive anon list for several minutes. That corresponds
> to a very small number of minor faults introduced by my
> patch.

When there's light VM pressure, ideally there should be zero overhead
caused by the patch. When there is VM pressure this will avoid some
unnecessary I/O which should outweight the minor faults. It should be
a good default behavior.

Marcelo Tosatti

unread,
Feb 5, 2010, 12:40:02 PM2/5/10
to
On Thu, Feb 04, 2010 at 06:47:15PM +0100, Andrea Arcangeli wrote:
> On Thu, Feb 04, 2010 at 08:40:43AM -0500, Rik van Riel wrote:
> > I suspect it won't be very many. I have been monitoring
> > /proc/meminfo on my system while testing this patch, and
> > it is quite typical that the size of the inactive anon
> > list does not change for minutes at a time.
> >
> > In other words, no pages are moved onto or off of the
> > inactive anon list for several minutes. That corresponds
> > to a very small number of minor faults introduced by my
> > patch.
>
> When there's light VM pressure, ideally there should be zero overhead
> caused by the patch. When there is VM pressure this will avoid some
> unnecessary I/O which should outweight the minor faults. It should be
> a good default behavior.

Agree.

But perhaps a module parameter to turn accessed bit emulation off might
be handy in the future?

Andrea Arcangeli

unread,
Feb 5, 2010, 1:20:01 PM2/5/10
to
On Fri, Feb 05, 2010 at 03:34:23PM -0200, Marcelo Tosatti wrote:
> But perhaps a module parameter to turn accessed bit emulation off might
> be handy in the future?

Maybe, but somebody should show that this can overall become a
downside, which I doubt... I think if it does, the VM is to blame for
calling page_referenced when there is no point to do so just yet.

Marcelo Tosatti

unread,
Feb 7, 2010, 2:30:02 PM2/7/10
to
On Fri, Feb 05, 2010 at 07:14:13PM +0100, Andrea Arcangeli wrote:
> On Fri, Feb 05, 2010 at 03:34:23PM -0200, Marcelo Tosatti wrote:
> > But perhaps a module parameter to turn accessed bit emulation off might
> > be handy in the future?
>
> Maybe, but somebody should show that this can overall become a
> downside, which I doubt... I think if it does, the VM is to blame for
> calling page_referenced when there is no point to do so just yet.

Agreed. ACK.

Avi Kivity

unread,
Feb 8, 2010, 5:30:02 AM2/8/10
to
On 02/03/2010 11:11 PM, Rik van Riel wrote:
> Currently KVM pretends that pages with EPT mappings never got
> accessed. This has some side effects in the VM, like swapping
> out actively used guest pages and needlessly breaking up actively
> used hugepages.
>
> We can avoid those very costly side effects by emulating the
> accessed bit for EPT PTEs, which should only be slightly costly
> because pages pass through page_referenced infrequently.
>
> TLB flushing is taken care of by kvm_mmu_notifier_clear_flush_young().
>
> This seems to help prevent KVM guests from being swapped out when
> they should not on my system.
>
>

Applied, thanks.

>
> - /* always return old for EPT */
> + /*
> + * Emulate the accessed bit for EPT, by checking if this page has
> + * an EPT mapping, and clearing it if it does. On the next access,
> + * a new EPT mapping will be established.
> + * This has some overhead, but not as much as the cost of swapping
> + * out actively used pages or breaking up actively used hugepages.
> + */
> if (!shadow_accessed_mask)
> - return 0;
> + return kvm_unmap_rmapp(kvm, rmapp, data);
>

This could be optimized by using a software-available bit for 'present'
and the rwx bits for young, that is:

(present, rwx) -> the page is present and recently accessed, will not
cause EPT violation
(present, !rwx) -> page is present but old, will cause EPT violation
but not rmap games and get_user_pages_fast().

However that's best done later if ever.

--
error compiling committee.c: too many arguments to function

0 new messages