Monitoring and acting on memory pressure

Nicholas Nethercote

unread,

Jun 14, 2011, 2:45:58 AM6/14/11

to dev-pl...@lists.mozilla.org

Hi,

When talking about optimizing memory usage, one thing that comes up a lot
is the idea of monitoring memory pressure in some way, and behaving
accordingly (eg. https://bugzilla.mozilla.org/show_bug.cgi?id=660577,
comment 55 and onwards). It's an attractive idea; Firefox is deployed on a
wide range of devices, and coming up with static policies that suit all of
them is difficult.

But it's hard to do well. For example:

- If an allocation fails, you probably want to try to free up some memory.
Oh, but thanks to over-committing, what'll happen is that the allocation
will succeed, but if you touch the memory you'll crash.

- If you start to page badly, you probably want to free up some memory. But
running a GC is bad, because that'll touch lots of memory and cause more
paging. And it may require some extra allocations too.

We do some fairly ad hoc memory pressure stuff. For example, the JS engine
considers(?) doing a GC every time it allocates 32MB of memory via
malloc/calloc/new. Also, we have memory pressure events and observers, but
nothing seems to trigger them, except maybe on mobile.

----

Anyway, Gregor Wagner recently mentioned on the JS-internals list a paper from
ISMM'11 called "Waste Not, Want Not: Resource-based Garbage Collection in a
Shared Environment"
(www.cs.rochester.edu/~xiaoming/publications/ismm11-b.pdf), which is all
about doing exactly this memory pressure monitoring.

The goal is to utilize as much memory as needed, while avoiding paging, in
the presence of multiple processes executing.

The main idea is a heuristic which tracks two things.

1. The major page fault count. If more than a certain number (they used 10)
have happened since the last time it was checked, we know memory pressure
is high.

2. Resident set size (RSS). In particular, memory pressure is high if RSS
*decreases*. This only happens when pages have been evicted, which means
that memory pressure from other processes is significant.

The paper says that a heuristic based on either of these alone isn't very
good heuristic, using both is good. It's important that both these numbers
are easy to get on all OSes (I think... about:memory already gets RSS).

As for when you want to check for memory pressure: obviously you have to do
it periodically. Eg. in a system with a generational collector, you might
do it every time the nursery fills up. This checking might have overhead,
so they use an additive-increase/multiplicative-decrease algorithm to check
it only periodically. They start by doing the check every Nth time; each
time memory pressure remains low, they increase N by one; each time it's
found to be high, they decrease it by a factor of 10. This allows for a
quick response when memory is tight, but limits overheads of the checking
when memory is plentiful.

The effect of this heuristic in their experiments (with Java GC) was that it
didn't slow down things when memory was plentiful, but drastically improved
things when memory was tight.

They also talk about more complicated ways that multiple processes can
co-operate
(via a "whiteboard") but that made things more complicated and seemed to
slow things down more than anything.

----

In our case, GC is only part of the story. Eg. bug 660577 is all about the
policy used to evict decoded images from the cache and how the current
policy often doesn't work well. I wonder if we could do a memory pressure
check every 1 or 2 seconds, whether that could be made to work well.

This ties in with our infallible malloc work; I think the idea there is
that if an allocation fails, we should try to free up memory, but I'm not
sure of the details. This heuristic might be useful.

Nick

Doug Turner

unread,

Jun 14, 2011, 2:59:12 AM6/14/11

to Nicholas Nethercote, dev-pl...@lists.mozilla.org

At least for Android, all low memory notifications from the system come in way too late for us to do anything about it. The end result is the system killer will kill our Fennec or the child process will crash.

Also, you may discover we had a IsLowMemory() function implemented at one point. However, its implementation was usually slow and sometimes incorrect.

Doug Turner

Hi,

----

Nick
_______________________________________________
dev-platform mailing list
dev-pl...@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Nicholas Nethercote

unread,

Jun 14, 2011, 3:12:49 AM6/14/11

to Doug Turner, dev-pl...@lists.mozilla.org

On Tue, Jun 14, 2011 at 4:59 PM, Doug Turner <do...@mozilla.com> wrote:
>
> Also, you may discover we had a IsLowMemory() function implemented at one point. However, its implementation was usually slow and sometimes incorrect.

Do you know how it worked, even roughly?

Nick

Doug Turner

unread,

Jun 14, 2011, 3:46:04 AM6/14/11

to Nicholas Nethercote, dev-pl...@lists.mozilla.org

yes. it's in hg.

http://hg.mozilla.org/mozilla-central/diff/a5d1b45234b8/xpcom/base/nsMemoryImpl.cpp

----- Original Message -----
From: "Nicholas Nethercote" <n.neth...@gmail.com>

Justin Lebar

unread,

Jun 14, 2011, 11:31:27 AM6/14/11

to dev-pl...@lists.mozilla.org

We discussed something like this [1]. The trick, it seems, is getting the hard page fault count on Windows. But it probably can be done!

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=661304#c32

Bas Schouten

unread,

Jun 14, 2011, 12:04:38 PM6/14/11

to dev-pl...@lists.mozilla.org

If I'm not mistaking a hard page fault triggers a kernel event (hf) on windows and can be traced through ETW I think. I have no idea what the performance of that is though.

Bas

----- Original Message -----
From: "Justin Lebar" <justin...@gmail.com>
To: dev-pl...@lists.mozilla.org
Cc: dev-pl...@lists.mozilla.org
Sent: Tuesday, June 14, 2011 3:31:27 PM
Subject: Re: Monitoring and acting on memory pressure

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=661304#c32

Justin Lebar

unread,

Jun 14, 2011, 12:57:01 PM6/14/11

to

Here are Glandium's thoughts on the problem of getting the page fault count on Windows: http://glandium.org/blog/?p=1963

Justin Lebar

unread,

Jun 14, 2011, 1:19:57 PM6/14/11

to

It looks like Chrome uses ETW, although I'm not sure yet whether they use it in production.

http://code.google.com/p/sawbuck/source/browse/trunk/sawbuck/py/etw/etw/descriptors/?r=121

Mike Shaver

unread,

Jun 17, 2011, 1:53:34 PM6/17/11

to mozilla.de...@googlegroups.com, dev-pl...@lists.mozilla.org

On Tue, Jun 14, 2011 at 1:19 PM, Justin Lebar <justin...@gmail.com> wrote:
> It looks like Chrome uses ETW, although I'm not sure yet whether they use it in production.
>
> http://code.google.com/p/sawbuck/source/browse/trunk/sawbuck/py/etw/etw/descriptors/?r=121

Sawbuck isn't in Chrome, it's a standalone tool for Chrome developers.
And I believe it requires privilege in order to sample those events,
but I could be wrong.

Mike