Random crazy Friday idea: Asymptotically infinite memory to ameliorate OOM crashes

162 views
Skip to first unread message

Dominic Cooney

unread,
Mar 27, 2015, 1:25:33 AM3/27/15
to blink-dev
From time to time I return to my computer to find that a tab has crashed, and my strong suspicion is that the process has ran out of memory and decided to crash. I suspect that author code in the page or an extension is leaking, and it's happening in a timer (setTimeout, etc.) (I *don't* know how prevalent this crash is, though. Maybe it's just me.)

So here's the crazy idea:

We could ameliorate these crashes by throttling a page's timers as its memory consumption increases.

Aside: I suspect it's setTimeout that is causing web page and extension authors to leak this memory. Take the "a clock created with timing events" example from the setTimeout reference of that venerable resource, W3schools. It allocates this useless closure:

function startTime() {
  ... allocate a tiny bit of "stuff" ...
  setTimeout(function() { startTime() }, 500);
}

Does V8 do anything special here to avoid that creating a chain of closures, all leaking "stuff"? (Can it?)

Dominic

Mike Lawther

unread,
Mar 27, 2015, 1:51:04 AM3/27/15
to Dominic Cooney, blink-dev
My late Thursday night crazy counter-idea :) Pages that leak should crash faster not slower. We're not doing anyone any favours keeping them on life-support. The only amelioration should be to give the user a chance to save any data - but I suspect they would be better served by having the page designed to be resilient to unexpected death in the first place (eg persist the user's novel-in-progress to server or local storage). 

Elliott Sprehn

unread,
Mar 27, 2015, 1:55:07 AM3/27/15
to Dominic Cooney, blink-dev


On Thursday, March 26, 2015, Dominic Cooney <domi...@chromium.org> wrote:
From time to time I return to my computer to find that a tab has crashed, and my strong suspicion is that the process has ran out of memory and decided to crash. I suspect that author code in the page or an extension is leaking, and it's happening in a timer (setTimeout, etc.) (I *don't* know how prevalent this crash is, though. Maybe it's just me.)

So here's the crazy idea:

We could ameliorate these crashes by throttling a page's timers as its memory consumption increases.

That's an interesting idea, but I'd rather see some numbers on why gmail wants 600MB before we throttle them. It does seem like largely JS heap.. I wonder where it all goes, could it be our fault?
 

Aside: I suspect it's setTimeout that is causing web page and extension authors to leak this memory. Take the "a clock created with timing events" example from the setTimeout reference of that venerable resource, W3schools. It allocates this useless closure:

function startTime() {
  ... allocate a tiny bit of "stuff" ...
  setTimeout(function() { startTime() }, 500);
}

Does V8 do anything special here to avoid that creating a chain of closures, all leaking "stuff"? (Can it?)


That shouldn't leak. I'm pretty sure V8 does capture detection so the variables aren't persisted in the context of the function. That's a pretty common pattern, it'd be weird if JS engines couldn't handle it.

- E

Kentaro Hara

unread,
Mar 27, 2015, 2:00:07 AM3/27/15
to Mike Lawther, Dominic Cooney, blink-dev
OOM has been a big issue in the memory team, but the hard part is that we cannot reproduce the reported OOM in most cases. If we can get a list of URLs that cause OOM reliably, that is very helpful (especially when we ship Oilpan).

Rather than suppressing OOM somehow, I'm interested in getting the list of URLs and fixing the OOM.


--
Kentaro Hara, Tokyo, Japan

Dominic Cooney

unread,
Mar 27, 2015, 3:01:26 AM3/27/15
to Elliott Sprehn, blink-dev
My understanding of JavaScript is extremely limited, but I thought that kind of closure analysis was complicated by potential aliasing of eval and Function. But playing around with this in DevTools indeed it looks like this works fine.

Dominic Cooney

unread,
Mar 27, 2015, 3:08:47 AM3/27/15
to Mike Lawther, blink-dev
On Fri, Mar 27, 2015 at 2:50 PM, Mike Lawther <mikel...@chromium.org> wrote:
My late Thursday night crazy counter-idea :) Pages that leak should crash faster not slower.

One way to do this would be to simply lower the heap limits for pages.

I wonder if there's a heuristic that can tell developers that they have a slow leak relatively quickly? For example, if we hermetically sealed the page from events and drove the timer callback very rapidly and looked at GC efficacy or something.

I should note that I'm still not sure if this is a very prevalent crash. It's just anecdotally the kind of crash I experience most often.

We're not doing anyone any favours keeping them on life-support. The only amelioration should be to give the user a chance to save any data - but I suspect they would be better served by having the page designed to be resilient to unexpected death in the first place (eg persist the user's novel-in-progress to server or local storage). 

This is a trick that Chrome for Android knows how to do, because background tabs being evicted there isn't uncommon, I think?

Making software reliable by periodically restarting it gives me a feeling of sadness, but it passes. It *is* effective.

Chris Harrelson

unread,
Mar 27, 2015, 2:38:33 PM3/27/15
to Kentaro Hara, Mike Lawther, Dominic Cooney, blink-dev
On Thu, Mar 26, 2015 at 10:59 PM, Kentaro Hara <har...@chromium.org> wrote:
OOM has been a big issue in the memory team, but the hard part is that we cannot reproduce the reported OOM in most cases. If we can get a list of URLs that cause OOM reliably, that is very helpful (especially when we ship Oilpan).

Rather than suppressing OOM somehow, I'm interested in getting the list of URLs and fixing the OOM.

+1 to collecting this data. Do we have any such data at present in crash reporting? 



On Fri, Mar 27, 2015 at 2:50 PM, Mike Lawther <mikel...@chromium.org> wrote:
My late Thursday night crazy counter-idea :) Pages that leak should crash faster not slower. We're not doing anyone any favours keeping them on life-support. The only amelioration should be to give the user a chance to save any data - but I suspect they would be better served by having the page designed to be resilient to unexpected death in the first place (eg persist the user's novel-in-progress to server or local storage). 

On 26 March 2015 at 22:25, Dominic Cooney <domi...@chromium.org> wrote:
From time to time I return to my computer to find that a tab has crashed, and my strong suspicion is that the process has ran out of memory and decided to crash. I suspect that author code in the page or an extension is leaking, and it's happening in a timer (setTimeout, etc.) (I *don't* know how prevalent this crash is, though. Maybe it's just me.)

So here's the crazy idea:

We could ameliorate these crashes by throttling a page's timers as its memory consumption increases.

Aside: I suspect it's setTimeout that is causing web page and extension authors to leak this memory. Take the "a clock created with timing events" example from the setTimeout reference of that venerable resource, W3schools. It allocates this useless closure:

function startTime() {
  ... allocate a tiny bit of "stuff" ...
  setTimeout(function() { startTime() }, 500);
}

Does V8 do anything special here to avoid that creating a chain of closures, all leaking "stuff"? (Can it?)

Dominic




--
Kentaro Hara, Tokyo, Japan

To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

Kentaro Hara

unread,
Mar 30, 2015, 12:21:22 AM3/30/15
to Chris Harrelson, Mike Lawther, Dominic Cooney, blink-dev
> OOM has been a big issue in the memory team, but the hard part is that we cannot reproduce the reported OOM in most cases. If we can get a list of URLs that cause OOM reliably, that is very helpful (especially when we ship Oilpan).
>
> Rather than suppressing OOM somehow, I'm interested in getting the list of URLs and fixing the OOM.

+1 to collecting this data. Do we have any such data at present in crash reporting?

Yes, we have data and are planning to keep tracking of the data (especially when shipping Oilpan).

BTW, OOM crash is the worst scenario caused by memory increase. It is indeed important to keep track of OOM crashes, but it would be also important to have a more "modest" metric with which we can catch memory increase in real-world websites.

Here is an idea for the "modest" metric:

Add a UseCounter which is counted up when the memory consumption exceeds X MB (where X is 256, 512, 1024, 2048 etc). The UseCounter is counted up only once per renderer process. For example, if the memory used by a renderer process has changed like 0 MB => 1500 MB => 500 MB => 1500 MB, then the UseCounter for 256 MB is counted up once, the UseCounter for 512 MB is counted up once and the UseCounter for 1024 MB is counted up once.

The UseCounter for X MB implies the number of renderer processes that will crash by OOM under the assumption that the memory each renderer process can use is limited to X MB. By observing the UseCounters for various X's, I guess we can understand the peak memory increase of Chrome in real-world websites.

What do you think? If it looks good, I can implement it to ParitionAlloc and Oilpan.

Darin Fisher

unread,
Mar 30, 2015, 12:26:44 AM3/30/15
to Mike Lawther, Dominic Cooney, blink-dev
+1

Plus, perhaps prior to killing the page we should dispatch an event to the page to let it know when memory is getting tight. Maybe a web developer could respond to that by clearing their own caches, etc.

-Darin

Elliott Sprehn

unread,
Mar 30, 2015, 12:36:06 AM3/30/15
to Darin Fisher, Mike Lawther, Dominic Cooney, blink-dev
On Sun, Mar 29, 2015 at 9:26 PM, Darin Fisher <da...@chromium.org> wrote:
+1

Plus, perhaps prior to killing the page we should dispatch an event to the page to let it know when memory is getting tight. Maybe a web developer could respond to that by clearing their own caches, etc.


+1, developers have been asking for low memory notifications (like other native platforms) for a very long time.

- E

Peter Kasting

unread,
Mar 30, 2015, 12:36:46 AM3/30/15
to Dominic Cooney, blink-dev
I'm totally hijacking this thread here, but -- I've noticed WAY more "He's dead, Jim" renderer kill pages in my usage of Dev channel over the last month or two compared to previously.  I've been assuming this is an OOM kill, but I guess I don't really know.  Are we tracking these sorts of things via crash dumps or the like?  Is there something I should be doing to report these when they happen, especially when they seem to be unpredictable/not reproducible on command?

PK

Kentaro Hara

unread,
Mar 30, 2015, 12:39:53 AM3/30/15
to Elliott Sprehn, Darin Fisher, Mike Lawther, Dominic Cooney, blink-dev
+1. Is there any event we can use for low memory notifications? Or is this something we need to spec?

Elliott Sprehn

unread,
Mar 30, 2015, 1:13:22 AM3/30/15
to Kentaro Hara, Darin Fisher, Mike Lawther, Dominic Cooney, blink-dev
A new event must be added. Historically there's been some tension in the standards community around adding such an event.


Someone probably needs to take ownership of making this happen. :)

- E 

Julien Chaffraix

unread,
Mar 30, 2015, 11:20:50 AM3/30/15
to Kentaro Hara, Chris Harrelson, Mike Lawther, Dominic Cooney, blink-dev
On Sun, Mar 29, 2015 at 9:20 PM, Kentaro Hara <har...@chromium.org> wrote:
>> > OOM has been a big issue in the memory team, but the hard part is that
>> > we cannot reproduce the reported OOM in most cases. If we can get a list of
>> > URLs that cause OOM reliably, that is very helpful (especially when we ship
>> > Oilpan).
>> >
>> > Rather than suppressing OOM somehow, I'm interested in getting the list
>> > of URLs and fixing the OOM.
>>
>> +1 to collecting this data. Do we have any such data at present in crash
>> reporting?
>
>
> Yes, we have data and are planning to keep tracking of the data (especially
> when shipping Oilpan).
>
> BTW, OOM crash is the worst scenario caused by memory increase. It is indeed
> important to keep track of OOM crashes, but it would be also important to
> have a more "modest" metric with which we can catch memory increase in
> real-world websites.
>
> Here is an idea for the "modest" metric:
>
> Add a UseCounter which is counted up when the memory consumption exceeds X
> MB (where X is 256, 512, 1024, 2048 etc). The UseCounter is counted up only
> once per renderer process. For example, if the memory used by a renderer
> process has changed like 0 MB => 1500 MB => 500 MB => 1500 MB, then the
> UseCounter for 256 MB is counted up once, the UseCounter for 512 MB is
> counted up once and the UseCounter for 1024 MB is counted up once.
>
> The UseCounter for X MB implies the number of renderer processes that will
> crash by OOM under the assumption that the memory each renderer process can
> use is limited to X MB. By observing the UseCounters for various X's, I
> guess we can understand the peak memory increase of Chrome in real-world
> websites.
>
> What do you think? If it looks good, I can implement it to ParitionAlloc and
> Oilpan.

I am concerned about the actionability of this UseCounter (not about
its usefulness as a thought experiment, it's definitely worthwhile).

If you find out that X% of the web-pages reach the limit, what would
be the next step(s)? Without some extra information (e.g. a URL) to
point back to and investigate / correlate, there is little that can be
done and we may as well instrument with Telemetry.

Also some data point on that: we landed a UseCounter to measure layout
time in ms and it didn't really gave us much [1]. This UseCounter is
also trying to make a continuous distribution of pages fits into a
single variable. That's probably going to smooth most interesting
information.

Julien

[1] See the description of the revert that explains the findings:
https://codereview.chromium.org/969973002

Daniel Bratell

unread,
Mar 30, 2015, 12:39:28 PM3/30/15
to Dominic Cooney, Mike Lawther, blink-dev, Joakim Bengtsson
On Fri, 27 Mar 2015 06:50:40 +0100, Mike Lawther <mikel...@chromium.org> wrote:

My late Thursday night crazy counter-idea :) Pages that leak should crash faster not slower. We're not doing anyone any favours keeping them on life-support. The only amelioration should be to give the user a chance to save any data - but I suspect they would be better served by having the page designed to be resilient to unexpected death in the first place (eg persist the user's novel-in-progress to server or local storage). 

I would like some kind of clear indication that tells me that a tab is making my computer usage slow, noisy and painful. High and increasing memory usage is such a thing, but also high cpu usage from a tab. Then I would prefer if it was left up to me to decide if I want that "bad" tab running or not. 

I've never gotten that idea past any UI designer though (in the end I implemented the secret opera:cpu in Opera ~11-12 for me and me only :-) ) but I still think it would be a good thing if someone could figure out a good UI for it. A intense pulsating tab background?

From the Chromium side it becomes a matter of tracking memory usage per tab, which could be much easier since memory usage is typically tracked per process and there are more than one tab per process (can be) and more than one process per tab (browser data, rendered, gpu and plugin all contribute). Our LinuxSDK products have memory limitations and sandboxed memory usage but I suspect that is primarily written for the embedded scenario (few "tabs", very little memory overall, no pagefile). Adding jb@op to CC in case he has anything to add/correct.

/Daniel

Kentaro Hara

unread,
Mar 30, 2015, 7:25:51 PM3/30/15
to Julien Chaffraix, Chris Harrelson, Mike Lawther, Dominic Cooney, blink-dev
I am concerned about the actionability of this UseCounter (not about
its usefulness as a thought experiment, it's definitely worthwhile). 

You're right. The UseCounter for the memory usage is just for confirming how well we're doing. It is a signal that indicates we have to take _some_ action (or we don't have to take any action) but it doesn't give us a hint on what the _some_ action should be. For the action part, we can use OOM crash reports which contain URLs. I've just added a never-inlined function that produces OOM crash reports in Oilpan [1].

I'm concerned about a scenario where Oilpan increases memory usage or/and leaks in real-world websites. I want to have a way to detect it so that we can take an action if needed.

Either way I think it is helpful to get a big picture of the memory usage in real-world websites :)


 If you find out that X% of the web-pages reach the limit, what would
be the next step(s)? Without some extra information (e.g. a URL) to
point back to and investigate / correlate, there is little that can be
done and we may as well instrument with Telemetry.

FWIW, we're tracking GC pause times using UseCounters and that information is helpful to confirm that almost all of them fit in 5 ms.



Mike Lawther

unread,
Mar 31, 2015, 2:49:17 AM3/31/15
to Kentaro Hara, Chris Harrelson, Dominic Cooney, blink-dev
We already have 'Memory.Renderer', which is 'The private working set used by each renderer process. Each renderer process provides one sample. Recorded once per UMA ping.' Units are KB.

Does this give pretty much the same information already?

Just for fun, here is mine from Mac 43.0.2342.2 (Official Build) dev (64-bit). I'm a little suspicious that the top one is exactly 500000KB though.

Histogram: Memory.Renderer recorded 2017 samples, average = 120310.7 (flags = 0x1)
0 O (1 = 0.0%)
1000 ...
25447 -O (3 = 0.1%) {0.0%}
28965 -------------------------------O (92 = 4.6%) {0.2%}
32969 ----------------------------------------------------------------------O (209 = 10.4%) {4.8%}
37526 ------------------------------------O (108 = 5.4%) {15.1%}
42713 -----------------------------------O (104 = 5.2%) {20.5%}
48617 --------------------------------------------------O (148 = 7.3%) {25.6%}
55338 ----------------------------------------------O (137 = 6.8%) {33.0%}
62988 ------------------------------------O (108 = 5.4%) {39.8%}
71695 ----------------------O (64 = 3.2%) {45.1%}
81606 ----------------O (49 = 2.4%) {48.3%}
92887 -------------------O (57 = 2.8%) {50.7%}
105727 ----------------------------------O (100 = 5.0%) {53.5%}
120342 ------------------------------------------------------------------------O (214 = 10.6%) {58.5%}
136978 ------------------------------------------O (126 = 6.2%) {69.1%}
155913 -----------------------------------------O (122 = 6.0%) {75.4%}
177466 --------------------------O (78 = 3.9%) {81.4%}
201998 ----------------------O (65 = 3.2%) {85.3%}
229921 -----------------------O (68 = 3.4%) {88.5%}
261704 ----------------O (48 = 2.4%) {91.9%}
297881 ----------O (30 = 1.5%) {94.2%}
339058 ------O (19 = 0.9%) {95.7%}
385928 -------O (22 = 1.1%) {96.7%}
439277 --------O (23 = 1.1%) {97.8%}
500000 -------O (22 = 1.1%) {98.9%}

Kentaro Hara

unread,
Mar 31, 2015, 9:01:28 AM3/31/15
to Mike Lawther, Chris Harrelson, Dominic Cooney, blink-dev
We already have 'Memory.Renderer', which is 'The private working set used by each renderer process. Each renderer process provides one sample. Recorded once per UMA ping.' Units are KB.
Does this give pretty much the same information already?

Yeah, this is something I want, although I want to split the result into V8's memory, PartitionAlloc's memory and Oilpan's memory at least (in order to verify that Oilpan is not doing a bad thing in a real world). As far as I observe OOM crash reports, V8 is much more likely to hit OOM than PartitionAlloc, so I guess V8's memory would dominate the major part of the renderer memory usage.

Daniel Bratell

unread,
Mar 31, 2015, 10:57:56 AM3/31/15
to Kentaro Hara, Mike Lawther, Chris Harrelson, Dominic Cooney, blink-dev
On Tue, 31 Mar 2015 08:48:53 +0200, Mike Lawther <mikel...@chromium.org> wrote:

We already have 'Memory.Renderer', which is 'The private working set used by each renderer process. Each renderer process provides one sample. Recorded once per UMA ping.' Units are KB.

Does this give pretty much the same information already?

Just for fun, here is mine from Mac 43.0.2342.2 (Official Build) dev (64-bit). I'm a little suspicious that the top one is exactly 500000KB though.

It's buckets between 0 and 500000 divided on a log scale. I don't know exactly when the data is sampled. Probably not at the end of a long session when the renderer has grown but it's still 10% of the renderers reporting 200 MB or more. Some web pages are fat. Whether Blink/v8 assists in that size or if the pages themselves are to blame would be excellent information.

/Daniel

Reply all
Reply to author
Forward
0 new messages