V8 GC timing in createElement benchmarks

18 views
Skip to first unread message

Kentaro Hara

unread,
Sep 30, 2015, 8:36:43 AM9/30/15
to oilpan-...@chromium.org, Jochen Eisinger, ul...@chromium.org, hpayer
Hi

I'm fixing performance regressions introduced by shipping Oilpan. Currently Dromaeo.dom-modify is regressing by 10% and I noticed that this is a regression caused by V8 GC timing. Since I didn't observe the regression a couple of months ago, this may be a regression introduced recently.

Consider the following code:

  for (i = 0; i < 500000; i++)
    document.createElement("select");

If we run the code with oilpan, we get the following sequence of Oilpan's GCs.

conservative GC  <---- Oilpan triggers a GC
marking=0.10 ms, markingTotal=573.12 ms
postSweep: collectionRate=0.00 <---- collection rate is 0% because all wrappers are alive in the V8 side
conservative GC
collectGarbage: ConservativeGC
marking=7.23 ms, markingTotal=580.35 ms
postSweep: collectionRate=0.00
conservative GC
collectGarbage: ConservativeGC
marking=51.81 ms, markingTotal=632.16 ms
postSweep: collectionRate=0.00
memory pressure GC  <---- Oilpan finally triggers a memory pressure GC (i.e., urgent GC)
collectGarbage: ConservativeGC
marking=82.13 ms, markingTotal=714.30 ms
postSweep: collectionRate=0.00
memory pressure GC
collectGarbage: ConservativeGC
marking=156.17 ms, markingTotal=870.47 ms  <---- The marking time exceeds 150 ms... No V8 GCs are triggered during the benchmark.
postSweep: collectionRate=0.00

No V8 minor/major GC is triggered during the benchmark. Consequently, the number of objects Oilpan's GC must trace increases and the marking time eventually exceeds 150 ms.

V8's major GC is sometimes triggered during the benchmark but it is unhelpful because the weak processing doesn't run until we finish the current task (i.e., Oilpan needs to trace all wrappers anyway).

Is this an expected behavior? Any advice is welcome to fix the long marking time :)

Thanks!

--
Kentaro Hara, Tokyo, Japan

Kentaro Hara

unread,
Sep 30, 2015, 8:38:40 AM9/30/15
to oilpan-...@chromium.org, Jochen Eisinger, ul...@chromium.org, hpayer
I attached a minimized benchmark that exposes the issue.
select_element.html

Kentaro Hara

unread,
Oct 2, 2015, 5:49:50 AM10/2/15
to oilpan-...@chromium.org, Jochen Eisinger, ul...@chromium.org, hpayer
Do you have any advice on this issue?

This is blocking us from shipping oilpan. If you don't have a plan to fix this soon, one option I can take is to add some hack to Oilpan's GC so that we don't trigger Oilpan's GC until V8's GC is observed (when the collection rate is extremely low).

Ulan Degenbaev

unread,
Oct 2, 2015, 6:01:37 AM10/2/15
to Kentaro Hara, oilpan-...@chromium.org, Jochen Eisinger, hpayer
> Since I didn't observe the regression a couple of months ago, this may be a regression introduced recently.
> No V8 minor/major GC is triggered during the benchmark. 
Was there a change related to calling AdjustAmountOfExternalAllocatedMemory for document.createElement in blink recently?

Kentaro Hara

unread,
Oct 2, 2015, 6:06:15 AM10/2/15
to Ulan Degenbaev, oilpan-...@chromium.org, Jochen Eisinger, hpayer
On Fri, Oct 2, 2015 at 7:01 PM, Ulan Degenbaev <ul...@google.com> wrote:
> Since I didn't observe the regression a couple of months ago, this may be a regression introduced recently.
> No V8 minor/major GC is triggered during the benchmark. 
Was there a change related to calling AdjustAmountOfExternalAllocatedMemory for document.createElement in blink recently?

No, the amount of memory consumed by normal Nodes has not been reported by AdjustAmountOfExternalAllocatedMemory (for a couple of years).

Ulan Degenbaev

unread,
Oct 2, 2015, 6:27:29 AM10/2/15
to Kentaro Hara, oilpan-...@chromium.org, Jochen Eisinger, hpayer
I removed the gc() calls from the repro test you attached, and still get a lot of V8 GCs at TOT(revision 351991). How do I reproduce the "no v8 GC" case?

[13596:0xaaa5eb24000]      821 ms: Scavenge 2.7 (39.1) -> 2.3 (39.1) MB, 0.9 / 0 ms [allocation failure].
[13596:0xaaa5eb24000]      839 ms: Scavenge 2.8 (39.1) -> 2.5 (40.1) MB, 2.4 / 0 ms [allocation failure].
[13596:0xaaa5eb24000]      916 ms: Scavenge 3.3 (40.1) -> 3.0 (40.1) MB, 4.7 / 2 ms [allocation failure].
[13596:0xaaa5eb24000]     1009 ms: Scavenge 4.5 (40.1) -> 4.1 (40.1) MB, 6.3 / 2 ms [allocation failure].
[13596:0xaaa5eb24000]     1046 ms: Scavenge 4.5 (40.1) -> 4.2 (41.1) MB, 7.6 / 2 ms [allocation failure].
[13596:0xaaa5eb24000]     1256 ms: Scavenge 7.7 (41.1) -> 7.3 (41.1) MB, 8.9 / 3 ms [allocation failure].
[13596:0xaaa5eb24000]     1295 ms: Scavenge 7.7 (41.1) -> 7.3 (44.1) MB, 14.3 / 3 ms [allocation failure].
[13596:0xaaa5eb24000]     1736 ms: Scavenge 14.8 (44.1) -> 14.4 (44.1) MB, 15.4 / 4 ms [allocation failure].
[13596:0xaaa5eb24000]     1792 ms: Scavenge 14.8 (44.1) -> 14.5 (51.1) MB, 31.5 / 5 ms [allocation failure].
[13596:0xaaa5eb24000] Increasing marking speed to 3 due to high promotion rate
[13596:0xaaa5eb24000]     2061 ms: Mark-sweep 16.2 (51.1) -> 1.9 (41.1) MB, 106.2 / 63 ms (+ 2.3 ms in 27 steps since start of marking, biggest step 0.1 ms) [GC interrupt] [GC in old space requested].
[13596:0xaaa5eb24000]     3031 ms: Scavenge 17.7 (41.1) -> 17.3 (41.1) MB, 42.0 / 19 ms [allocation failure].
[13596:0xaaa5eb24000]     3120 ms: Scavenge 17.7 (41.1) -> 17.3 (54.1) MB, 67.3 / 20 ms [allocation failure].
[13596:0xaaa5eb24000] Increasing marking speed to 3 due to high promotion rate
[13596:0xaaa5eb24000]     3416 ms: Mark-sweep 19.2 (54.1) -> 1.9 (42.1) MB, 113.4 / 65 ms (+ 9.3 ms in 28 steps since start of marking, biggest step 6.7 ms) [GC interrupt] [GC in old space requested].
[13596:0xaaa5eb24000]     4338 ms: Scavenge 17.7 (42.1) -> 17.3 (42.1) MB, 30.4 / 9 ms [allocation failure].
[13596:0xaaa5eb24000]     4413 ms: Scavenge 17.7 (42.1) -> 17.3 (54.1) MB, 53.9 / 10 ms [allocation failure].
[13596:0xaaa5eb24000] Increasing marking speed to 3 due to high promotion rate
[13596:0xaaa5eb24000]     4698 ms: Mark-sweep 19.2 (54.1) -> 1.9 (42.1) MB, 110.7 / 60 ms (+ 3.7 ms in 28 steps since start of marking, biggest step 1.4 ms) [GC interrupt] [GC in old space requested].
[13596:0xaaa5eb24000]     5606 ms: Scavenge 17.7 (42.1) -> 17.3 (42.1) MB, 29.8 / 9 ms [allocation failure].
[13596:0xaaa5eb24000]     5671 ms: Scavenge 17.7 (42.1) -> 17.3 (54.1) MB, 44.7 / 9 ms [allocation failure].
[13596:0xaaa5eb24000] Increasing marking speed to 3 due to high promotion rate
[13596:0xaaa5eb24000]     5966 ms: Mark-sweep 19.2 (54.1) -> 1.9 (42.1) MB, 128.7 / 77 ms (+ 3.1 ms in 28 steps since start of marking, biggest step 1.1 ms) [GC interrupt] [GC in old space requested].


Kentaro Hara

unread,
Oct 2, 2015, 6:40:43 AM10/2/15
to Ulan Degenbaev, oilpan-...@chromium.org, Jochen Eisinger, hpayer
Oh, sorry!!! I noticed that this is an Oilpan bug.

Without oilpan, the amount of memory used by normal Nodes is reported by AdjustAmountOfMemory(). So a lot of V8 GCs are properly triggered when many Nodes are allocated.


However, with oilpan, it is not reported because Partitions::currentDOMMemoryUsage() returns 0 (because Nodes are no longer allocated in PartitionAlloc). So no V8 GCs are triggered even when many Nodes are allocated.

This is an Oilpan bug. Let me fix it. I'm sorry about the noise :)

Ulan Degenbaev

unread,
Oct 2, 2015, 6:46:54 AM10/2/15
to Kentaro Hara, oilpan-...@chromium.org, Jochen Eisinger, hpayer
No problem :) I am glad that we found it.

Kentaro Hara

unread,
Oct 2, 2015, 8:25:11 AM10/2/15
to oilpan-...@chromium.org
-V8 guys

Without oilpan, the amount of memory used by normal Nodes is reported by AdjustAmountOfMemory(). So a lot of V8 GCs are properly triggered when many Nodes are allocated. 
 
https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/bindings/core/v8/V8GCController.cpp&sq=package:chromium&type=cs&l=509&rcl=1443756686 
 
However, with oilpan, it is not reported because Partitions::currentDOMMemoryUsage() returns 0 (because Nodes are no longer allocated in PartitionAlloc). So no V8 GCs are triggered even when many Nodes are allocated.

This analysis is correct, but I noticed that currentDOMMemoryUsage() is not the cause of the createElement regression.

The value returned by currentDOMMemoryUsage() is reported to V8 only at didProcessTask (i.e., at the end of the event loop), so whether it is reported or not doesn't change the timing of V8 GCs scheduled during the event loop (i.e., it doesn't change the timing of V8 GCs scheduled during createElement.html).

A real problem is as follows.

- V8 minor GCs are triggered during createElement.html.
- However, the number of wrappers one minor GC can collect is limited to 10000 to avoid causing a long pause time in the minor GC (https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/bindings/core/v8/V8GCController.cpp&q=v8gccontroller&sq=package:chromium&type=cs&l=124).
- As a result, the minor GC collects only 10000 wrappers. A lot of wrappers are promoted to the old space without being collected. The number of wrappers exceed 400000.
- If Oilpan's GC is triggered, it needs to trace the 400000 wrappers. This causes the 150 ms pause.

I think there are two solutions for this issue.

(a) Drop the limit of 10000 only in oilpan builds.

(b) If there are a lot of wrappers, don't trigger Oilpan's memory pressure GC and conservative GC until a V8 major GC is triggered.

I'd prefer (a). Dropping the limit doesn't make sense in non-oilpan but makes sense in oilpan. If we don't collect wrappers that are collectable in a minor GC, it will just increase the pause time of the next Oilpan's GC(s). In order to decrease max(pause time of V8 minor GC, pause time of Oilpan's GC), it is better to collect as many wrappers as possible in the minor GC. Also I'm not sure how likely we will exceed the limit of 10000 in real-world applications. (BTW, haraken@ is the guy who added the limit three years ago :-)

I'm okay with (b), but if we set a improper threshold, we have a risk of hitting OOM.

Sigbjorn: What do you think?

Sigbjorn Finne

unread,
Oct 2, 2015, 8:48:14 AM10/2/15
to Kentaro Hara, oilpan-...@chromium.org
The testcase in the commit message for that limit (
https://bugs.webkit.org/attachment.cgi?id=185940 ) looks suspiciously
familiar also :)

Try going with (a)? It does look like false economy to limit upfront
minor GC cost like that, _but_ we are dealing with pathological inputs
here. (So you could argue that spending longer time in minor GCs for
those cases is acceptable, I suppose.)

--sigbjorn
Reply all
Reply to author
Forward
0 new messages