Memory team snippet

995 views
Skip to first unread message

Kentaro Hara

unread,
Jun 28, 2015, 8:21:53 PM6/28/15
to blink-dev
(haraken, all) Discussed Q3 goals of the memory team. In short, we'll focus on the following items in Q3:

- Finish shipping Oilpan
- Understand Blink's memory usage in real-world websites and provide continuous data about the memory usage (using memory-infra + telemetry)
- Support more objects in the leak detector and fix the detected leaks
- Introduce a low-device mode to Blink (which aggressively discards memory)
- PartitionAlloc everywhere (in both Chromium and Blink)
- Make buffer allocation faster and more space-efficient (Vector, ArrayBuffer, StringBuilder etc)

(yutak, haraken) Investigated peak memory increases sometimes observed in Oilpan and mostly identified the reasons:

- First, as Oilpan triggers GCs more frequently, PartitionAlloc's memory usage increases. This sounds really strange because it means that more GCs lead to memory increase. It took us a couple of days to identify that this is (just) because as Oilpan runs GCs more frequently, more TraceEvents are recorded. The TraceEvent objects are allocated in PartitionAlloc. LOL :)

- The second reason is a serious one -- "floating garbage" problem (https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/cms.html). Imagine a DOM object X that can be collected in one V8 GC cycle in the non-oilpan world. In the oilpan world, it is possible that the DOM object X needs a sequence of Oilpan GC => V8 major GC => Oilpan GC to get collected. This delays object destructions and can increase the peak memory increase. In the long term, this problem is going to be solved by unifying V8 GC with Oilpan GC (i.e., make Oilpan's objects traceable from V8 GC; in that world, we can collect the DOM object X in one V8 GC cycle), but we need a short-term solution to ship Oilpan. Still investigating.

(keishi, haraken) Finally got a full understanding of the lifetime relationship of accessibility objects. Fixed a couple of existing leaks. Landed a CL to move accessibility/ to Oilpan (but was reverted).

(tasak) Collecting and analyzing performance and memory data of PartitionAlloc-Chromium in Android v.s. jemalloc/dlmalloc-Chromium in Android. It seems that jemalloc and dlmalloc are highly optimized for Android and PartitionAlloc might not be a clear win there.

(sigbjorn) Took care of all detected crashes in ClusterFuzz of Oilpan builds. Now we're ready to enable the lazy sweeping on trunk, but it is blocked by a clang bug around ASan.

(bashi) Working on memory-infra + telemetry to get Blink's memory usage in real-world websites.

(peria) Fixing leaks of core/fetch/. Collecting performance of core/animations/.



--
Kentaro Hara, Tokyo, Japan

Kentaro Hara

unread,
Jun 29, 2015, 2:12:02 AM6/29/15
to blink-dev
Forgot to add:

(cevans) Implemented a purge memory API that scans all heaps in PartitionAlloc, finds unused system pages, and discards those system pages as much as possible. This is a powerful API to reduce PartitionAlloc's memory usage. A bit more study is needed to control the timing to dispatch the purge API.

(ssid, primiano, haraken) Exporting more heap/object information of Oilpan to memory-infra. It's mostly working with the cool UI!



Daniel Bratell

unread,
Jun 29, 2015, 4:58:16 AM6/29/15
to blink-dev, Kentaro Hara
On Mon, 29 Jun 2015 02:21:19 +0200, Kentaro Hara <har...@chromium.org> wrote:

- Introduce a low-device mode to Blink (which aggressively discards memory)

I don't know if there is a design for this, but I would recommend staying away from a "either low-device or high-device" approach and instead have some kind of numerical setting for how aggressive this should be. Devices and personal computers come in a range of configurations and working environments and "two sizes fit all" will not work.

Furthermore, to complicate things, a desktop computer with 8 GB of RAM running a heavy background task (or maybe just many tabs) might be equally short of memory as a mobile phone with 768 MB or a TV with 256 MB.

/Daniel

--
/* Opera Software, Linköping, Sweden: CEST (UTC+2) */

Kentaro Hara

unread,
Jun 29, 2015, 5:31:00 AM6/29/15
to Daniel Bratell, blink-dev
I don't know if there is a design for this, but I would recommend staying away from a "either low-device or high-device" approach and instead have some kind of numerical setting for how aggressive this should be. Devices and personal computers come in a range of configurations and working environments and "two sizes fit all" will not work.

I don't yet have a design doc. What I'm planning is something like:

- Purge unused system pages in PartitionAlloc & Oilpan.
- Get various uncontrolled caches in Blink under control and discard the contents.
- (Experimental) Forcibly drop references to V8 wrappers created by a V8 context we have already navigated away a long time ago.

FWIW, V8, GPU buffer etc already support a low-device mode to save memory aggressively.

Daniel Bratell

unread,
Jun 29, 2015, 8:24:49 AM6/29/15
to Kentaro Hara, blink-dev
On Mon, 29 Jun 2015 11:30:27 +0200, Kentaro Hara <har...@chromium.org> wrote:

I don't know if there is a design for this, but I would recommend staying away from a "either low-device or high-device" approach and instead have some kind of numerical setting for how aggressive this should be. Devices and personal computers come in a range of configurations and working environments and "two sizes fit all" will not work.

I don't yet have a design doc. What I'm planning is something like:

- Purge unused system pages in PartitionAlloc & Oilpan.
- Get various uncontrolled caches in Blink under control and discard the contents.

These seems like good things to do regardless of device if the memory pressure is high.  Or good things to do regardless full stop. Right now we depend a bit too much on the paging system for unused memory to be removed from RAM and that won't work if there is no pagefile. If there is a pagefile the paging system can remove unused memory but only at the cost of writing it to a slow disk (very slow compared to not RAM).

- (Experimental) Forcibly drop references to V8 wrappers created by a V8 context we have already navigated away a long time ago.

It says experimental, but this would break things visibly, right? If someone/something keeps an object alive it *might* be to take a look at it later and it wouldn't like that object to have broken. So only to use when desperate.

FWIW, V8, GPU buffer etc already support a low-device mode to save memory aggressively.

I don't mind tuning for devices, but I mind trying to divide the set of all possible devices into two piles, one that is given "memory efficient but a little bit slower code", and one that is given "fast, but wasteful of memory" code. That is basically what Presto did 10-15 years ago and we ended up replacing it with more fine tuned solutions because what we got in ~2003 was "slow code that uses less memory" and "memory hungry code that is a bit faster" with nothing in between.

I rather prefer code that tunes itself to the circumstances or that can at least be tuned in a range of settings from "I'm desperate" to "I have access to all the world's RAM and CPU".

(I believe the v8 code has 4 different settings for memory usage plus the "low device" mode).

Kentaro Hara

unread,
Jun 29, 2015, 10:07:13 AM6/29/15
to Daniel Bratell, blink-dev
Just to clarify, I'm not intending to make an intrusive change to the Blink code base to support a low-end device mode. What I'm planning is a modest one:

- Register Blink's caches (e.g., V8PerIsolateData has a lot of caches that store V8 objects such as v8::FunctionTemplate) to CacheController. The CacheController aggressively discards the caches in the low-end device mode.

- Memory allocators (PartitionAlloc & Oilpan) expose purge APIs to discard as many unused system pages as possible. The purge APIs are called aggressively in the low-end device mode etc.

bashi@ is now implementing a framework with memory-infra + telemetry, which will give us Blink's memory usage in real-world websites. I'm planning to use it as a metric of the memory reduction work.



Tom Hudson

unread,
Jun 29, 2015, 10:35:12 AM6/29/15
to Daniel Bratell, Kentaro Hara, blink-dev
From 4.4 the Android kernel has a "low RAM device"  flag (https://source.android.com/devices/tech/ram/low-ram.html) set on 512MB devices. More nuance would be nice, but that's what the OS offers us at this time.

Tom

Daniel Bratell

unread,
Jun 29, 2015, 1:00:53 PM6/29/15
to Tom Hudson, Kentaro Hara, blink-dev
I don't know the Android APIs but maybe there are ways to manually check memory usage and how much that seems to be available. I suspect my 768 MB phone with too many background apps installed are more memory starved than a brand new 512 MB phone. As long as the browser doesn't have access to all hardware, checking what is available seems more valuable than checking what is installed.

Primiano Tucci

unread,
Jun 29, 2015, 1:32:23 PM6/29/15
to Daniel Bratell, Tom Hudson, Kentaro Hara, blink-dev
Hmm I am not convinced this strategy would work on Android by the way the lowmem killer and the framework work.
TL;DR android kills as many non-fg apps as needed trying to maintain a margin of free memory (this is technically wrong for a lot of subtle aspects and corner cases, but is a good approximation of reality).
I bet that what you will find out if you measure the free memory on the system on Android is that, with exception of a fresh boot, that amount stays pretty much constant. In other words: if you become a better citizen and use less memory, you will likely cause a previously running app to not be evicted and stay there. Which is good, but not directly measurable by looking at the amount of free memory in the system.

Daniel Bratell

unread,
Jun 29, 2015, 4:19:46 PM6/29/15
to Primiano Tucci, Tom Hudson, Kentaro Hara, blink-dev
On Mon, 29 Jun 2015 19:32:19 +0200, Primiano Tucci <prim...@chromium.org> wrote:

Hmm I am not convinced this strategy would work on Android by the way the lowmem killer and the framework work.
TL;DR android kills as many non-fg apps as needed trying to maintain a margin of free memory (this is technically wrong for a lot of subtle aspects and corner cases, but is a good approximation of reality).
I bet that what you will find out if you measure the free memory on the system on Android is that, with exception of a fresh boot, that amount stays pretty much constant. In other words: if you become a better citizen and use less memory, you will likely cause a previously running app to not be evicted and stay there. Which is good, but not directly measurable by looking at the amount of free memory in the system.

True. 70 MB "free" seems to be the steady state here.

Would it be possible to do some math with various processes' memory usage from a userspace app so that only the memory usage of "active" processes was subtracted from total memory or is that kind of information not available? It would be some of the things you would expect an operating system to do, but running a whole set of processes with various modes and priorities is already diving into the OS level.

Kentaro Hara

unread,
Jul 5, 2015, 8:03:33 PM7/5/15
to blink-dev
Hi

Note: Currently stack traces of ASan + Oilpan are broken due to a LLVM bug. A workaround is to use "--no-sandbox". See https://code.google.com/p/chromium/issues/detail?id=502974 for more details.

Oilpan things:

(sigbjornf, haraken) Finally enabled lazy sweeping on trunk. We're carefully watching if the lazy sweeping causes any issue around destruction ordering on trunk. Landed another ASan verification to detect use-after-free in Oilpan's heaps (mostly associated with destruction ordering issues in Blink).

(haraken) Enabled idle GCs on trunk. However, it was reverted because it broke telemetry tests.

(peria) Finished collecting performance & memory data for core/animations/ and wrote up a document (https://docs.google.com/document/d/1tQqDtDN8xiDFqpzRTA5oubBaWscMZRx63rYLtpApG_o/edit#heading=h.qwbn9p8fji5x). It looks like there is no observable regression/improvement. Will chat with the SYD team if it's ok to ship Oilpan for core/animations/.

(yutak, keishi) Investigating Blink's memory workload and its relationship to Oilpan's GC and V8's GC. yutak@ identified that (1) forcing a precise GC at a page navigation completely solves the peak memory increase of Blogger (but we cannot simply do this because it can lead to too many precise GCs), (2) if we properly optimize GC timings, the peak memory increases in other memory benchmarks are gone (which means that the peak memory increases are not caused by Oilpan's memory allocation strategy -- e.g., type-specific heaps, not yet discarding unused system pages, worst-fit allocation etc; we identified that what matters is just GC timing). yutak@ is investigating more to optimize the heuristics that determines the GC timing.

(haraken) Reduced sizeof(Persistent) from 4*sizeof(void*) to 2*sizeof(void*). This is important to reduce sizeof(DOM object) and thus improve cache locality.

(haraken) Landed a change to decommit unused mmap regions more aggressively. Another idea to more reduce Oilpan's memory usage is to find unused system pages while sweeping and discard them. Experimenting.

(sigbjornf) Moved ScrollableArea, EventSource etc to Oilpan's heap.


Non-oilpan things:

(hajimehoshi) hajimehoshi@ is back to the memory team! Supported ActiveDOMObjects in the leak detector (https://docs.google.com/document/d/1sFAsZxeISKnbGdXoLZlB2tDZ8pvO102ePFQx6TX4X14/edit). Then the leak detector detected 120 leaks in existing layout tests... Also supported ScriptPromise, Frame etc in the leak detector. We're planning to distribute the leak-fix work once we get a list of problematic leaks.

(bashi) Managed to make memory-infra + telemetry workable and collected the first result of Blink's memory usage to the total memory usage of a renderer process in real-world websites. Analyzing the data to make sure that the data is correct and consistent.

(tasak) Re-collecting performance & memory data of full-PartitionAlloc-Chromium, full-tcmalloc-Chromium and full-system-allocator-Chromium on Win 32 bit, Win 64 bit, Mac, Linux and Nexus.

(haraken) Started experimenting with exporting Oilpan's buffer allocator to normal Vectors, HashMaps, StringBuilders, ArrayBuffers etc. It will take a couple of weeks to make it workable and collect data.

Primiano Tucci

unread,
Jul 6, 2015, 6:37:14 AM7/6/15
to Kentaro Hara, blink-dev, mig...@chromium.org, pic...@chromium.org
(peria) Finished collecting performance & memory data for core/animations/ and wrote up a document (https://docs.google.com/document/d/1tQqDtDN8xiDFqpzRTA5oubBaWscMZRx63rYLtpApG_o/edit#heading=h.qwbn9p8fji5x). It looks like there is no observable regression/improvement. Will chat with the SYD team if it's ok to ship Oilpan for core/animations/.

Many thanks for getting all the numbers. I was looking at both the doc (which doesn't seem open to comments) and the telemetry data and I see some numbers that concern me a bit.

The doc mentions "Those regressions are at most 9MB in real number, and are small enough to be ignored.". I don't know if there is and what is the decisional threshold on desktop platforms, but on Android few MB regressions in the page cyclers have proven to be noticed and problematic (crbug.com/475637 is just one of the recent horror stories that comes on top of my head, and that was about a +2 MB delta)

I see some regressions in the order of MB in the telemetry data you reported. Do we understand where they come from? Is there anything we can do there?

Metric: Memory/vm_private_dirty_final_renderer (summarizing the biggest deltas |>| 1M)
Nexus 7
Blogger: +4M
Wordpress: -1M (Good)

Nexus 4
Blogger: +6.2M
G+: +1.88 M
Facebook -1M (Good)

Desktop
Blogger: +16.8 M
Facebook: +2.4M
Gmail: +6.7 M
G+: +3M
Wordpress: +2M

Thanks,
Primiano
-- 

Kentaro Hara

unread,
Jul 6, 2015, 6:52:39 AM7/6/15
to Primiano Tucci, blink-dev, mig...@chromium.org, pic...@chromium.org
On Mon, Jul 6, 2015 at 7:37 PM, Primiano Tucci <prim...@chromium.org> wrote:
(peria) Finished collecting performance & memory data for core/animations/ and wrote up a document (https://docs.google.com/document/d/1tQqDtDN8xiDFqpzRTA5oubBaWscMZRx63rYLtpApG_o/edit#heading=h.qwbn9p8fji5x). It looks like there is no observable regression/improvement. Will chat with the SYD team if it's ok to ship Oilpan for core/animations/.

Many thanks for getting all the numbers. I was looking at both the doc (which doesn't seem open to comments) and the telemetry data and I see some numbers that concern me a bit.

The doc mentions "Those regressions are at most 9MB in real number, and are small enough to be ignored.". I don't know if there is and what is the decisional threshold on desktop platforms, but on Android few MB regressions in the page cyclers have proven to be noticed and problematic (crbug.com/475637 is just one of the recent horror stories that comes on top of my head, and that was about a +2 MB delta)

9 MB is way too large.

peria@: Where do you observe the 9 MB regression in the telemetry data? Also I wonder if Blogger, Gmail and Google Calendar are using Web animations so heavily.

 
I see some regressions in the order of MB in the telemetry data you reported. Do we understand where they come from? Is there anything we can do there?

Metric: Memory/vm_private_dirty_final_renderer (summarizing the biggest deltas |>| 1M)
Nexus 7
Blogger: +4M
Wordpress: -1M (Good)

Nexus 4
Blogger: +6.2M
G+: +1.88 M
Facebook -1M (Good)

Desktop
Blogger: +16.8 M
Facebook: +2.4M
Gmail: +6.7 M
G+: +3M
Wordpress: +2M

These *_final_* metrics don't make much sense because the result highly depends on when the last GC has happened. What matters are *_peak_* metrics (and I see no substantial regression in the *_peak_* metrics).

(BTW, I forgot to remove the document link before sending the snippet -- we were planning to discuss the result with the SYD team and then share it with the blink-dev :-)



 
Thanks,
Primiano
-- 

On Mon, Jul 6, 2015 at 1:02 AM, Kentaro Hara <har...@chromium.org> wrote:
Hi

Note: Currently stack traces of ASan + Oilpan are broken due to a LLVM bug. A workaround is to use "--no-sandbox". See https://code.google.com/p/chromium/issues/detail?id=502974 for more details.

Oilpan things:

(sigbjornf, haraken) Finally enabled lazy sweeping on trunk. We're carefully watching if the lazy sweeping causes any issue around destruction ordering on trunk. Landed another ASan verification to detect use-after-free in Oilpan's heaps (mostly associated with destruction ordering issues in Blink).

(haraken) Enabled idle GCs on trunk. However, it was reverted because it broke telemetry tests.

(peria) Finished collecting performance & memory data for core/animations/ and wrote up a document (https://docs.google.com/document/d/1tQqDtDN8xiDFqpzRTA5oubBaWscMZRx63rYLtpApG_o/edit#heading=h.qwbn9p8fji5x). It looks like there is no observable regression/improvement. Will chat with the SYD team if it's ok to ship Oilpan for core/animations/.

(yutak, keishi) Investigating Blink's memory workload and its relationship to Oilpan's GC and V8's GC. yutak@ identified that (1) forcing a precise GC at a page navigation completely solves the peak memory increase of Blogger (but we cannot simply do this because it can lead to too many precise GCs), (2) if we properly optimize GC timings, the peak memory increases in other memory benchmarks are gone (which means that the peak memory increases are not caused by Oilpan's memory allocation strategy -- e.g., type-specific heaps, not yet discarding unused system pages, worst-fit allocation etc; we identified that what matters is just GC timing). yutak@ is investigating more to optimize the heuristics that determines the GC timing.

(haraken) Reduced sizeof(Persistent) from 4*sizeof(void*) to 2*sizeof(void*). This is important to reduce sizeof(DOM object) and thus improve cache locality.

(haraken) Landed a change to decommit unused mmap regions more aggressively. Another idea to more reduce Oilpan's memory usage is to find unused system pages while sweeping and discard them. Experimenting.

(sigbjornf) Moved ScrollableArea, EventSource etc to Oilpan's heap.


Non-oilpan things:

(hajimehoshi) hajimehoshi@ is back to the memory team! Supported ActiveDOMObjects in the leak detector (https://docs.google.com/document/d/1sFAsZxeISKnbGdXoLZlB2tDZ8pvO102ePFQx6TX4X14/edit). Then the leak detector detected 120 leaks in existing layout tests... Also supported ScriptPromise, Frame etc in the leak detector. We're planning to distribute the leak-fix work once we get a list of problematic leaks.

(bashi) Managed to make memory-infra + telemetry workable and collected the first result of Blink's memory usage to the total memory usage of a renderer process in real-world websites. Analyzing the data to make sure that the data is correct and consistent.

(tasak) Re-collecting performance & memory data of full-PartitionAlloc-Chromium, full-tcmalloc-Chromium and full-system-allocator-Chromium on Win 32 bit, Win 64 bit, Mac, Linux and Nexus.

(haraken) Started experimenting with exporting Oilpan's buffer allocator to normal Vectors, HashMaps, StringBuilders, ArrayBuffers etc. It will take a couple of weeks to make it workable and collect data.


--
Kentaro Hara, Tokyo, Japan

Kentaro Hara

unread,
Jul 7, 2015, 3:09:22 AM7/7/15
to Primiano Tucci, blink-dev, mig...@chromium.org, pic...@chromium.org
On Mon, Jul 6, 2015 at 7:52 PM, Kentaro Hara <har...@chromium.org> wrote:
On Mon, Jul 6, 2015 at 7:37 PM, Primiano Tucci <prim...@chromium.org> wrote:
(peria) Finished collecting performance & memory data for core/animations/ and wrote up a document (https://docs.google.com/document/d/1tQqDtDN8xiDFqpzRTA5oubBaWscMZRx63rYLtpApG_o/edit#heading=h.qwbn9p8fji5x). It looks like there is no observable regression/improvement. Will chat with the SYD team if it's ok to ship Oilpan for core/animations/.

Many thanks for getting all the numbers. I was looking at both the doc (which doesn't seem open to comments) and the telemetry data and I see some numbers that concern me a bit.

The doc mentions "Those regressions are at most 9MB in real number, and are small enough to be ignored.". I don't know if there is and what is the decisional threshold on desktop platforms, but on Android few MB regressions in the page cyclers have proven to be noticed and problematic (crbug.com/475637 is just one of the recent horror stories that comes on top of my head, and that was about a +2 MB delta)

9 MB is way too large.

peria@: Where do you observe the 9 MB regression in the telemetry data? Also I wonder if Blogger, Gmail and Google Calendar are using Web animations so heavily.

 
I see some regressions in the order of MB in the telemetry data you reported. Do we understand where they come from? Is there anything we can do there?

Metric: Memory/vm_private_dirty_final_renderer (summarizing the biggest deltas |>| 1M)
Nexus 7
Blogger: +4M
Wordpress: -1M (Good)

Nexus 4
Blogger: +6.2M
G+: +1.88 M
Facebook -1M (Good)

Desktop
Blogger: +16.8 M
Facebook: +2.4M
Gmail: +6.7 M
G+: +3M
Wordpress: +2M

These *_final_* metrics don't make much sense because the result highly depends on when the last GC has happened. What matters are *_peak_* metrics (and I see no substantial regression in the *_peak_* metrics).

(BTW, I forgot to remove the document link before sending the snippet -- we were planning to discuss the result with the SYD team and then share it with the blink-dev :-)


Primiano@: Today we took a look at the telemetry data in details, and in short, we realized that the data is not yet ready to conclude something consistent. Give us a couple of more days :)

Kentaro Hara

unread,
Jul 12, 2015, 7:55:58 PM7/12/15
to blink-dev
Hi

Oilpan:

(haraken, peria) Enabled idle GCs on trunk and fixed related issues. Now both lazy sweeping and idle GCs are enabled on trunk. They seem stabilized :)

(yutak, haraken) Changing a way to estimate the live object size to make the GC heuristics saner (https://codereview.chromium.org/1211573006/). Currently Oilpan reacts to V8's major GCs very sensitively with an assumption that V8's major GC is rare and each V8 major GC will drop a lot of persistent handles, but that is no longer true because V8's major GCs are getting more and more incremental. Removing the assumption from Oilpan's GC heuristics.

(peria) Moved core/html/canvas/, modules/canvas2d/, XMLHttpRequest, MessagePort to Oilpan's heap. Working on modules/webgl/ (which needs a lot of performance investigation work).

(peria) Noticed that the performance results for core/animations/ are not yet consistent. Recollecting the data.

(keishi) Removing raw pointers to on-heap objects. It turned out that there are 100+ raw pointers to on-heap objects, which we need to remove before shipping...

(haraken) Investigated crashes of WebRTC exposed by Oilpan.


Non-Oilpan:

(bashi) Collected an overview of the memory breakdown of a renderer process (V8, PartitionAlloc, Oilpan, GPU etc) in top representative websites. Working with a couple of teams to confirm that the collected results are consistent and reliable. We're planning to share the data with you by the end of July.

(hajimehoshi) Supported v8::Context, ScriptPromise and Frame in the leak detector. A lot of leaks were detected, but it's possible that most of them are false-positives. Investigating if each detected leak is a real one or not.

(haraken) Created a CL that completely switches the buffer partition of PartitionAlloc with the buffer allocator of Oilpan. It is expected to speed up Vector, HashTable, StringBuilder etc. I'll collect performance. (BTW, I noticed that V8's ArrayBuffers won't be benefited by the buffer allocator because ArrayBuffers are not expandable/shrinkable.)

(tasak) Recollecting performance/memory results of tcmalloc vs. PartitionAlloc vs. system allocators on Linux, Mac, Win 32 bit, Win 64 bit and Nexus.

Kentaro Hara

unread,
Jul 20, 2015, 9:33:02 PM7/20/15
to blink-dev
Oilpan:

(keishi, yutak, sigbjornf, haraken) In preparation for shipping Oilpan, we're removing all raw pointers to on-heap objects. There are >100 raw pointers (https://docs.google.com/spreadsheets/d/1VWmeQyPGQ96T8gN8dcqDJWIkV0C9S80hG6XbTW-papE/edit#gid=0).

(haraken, yutak, keishi) Landed a new GC heuristics that does not depend on the assumption that V8's major GC will drop a lot of persistent handles (this is no longer true). This change made the estimation of live object size much saner. keishi@ is experimenting with a GC heuristics that takes into account a page navigation (which drops a lot of persistent handles).

(peria) Re-collected performance & memory numbers for shipping Oilpan for core/animations. In short: there are no performance or memory regressions in Nexus4 and Nexus7. There is no memory regression in Linux. But there is some performance regression in Linux. Still investigating.


Non-Oilpan:

(haraken) Started a discussion about discardable items in Blink (https://docs.google.com/document/d/13RMN1dExjQdnSjEaEgxOxdakX8UndbqjAmc2mi_DSp0/edit). Now we have a good list of discardable items to experiment (thanks!). In particular, I'm interested in how much memory we can save by discarding a layout object tree.

(bashi) Improving memory-infra + telemetry and collecting Blink's memory usage. The collected values are still unreliable in many senses (e.g., the sum of values reported by each allocator is sometimes significantly different from the private_dirty reported by OS; tcmalloc sometimes reports negative values etc). However, we're getting confident about the following fact:

-- Calculate r = (Memory reported by PartitionAlloc + Oilpan) / (Private_dirty of a renderer process reported by OS)
-- r indicates how much memory Blink is consuming in a renderer process.

We're planning to share some results with blink-dev@ by the end of July. It will at least clarify how much memory Blink is consuming in a renderer process in representative websites.

(tasak) Finished collecting full performance & memory results of tcmalloc vs. PartitionAlloc vs. system allocators on all of Nexus, Linux, Mac, Win 32 bit and Win 64 bit. Still analyzing the result, but our overall conclusion would be as follows:

-- Regarding performance, PartitionAlloc is much faster than tcmalloc and system allocators.
-- Regarding memory usage, PartitionAlloc is sometimes better but sometimes worse. It seems that the memory usage doesn't really change depending on what allocators we use. This would be because the actual amount of objects we have to allocate doesn't change depending on what allocators we use.

We'll publish a summary document soon.

(tasak, hajimehoshi) Given the above, it's not clear if we can say with confidence that PartitionAlloc is better than tcmalloc in Chromium overall. So, at the moment, we decided to switch our efforts from replacing everything in Chromium with PartitionAlloc to replacing everything in Blink with PartitionAlloc. (This is important to get reliable data with memory-infra + telemetry.)

(hajimehoshi) Adding type-information to objects allocated in PartitionAlloc to understand what objects are the main memory consumer in Blink.

(hajimehoshi) After supporting ActiveDOMObjects, ScriptPromise, v8::Context and Frame in the leak detector, the detector detected a lot of leaks. However, we noticed that most of them are false-positives. We're concluding that Blink doesn't have a serious leak.

Daniel Bratell

unread,
Jul 21, 2015, 1:20:58 PM7/21/15
to blink-dev, Kentaro Hara
On Tue, 21 Jul 2015 03:32:28 +0200, Kentaro Hara <har...@chromium.org> wrote:

-- Regarding performance, PartitionAlloc is much faster than tcmalloc and system allocators.
-- Regarding memory usage, PartitionAlloc is sometimes better but sometimes worse. It seems that the memory usage doesn't really change depending on what allocators we use. This would be because the actual amount of objects we have to allocate doesn't change depending on what allocators we use.

There is one scenario that I know have been problematic in the past and that is very long lived documents, or documents with a lot of churn. Some allocators are better than others at handling that scenario. Some lose a lot of memory to fragmentation, others do not.

Do you know if tcmalloc or PartitionAlloc have been tested in such long term scenarios?

Kentaro Hara

unread,
Jul 21, 2015, 7:21:21 PM7/21/15
to Daniel Bratell, blink-dev
We're just testing short-term scenarios with telemetry. Regarding memory usage, PartitionAlloc performs slightly better than tcmalloc and system allocators in both renderer process and browser process _in_average_, but the win looks not convincing enough to justify the replacement.

I agree that it would be worth testing long-term scenarios to see the impact on fragmentation.

Kentaro Hara

unread,
Jul 27, 2015, 3:25:37 AM7/27/15
to blink-dev
Hi

Oilpan:

(keishi, yutak, haraken) Removing lots of raw pointers to on-heap objects in preparation for shipping Oilpan for everything (https://docs.google.com/spreadsheets/d/1VWmeQyPGQ96T8gN8dcqDJWIkV0C9S80hG6XbTW-papE/edit?pli=1#gid=0). 50% completed.

(haraken) Moved the StyleImage hierarchy to Oilpan. Moving the EventListener hierarchy to Oilpan. Understood how the lifetime of the Resource hierarchy is managed.

(peria) Created a CL to ship Oilpan for modules/webgl and collected full performance/memory numbers. The numbers are looking stable and good, but we noticed that we forgot to measure smoothness.touch_webgl_cases. Collecting the performance/memory number for it.

(peria) Collecting performance to ship Oilpan for core/animations. We concluded that there is a couple of real regressions in smoothness.tough_animation_cases only in Linux and that the regression was introduced sometime in the past two months. Bisecting the range to find the culprit.

(sigbjornf) We noticed that WebXXX objects can be destructed by a different thread (in the Chromium side) from the thread that created the WebXXX object. This can cause problems if the WebXXX object holds a Persistent handle to Oilpan objects. Handling the issue.

(peria, haraken) Moving html/track/ to Oilpan.


Non-oilpan:

(bashi, primiano) Analyzing Blink's memory usage with memory-infra + telemetry and making the numbers more accurate. The hard part is that if we enable memory-infra, the memory consumed by the TracedValues (and its related data structures) is counted up as the memory usage, which can significantly confuse the numbers reported by each allocator and private_dirty.

(tasak) We're almost concluding that switching tcmalloc & system allocators in Blink to PartitionAlloc is a clear performance win. However, we must be careful about the partition where the remaining objects are allocated. When we open Gmail, there are still >30000 objects in a renderer process allocated on outside PartitionAlloc (and we're going to move those objects to PartitionAlloc). If we simply put all the objects into the FastMalloc partition, it regresses performance of some blink_perf benchmarks on Nexus4. The regression is gone if we allocate the objects in a dedicated partition. This result indicates that the remaining objects make the cache locality of the FastMalloc partition worse. We need more investigation.

(haraken) Created a CL that exports Oilpan's buffer allocator to Blink's Vector (https://codereview.chromium.org/1220253004). The attached image summarizes the result for the following code:

for (...) {
  Vector vec;
  for (int i = 0; i < SIZE; i++) {
    vec.append(...);
  }
}

- PartitionAlloc's Vector is 3 times slower than std::vector.
- We can fix the regression by (1) using the Oilpan's buffer allocator and (2) increasing the vector expansion rate from 1.25 to 2.0.
- However, I noticed that the Oilpan's buffer allocator will be usable only for Vectors. (I was assuming that we can use the allocator for HashTable, StringBuilder, ArrayBuffer and improve performance of them, but that didn't improve performance for some complicated reasons.) I'm not sure if we want to land the 1500 LOC change only for Vectors. I'll do more study and write a document that will lead to some conclusion.

(hajimehoshi) Added an instrumentation to PartitionAlloc so that PartitionAlloc can profile memory usages per object type. Collecting data using real-world websites.
buffer_alloc_1_limited.eps

Kentaro Hara

unread,
Aug 2, 2015, 7:48:26 PM8/2/15
to blink-dev
Hi

Oilpan:

(peria) Finished collecting performance & memory numbers for shipping Oilpan for modules/webgl and wrote up a document (https://docs.google.com/document/d/1QUDGL5a2wBZVyUrzlGjzKFe1sQgOOvm6gNhB1B7VRoE/edit?pli=1). No regression/improvement is observed. Discussing with the WebGL team.

(peria) Shipping Oilpan for core/animations is blocked by a strange regression in Linux. Bisecting to find a culprit CL.

(haraken) Landed WeakPersistent. You can use a WeakPersistent to create a weak pointer from an off-heap object to an on-heap object. Just a quick summary:

Persistent: A strong pointer. off-heap => on-heap.
WeakPersistent: A weak pointer. off-heap => on-heap.
Member: A strong pointer. on-heap => on-heap.
WeakMember: A weak pointer. on-heap => on-heap.

(keishi, yutak, haraken) Removing raw pointers to on-heap objects in preparation for shipping oilpan for everything. 65% completed (https://docs.google.com/spreadsheets/d/1VWmeQyPGQ96T8gN8dcqDJWIkV0C9S80hG6XbTW-papE/edit?pli=1#gid=0).

(peria) Shipped Oilpan for html/track.

(haraken) Moved the EventListener hierarchy to Oilpan's heap.


Memory reduction:

(bashi) Shared the "renderer memory breakdown" result with blink-dev@ (https://docs.google.com/document/d/1zlGQkwkWEu5LUg-CrZHHRuhDtZcuvrRv8bjdC8-noaU/edit?pli=1#heading=h.a898thpufhtq). In short, we found that Blink is consuming 10 - 50% of the renderer memory in low-memory mobile devices and 10 - 35% of the renderer memory in desktops. This clearly means that Blink needs a serious memory reduction. If you're busy, you can just look at the numbers in the spreadsheet (https://docs.google.com/spreadsheets/d/1nDmgP2SaFS_CmqFoNW4WDGYWkFl6CPUWmiVAoySv9bE/edit?pli=1#gid=0).

(bashi) Defined the top 10 webpages where Blink's memory reduction is a key (https://docs.google.com/document/d/1QYCM5UeX90OodWt1vl6jlShhv8kg0HnXY7SdPCyILRs/edit?pli=1). We're planning to use the numbers of the webpages as a metric for the upcoming memory reduction work.

(hajimehoshi) Mostly finished implementing a per-object-type profiler for PartitionAlloc. The profiler can list up the top 10 objects in each partition of PartitionAlloc. Collecting data from real-world webpages.

(tasak) As a result of our long investigations, our proposal is going to be "replace all allocators in a renderer process with PartitionAlloc" (i.e., a browser process is out of our scope). Re-collecting the data that supports the proposal. It is important to replace the allocators in a renderer process with PartitionAlloc to get all their objects under our control and visualize the objects with the per-object-type profiler.

(haraken) I was in Munich to discuss a bunch of memory stuff with the V8 team. Learned what V8 is doing in low-memory scenarios etc.

Kentaro Hara

unread,
Aug 2, 2015, 8:14:02 PM8/2/15
to blink-dev
(sigbjornf, haraken) Looked into the 50 crashes detected in Oilpan's ClusterFuzzer. There was no crashes caused by Oilpan. Oilpan builds look pretty stable.

(sigbjornf) Handling the destruction issues of cross-thread pointers in CryptoResult. Given that WebXXX pointers that retain a cross-thread pointer can be destructed in a thread that is different from the thread that created the cross-thread pointer (this is nasty...), we need to improve the infrastructure for the cross-thread pointer in oilpan.



Kentaro Hara

unread,
Aug 9, 2015, 8:10:46 PM8/9/15
to blink-dev
Hi

Oilpan:

(peria) Finally got an approval from WebGL guys to ship oilpan for WebGL! (Document: https://docs.google.com/document/d/1QUDGL5a2wBZVyUrzlGjzKFe1sQgOOvm6gNhB1B7VRoE/edit CL: https://codereview.chromium.org/1234883002/) Will ship it this week.

(haraken, keishi, peria) Investigated the regression in Web animations and concluded that the regression is caused by a crazy number of AnimatableValues repeatedly created and destroyed in tough_animation_cases benchmarks. I concluded that it is hard to resolve the regression without doing either of (a) implementing incremental marking, (b) unshipping oilpan from AnimatableValues, or (c) refactor AnimatableValues so that the values can be reused. (a) is not realistic in short term. I'm chatting with the animation team about (b) or (c).

(haraken) Noticed that Oilpan can hit an urgent GC in some (artificial) benchmarks because GCs are not triggered until it consumes too much memory. Making the GC heuristics saner so that GCs are triggered more properly (https://codereview.chromium.org/1272083003/).

(keishi, yutak, sigbjornf, haraken) Removing raw pointers & references to on-heap objects.

(sigbjornf) Made CrossThreadPersistent destructible after the thread that created the CrossThreadPersistent gets detached. The CrossThreadPersistent is auto-cleared when the thread gets detached. This simplified handling of a bunch of WebXXX classes that can be destructed by a different Chromium-side thread.

(sigbjornf) Landed a SelfKeepAlive<T> handle.


Memory reduction:

(bashi) Adding the key 10 pages (which we're going to use as a metric for upcoming memory reduction work) to telemetry (https://docs.google.com/document/d/1QYCM5UeX90OodWt1vl6jlShhv8kg0HnXY7SdPCyILRs/edit).

(hajimehoshi) Investigating Blink objects allocated in the key 10 pages using the per-object-type profiler. What we found until now is:

- The top 5 objects that consume memory of the fastMalloc partition are CSS-related objects (CSSValues, DescendantInvalidationSet, StyleRule etc). However, the CSS-related objects consume only <5% of Blink's total memory. So this is not a place we should optimize first.

- The place we should really optimize is StringImpl. StringImpls sometimes consume >30% of Blink's total memory. We're investigating where large StringImpls are coming from.

- There are a couple of >1 MB StringImpls that store JavaScript source code. We investigated if we can drop the source code, but realized that we won't be able to drop it because V8 needs the source code to react to function.toString().

- Investigating ways to reduce the memory consumed by > 1MB StringImpls. Optimizing UChar or LChar. Gzipping large StringImpls (it reduces a 5 MB script into 300 KB). etc. Also still investigating where the large StringImpls come from. Our hypothesis is that most of them come from Resources. We'll publish a document once we get more data.

(tasak) Published a document and proposed to move all remaining Blink objects to PartitionAlloc (https://docs.google.com/document/d/1e4OvVMFuPxtoLGr6VkAu7qTaHWS02qBv5ndQoW8v1Wg/edit#heading=h.lm4d5out9k6c).

(bashi) Removed an if(s_initialized) branch from all allocation paths of PartitionAlloc. The check was needed because there were some call paths that allocate objects in PartitionAlloc before WTF::initialize is called. bashi@ removed all the problematic call paths and guaranteed that WTF::initialize is called first.

Kentaro Hara

unread,
Aug 16, 2015, 8:45:10 PM8/16/15
to blink-dev
Hi

Tokyo is a holiday season called "Obon".

Oilpan:

Where we're now:

1) Ship Oilpan for WebGL (Done)
2) Ship Oilpan for EventTargets (Done)
3) Fix a regression in Speedometer (Done)
4) Remove raw pointers to on-heap objects (Mostly done)
5) Ship Oilpan for core/animations/ (CL is ready but blocked by a regression in frame_times)
6) Fix peak memory increase in memory.top_7_stress (Still investigating)
7) Ship Oilpan for everything

My plan is to flip the Oilpan flag by the end of Sep at the latest.

(haraken, keishi) Investigating the regression in the frame_times metric of tough_animation_cases. For an experiment, I tried to unship Oilpan from AnimatableValue, InterpolableValue, CSSAnimationUpdate etc and removed heavily allocated objects from Oilpan's heap. This dramatically (> 80%) reduced the number of objects in Oilpan and reduced the GC overhead down to < 1% of the total execution time. However, we're still observing a 10% regression in the frame_times metric. This indicates that the regression is coming from outside the GC overhead. Still investigating.

(keishi) Landed a page navigation GC. A page navigation GC is triggered when a frame that has a substantial number of Nodes is navigated. This is important to let the next V8 GC collect DOM objects held by that frame and thus reduce the peak memory increase. This significantly reduces the peak memory increase in memory.top_7_stress, but not yet completely.

(haraken) Investigating the remaining regression in the peak memory increase in memory.top_7_stress.

(peria) Shipped Oilpan for the AbstractWorker hierarchy.

(peria) Tried to ship Oilpan for DOMWindow but realized that it's hard to do that without shipping Oilpan for the Node hierarchy. Suspended.

(yutak, sigbjornf) Removed a lot of raw pointers to on-heap objects.


Memory reduction:

(haraken) Sent an Intent-to-implement to implement a MemoryPurgeController (Design doc: https://docs.google.com/document/d/1TbtkhXpjw_8lftLwELuEPEgcyWfXuVLuqW2KYJZeaBA/edit).

(bashi) Adding the key 10 pages as the target of our memory reduction efforts to the telemetry page set.

(hajimehoshi) Investigating the top 5 objects in the key 10 pages using the per-object-type profiler of PartitionAlloc. It looks like that the top 5 objects tend to be StringImpls and most of them are coming from JavaScript source code and V8 code caching. At the very least, it is clear that the reduction of StringImpls is a key. Investigating how much we can reduce the memory by compressing the JavaScript source code or discarding the V8 code caches.

(tasak) Proposed to replace Blink's allocator with PartitionAlloc and approved. Started landing the CLs for the replacement (Design doc: https://docs.google.com/document/d/1e4OvVMFuPxtoLGr6VkAu7qTaHWS02qBv5ndQoW8v1Wg/edit?pli=1).

(bashi) Exported the isLowEndDeviceMode flag to Blink.

(bashi) Removing all call paths that allocate objects on PartitionAlloc before WTF::initialize.

Kentaro Hara

unread,
Aug 23, 2015, 8:41:23 PM8/23/15
to blink-dev
Hi

Oilpan:

(keishi, haraken) Fixing the frame_times regression in core/animations/. Fixed a heavy allocation design of CSSAnimationUpdate and significantly reduced the number of objects allocated on the heap. On the other hand, keishi@ identified that the frame_times regression is not related to GC overhead. frame_times is regressing by >10% while marking & sweeping are consuming only <1% of the total time. Since we didn't observe such a huge regression as of March, we started bisecting.

(haraken) Investigated the peak memory increase in memory.top_7_stress. Created more than 80 Chromium builds to figure out the relationship between GC parameters and the peak memory increase. The important fact I found is that the peak memory increase is completely gone if we very frequently schedule Oilpan's GCs. This means that the peak memory increase is just caused by GC timings (i.e., we're just hitting the well-known trade-off between performance and memory about GC timings -- we're _not_ hitting any fundamental issues of Oilpan such as leaks, extra overhead of Oilpan's heap layout etc). I wrote a CL to optimize the GC timings, but it is hard to get rid of the regression as long as we adopt a GC.

(yutak, sigbjornf, peria) Removing raw pointers to on-heap objects. yutak@ introduced CrossThreadWeakPersistent.


Memory reduction:

(hajimehoshi) Investigating Blink's memory usage on the key 10 pages on both desktops and mobile using the per-object-type profiler. For now, the key findings are: (1) StringImpl's reduction is a key, (2) the top 5 StringImpls are likely to be consumed by JavaScript raw source code & V8's code caching.

(bashi) Implementing MemoryPurgeController.

(bashi) Adding the key 10 pages to telemetry so that we can keep track of the metric of our memory reduction efforts.

(tasak) Replacing allocators in Blink with PartitionAlloc.

Kouhei Ueno

unread,
Aug 23, 2015, 11:58:20 PM8/23/15
to Kentaro Hara, blink-dev

Inactive Tab Reclaiming Subteam:


Tab state transfer stats UMAs. crbug.com/517335

(kouhei) Win/Linux/CrOS implementation landed for M46

(tzik) Android implementation


Introduce WebPageImportanceSignal to hint Blink->Chromium importance of the tab state. crbug.com/520838

(kouhei) Landed the first signal “hadFormInteraction”. WIP if “issuedFetchWithSideEffects”.


Reclaim unused memory from inactive tabs

(tzik) Wrote LevelDB (the backend of IndexedDB) cache prune patch. Submitted to the internal repository.


Reload from disk cache:

(tzik) Investigating feasibility of disk cache pinning of resources used by inactive tabs.


To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.



--
Kouhei Ueno

Kentaro Hara

unread,
Aug 30, 2015, 8:10:01 PM8/30/15
to blink-dev
Hi

Oilpan:

(haraken, keishi, peria) Investigating the frame_times regression in core/animations/. This is the final blocker for shipping oilpan. We created a bunch of experimental CLs to diagnose the regression (e.g., unship oilpan from AnimatableValues, CSSValues etc) and figured out that the frame_times regression is caused by the two factors:

1) The current heavy allocation design of AnimatableValues and InterpoableValues
2) Some memory locality issue around CSSValue

Regarding 1), we confirmed that we can circumvent the problem (at least in short term) by unshipping oilpan from those heavily allocated objects. We're planning to propose it.

Regarding 2), we need more investigations. It is not an option to unship oilpan from CSSValues. Anyway, unknown memory locality issues must be resolved before shipping.

(haraken, yutak) Confirmed that the peak memory increase in memory.top_7_stress is gone or explained.

(yutak, sigbjornf) Removing raw pointers to on-heap objects.


Memory reduction:

(bashi) Started experimenting purging something with MemoryPurgeController. Experimenting with purging discardable items in the document (https://docs.google.com/document/d/13RMN1dExjQdnSjEaEgxOxdakX8UndbqjAmc2mi_DSp0/edit).

(bashi) Adding the key 10 pages to telemetry. It's blocked by a couple of telemetry bugs.

(hajimehoshi) Collecting numbers about how much StringImpls can be explained by JavaScript source code and V8's code caching in real-world websites. Writing a document.

(tasak) Replacing allocators in Blink with PartitionAlloc. Landing a lot of patches that add WTF_MAKE_FAST_ALLOCATED, DISALLOW_ALLOCATION, ALLOW_ONLY_INLINE_ALLOCATION or STACK_ALLOCATED. I think we should invent a better name for the macros :)

Tab serialization:

(kouhei) Synced with georgesak@ and chrisha@ on collaborating with Tab Discarding effort. Planning to visit the MON office on Sept. 10-11.

(tzik) If the hit rate of the disk cache is high enough, we don't need to implement a special mechanism for pinning resources of discarded tabs; i.e., we can just use the disk cache if the hit rate is high enough. Looking into the current implementation of the disk cache and improving a way to measure the hit rate.

(tzik) Landed LevelDB cache pruning to google3 repo. Roll out to Chrome is pending.

Elliott Sprehn

unread,
Aug 31, 2015, 12:38:56 PM8/31/15
to Kentaro Hara, blink-dev

Why can't we unship the GC for CSSValue? They're leaf nodes and are never exposed to script. They also have simple lifetimes. I don't think oilpan buys us anything for them.

Btw if oilpan doesn't deal well with allocation heavy things how well is it going to handle pages that do a lot of DOM churn?

Sigbjorn Finne

unread,
Aug 31, 2015, 1:05:39 PM8/31/15
to Elliott Sprehn, Kentaro Hara, blink-dev
Den 8/31/2015 18:38, 'Elliott Sprehn' via blink-dev skreiv:
> Why can't we unship the GC for CSSValue? They're leaf nodes and are
> never exposed to script. They also have simple lifetimes. I don't think
> oilpan buys us anything for them.
>

This is what https://codereview.chromium.org/1164573002 discussed and
partially adopted; what is its current status?

--sigbjorn

Elliott Sprehn

unread,
Aug 31, 2015, 2:46:37 PM8/31/15
to Sigbjorn Finne, Sasha Bermeister, tim...@chromium.org, Kentaro Hara, blink-dev
I think we're paused on doing immediates and are just refactoring the code to start. I do think we should switch these back to plain RefCounted, in general I'm not sure why we switched so much of core/css to be GC'ed. Most of the objects live for the lifetime of the page, many of them are immutable (ex. CSSValue). Making them GC'ed just makes Oilpan's heap bigger and the GC pauses longer.

Kentaro Hara

unread,
Aug 31, 2015, 7:32:43 PM8/31/15
to Elliott Sprehn, Sigbjorn Finne, Sasha Bermeister, tim...@chromium.org, blink-dev
Why can't we unship the GC for CSSValue? They're leaf nodes and are never exposed to script. They also have simple lifetimes. I don't think oilpan buys us anything for them.

Unshipping Oilpan from CSSValues adds a bunch of Persistent handles from performance-sensitive on-heap objects to the CSSValues. This decreases performance. Actually I tried to unship Oilpan from CSSValues in https://codereview.chromium.org/1303173007/ and confirmed that it leads to a performance loss.

Another reason is that Oilpan should be designed so that it can tolerate heavily allocated objects like CSSValues. If oilpan cannot support fundamental objects like CSSValues, it would imply that its GC infrastructure is too weak.


Btw if oilpan doesn't deal well with allocation heavy things how well is it going to handle pages that do a lot of DOM churn?

Oilpan already has a robust infrastructure enough to support heavily allocated objects such as CSSValues, Nodes etc, but doesn't yet have a robust infrastructure enough to support super-incredibly heavily allocated objects such as AnimatableValues, InterpolableValues (Note: Putting AnimatableValues & InterpolableValues on Oilpan's heap regresses frame_times, but the regression happens only in the super-micro benchmark on Linux machines).

Overall, I believe that Oilpan already has a robust infrastructure enough to support common Blink's workloads. The final part we're working on now is how to actually land Oilpan without causing any performance regression in any micro benchmarks. It needs some tweak (like unshipping Oilpan from some objects).



Elliott Sprehn

unread,
Aug 31, 2015, 7:40:36 PM8/31/15
to Kentaro Hara, Sigbjorn Finne, Sasha Bermeister, tim...@chromium.org, blink-dev
On Tue, Sep 1, 2015 at 1:32 AM, Kentaro Hara <har...@chromium.org> wrote:
Why can't we unship the GC for CSSValue? They're leaf nodes and are never exposed to script. They also have simple lifetimes. I don't think oilpan buys us anything for them.

Unshipping Oilpan from CSSValues adds a bunch of Persistent handles from performance-sensitive on-heap objects to the CSSValues. This decreases performance. Actually I tried to unship Oilpan from CSSValues in https://codereview.chromium.org/1303173007/ and confirmed that it leads to a performance loss.

That's because you're trying to GC the StyleImage still, don't do that. :)
 
Also Persistent shouldn't be so expensive, how do we fix that?


Another reason is that Oilpan should be designed so that it can tolerate heavily allocated objects like CSSValues. If oilpan cannot support fundamental objects like CSSValues, it would imply that its GC infrastructure is too weak.

Then it should be able to handle cases like the animations though, and you had to unship those. It also couldn't handle the layout tree.
 


Btw if oilpan doesn't deal well with allocation heavy things how well is it going to handle pages that do a lot of DOM churn?

Oilpan already has a robust infrastructure enough to support heavily allocated objects such as CSSValues, Nodes etc, but doesn't yet have a robust infrastructure enough to support super-incredibly heavily allocated objects such as AnimatableValues, InterpolableValues (Note: Putting AnimatableValues & InterpolableValues on Oilpan's heap regresses frame_times, but the regression happens only in the super-micro benchmark on Linux machines).

Overall, I believe that Oilpan already has a robust infrastructure enough to support common Blink's workloads. The final part we're working on now is how to actually land Oilpan without causing any performance regression in any micro benchmarks. It needs some tweak (like unshipping Oilpan from some objects).


Unshipping from animations but not from objects that live as long as the page doesn't make sense to me. Why are we trying to GC all the "stable" objects that live forever?

- E 

Sasha Bermeister

unread,
Aug 31, 2015, 7:40:53 PM8/31/15
to Kentaro Hara, Elliott Sprehn, Sigbjorn Finne, tim...@chromium.org, blink-dev
What if we tied the lifetime of CSSValues directly to the stylesheet, i.e. CSSValues are created and allocated in a block of memory inside StyleSheetContents and are freed only when the stylesheet is destroyed? Functions would only be able to take pointers/references to these objects and any form of GC would be unnecessary, as their lifetime is fixed.

I'm not sure what the lifetime of platform values are right now, but they could probably be tied in a similar way to ComputedStyle.

Kentaro Hara

unread,
Aug 31, 2015, 8:22:27 PM8/31/15
to Elliott Sprehn, Sigbjorn Finne, Sasha Bermeister, tim...@chromium.org, blink-dev
Unshipping from animations but not from objects that live as long as the page doesn't make sense to me. Why are we trying to GC all the "stable" objects that live forever?

It's just because the number of CSSValues is much smaller (and thus no problem) than the number of AnimatableValues & InterpolableValues. In tough_animation_cases.css_properties_*, the number of AnimatableValues & InterpolableValues & other temporary objects associated with them is 200000 whereas the number of CSSValues is 50000.

Just to clarify:

- For Web animations, it is important to ship Oilpan for animation objects such as Animation, AnimationTimeline, ElementAnimation, SampledEffect etc. It is hard to realize their correct lifetime relationship without having Oilpan. We confirmed that we can ship Oilpan for these objects without introducing any performance regression. We're planning to do this now.

- Putting AnimtableValues & InterpolableValues on Oilpan's heap just makes the problem complex. They are temporary objects created to represent each animation value and thus add an incredible pressure on the Oilpan's heap. Also the lifetime of these temporary objects are very clear, and thus there is no advantage in putting them on Oilpan's heap. Of course, we'll be able to resolve the performance issue by implementing an incremental marking or a generational GC, but I don't think that the engineering cost & the complexity outweighs the benefit. At the very least, this won't be a task for Oilpan v1.

- Whether we should put CSSValues on Oilpan's heap or not might be controversial. Unless it causes no performance issues, I'd prefer just putting them on Oilpan's heap.




Kentaro Hara

unread,
Sep 6, 2015, 8:23:58 PM9/6/15
to blink-dev
Hi

Oilpan:

1) Address all issues around destruction ordering of Blink objects (<-- Done)
2) Enable lazy sweeping (<-- Done).
3) Enable idle GC (<-- Done).
4) Fix or explain the peak memory increase observed in memory benchmarks (<-- Done).
5) Ship Oilpan for WebGL (<-- Done).
6) Ship Oilpan for Web animations (<-- Done).
7) Ship Oilpan for EventTargets (<-- Done).
8) Ship Oilpan for other core/ objects independent from the Node hierarchy (<-- Done).
9) Re-collect a full performance/memory result to ship Olipan for everything by default (<-- We're now here).
10) Flip the Oilpan flag to 1.

(haraken, keishi) Finally shipped Oilpan for core/animations/. Unshipped Oilpan from temporary animation objects. See https://groups.google.com/a/chromium.org/d/topic/oilpan-reviews/V7d-7o4AbeA/discussion for more details about why we decided to unship Oilpan from the temporary objects.

(keishi, yutak) Collecting a final performance for the Node hierarchy. It seems there is a slight regression in CSS-related micro-benchmarks (we didn't observe the regression as of April). Investigating.

(peria, sigbjornf) Fixing various stabilization issues.


Memory reduction:

(hajimehoshi) Wrote an insightful document' "Breakdown of Blink’s memory usage" (https://docs.google.com/document/d/1XnN8RAOJDeoiNR9A4LqaUt4PQ3GXLPUJUzVIY49oRRg/edit#). In summary:

- We implemented a per-object-type profiler.
- Using the profiler, we broke down Blink's memory in the key 10 pages.
- We identified that the largest memory consumer is StringImpls. Reduction of StringImpls is a key to reduce Blink's memory.
- We experimented with compressing large StringImpls and confirmed that string compression will reduce Blink's memory by 10 - 60% (18% in average).

(bashi) Experimenting with purging discardable items one by one to see how much memory we can save by discarding each item. At the moment, we're not successful at finding items that have high impact on memory reduction.

(tasak) We want to understand how much memory Blink is using when Blink hits OOM. Added a UMA about it with a hacky CL (https://codereview.chromium.org/1315113006/). (It would be great if we have a way to add parameters to the crash reports... then we won't need this kind of hack.)

(tasak) Replacing allocators in Blink with PartitionAlloc. Almost completed.

Kentaro Hara

unread,
Sep 6, 2015, 8:27:21 PM9/6/15
to blink-dev
(haraken) Scanned all bugs that have Cr-Blink-MemoryAllocator labels. Reduce the number from 102 to 59. 95% of the bugs are Oilpan-related and most of them are going to be fixed as a natural consequence of shipping Oilpan. There are few security-related bugs.

Kentaro Hara

unread,
Sep 13, 2015, 9:29:06 PM9/13/15
to blink-dev
Hi

Oilpan:

(haraken, keishi, sigbjornf) Collected a full performance/memory result of non-Oilpan vs. Oilpan. As I expected :D, we found newly introduced regressions in a bunch of micro-benchmarks. We created the following changes to fix the regressions. I hope we've now addressed almost all the regressions on Oilpan.

- Optimize GC heuristics more (https://codereview.chromium.org/1325783007/).
- Use EmphemeralRange in spellchecker/ (https://codereview.chromium.org/1331893002/).
- Stop allocating a vector buffer in DistributedNodes' constructor (https://codereview.chromium.org/1333813002/).
- Significantly reduce the number of persistent handles & the overhead per persistent handle (https://codereview.chromium.org/1338573003/).

We'll recollect the performance/memory numbers once we land the fixes.

(peria) Shipped Oilpan for accessibility/.

(peria) Moving MediaStream-related objects to Oilpan. Facing a couple of issues around destruction ordering.

(yutak) Hardening a syntax verification for Oilpan. Added a runtime verification to check that GarbageCollected objects are not allocated on stack or as a part of object (This is actually safe but has a risk of resulting in code that is unsafe). Fixed all call sites.


Memory reduction:

(haraken) Our recent profiling showed that the largest memory consumer in Blink is StringImpls. We also confirmed that compressing large StringImpls will reduce Blink’s memory usage by 5 - 60% (17% in average). Based on the data, I wrote a document and proposed a way to compress StringImpls.

(hajimehoshi) Investigating where each of the large StringImpls in the key 10 pages are coming from. Most of them come from JavaScript source code. Another origin is CSSImageValue::m_relativeURL and CSSImageValue::m_absoluteURL. It sometimes happens that the identical data-urls are duplicated in the m_relativeURL and m_absoluteURL. Maybe we want to change the String to AtomicString. (We'll start a separate thread for this.)

(bashi) Still tackling to add the key 10 pages to telemetry. Bots are still failing.

(bashi) Experimenting with discarding items listed in the document. At the moment, we're still not successful at finding items that have an impact on Blink's overall memory. It is indeed true that Blink has a lot of uncontrolled caches, but it seems that discarding the caches doesn't have a big impact on Blink's memory reduction. Still experimenting.

(bashi, hajimehoshi) To land the per-object-type profiler for PartitionAlloc, we need a way to get a class name for each object allocation. We need to get the class name using a stack trace somehow. We're considering the best way to do that.


Tab serialization:

(kouhei) Summarized the current priorities of the tab serialization project in the document.

(kouhei) Travelling MTV/MON. Syncing with leads.

(tzik) Continuing to optimize DiskCache.

- Created a URLRequest sniffer to collect better cache efficiency measurement.
- Set up a local server to measure the cache hit rate.
- Ported usage-based eviction from Blockfile backend to Simple backend.

As a result, the cache hit rate improved from 41.7% to 49.5% on a benchmark.

(tzik) Implemented an infrastructure for an eviction algorithm simulator.

Kentaro Hara

unread,
Sep 13, 2015, 9:34:30 PM9/13/15
to blink-dev
On Mon, Sep 14, 2015 at 10:28 AM, Kentaro Hara <har...@chromium.org> wrote:
Hi

Oilpan:

(haraken, keishi, sigbjornf) Collected a full performance/memory result of non-Oilpan vs. Oilpan. As I expected :D, we found newly introduced regressions in a bunch of micro-benchmarks. We created the following changes to fix the regressions. I hope we've now addressed almost all the regressions on Oilpan.

- Optimize GC heuristics more (https://codereview.chromium.org/1325783007/).
- Use EmphemeralRange in spellchecker/ (https://codereview.chromium.org/1331893002/).
- Stop allocating a vector buffer in DistributedNodes' constructor (https://codereview.chromium.org/1333813002/).
- Significantly reduce the number of persistent handles & the overhead per persistent handle (https://codereview.chromium.org/1338573003/).

We'll recollect the performance/memory numbers once we land the fixes.

(peria) Shipped Oilpan for accessibility/.

(peria) Moving MediaStream-related objects to Oilpan. Facing a couple of issues around destruction ordering.

(yutak) Hardening a syntax verification for Oilpan. Added a runtime verification to check that GarbageCollected objects are not allocated on stack or as a part of object (This is actually safe but has a risk of resulting in code that is unsafe). Fixed all call sites.


Memory reduction:

(haraken) Our recent profiling showed that the largest memory consumer in Blink is StringImpls. We also confirmed that compressing large StringImpls will reduce Blink’s memory usage by 5 - 60% (17% in average). Based on the data, I wrote a document and proposed a way to compress StringImpls.

(hajimehoshi) Investigating where each of the large StringImpls in the key 10 pages are coming from. Most of them come from JavaScript source code. Another origin is CSSImageValue::m_relativeURL and CSSImageValue::m_absoluteURL. It sometimes happens that the identical data-urls are duplicated in the m_relativeURL and m_absoluteURL. Maybe we want to change the String to AtomicString. (We'll start a separate thread for this.)

(tasak) Investigating where the SharedBuffers are coming from. Identified that StyleSheetResource is one of the main consumers of SharedBuffers and found a way to clear the StyleSheetResrouce earlier (than today).

(haraken, tasak, bashi, hajimehoshi) Synced with the London team. Discussed how to collaborate.

 

(bashi) Still tackling to add the key 10 pages to telemetry. Bots are still failing.

(bashi) Experimenting with discarding items listed in the document. At the moment, we're still not successful at finding items that have an impact on Blink's overall memory. It is indeed true that Blink has a lot of uncontrolled caches, but it seems that discarding the caches doesn't have a big impact on Blink's memory reduction. Still experimenting.

(bashi, hajimehoshi) To land the per-object-type profiler for PartitionAlloc, we need a way to get a class name for each object allocation. We need to get the class name using a stack trace somehow. We're considering the best way to do that. 
 
 
Tab serialization:

(kouhei) Summarized the current priorities of the tab serialization project in the document.

(kouhei) Travelling MTV/MON. Syncing with leads.

(tzik) Continuing to optimize DiskCache.

- Created a URLRequest sniffer to collect better cache efficiency measurement.
- Set up a local server to measure the cache hit rate.
- Ported usage-based eviction from Blockfile backend to Simple backend.

As a result, the cache hit rate improved from 41.7% to 49.5% on a benchmark.

(tzik) Implemented an infrastructure for an eviction algorithm simulator.



--
Kentaro Hara, Tokyo, Japan

Daniel Bratell

unread,
Sep 14, 2015, 6:08:57 AM9/14/15
to blink-dev, Kentaro Hara, Ross McIlroy
On Mon, 14 Sep 2015 03:28:27 +0200, Kentaro Hara <har...@chromium.org> wrote:


Memory reduction:

(haraken) Our recent profiling showed that the largest memory consumer in Blink is StringImpls. We also confirmed that compressing large StringImpls will reduce Blink’s memory usage by 5 - 60% (17% in average). Based on the data, I wrote a document and proposed a way to compress StringImpls.

There were some other posts that indicated a large part of this was javascript sourcecode passed from the network code, via blink, into v8. Is that interpretation correct and if so, can something be done in particular to that use case/code path?

With the V8 ignition project happening, there might be room for rethinking this.

(An old (pre Opera-10.5)  Opera method was to recreate the source code from the AST if needed (very rarely needed). Dropped in the O10.5-O12 engine (Carakan) because of reasons. Whitespace preservation might have been one issue)

Kentaro Hara

unread,
Sep 14, 2015, 8:00:44 AM9/14/15
to Daniel Bratell, blink-dev, Ross McIlroy
On Mon, Sep 14, 2015 at 7:08 PM, Daniel Bratell <bra...@opera.com> wrote:
On Mon, 14 Sep 2015 03:28:27 +0200, Kentaro Hara <har...@chromium.org> wrote:


Memory reduction:

(haraken) Our recent profiling showed that the largest memory consumer in Blink is StringImpls. We also confirmed that compressing large StringImpls will reduce Blink’s memory usage by 5 - 60% (17% in average). Based on the data, I wrote a document and proposed a way to compress StringImpls.

There were some other posts that indicated a large part of this was javascript sourcecode passed from the network code, via blink, into v8. Is that interpretation correct and if so, can something be done in particular to that use case/code path?

That was my original plan, but later I noticed that we can implement the compression at the StringImpl layer without adding a lot of complexity. So I'm currently investigating the approach.


 
With the V8 ignition project happening, there might be room for rethinking this.

(An old (pre Opera-10.5)  Opera method was to recreate the source code from the AST if needed (very rarely needed). Dropped in the O10.5-O12 engine (Carakan) because of reasons. Whitespace preservation might have been one issue)

/Daniel


--
/* Opera Software, Linköping, Sweden: CEST (UTC+2) */

Ross McIlroy

unread,
Sep 14, 2015, 11:43:48 AM9/14/15
to Daniel Bratell, blink-dev, Kentaro Hara, Ross McIlroy
It is very early days for the V8 Ignition interpreter project, so it would be quite a way down the road before it will be usable. One of the goals is to avoid having to re-parse the JS source in order to perform optimized compilation in TurboFan, however there are a number of other situations where we would still need the JS source - e.g., code which call toString() on the function, or code which is to be optimized by Crankshaft instead of TurboFan.  As such, this is possibly something we could try to aspire to as a long-term goal, but not something that would be a short-term fix.

Daniel Bratell

unread,
Sep 14, 2015, 2:06:07 PM9/14/15
to Kentaro Hara, blink-dev, Ross McIlroy
On Mon, 14 Sep 2015 14:00:10 +0200, Kentaro Hara <har...@chromium.org> wrote:

On Mon, Sep 14, 2015 at 7:08 PM, Daniel Bratell <bra...@opera.com> wrote:
On Mon, 14 Sep 2015 03:28:27 +0200, Kentaro Hara <har...@chromium.org> wrote:


Memory reduction:

(haraken) Our recent profiling showed that the largest memory consumer in Blink is StringImpls. We also confirmed that compressing large StringImpls will reduce Blink’s memory usage by 5 - 60% (17% in average). Based on the data, I wrote a document and proposed a way to compress StringImpls.

There were some other posts that indicated a large part of this was javascript sourcecode passed from the network code, via blink, into v8. Is that interpretation correct and if so, can something be done in particular to that use case/code path?

That was my original plan, but later I noticed that we can implement the compression at the StringImpl layer without adding a lot of complexity. So I'm currently investigating the approach.

Looks simple enough to do just to gather some data but I would guess there might be a tail of performance fixes from such a change. I'll stay tuned for more!

Kentaro Hara

unread,
Sep 20, 2015, 8:34:35 PM9/20/15
to blink-dev
Hi

Tokyo is on public holidays until Wed. We'll be quiet.

Oilpan:

(haraken, sigbjornf, keishi) Recollected performance numbers and addressing remaining regressions.

- Parser.iframe-append-remove (Addressed by this CL.)
- Dromaeo.dom-modify (Addressed by this CL, although I don't understand why it improves performance. Maybe we're hitting a compiler bug. Some regression still remains but I realized that it is not a real regression. The way Dromaeo.dom-modify measures runs/sec is just inappropriate.)
- CSS.ClassInvalidation (Addressed by this CL, although I don't understand why it improves performance. Maybe we're hitting a compiler bug.)
- DOM.textarea-dom (Addressed by this CL.)
- CSS.FocusUpdate

I'm getting an impression that we're already running into a phase to optimize code for micro-benchmarks (i.e., it seems that we've finished addressing fundamental issues in Oilpan's infrastructure). Once we land the above CLs, we'll recollect performance numbers and try to propose Oilpan's shipping to blink-dev@ (even if a couple of micro-benchmarks still regress).

(peria) Moved LifecycleNotifier to Oilpan's heap.

(peria) Moving MediaStreamSource etc to Oilpan's heap. Hitting a complex issue about destruction ordering between Chromium and Blink.

(yutak) Refactoring the GC plugin to harden syntax verification.


Memory reduction:

(haraken, hajimehoshi) Experimenting with compressing large StringImpls. One concern raised by the V8 team is that V8 may be accessing large JavaScript source code quite often to (re)compile the source. To measure how often V8 is accessing large external strings, haraken@ created a V8 CL that always falls back to Blink's binding layer when V8 accesses external strings. As far as hajimehoshi@ measured the number of large external strings accessed by V8 (after the strings get compressed at some point), the number looks very limited. So we decided to start implementing StringImpl's compression and get more accurate numbers.

(bashi) To land the per-object-type profiler for PartitionAlloc on trunk, we need a way to get a stack trace from PartitionAlloc::allocate() (in order to know what object is allocating on PartitionAlloc). Investigating the best way to get the stack trace. Once the repositories are merged, maybe we can just use stack trace APIs in src/base/debug/.

(bashi, haraken) Investigated a short-term way (i.e., mergeable to old Chrome branches) to fix leaks caused by V8 <=> Blink reference cycles around CustomEvent, but couldn't come up with a workable idea. In long-term, it will be fixed by traceWrapper though.

(tasak) Investigating the lifetime of SharedBuffer. The top three customers of the SharedBuffers are as follows.

- ScriptResource (The lifetime is managed correctly. ScriptResource drops a reference to the underlying SharedBuffer immediately after the ScriptResource passes the ownership to V8's external string.)

- StyleSheetResource (This sometimes lives longer than needed because we're missing a couple of setResource(0). tasak@ created a CL to add setResource(0) but some inspector tests are failing.)

- ImageResource (This sometimes lives longer than needed. The culprit is ResourceFetcher::m_documentResources, which can retain a lot of ImageResources. The m_documentResources is not cleared until garbageCollectDocumentResources() gets called but there is no guarantee that it is called soon. tasak@ is trying to remove m_documentResources.)

Kentaro Hara

unread,
Sep 27, 2015, 8:06:36 PM9/27/15
to blink-dev
Hi

Tokyo was on public holidays on Mon, Tue and Wed (called Silver Week; c.f., Golden Week is in May).

Oilpan:

(haraken, sigbjornf, keishi) Investigating the remaining regressions one by one. As a result of fixing the regression, we often get to a conclusion that we were just gaming micro-benchmarks (e.g., CLs listed below). It seems that the only substantial regressions are the following two:

- bindings.create-element (it seems that object allocation in Oilpan is slower than that in PartitionAlloc)
- css.FocusUpdate & blink_style.top_25 (it seems that style recalculation in Oilpan is slower than in non-Oilpan)

(sigbjornf) Tried to fix a regression in css.ClassInvalidation but he mistakenly created a CL to make non-oilpan even faster (CL) :-)

(sigbjornf) Fixed a regression in dom.textarea-dom (CL).

(sigbjornf) Fixed OOM in dromaeo.dom-modify in 32-bit Windows (CL).

(keishi) Fixed a slight regression in Speedometer by tweaking a GC policy (CL).

(haraken) Trying to fix a regression in css.ClassInvalidation but gave up because it is just gaming micro-benchmarks & compiler optimizations (CL).

(kbr, haraken, sigbjornf) Fixed WebGL crashes related to Oilpan.

(haraken) Doing Q4 planning. Needless to say, the goal is to do the flag flip ASAP....


Memory reduction:

(bashi, haraken) The Polymer team reported that the number of documents in Polymer apps is leaking. We identified that it is called by a known leak caused by a V8 <=> Blink reference cycles around CustomEvent::m_detail. bashi@ created a CL to fix it. (In long term, these leaks caused by V8 <=> Blink reference cycles will be completely removed by introducing traceWrapper.)

(hajimehoshi) In Gmail, a huge data-url is duplicated between CSSImageValue::m_relativeURL and CSSImageValue::m_absoluteURL in Gmail. Replaced them to use an AtomicString. This reduced 137 KB from Gmail (CL).

(bashi) When implementing an allocation-site profiler in memory-infra, we need to get a stack trace when each object gets allocated in PartitionAlloc or Oilpan. Considering a way to get the stack trace.

(tasak) Experimenting a way to remove ResourceFetcher::m_documentResources. The hash map is keeping Resource objects longer than needed and thus causing a memory bloat.

(haraken) Doing Q4 planning.


Tab serialization:

(kouhei) Doing Q4 planning. A tentative objective is going to be "80% of the tabs should be restored within 1 second on Nexus devices".

(kouhei) Synced with MON team. Bumped up the priority for WebPageImportanceSignals to unblock shipping tab discarding (CL, CL).

(tzik) Published a design document of the cache eviction algorithm in disk_cache.

(tzik) Investigated a cache mode suitable for the tab restoration. Maybe we can just use the same mode as the back-forward navigation, then resources are loaded from the cache regardless of their expiration.

Kentaro Hara

unread,
Oct 4, 2015, 8:58:00 PM10/4/15
to blink-dev, Project TRIM
Hi

Oilpan:

(haraken, keishi) As I wrote last week, the remaining regressions we have to address are the following two:

a) create-element & query-selector & dromaeo.dom-modify
b) blink_style.top_25.update_style

Regarding a), I identified the cause and uploaded a fix. Regarding b), keishi@ is still minimizing the test case.

(haraken) Updated GC heuristics in many ways to reduce a risk of increasing peak memory usage in Oilpan.

(peria) Moved MediaStreamSource's family to Oilpan's heap. This required a bunch of work to address destruction issues between Chromium and Blink, fixing leaks, updating tests.

(peria) Started moving GarbageCollected types from platform/ to wtf/. The motivation is that we want to implement runtime verifications to check that Vector<T*>, HashSet<T*> etc are not used for T that is GarbageCollected. It is important to detect this error because raw pointers in off-heap collections are not traced by Oilpan (i.e., it has a risk of causing use-after-free). To implement the runtime verifications in off-heap collections in wtf/ (e.g., Vector, HashTable), GarbageCollected types must be defined in wtf/. (Note: We're not planning to move Oilpan from platform/ to wtf/. We're just planning to move only Member, Persistent, GarbageCollected etc to wtf/.)

(yutak) Refactoring the GC plugin to get it under control.

(sigbjornf) A lot of maintenance/stabilization work.

(haraken, keishi, peria, yutak) Finalized Q4 plans.


Memory reduction:

(haraken, tasak) We're going to give a presentation of "the most 5 impactful projects to reduce Blink's memory" at APAC BrownBag this week. Collecting a lot of data to support the proposal.

(bashi) Added the key 10 pages to telemetry and started tracking its memory usage per allocator.

(bashi) Experimenting with dropping discardable items in Blink and investigating its memory impact. The status is described in this spreadsheet.

(tasak) Investigating the memory impact of ImageResource, FontResource and FontCache.

(hajimehoshi) On vacation.

(haraken, bashi, tasak) Finalized Q4 plans.


Instant tab restore:

(kouhei, tzik, haraken) Finalized Q4 plans. Overall, we're planning to focus on 1) drastic memory reduction for inactive tabs and 2) instant tab restore from a disk cache.

Daniel Bratell

unread,
Oct 6, 2015, 12:00:11 PM10/6/15
to blink-dev, Project TRIM, Kentaro Hara, ba...@chromium.org
On Mon, 05 Oct 2015 02:57:24 +0200, Kentaro Hara <har...@chromium.org> wrote:


Memory reduction:

(haraken, tasak) We're going to give a presentation of "the most 5 impactful projects to reduce Blink's memory" at APAC BrownBag this week. Collecting a lot of data to support the proposal.

I hope you can make this information available outside Google as well.


(bashi) Experimenting with dropping discardable items in Blink and investigating its memory impact. The status is described in this spreadsheet.

bashi, could you make it possible to comment in it? I want to write comments! :-)

As a micro-snippet from me: I'm looking at DisplayLists in blink and cc. Currently trying to measure performance impact of some changes.

Kenichi Ishibashi

unread,
Oct 6, 2015, 6:55:41 PM10/6/15
to Daniel Bratell, blink-dev, Project TRIM, Kentaro Hara
On Wed, Oct 7, 2015 at 1:00 AM, Daniel Bratell <bra...@opera.com> wrote:
On Mon, 05 Oct 2015 02:57:24 +0200, Kentaro Hara <har...@chromium.org> wrote:


Memory reduction:

(haraken, tasak) We're going to give a presentation of "the most 5 impactful projects to reduce Blink's memory" at APAC BrownBag this week. Collecting a lot of data to support the proposal.

I hope you can make this information available outside Google as well.


(bashi) Experimenting with dropping discardable items in Blink and investigating its memory impact. The status is described in this spreadsheet.

bashi, could you make it possible to comment in it? I want to write comments! :-)
Done :) 

Kentaro Hara

unread,
Oct 12, 2015, 2:05:14 AM10/12/15
to blink-dev
Hi

Oilpan:

(haraken) Fixed a regression observed in create-element, Dromaeo/dom-modify, query-selector.

(sigbjorn, keishi) Investigated a regression observed in style recalculations in blink_style.top_25. Finally we identified that the regression is caused because the benchmark creates and destroys a ton of StyleResolvers. We created a CL that fixes the regression, but I don't think the CL is worth landing because it is just gaming micro-benchmarks. I'm planning to justify the regression.

(haraken, keishi, yutak, peria) After all, I think we've addressed all substantial regressions observed on oilpan builds. We started the final performance measurement. Once the result is good, I'll send an announcement to blink-dev@ about the oilpan flag flip.


Memory reduction:

(bashi, tasak, hajimehoshi, haraken) Wrote a slide that compiles our first insights about Blink's memory usage. I want to emphasize that all the data is the slide is collected by super hard work of bashi@, tasak@ and hajimehoshi@ in the past 3 months.

(tasak) Did a lot of measurements for collecting the data for the slide.

(bashi) Finished the first round investigation of memory impact of discardable items in Blink. The result is summarized in this spreadsheet.

(hajimehoshi) Creating a prototype to compress StringImpls.


Instant tab restore:

(tzik) Created an infra to automate tab restoration benchmarks. Using the infra, broke down the time taken to restore a tab from a disk cache. The result is summarized in this document and this spreadsheet.

Based on the data, in Q4, kouhei@ and tzik@ will focus on significantly reducing the time taken to restore a tab from a disk cache (i.e., "instant tab restore").

(kouhei) Set up telemetry based infra for tracking inactive tabs memory. Studying existing mechanisms for memory pressure management in Chromium.

Kentaro Hara

unread,
Oct 18, 2015, 9:24:46 PM10/18/15
to blink-dev
Hi

Oilpan:

(haraken, yutak, keishi, peria) Finished collecting full performance on all platforms. Now the Oilpan's performance looks good enough and we published a proposal to ship Oilpan for everything. It was approved by Blink. haraken@ is talking a concrete shipping step with release managers.

(yutak) Started writing a comprehensive document about Oilpan's programming rule.

(peria) Vector<T*> should not be used if T is a GarbageCollected object (You have to use HeapVector<Member<T>>). Adding a runtime verification about it and fixed >20 wrong usages in the code base.


Memory reduction:

(haraken, bashi, tasak, hajimehoshi, kinuko, hiroshige) Talked about our short-term action items. Here is the list:

  • MemoryPurgeController should work on trunk (bashi@)

    • For inactive tabs & MemoryPressureListeners

    • Purge discardable items (worth introducing DiscardableHashMap?)

    • Add UMAs

  • memory-infra should have more profiling data about Blink objects (tasak@ and yukishiino@)

    • Allocation-site profiler and object-type profiler should be integrated to memory-infra (waiting for ruuda@)

      • StringImpls, Vectors, HashTables, objects in FastMalloc partition (should be integrated with perf-insights?)

    • Cross-allocator relationships should be explained in memory-infra

      • Resource => locked discardable memory

      • ImageResource => Skia image

      • FontResource => Skia FontFace

      • LayoutObject => CC

  • The key 10 pages should be profiled more in details (yukishiino@)

    • Break down short-running applications again after resolving unclear points

      • Take the result on Linux (where malloc is explained)

      • Explain the dark matter

      • Explain the FastMalloc partition

      • Exclude unlocked discardable memory

    • Break down long-running applications (lower priority)

    • Explain Vectors and HashTables

  • Large StringImpls should be compressed (hajimehoshi@)

    • Create a prototype

    • Collect data

  • Memory retained by Resources should be explained and purged (hiroshige@)

    • Step 1: All SharedBuffers and locked discardable memory in the key 10 pages should be explained

    • Step 2: ResourcePtrs shouldn’t be kept alive longer than needed

      • HTMLLinkElement::m_styleSheetResource should be promptly cleared when it finishes parsing

      • ResourceFetcher::m_documentResources should be removed

    • Step 3: All memory retained by Resources (including various caches) should be visualized in memory-infra

      • ImageResource, ScriptResource, StylesheetResource, FontResource etc

      • Skia image, Font cache, Glyph cache etc

  • Telemetry+perfinsights should provide enough benchmarks and metrics to keep track of our memory-reduction efforts (bashi@)

    • List up items we want to add to perf-insights before Nat comes to Tokyo

  • Worth considering:

    • Introduce DiscardableHashMap (we should ask blink-dev@)

    • Is there anything in Blink that should be moved to Chromium’s discardable memory?

(bashi) Exporting detailed memory information in Blink to perf-insights. We'll work with Nat this week.

(hajimehoshi) Creating a prototype for large StringImpl compression.


Instant tab restore:

(kouhei) Created WIP patches to accelerate first meaningful paint.

- Don't bother layout until first navigation is done.

- BackgroundHTMLParser: Introduce ParsedChunkQueue to pass ParsedChunks to main thread


(tzik) Investigating performance of tab restoration. Fixing low-hanging fruits:

- UA string cache for faster resource request

- Preallocated StringImpl creation for CoreInitializer speed up


(tzik) We found that it takes 250 ms from a renderer process is created to WebKit::initialize is called. Investigating why it is taking as much as 250 ms. This is a large bottleneck in tab restoration.

Reply all
Reply to author
Forward
0 new messages