- Introduce a low-device mode to Blink (which aggressively discards memory)
I don't know if there is a design for this, but I would recommend staying away from a "either low-device or high-device" approach and instead have some kind of numerical setting for how aggressive this should be. Devices and personal computers come in a range of configurations and working environments and "two sizes fit all" will not work.
I don't know if there is a design for this, but I would recommend staying away from a "either low-device or high-device" approach and instead have some kind of numerical setting for how aggressive this should be. Devices and personal computers come in a range of configurations and working environments and "two sizes fit all" will not work.I don't yet have a design doc. What I'm planning is something like:- Purge unused system pages in PartitionAlloc & Oilpan.- Get various uncontrolled caches in Blink under control and discard the contents.
- (Experimental) Forcibly drop references to V8 wrappers created by a V8 context we have already navigated away a long time ago.
FWIW, V8, GPU buffer etc already support a low-device mode to save memory aggressively.
Hmm I am not convinced this strategy would work on Android by the way the lowmem killer and the framework work.TL;DR android kills as many non-fg apps as needed trying to maintain a margin of free memory (this is technically wrong for a lot of subtle aspects and corner cases, but is a good approximation of reality).I bet that what you will find out if you measure the free memory on the system on Android is that, with exception of a fresh boot, that amount stays pretty much constant. In other words: if you become a better citizen and use less memory, you will likely cause a previously running app to not be evicted and stay there. Which is good, but not directly measurable by looking at the amount of free memory in the system.
(peria) Finished collecting performance & memory data for core/animations/ and wrote up a document (https://docs.google.com/document/d/1tQqDtDN8xiDFqpzRTA5oubBaWscMZRx63rYLtpApG_o/edit#heading=h.qwbn9p8fji5x). It looks like there is no observable regression/improvement. Will chat with the SYD team if it's ok to ship Oilpan for core/animations/.
Blogger: +4M
Wordpress: -1M (Good)
Blogger: +6.2M
G+: +1.88 MDesktop
Facebook -1M (Good)
Blogger: +16.8 M
Facebook: +2.4M
Gmail: +6.7 M
G+: +3M
Wordpress: +2M
(peria) Finished collecting performance & memory data for core/animations/ and wrote up a document (https://docs.google.com/document/d/1tQqDtDN8xiDFqpzRTA5oubBaWscMZRx63rYLtpApG_o/edit#heading=h.qwbn9p8fji5x). It looks like there is no observable regression/improvement. Will chat with the SYD team if it's ok to ship Oilpan for core/animations/.Many thanks for getting all the numbers. I was looking at both the doc (which doesn't seem open to comments) and the telemetry data and I see some numbers that concern me a bit.The doc mentions "Those regressions are at most 9MB in real number, and are small enough to be ignored.". I don't know if there is and what is the decisional threshold on desktop platforms, but on Android few MB regressions in the page cyclers have proven to be noticed and problematic (crbug.com/475637 is just one of the recent horror stories that comes on top of my head, and that was about a +2 MB delta)
I see some regressions in the order of MB in the telemetry data you reported. Do we understand where they come from? Is there anything we can do there?Metric: Memory/vm_private_dirty_final_renderer (summarizing the biggest deltas |>| 1M)Nexus 7Blogger: +4MWordpress: -1M (Good)Nexus 4Blogger: +6.2MG+: +1.88 MDesktop
Facebook -1M (Good)Blogger: +16.8 MFacebook: +2.4MGmail: +6.7 MG+: +3MWordpress: +2M
Thanks,Primiano--On Mon, Jul 6, 2015 at 1:02 AM, Kentaro Hara <har...@chromium.org> wrote:HiNote: Currently stack traces of ASan + Oilpan are broken due to a LLVM bug. A workaround is to use "--no-sandbox". See https://code.google.com/p/chromium/issues/detail?id=502974 for more details.Oilpan things:(sigbjornf, haraken) Finally enabled lazy sweeping on trunk. We're carefully watching if the lazy sweeping causes any issue around destruction ordering on trunk. Landed another ASan verification to detect use-after-free in Oilpan's heaps (mostly associated with destruction ordering issues in Blink).(haraken) Enabled idle GCs on trunk. However, it was reverted because it broke telemetry tests.(peria) Finished collecting performance & memory data for core/animations/ and wrote up a document (https://docs.google.com/document/d/1tQqDtDN8xiDFqpzRTA5oubBaWscMZRx63rYLtpApG_o/edit#heading=h.qwbn9p8fji5x). It looks like there is no observable regression/improvement. Will chat with the SYD team if it's ok to ship Oilpan for core/animations/.(yutak, keishi) Investigating Blink's memory workload and its relationship to Oilpan's GC and V8's GC. yutak@ identified that (1) forcing a precise GC at a page navigation completely solves the peak memory increase of Blogger (but we cannot simply do this because it can lead to too many precise GCs), (2) if we properly optimize GC timings, the peak memory increases in other memory benchmarks are gone (which means that the peak memory increases are not caused by Oilpan's memory allocation strategy -- e.g., type-specific heaps, not yet discarding unused system pages, worst-fit allocation etc; we identified that what matters is just GC timing). yutak@ is investigating more to optimize the heuristics that determines the GC timing.(haraken) Reduced sizeof(Persistent) from 4*sizeof(void*) to 2*sizeof(void*). This is important to reduce sizeof(DOM object) and thus improve cache locality.(haraken) Landed a change to decommit unused mmap regions more aggressively. Another idea to more reduce Oilpan's memory usage is to find unused system pages while sweeping and discard them. Experimenting.(sigbjornf) Moved ScrollableArea, EventSource etc to Oilpan's heap.Non-oilpan things:(hajimehoshi) hajimehoshi@ is back to the memory team! Supported ActiveDOMObjects in the leak detector (https://docs.google.com/document/d/1sFAsZxeISKnbGdXoLZlB2tDZ8pvO102ePFQx6TX4X14/edit). Then the leak detector detected 120 leaks in existing layout tests... Also supported ScriptPromise, Frame etc in the leak detector. We're planning to distribute the leak-fix work once we get a list of problematic leaks.(bashi) Managed to make memory-infra + telemetry workable and collected the first result of Blink's memory usage to the total memory usage of a renderer process in real-world websites. Analyzing the data to make sure that the data is correct and consistent.(tasak) Re-collecting performance & memory data of full-PartitionAlloc-Chromium, full-tcmalloc-Chromium and full-system-allocator-Chromium on Win 32 bit, Win 64 bit, Mac, Linux and Nexus.(haraken) Started experimenting with exporting Oilpan's buffer allocator to normal Vectors, HashMaps, StringBuilders, ArrayBuffers etc. It will take a couple of weeks to make it workable and collect data.--Kentaro Hara, Tokyo, Japan
On Mon, Jul 6, 2015 at 7:37 PM, Primiano Tucci <prim...@chromium.org> wrote:(peria) Finished collecting performance & memory data for core/animations/ and wrote up a document (https://docs.google.com/document/d/1tQqDtDN8xiDFqpzRTA5oubBaWscMZRx63rYLtpApG_o/edit#heading=h.qwbn9p8fji5x). It looks like there is no observable regression/improvement. Will chat with the SYD team if it's ok to ship Oilpan for core/animations/.Many thanks for getting all the numbers. I was looking at both the doc (which doesn't seem open to comments) and the telemetry data and I see some numbers that concern me a bit.The doc mentions "Those regressions are at most 9MB in real number, and are small enough to be ignored.". I don't know if there is and what is the decisional threshold on desktop platforms, but on Android few MB regressions in the page cyclers have proven to be noticed and problematic (crbug.com/475637 is just one of the recent horror stories that comes on top of my head, and that was about a +2 MB delta)9 MB is way too large.peria@: Where do you observe the 9 MB regression in the telemetry data? Also I wonder if Blogger, Gmail and Google Calendar are using Web animations so heavily.I see some regressions in the order of MB in the telemetry data you reported. Do we understand where they come from? Is there anything we can do there?Metric: Memory/vm_private_dirty_final_renderer (summarizing the biggest deltas |>| 1M)Nexus 7Blogger: +4MWordpress: -1M (Good)Nexus 4Blogger: +6.2MG+: +1.88 MDesktop
Facebook -1M (Good)Blogger: +16.8 MFacebook: +2.4MGmail: +6.7 MG+: +3MWordpress: +2MThese *_final_* metrics don't make much sense because the result highly depends on when the last GC has happened. What matters are *_peak_* metrics (and I see no substantial regression in the *_peak_* metrics).(BTW, I forgot to remove the document link before sending the snippet -- we were planning to discuss the result with the SYD team and then share it with the blink-dev :-)
-- Regarding performance, PartitionAlloc is much faster than tcmalloc and system allocators.-- Regarding memory usage, PartitionAlloc is sometimes better but sometimes worse. It seems that the memory usage doesn't really change depending on what allocators we use. This would be because the actual amount of objects we have to allocate doesn't change depending on what allocators we use.
Inactive Tab Reclaiming Subteam:
Tab state transfer stats UMAs. crbug.com/517335
(kouhei) Win/Linux/CrOS implementation landed for M46
(tzik) Android implementation
Introduce WebPageImportanceSignal to hint Blink->Chromium importance of the tab state. crbug.com/520838
(kouhei) Landed the first signal “hadFormInteraction”. WIP if “issuedFetchWithSideEffects”.
Reclaim unused memory from inactive tabs
(tzik) Wrote LevelDB (the backend of IndexedDB) cache prune patch. Submitted to the internal repository.
Reload from disk cache:
(tzik) Investigating feasibility of disk cache pinning of resources used by inactive tabs.To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
Why can't we unship the GC for CSSValue? They're leaf nodes and are never exposed to script. They also have simple lifetimes. I don't think oilpan buys us anything for them.
Btw if oilpan doesn't deal well with allocation heavy things how well is it going to handle pages that do a lot of DOM churn?
Why can't we unship the GC for CSSValue? They're leaf nodes and are never exposed to script. They also have simple lifetimes. I don't think oilpan buys us anything for them.
Btw if oilpan doesn't deal well with allocation heavy things how well is it going to handle pages that do a lot of DOM churn?
Why can't we unship the GC for CSSValue? They're leaf nodes and are never exposed to script. They also have simple lifetimes. I don't think oilpan buys us anything for them.Unshipping Oilpan from CSSValues adds a bunch of Persistent handles from performance-sensitive on-heap objects to the CSSValues. This decreases performance. Actually I tried to unship Oilpan from CSSValues in https://codereview.chromium.org/1303173007/ and confirmed that it leads to a performance loss.
Another reason is that Oilpan should be designed so that it can tolerate heavily allocated objects like CSSValues. If oilpan cannot support fundamental objects like CSSValues, it would imply that its GC infrastructure is too weak.
Btw if oilpan doesn't deal well with allocation heavy things how well is it going to handle pages that do a lot of DOM churn?Oilpan already has a robust infrastructure enough to support heavily allocated objects such as CSSValues, Nodes etc, but doesn't yet have a robust infrastructure enough to support super-incredibly heavily allocated objects such as AnimatableValues, InterpolableValues (Note: Putting AnimatableValues & InterpolableValues on Oilpan's heap regresses frame_times, but the regression happens only in the super-micro benchmark on Linux machines).
Overall, I believe that Oilpan already has a robust infrastructure enough to support common Blink's workloads. The final part we're working on now is how to actually land Oilpan without causing any performance regression in any micro benchmarks. It needs some tweak (like unshipping Oilpan from some objects).
Unshipping from animations but not from objects that live as long as the page doesn't make sense to me. Why are we trying to GC all the "stable" objects that live forever?
HiOilpan:(haraken, keishi, sigbjornf) Collected a full performance/memory result of non-Oilpan vs. Oilpan. As I expected :D, we found newly introduced regressions in a bunch of micro-benchmarks. We created the following changes to fix the regressions. I hope we've now addressed almost all the regressions on Oilpan.- Optimize GC heuristics more (https://codereview.chromium.org/1325783007/).- Use EmphemeralRange in spellchecker/ (https://codereview.chromium.org/1331893002/).- Stop allocating a vector buffer in DistributedNodes' constructor (https://codereview.chromium.org/1333813002/).- Significantly reduce the number of persistent handles & the overhead per persistent handle (https://codereview.chromium.org/1338573003/).We'll recollect the performance/memory numbers once we land the fixes.(peria) Shipped Oilpan for accessibility/.(peria) Moving MediaStream-related objects to Oilpan. Facing a couple of issues around destruction ordering.(yutak) Hardening a syntax verification for Oilpan. Added a runtime verification to check that GarbageCollected objects are not allocated on stack or as a part of object (This is actually safe but has a risk of resulting in code that is unsafe). Fixed all call sites.Memory reduction:(haraken) Our recent profiling showed that the largest memory consumer in Blink is StringImpls. We also confirmed that compressing large StringImpls will reduce Blink’s memory usage by 5 - 60% (17% in average). Based on the data, I wrote a document and proposed a way to compress StringImpls.(hajimehoshi) Investigating where each of the large StringImpls in the key 10 pages are coming from. Most of them come from JavaScript source code. Another origin is CSSImageValue::m_relativeURL and CSSImageValue::m_absoluteURL. It sometimes happens that the identical data-urls are duplicated in the m_relativeURL and m_absoluteURL. Maybe we want to change the String to AtomicString. (We'll start a separate thread for this.)
(bashi) Still tackling to add the key 10 pages to telemetry. Bots are still failing.(bashi) Experimenting with discarding items listed in the document. At the moment, we're still not successful at finding items that have an impact on Blink's overall memory. It is indeed true that Blink has a lot of uncontrolled caches, but it seems that discarding the caches doesn't have a big impact on Blink's memory reduction. Still experimenting.(bashi, hajimehoshi) To land the per-object-type profiler for PartitionAlloc, we need a way to get a class name for each object allocation. We need to get the class name using a stack trace somehow. We're considering the best way to do that.
Tab serialization:(kouhei) Summarized the current priorities of the tab serialization project in the document.(kouhei) Travelling MTV/MON. Syncing with leads.(tzik) Continuing to optimize DiskCache.- Created a URLRequest sniffer to collect better cache efficiency measurement.- Set up a local server to measure the cache hit rate.- Ported usage-based eviction from Blockfile backend to Simple backend.As a result, the cache hit rate improved from 41.7% to 49.5% on a benchmark.(tzik) Implemented an infrastructure for an eviction algorithm simulator.--Kentaro Hara, Tokyo, Japan
Memory reduction:(haraken) Our recent profiling showed that the largest memory consumer in Blink is StringImpls. We also confirmed that compressing large StringImpls will reduce Blink’s memory usage by 5 - 60% (17% in average). Based on the data, I wrote a document and proposed a way to compress StringImpls.
On Mon, 14 Sep 2015 03:28:27 +0200, Kentaro Hara <har...@chromium.org> wrote:Memory reduction:(haraken) Our recent profiling showed that the largest memory consumer in Blink is StringImpls. We also confirmed that compressing large StringImpls will reduce Blink’s memory usage by 5 - 60% (17% in average). Based on the data, I wrote a document and proposed a way to compress StringImpls.There were some other posts that indicated a large part of this was javascript sourcecode passed from the network code, via blink, into v8. Is that interpretation correct and if so, can something be done in particular to that use case/code path?
With the V8 ignition project happening, there might be room for rethinking this.(An old (pre Opera-10.5) Opera method was to recreate the source code from the AST if needed (very rarely needed). Dropped in the O10.5-O12 engine (Carakan) because of reasons. Whitespace preservation might have been one issue)/Daniel--/* Opera Software, Linköping, Sweden: CEST (UTC+2) */
On Mon, Sep 14, 2015 at 7:08 PM, Daniel Bratell <bra...@opera.com> wrote:On Mon, 14 Sep 2015 03:28:27 +0200, Kentaro Hara <har...@chromium.org> wrote:Memory reduction:(haraken) Our recent profiling showed that the largest memory consumer in Blink is StringImpls. We also confirmed that compressing large StringImpls will reduce Blink’s memory usage by 5 - 60% (17% in average). Based on the data, I wrote a document and proposed a way to compress StringImpls.There were some other posts that indicated a large part of this was javascript sourcecode passed from the network code, via blink, into v8. Is that interpretation correct and if so, can something be done in particular to that use case/code path?That was my original plan, but later I noticed that we can implement the compression at the StringImpl layer without adding a lot of complexity. So I'm currently investigating the approach.
Memory reduction:(haraken, tasak) We're going to give a presentation of "the most 5 impactful projects to reduce Blink's memory" at APAC BrownBag this week. Collecting a lot of data to support the proposal.
(bashi) Experimenting with dropping discardable items in Blink and investigating its memory impact. The status is described in this spreadsheet.
On Mon, 05 Oct 2015 02:57:24 +0200, Kentaro Hara <har...@chromium.org> wrote:Memory reduction:(haraken, tasak) We're going to give a presentation of "the most 5 impactful projects to reduce Blink's memory" at APAC BrownBag this week. Collecting a lot of data to support the proposal.I hope you can make this information available outside Google as well.(bashi) Experimenting with dropping discardable items in Blink and investigating its memory impact. The status is described in this spreadsheet.bashi, could you make it possible to comment in it? I want to write comments! :-)
MemoryPurgeController should work on trunk (bashi@)
For inactive tabs & MemoryPressureListeners
Purge discardable items (worth introducing DiscardableHashMap?)
Add UMAs
memory-infra should have more profiling data about Blink objects (tasak@ and yukishiino@)
Allocation-site profiler and object-type profiler should be integrated to memory-infra (waiting for ruuda@)
StringImpls, Vectors, HashTables, objects in FastMalloc partition (should be integrated with perf-insights?)
Cross-allocator relationships should be explained in memory-infra
Resource => locked discardable memory
ImageResource => Skia image
FontResource => Skia FontFace
LayoutObject => CC
The key 10 pages should be profiled more in details (yukishiino@)
Break down short-running applications again after resolving unclear points
Take the result on Linux (where malloc is explained)
Explain the dark matter
Explain the FastMalloc partition
Exclude unlocked discardable memory
Break down long-running applications (lower priority)
Explain Vectors and HashTables
Large StringImpls should be compressed (hajimehoshi@)
Create a prototype
Collect data
Memory retained by Resources should be explained and purged (hiroshige@)
Step 1: All SharedBuffers and locked discardable memory in the key 10 pages should be explained
Step 2: ResourcePtrs shouldn’t be kept alive longer than needed
HTMLLinkElement::m_styleSheetResource should be promptly cleared when it finishes parsing
ResourceFetcher::m_documentResources should be removed
Step 3: All memory retained by Resources (including various caches) should be visualized in memory-infra
ImageResource, ScriptResource, StylesheetResource, FontResource etc
Skia image, Font cache, Glyph cache etc
Telemetry+perfinsights should provide enough benchmarks and metrics to keep track of our memory-reduction efforts (bashi@)
List up items we want to add to perf-insights before Nat comes to Tokyo
Worth considering:
Introduce DiscardableHashMap (we should ask blink-dev@)
(kouhei) Created WIP patches to accelerate first meaningful paint.
- Don't bother layout until first navigation is done.
- BackgroundHTMLParser: Introduce ParsedChunkQueue to pass ParsedChunks to main thread
(tzik) Investigating performance of tab restoration. Fixing low-hanging fruits:
- UA string cache for faster resource request
- Preallocated StringImpl creation for CoreInitializer speed up
(tzik) We found that it takes 250 ms from a renderer process is created to WebKit::initialize is called. Investigating why it is taking as much as 250 ms. This is a large bottleneck in tab restoration.