Platform Architecture Team snippet

Kentaro Hara

unread,

May 22, 2018, 4:24:10 AM5/22/18

to platform-architecture-dev, blink-dev

Hi

So many exciting work is going on!

Memory:

(tasak) Published an excellent study (internal) of real-world memory UKM on desktops. tasak@ is doing a follow-up survey to understand the correlation between PrivateMemoryFootprint and # of (detached) frames. If there is a strong correlation, we can think about a targeted optimization / intervention for detached frames. Adding more memory UKMs.

(tasak) mariakhomenko@ conducted an excellent memory ablation study (internal) to understand how an X MB memory increase on the browser process affects various performance metrics. tasak@ is setting up a similar experiment for a renderer process.

(mlippautz) Enabled an IsIncrementalMarking() check on ToT to measure the performance impact. Currently IsIncrementalMarking() always returns false because an incremental marking is not enabled. However, having the check itself has a risk of regressing performance because the check needs to be inserted into performance-sensitive places like Member::operator=(). According to perf bots, it didn't regress performance of Speedometer but regressed some micro benchmarks in blink_perf. Investigating.

(keishi, mlippautz, haraken) Investigating how to shut down workers. When worker.close or worker.terminate is called, we have to shut down the worker without running pending tasks. However, this is problematic because developers are sometimes expecting that posted tasks or mojo callbacks always run. If we don't run pending tasks, some Persistent handles may leak, some memory may leak, some state may remain inconsistent etc. That said, it is also dangerous to run pending tasks after forcibly shutting down the worker. Hmm... You can see more discussion here. keishi@ is implementing a graceful worker shutdown, at least from the Oilpan perspective.

(peria, mlippautz) Working on merging TraceWrappers and Trace. Now it's guaranteed that members traced by TraceWrappers are a subset of members traced by Trace. We're getting close :)

(yuzus, ssid) Collecting and analyzing numbers of Finch of the OOM intervention v0. The numbers are getting better than before. ssid@ proposed a plan to improve the OOM detection signal.

(yuzus) Implemented a prototype of the OOM intervention v1 (i.e., drop cross-origin iframes when renderer's memory usage is high). Waiting for a UI review.

(ulan) Implementing prOOMpt. Landed a mechanism to detect a V8 heap limit so that we can show a UI and reload a tab before it crashes.

Scheduling:

(hajimehoshi, altimin, haraken) Reached consensus on a guideline for how to use task types. Added as a comment in the code base.

(hajimehoshi) Replacing Unthrottled tasks with more proper task types.

(hajimehoshi) Breaking down "None" task types in UMA to more specific task types. The goal is to explain >95% task execution by task types in UMA.

(altimin, haraken) Discussed how to throttle / freeze more tasks in background tabs on Android and desktops. Discussed how to untangle the mess of the scheduling architecture.

Bindings:

(yukishiino) Replaced the implementation of NodeFilter with an IDL callback interface. The next step is to replace the implementation of EventListener with an IDL callback interface.

(peria) Made the overload resolution algorithm in the IDL compiler more spec-conformant (CL).

Code architecture:

(dgozman, pfeldman) Added an excellent guideline of "what platform/, core/, modules/, bindings/ and controller/ mean" to README.md.

(haraken) Updated the Onion Soup spreadsheet to the latest status and pinged all owners / teams. Overall we're making great progress :)

(jbroman) Merged WTF::Optional with base::Optional.

(jbroman) Merged WTF::AutoReset with base::AutoReset.

(lfg) Onion-souped cache storage.

(lfg) Onion-souped origin trials.

(adithyas) Onion-souping speech recognition.

(dgozman) Onion-souping manifest.

(dgozman) Onion-souped clipboard.

(Note: This is not a complete list of arch team's achievements. This is just a list of arch team's achievements haraken@ is aware of and other team's achievements closely related to the arch team. Feel free to add more by replying to this thread.)

--

Kentaro Hara, Tokyo, Japan

Kentaro Hara

unread,

May 22, 2018, 9:45:52 AM5/22/18

to platform-architecture-dev, blink-dev

Forgot to add...

On Tue, May 22, 2018 at 5:23 PM Kentaro Hara <har...@chromium.org> wrote:

Hi

So many exciting work is going on!

Memory:

(tasak) Published an excellent study (internal) of real-world memory UKM on desktops. tasak@ is doing a follow-up survey to understand the correlation between PrivateMemoryFootprint and # of (detached) frames. If there is a strong correlation, we can think about a targeted optimization / intervention for detached frames. Adding more memory UKMs.

(tasak) mariakhomenko@ conducted an excellent memory ablation study (internal) to understand how an X MB memory increase on the browser process affects various performance metrics. tasak@ is setting up a similar experiment for a renderer process.

(mlippautz) Enabled an IsIncrementalMarking() check on ToT to measure the performance impact. Currently IsIncrementalMarking() always returns false because an incremental marking is not enabled. However, having the check itself has a risk of regressing performance because the check needs to be inserted into performance-sensitive places like Member::operator=(). According to perf bots, it didn't regress performance of Speedometer but regressed some micro benchmarks in blink_perf. Investigating.

(keishi, mlippautz, haraken) Investigating how to shut down workers. When worker.close or worker.terminate is called, we have to shut down the worker without running pending tasks. However, this is problematic because developers are sometimes expecting that posted tasks or mojo callbacks always run. If we don't run pending tasks, some Persistent handles may leak, some memory may leak, some state may remain inconsistent etc. That said, it is also dangerous to run pending tasks after forcibly shutting down the worker. Hmm... You can see more discussion here. keishi@ is implementing a graceful worker shutdown, at least from the Oilpan perspective.

(peria, mlippautz) Working on merging TraceWrappers and Trace. Now it's guaranteed that members traced by TraceWrappers are a subset of members traced by Trace. We're getting close :)

(yuzus, ssid) Collecting and analyzing numbers of Finch of the OOM intervention v0. The numbers are getting better than before. ssid@ proposed a plan to improve the OOM detection signal.

(yuzus) Implemented a prototype of the OOM intervention v1 (i.e., drop cross-origin iframes when renderer's memory usage is high). Waiting for a UI review.

(ulan) Implementing prOOMpt. Landed a mechanism to detect a V8 heap limit so that we can show a UI and reload a tab before it crashes.

Scheduling:

(hajimehoshi, altimin, haraken) Reached consensus on a guideline for how to use task types. Added as a comment in the code base.

(hajimehoshi) Replacing Unthrottled tasks with more proper task types.

(hajimehoshi) Breaking down "None" task types in UMA to more specific task types. The goal is to explain >95% task execution by task types in UMA.

(altimin, haraken) Discussed how to throttle / freeze more tasks in background tabs on Android and desktops. Discussed how to untangle the mess of the scheduling architecture.

(yutak) Pushed Scheduling Architecture 2.0 forward and removed lots of complexity and unused classes / methods from platform/scheduler/. Removed WebSchedulerImpl. Removed RenderWebSchedulerImpl. Remove unused public APIs from WebThreadScheduler, WebMainThreadScheduler etc.

Bindings:

(yukishiino) Replaced the implementation of NodeFilter with an IDL callback interface. The next step is to replace the implementation of EventListener with an IDL callback interface.

(peria) Made the overload resolution algorithm in the IDL compiler more spec-conformant (CL).

Code architecture:

(dgozman, pfeldman) Added an excellent guideline of "what platform/, core/, modules/, bindings/ and controller/ mean" to README.md.

(haraken) Updated the Onion Soup spreadsheet to the latest status and pinged all owners / teams. Overall we're making great progress :)

(jbroman) Merged WTF::Optional with base::Optional.

(jbroman) Merged WTF::AutoReset with base::AutoReset.

(lfg) Onion-souped cache storage.

(lfg) Onion-souped origin trials.

(adithyas) Onion-souping speech recognition.

(dgozman) Onion-souping manifest.

(dgozman) Onion-souped clipboard.

(Note: This is not a complete list of arch team's achievements. This is just a list of arch team's achievements haraken@ is aware of and other team's achievements closely related to the arch team. Feel free to add more by replying to this thread.)

--
Kentaro Hara, Tokyo, Japan

Ben Kelly

unread,

May 22, 2018, 9:57:35 AM5/22/18

to Kentaro Hara, platform-architecture-dev, blink-dev

On Tue, May 22, 2018 at 4:23 AM, Kentaro Hara <har...@chromium.org> wrote:

(tasak) Published an excellent study (internal) of real-world memory UKM on desktops. tasak@ is doing a follow-up survey to understand the correlation between PrivateMemoryFootprint and # of (detached) frames. If there is a strong correlation, we can think about a targeted optimization / intervention for detached frames. Adding more memory UKMs.

FWIW, we have also noticed excessive memory usage on sites with many detached iframes. For example, we've seen issues with twitter leaking frames via their cross-domain messaging (xdm) library:

https://bugzilla.mozilla.org/show_bug.cgi?id=1277376

If you have ideas for mitigating this situation I think we'd be interested in at least discussing it.

Thanks.

Ben

Kentaro Hara

unread,

May 22, 2018, 10:24:26 AM5/22/18

to Ben Kelly, platform-architecture-dev, blink-dev

Thanks Ben for the info!

One thing we could do is to expose the information to DevTools and let developers be aware of leaking windows. (Though just exposing information might not be enough to incentize many developers to fix the problem.)

Another approach is to implement some intervention to forcibly drop leaking windows. The tricky part is that the fact that the windows are leaking indicates that a user script still has a reference to those windows. If we forcibly drop the leaking windows, it will end up with creating dangling pointers (which will cause security issues).

I welcome your ideas :)

Thanks.

Ben

Ben Kelly

unread,

May 22, 2018, 11:01:32 AM5/22/18

to Kentaro Hara, platform-architecture-dev, blink-dev

On Tue, May 22, 2018 at 10:23 AM, Kentaro Hara <har...@chromium.org> wrote:

On Tue, May 22, 2018 at 10:57 PM Ben Kelly <bke...@mozilla.com> wrote:
On Tue, May 22, 2018 at 4:23 AM, Kentaro Hara <har...@chromium.org> wrote:
(tasak) Published an excellent study (internal) of real-world memory UKM on desktops. tasak@ is doing a follow-up survey to understand the correlation between PrivateMemoryFootprint and # of (detached) frames. If there is a strong correlation, we can think about a targeted optimization / intervention for detached frames. Adding more memory UKMs.

FWIW, we have also noticed excessive memory usage on sites with many detached iframes. For example, we've seen issues with twitter leaking frames via their cross-domain messaging (xdm) library:

https://bugzilla.mozilla.org/show_bug.cgi?id=1277376

If you have ideas for mitigating this situation I think we'd be interested in at least discussing it.

One thing we could do is to expose the information to DevTools and let developers be aware of leaking windows. (Though just exposing information might not be enough to incentize many developers to fix the problem.)

Perhaps it would be possible to include a count of detached iframes in the memory reporting API that's been proposed. In theory sites that are looking at memory would care about leaked iframes. Of course, it would have to be careful not to leak details about iframes further nested in cross-origin frames. It might also not fit in with the overall design of that API.

It would also be nice if we could get common libraries/patterns to avoid these leaks. For example, if the xdm library used MessageChannel instead of direct window.postMessage() then I think that twitter leak would not occur.

Another approach is to implement some intervention to forcibly drop leaking windows. The tricky part is that the fact that the windows are leaking indicates that a user script still has a reference to those windows. If we forcibly drop the leaking windows, it will end up with creating dangling pointers (which will cause security issues).

Maybe triggering a page reload if it hits an excessive threshold would be a bit better. Still, I doubt we would do anything this breaking in firefox.

Sorry I don't have anything better to suggest at the moment.

Ben

Domenic Denicola

unread,

May 22, 2018, 11:11:09 AM5/22/18

to Ben Kelly, Kentaro Hara, platform-architecture-dev, blink-dev

From: platform-arc...@chromium.org <platform-arc...@chromium.org> On Behalf Of Ben Kelly

>> Another approach is to implement some intervention to forcibly drop leaking windows. The tricky part is that the fact that the windows are leaking indicates that a user script still has a reference to those windows. If we forcibly drop the leaking windows, it will end up with creating dangling pointers (which will cause security issues).
>
> Maybe triggering a page reload if it hits an excessive threshold would be a bit better. Still, I doubt we would do anything this breaking in firefox.

I'm not sure if this would be more or less breaking, but we could have some sort of "neutered window" where all its methods and properties throw (similar to a cross-origin window, but more so). And we could transition the JS window object reference into this state, which would allow freeing the C++ Window object.

I think Edge does something rather similar to this, although it does it very aggressively, not as an intervention-of-last-resort. When we talked to them about it in the past they said it was a cause of web compat pain. So... ¯\_(ツ)_/¯

Boris Zbarsky

unread,

May 22, 2018, 4:10:04 PM5/22/18

to Domenic Denicola, Ben Kelly, Kentaro Hara, platform-architecture-dev, blink-dev

On 5/22/18 11:11 AM, Domenic Denicola wrote:
> I'm not sure if this would be more or less breaking, but we could have some sort of "neutered window" where all its methods and properties throw (similar to a cross-origin window, but more so).

For what it's worth, Firefox does this sort of thing for references from
the browser UI to unloaded web pages: once you navigate away from the
page, the references from the browser UI effectively switch up the proxy
handler to one that just throws from all internal methods.

Doing this on the web would in fact have some of the compat pain the
Edge people mention. It would also be difficult to do for references to
non-WindowProxy (and non-Location) objects, I expect.

-Boris

Kentaro Hara

unread,

May 23, 2018, 12:06:07 AM5/23/18

to Boris Zbarsky, Domenic Denicola, Ben Kelly, platform-architecture-dev, blink-dev

Thanks all for the ideas!

The idea of the neutered window definitely sounds interesting.

> It would also be difficult to do for references to non-WindowProxy (and non-Location) objects, I expect.

Yeah, I'm concerned about this as well. Conceptually we can neuter only WindowProxy and leave other objects as is (then I think we can release most of the memory of that iframe), but I'm not yet sure how much work would be needed to make V8 & Blink work with neutered WindowProxies. (e.g., in V8 any V8 wrapper has a strong reference to a creation context -- what would happen if we neuter the creation context?)

Either way, we'll investigate the correlation between leaking iframes and memory usage using real-world UKM. If there's a strong correlation, we can dig into more details.

Reply all

Reply to author

Forward