Re: E10s Planning, Plugins, and Responsiveness Meeting

Honza Bambas

unread,

Nov 6, 2011, 4:37:58 PM11/6/11

to dev-pl...@lists.mozilla.org

Are there any meeting notes to look at? I wasn't online on Friday and
missed the meeting.

Thanks.
-hb-

On 11/3/2011 7:01 PM, Damon Sicore wrote:
> All,
>
> There will be a meeting Friday, November 4, at 10:00AM PDT to discuss optimizing our efforts to improve browser responsiveness. Specifically, we will discuss remaining tasks for e10s and if and how we will resource them, hangs and jank caused by plugins, and other efforts to improve responsiveness. Below is the agenda for the meeting. If you are in the BCC list (using BCC list to avoid dev-planning auto-bouncing this as spam), you should plan to attend.
>
> Meeting Details:
>
> # When: Friday, Nov 4, 10am PDT. Blocking out two hours for this discussion.
> # Mozilla Mountain View: Warp Core, 3rd floor
> # 650-903-0800 or 650-215-1282 x92 Conf# 95312 (US/INTL)
> # 1-800-707-2533 (pin 369) Conf# 95312 (US)
> # Vidyo Room: Warp Core
> # Vidyo Guest URL: https://v.mozilla.com/flex.html?roomdirect.html&key=UK1zyrd7Vhym (please mute)
> # irc.mozilla.org #planning for backchannel
>
> # Agenda
>
> 1) Confirm responsiveness is a top goal (no matter the method) for engineering.
>
> 2) Discuss potential jank problems to be solved outside E10S and the measurements we are using to track them
>
> + Andreas& Joel/Ted's tools
> + Taras' I/O tracking
> + Places
> + Incremental GC
> + Cycle Collector
>
> 3) Role of E10S in solving jank issues
>
> 4) Out of process plugin hangs, jank, memory, and lifecycle issues - Identify efforts and staff.
>
> 5) E10s Future prioritization, staffing and drivers
> _______________________________________________
> dev-planning mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-planning
>

Damon Sicore

unread,

Nov 8, 2011, 4:12:07 PM11/8/11

to Damon Sicore, mozilla.dev.planning group

All,

Per the message below, a meeting was held to discuss E10s and responsiveness resourcing and planning. We listed and discussed the major projects that have an E10s aspect:

1) Front-end inversion of control flow efforts, currently staffed by Gavin, Felipe, and Drew.
2) Add-on static analysis to determine the path towards add-on compatibility, dherman.
3) Jetpack
4) Graphics E10s (E10s layers, D3D9, 10, and OS X OpenGL Layers)
5) Dev Tools (dcamp and crew)
6) A11y (dbolter and crew)
7) IndexDB and printing support (Kyle and Smaug)
8) Eng tools special powers.

Today, we have product issues around responsiveness that are not well defined; however, it's clear that responsiveness issues exist as our users are providing consistent feedback indicating hangs and jank (problems where scrolling is paused or the UI is unresponsive) or Flash hangs or jitters during video playback. These are problems we cannot ignore. There are specific efforts that require focus in order to address responsiveness:

A) Out of process plugin optimizations: Memory leaks, restarts, hangs, and kill timer problems.
B) Event loop tuning, Gecko Reflow tuning: We need to instrument and optimize.
C) Incremental GC: JS team is on track to deliver incremental GC this quarter.
D) Cycle collector optimizations.
E) Places optimizations: Taras' team is working on optimizing; however, we need to make hard decisions about when and where to use an SQL database. We also need to consider alternatives to SQLite.
F) Front-end jank.
G) Content stimulated jank.

We discussed the merits of efforts 1-8 above, and it was decided that the front-end IOC work (1) should be suspended in order to focus on places optimizations, item (E). And, the dev tools efforts would also be suspended (5).

In addition, we discussed the formation of a program and/or a team strictly responsible for improving responsiveness. Items A-G above embody a significant amount of work, and to address these issues, we'll need to apply significant resources. David Mandelin suggested that someone should be living and breathing responsiveness. As a result, it was decided that JP, Johnath, Bob Moss, and I would form a program, identifying specific resources to address responsiveness issues as a whole. This type of effort has been extremely effective in the past (CritSmash resulted in shipping with zero reproducible sg:crits, CrashKill dramatically improved our stability and ability to track product stability, etc.) This program will be similar.

Summary: We'll suspend the efforts noted above to refocus on responsiveness issues with a special program to be formed by Johnath, JP, Bob Moss, and myself. As in previous special programs, we'll need everyone's attention and support to identify and fix responsiveness issues. As Firefox is a portal to the web, users demand that it be as responsive as possible. This is a basic fact. This is not an effort any of us can ignore. Just like the critical security bug program (CritSmash), we'll need developers to immediately respond and act on identified responsiveness issues.

All my best,

Damon

On Nov 3, 2011, at 11:01 AM, Damon Sicore wrote:

> All,
>
> There will be a meeting Friday, November 4, at 10:00AM PDT to discuss optimizing our efforts to improve browser responsiveness. Specifically, we will discuss remaining tasks for e10s and if and how we will resource them, hangs and jank caused by plugins, and other efforts to improve responsiveness. Below is the agenda for the meeting. If you are in the BCC list (using BCC list to avoid dev-planning auto-bouncing this as spam), you should plan to attend.
>
> Meeting Details:
>
> # When: Friday, Nov 4, 10am PDT. Blocking out two hours for this discussion.
> # Mozilla Mountain View: Warp Core, 3rd floor
> # 650-903-0800 or 650-215-1282 x92 Conf# 95312 (US/INTL)
> # 1-800-707-2533 (pin 369) Conf# 95312 (US)
> # Vidyo Room: Warp Core
> # Vidyo Guest URL: https://v.mozilla.com/flex.html?roomdirect.html&key=UK1zyrd7Vhym (please mute)
> # irc.mozilla.org #planning for backchannel
>
> # Agenda
>
> 1) Confirm responsiveness is a top goal (no matter the method) for engineering.
>
> 2) Discuss potential jank problems to be solved outside E10S and the measurements we are using to track them
>

> + Andreas & Joel/Ted's tools

Boris Zbarsky

unread,

Nov 8, 2011, 5:06:07 PM11/8/11

to

On 11/8/11 4:12 PM, Damon Sicore wrote:
> G) Content stimulated jank.

This is the big one I run into that seems to be very difficult outside
something like e10s. Do we have any ideas on attacking it?

-Boris

Robert O'Callahan

unread,

Nov 8, 2011, 6:56:03 PM11/8/11

to Boris Zbarsky, dev-pl...@lists.mozilla.org

Yeah, I have no idea how to solve this outside e10s and I haven't heard
anyone else suggest anything either. At least for problems that boil down
to "page runs long-running JS script without yielding".

Under D, "Cycle collector optimizations", while I think we can make some
incremental improvements to make pause times a bit shorter, I have no idea
how we can eliminate nasty CC pauses for large heaps without e10s (or
something similar in risk).

Rob
--
"If we claim to be without sin, we deceive ourselves and the truth is not
in us. If we confess our sins, he is faithful and just and will forgive us
our sins and purify us from all unrighteousness. If we claim we have not
sinned, we make him out to be a liar and his word is not in us." [1 John
1:8-10]

Robert O'Callahan

unread,

Nov 8, 2011, 6:58:29 PM11/8/11

to Boris Zbarsky, dev-pl...@lists.mozilla.org

To be clear --- I think it very likely makes sense to temporarily delay
e10s to focus on short-term wins, especially stuff like Places that
wouldn't be helped by e10s at all. But I also think we have to have e10s to
win the war on jank.

smaug

unread,

Nov 8, 2011, 7:17:48 PM11/8/11

to Damon Sicore, mozilla.dev.planning group

We should, IMO, also *actively* blocklist badly behaving addons.
https://bugzilla.mozilla.org/show_bug.cgi?id=694683
is an example. F-Secure was making Firefox almost unusable on
a fast, new laptop.

>> + Andreas& Joel/Ted's tools + Taras' I/O tracking + Places +

Andrew McCreight

unread,

Nov 8, 2011, 7:24:49 PM11/8/11

to dev-pl...@lists.mozilla.org

----- Original Message -----
> Under D, "Cycle collector optimizations", while I think we can make
> some
> incremental improvements to make pause times a bit shorter, I have no
> idea
> how we can eliminate nasty CC pauses for large heaps without e10s (or
> something similar in risk).

Generally, I think long CC pause times are a symptom of leaks. But it would be nice for the cycle collector to be more bulletproof. As you say, there are various incremental ways to reduce CC times (interruptible CC, cycle collect pure DOM cycles separately from ones involving JS, aging out objects, etc.), but I don't know how much these will help in these cases we've been seeing recently where there may be some kind of leak resulting in multi-second pauses.

There is work on concurrent cycle collection, but this would probably require changing how we ref count cycle collected objects in the browser, which would not be a light undertaking, to say the least...

>
> Rob
> --
> "If we claim to be without sin, we deceive ourselves and the truth is
> not
> in us. If we confess our sins, he is faithful and just and will
> forgive us
> our sins and purify us from all unrighteousness. If we claim we have
> not
> sinned, we make him out to be a liar and his word is not in us." [1
> John
> 1:8-10]

Robert O'Callahan

unread,

Nov 8, 2011, 9:14:55 PM11/8/11

to Andrew McCreight, dev-pl...@lists.mozilla.org

On Wed, Nov 9, 2011 at 1:24 PM, Andrew McCreight <amccr...@mozilla.com>wrote:

> Generally, I think long CC pause times are a symptom of leaks. But it
> would be nice for the cycle collector to be more bulletproof. As you say,
> there are various incremental ways to reduce CC times (interruptible CC,
> cycle collect pure DOM cycles separately from ones involving JS, aging out
> objects, etc.), but I don't know how much these will help in these cases
> we've been seeing recently where there may be some kind of leak resulting
> in multi-second pauses.
>

The goal is get steady 60fps, which means CC pause times of about 10ms max
(maybe that's even a bit generous). I suspect even non-leaky heap sizes can
hit that.

Shawn Wilsher

unread,

Nov 8, 2011, 10:41:28 PM11/8/11

to dev-pl...@lists.mozilla.org

On 11/8/2011 1:12 PM, Damon Sicore wrote:
> E) Places optimizations: Taras' team is working on optimizing; however, we need to make hard decisions about when and where to use an SQL database. We also need to consider alternatives to SQLite.

Places is one of the few places where we need to use a SQL database
unless we plan on dropping a bunch of features. Is this remark about
using SQLite usage in general?

Cheers,

Shawn

Doug Turner

unread,

Nov 8, 2011, 10:56:49 PM11/8/11

to Shawn Wilsher, dev-pl...@lists.mozilla.org

On Nov 8, 2011, at 7:41 PM, Shawn Wilsher wrote:

> Places is one of the few places where we need to use a SQL database unless we plan on dropping a bunch of features. Is this remark about using SQLite usage in general?

OOC, what feature would we have to drop if we moved away from SQL. I spend a few weeks in the code and didn't see anything directly tied to SQL that couldn't be replaced by a different data store, but clearly I didn't mess with the entire places schema…

Dietrich Ayala

unread,

Nov 8, 2011, 11:12:22 PM11/8/11

to Doug Turner, dev-pl...@lists.mozilla.org, Shawn Wilsher

For all ~10 SQLite databases used, we should evaluate whether SQL is
the right tool, Places included. SQLite is certainly not required for
all Places features, and we absolutely should look at alternatives
including hybrid storage solutions, or switching away from SQLite
altogether if its specific performance challenges are insurmountable.

But storage is only one part of the story - switching persistent
storage solutions is not a panacea for broader architectural problems.

There are various scenarios in which chrome code can block the UI for
unreasonable periods of time. For instance, I recently found a
Facebook page that results in the session-restore code blocking the UI
while serializing session history for subframes. Combinations of
user-data and web content could be causing chrome side-effects like
this in the wild on a regular basis, resulting in worse problems than
any of our known storage-related problems.

Telemetry data is starting to help our visibility here, and we should
continue to push on it - implementing broader instrumentation that
tells us exactly what's causing long event-loop lag, etc.