Re: E10s Planning, Plugins, and Responsiveness Meeting - Friday, Nov 4, 10AM PDT

1683 views
Skip to first unread message

Honza Bambas

unread,
Nov 6, 2011, 4:37:58 PM11/6/11
to dev-pl...@lists.mozilla.org
Are there any meeting notes to look at? I wasn't online on Friday and
missed the meeting.

Thanks.
-hb-

On 11/3/2011 7:01 PM, Damon Sicore wrote:
> All,
>
> There will be a meeting Friday, November 4, at 10:00AM PDT to discuss optimizing our efforts to improve browser responsiveness. Specifically, we will discuss remaining tasks for e10s and if and how we will resource them, hangs and jank caused by plugins, and other efforts to improve responsiveness. Below is the agenda for the meeting. If you are in the BCC list (using BCC list to avoid dev-planning auto-bouncing this as spam), you should plan to attend.
>
> Meeting Details:
>
> # When: Friday, Nov 4, 10am PDT. Blocking out two hours for this discussion.
> # Mozilla Mountain View: Warp Core, 3rd floor
> # 650-903-0800 or 650-215-1282 x92 Conf# 95312 (US/INTL)
> # 1-800-707-2533 (pin 369) Conf# 95312 (US)
> # Vidyo Room: Warp Core
> # Vidyo Guest URL: https://v.mozilla.com/flex.html?roomdirect.html&key=UK1zyrd7Vhym (please mute)
> # irc.mozilla.org #planning for backchannel
>
> # Agenda
>
> 1) Confirm responsiveness is a top goal (no matter the method) for engineering.
>
> 2) Discuss potential jank problems to be solved outside E10S and the measurements we are using to track them
>
> + Andreas& Joel/Ted's tools
> + Taras' I/O tracking
> + Places
> + Incremental GC
> + Cycle Collector
>
> 3) Role of E10S in solving jank issues
>
> 4) Out of process plugin hangs, jank, memory, and lifecycle issues - Identify efforts and staff.
>
> 5) E10s Future prioritization, staffing and drivers
> _______________________________________________
> dev-planning mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-planning
>

Damon Sicore

unread,
Nov 8, 2011, 4:12:07 PM11/8/11
to Damon Sicore, mozilla.dev.planning group
All,

Per the message below, a meeting was held to discuss E10s and responsiveness resourcing and planning. We listed and discussed the major projects that have an E10s aspect:

1) Front-end inversion of control flow efforts, currently staffed by Gavin, Felipe, and Drew.
2) Add-on static analysis to determine the path towards add-on compatibility, dherman.
3) Jetpack
4) Graphics E10s (E10s layers, D3D9, 10, and OS X OpenGL Layers)
5) Dev Tools (dcamp and crew)
6) A11y (dbolter and crew)
7) IndexDB and printing support (Kyle and Smaug)
8) Eng tools special powers.

Today, we have product issues around responsiveness that are not well defined; however, it's clear that responsiveness issues exist as our users are providing consistent feedback indicating hangs and jank (problems where scrolling is paused or the UI is unresponsive) or Flash hangs or jitters during video playback. These are problems we cannot ignore. There are specific efforts that require focus in order to address responsiveness:

A) Out of process plugin optimizations: Memory leaks, restarts, hangs, and kill timer problems.
B) Event loop tuning, Gecko Reflow tuning: We need to instrument and optimize.
C) Incremental GC: JS team is on track to deliver incremental GC this quarter.
D) Cycle collector optimizations.
E) Places optimizations: Taras' team is working on optimizing; however, we need to make hard decisions about when and where to use an SQL database. We also need to consider alternatives to SQLite.
F) Front-end jank.
G) Content stimulated jank.

We discussed the merits of efforts 1-8 above, and it was decided that the front-end IOC work (1) should be suspended in order to focus on places optimizations, item (E). And, the dev tools efforts would also be suspended (5).

In addition, we discussed the formation of a program and/or a team strictly responsible for improving responsiveness. Items A-G above embody a significant amount of work, and to address these issues, we'll need to apply significant resources. David Mandelin suggested that someone should be living and breathing responsiveness. As a result, it was decided that JP, Johnath, Bob Moss, and I would form a program, identifying specific resources to address responsiveness issues as a whole. This type of effort has been extremely effective in the past (CritSmash resulted in shipping with zero reproducible sg:crits, CrashKill dramatically improved our stability and ability to track product stability, etc.) This program will be similar.

Summary: We'll suspend the efforts noted above to refocus on responsiveness issues with a special program to be formed by Johnath, JP, Bob Moss, and myself. As in previous special programs, we'll need everyone's attention and support to identify and fix responsiveness issues. As Firefox is a portal to the web, users demand that it be as responsive as possible. This is a basic fact. This is not an effort any of us can ignore. Just like the critical security bug program (CritSmash), we'll need developers to immediately respond and act on identified responsiveness issues.

All my best,

Damon




On Nov 3, 2011, at 11:01 AM, Damon Sicore wrote:

> All,
>
> There will be a meeting Friday, November 4, at 10:00AM PDT to discuss optimizing our efforts to improve browser responsiveness. Specifically, we will discuss remaining tasks for e10s and if and how we will resource them, hangs and jank caused by plugins, and other efforts to improve responsiveness. Below is the agenda for the meeting. If you are in the BCC list (using BCC list to avoid dev-planning auto-bouncing this as spam), you should plan to attend.
>
> Meeting Details:
>
> # When: Friday, Nov 4, 10am PDT. Blocking out two hours for this discussion.
> # Mozilla Mountain View: Warp Core, 3rd floor
> # 650-903-0800 or 650-215-1282 x92 Conf# 95312 (US/INTL)
> # 1-800-707-2533 (pin 369) Conf# 95312 (US)
> # Vidyo Room: Warp Core
> # Vidyo Guest URL: https://v.mozilla.com/flex.html?roomdirect.html&key=UK1zyrd7Vhym (please mute)
> # irc.mozilla.org #planning for backchannel
>
> # Agenda
>
> 1) Confirm responsiveness is a top goal (no matter the method) for engineering.
>
> 2) Discuss potential jank problems to be solved outside E10S and the measurements we are using to track them
>
> + Andreas & Joel/Ted's tools

Boris Zbarsky

unread,
Nov 8, 2011, 5:06:07 PM11/8/11
to
On 11/8/11 4:12 PM, Damon Sicore wrote:
> G) Content stimulated jank.

This is the big one I run into that seems to be very difficult outside
something like e10s. Do we have any ideas on attacking it?

-Boris

Robert O'Callahan

unread,
Nov 8, 2011, 6:56:03 PM11/8/11
to Boris Zbarsky, dev-pl...@lists.mozilla.org
Yeah, I have no idea how to solve this outside e10s and I haven't heard
anyone else suggest anything either. At least for problems that boil down
to "page runs long-running JS script without yielding".

Under D, "Cycle collector optimizations", while I think we can make some
incremental improvements to make pause times a bit shorter, I have no idea
how we can eliminate nasty CC pauses for large heaps without e10s (or
something similar in risk).

Rob
--
"If we claim to be without sin, we deceive ourselves and the truth is not
in us. If we confess our sins, he is faithful and just and will forgive us
our sins and purify us from all unrighteousness. If we claim we have not
sinned, we make him out to be a liar and his word is not in us." [1 John
1:8-10]

Robert O'Callahan

unread,
Nov 8, 2011, 6:58:29 PM11/8/11
to Boris Zbarsky, dev-pl...@lists.mozilla.org
To be clear --- I think it very likely makes sense to temporarily delay
e10s to focus on short-term wins, especially stuff like Places that
wouldn't be helped by e10s at all. But I also think we have to have e10s to
win the war on jank.

smaug

unread,
Nov 8, 2011, 7:17:48 PM11/8/11
to Damon Sicore, mozilla.dev.planning group
We should, IMO, also *actively* blocklist badly behaving addons.
https://bugzilla.mozilla.org/show_bug.cgi?id=694683
is an example. F-Secure was making Firefox almost unusable on
a fast, new laptop.
>> + Andreas& Joel/Ted's tools + Taras' I/O tracking + Places +

Andrew McCreight

unread,
Nov 8, 2011, 7:24:49 PM11/8/11
to dev-pl...@lists.mozilla.org
----- Original Message -----
> Under D, "Cycle collector optimizations", while I think we can make
> some
> incremental improvements to make pause times a bit shorter, I have no
> idea
> how we can eliminate nasty CC pauses for large heaps without e10s (or
> something similar in risk).

Generally, I think long CC pause times are a symptom of leaks. But it would be nice for the cycle collector to be more bulletproof. As you say, there are various incremental ways to reduce CC times (interruptible CC, cycle collect pure DOM cycles separately from ones involving JS, aging out objects, etc.), but I don't know how much these will help in these cases we've been seeing recently where there may be some kind of leak resulting in multi-second pauses.

There is work on concurrent cycle collection, but this would probably require changing how we ref count cycle collected objects in the browser, which would not be a light undertaking, to say the least...


>
> Rob
> --
> "If we claim to be without sin, we deceive ourselves and the truth is
> not
> in us. If we confess our sins, he is faithful and just and will
> forgive us
> our sins and purify us from all unrighteousness. If we claim we have
> not
> sinned, we make him out to be a liar and his word is not in us." [1
> John
> 1:8-10]

Robert O'Callahan

unread,
Nov 8, 2011, 9:14:55 PM11/8/11
to Andrew McCreight, dev-pl...@lists.mozilla.org
On Wed, Nov 9, 2011 at 1:24 PM, Andrew McCreight <amccr...@mozilla.com>wrote:

> Generally, I think long CC pause times are a symptom of leaks. But it
> would be nice for the cycle collector to be more bulletproof. As you say,
> there are various incremental ways to reduce CC times (interruptible CC,
> cycle collect pure DOM cycles separately from ones involving JS, aging out
> objects, etc.), but I don't know how much these will help in these cases
> we've been seeing recently where there may be some kind of leak resulting
> in multi-second pauses.
>

The goal is get steady 60fps, which means CC pause times of about 10ms max
(maybe that's even a bit generous). I suspect even non-leaky heap sizes can
hit that.

Shawn Wilsher

unread,
Nov 8, 2011, 10:41:28 PM11/8/11
to dev-pl...@lists.mozilla.org
On 11/8/2011 1:12 PM, Damon Sicore wrote:
> E) Places optimizations: Taras' team is working on optimizing; however, we need to make hard decisions about when and where to use an SQL database. We also need to consider alternatives to SQLite.
Places is one of the few places where we need to use a SQL database
unless we plan on dropping a bunch of features. Is this remark about
using SQLite usage in general?

Cheers,

Shawn

Doug Turner

unread,
Nov 8, 2011, 10:56:49 PM11/8/11
to Shawn Wilsher, dev-pl...@lists.mozilla.org

On Nov 8, 2011, at 7:41 PM, Shawn Wilsher wrote:

> Places is one of the few places where we need to use a SQL database unless we plan on dropping a bunch of features. Is this remark about using SQLite usage in general?


OOC, what feature would we have to drop if we moved away from SQL. I spend a few weeks in the code and didn't see anything directly tied to SQL that couldn't be replaced by a different data store, but clearly I didn't mess with the entire places schema…


Dietrich Ayala

unread,
Nov 8, 2011, 11:12:22 PM11/8/11
to Doug Turner, dev-pl...@lists.mozilla.org, Shawn Wilsher
For all ~10 SQLite databases used, we should evaluate whether SQL is
the right tool, Places included. SQLite is certainly not required for
all Places features, and we absolutely should look at alternatives
including hybrid storage solutions, or switching away from SQLite
altogether if its specific performance challenges are insurmountable.

But storage is only one part of the story - switching persistent
storage solutions is not a panacea for broader architectural problems.

There are various scenarios in which chrome code can block the UI for
unreasonable periods of time. For instance, I recently found a
Facebook page that results in the session-restore code blocking the UI
while serializing session history for subframes. Combinations of
user-data and web content could be causing chrome side-effects like
this in the wild on a regular basis, resulting in worse problems than
any of our known storage-related problems.

Telemetry data is starting to help our visibility here, and we should
continue to push on it - implementing broader instrumentation that
tells us exactly what's causing long event-loop lag, etc.

Boris Zbarsky

unread,
Nov 8, 2011, 11:14:57 PM11/8/11
to
On 11/8/11 6:56 PM, Robert O'Callahan wrote:
> At least for problems that boil down
> to "page runs long-running JS script without yielding".

This one is perhaps sorta-solvable. In particular, we could do exactly
what we do for our slow script dialog right now but slightly better: off
an operation callback block all event delivery to the page, spin up a
nested event loop, process events, then return control to the page.
Unless the tab gets closed, in which case we don't return. We already
support stopping a script from the operation callback.

The hard part aboveis "block all event delivery to the page".

A possibly somewhat harder problem is what to do when page JS asks for a
sync layout on a large page, or some other C++ operation that can't
quite handle being interrupted partway (even interruptible reflow can't
handle being interrupted at arbitrarily fine resolution).

-Boris

Doug Turner

unread,
Nov 8, 2011, 11:22:25 PM11/8/11
to Dietrich Ayala, dev-pl...@lists.mozilla.org, Shawn Wilsher
Using SQLite, or more likely its usage, is a major problem for the mozilla platform, and it is a source of huge problems for mobile. See https://bugzilla.mozilla.org/show_bug.cgi?id=696141#c7

How do we start evaluating alternatives for all of these databases? Are you driving that?

Doug

Robert O'Callahan

unread,
Nov 8, 2011, 11:40:48 PM11/8/11
to Boris Zbarsky, dev-pl...@lists.mozilla.org
On Wed, Nov 9, 2011 at 5:14 PM, Boris Zbarsky <bzba...@mit.edu> wrote:

> On 11/8/11 6:56 PM, Robert O'Callahan wrote:
>
>> At least for problems that boil down
>> to "page runs long-running JS script without yielding".
>>
>
> This one is perhaps sorta-solvable. In particular, we could do exactly
> what we do for our slow script dialog right now but slightly better: off an
> operation callback block all event delivery to the page, spin up a nested
> event loop, process events, then return control to the page. Unless the tab
> gets closed, in which case we don't return. We already support stopping a
> script from the operation callback.
>

> The hard part aboveis "block all event delivery to the page".
>

Effectively we'd be trying to emulate multiple threads/processes by using a
single thread with some cooperative context switching and a lot of magic to
avoid reentrancy. That doesn't sound like a workable short-term solution to
me.

Dietrich Ayala

unread,
Nov 8, 2011, 11:40:48 PM11/8/11
to Doug Turner, dev-pl...@lists.mozilla.org, Shawn Wilsher
On Tue, Nov 8, 2011 at 8:22 PM, Doug Turner <do...@mozilla.com> wrote:
> Using SQLite, or more likely its usage, is a major problem for the mozilla platform, and it is a source of huge problems for mobile.

Yes, our non-performant usage of SQLite is most often the culprit of
our storage-related problems. However, Brendan had thoughts on why
SQLite's mutex dependencies were a poor design in another thread, so
maybe he'll say more here on why using it is inherently bad.

> How do we start evaluating alternatives for all of these databases?  Are you driving that?

I'm not driving that, nor do I think it's something one person should
drive. Usage of SQLite spans from network code all the way up to the
glass in content preferences.

Each SQLite consumer should be evaluating whether it is too big a
hammer for their needs, and looking for main-thread-IO blocking going
on due to their usage of it. See bug 699820.

LevelDB support in the Moz platform is happening in bug 679852, so
maybe that's a solution for current SQLite consumers that can be
implemented on a key/value store alone. But I haven't seen much data
on how it would address specific problems that we're having.

Like we talked about in IRC, I think it would be worth having someone
(perf team maybe?) build a matrix of storage options in our platform,
their pros/cons, recommended best-practices, etc. That would provide
something for feature-owners to look at when evaluating the best
storage approach for their needs. Maybe we should do reviews like we
do with Security team :)

Boris Zbarsky

unread,
Nov 8, 2011, 11:43:49 PM11/8/11
to
On 11/8/11 11:40 PM, Robert O'Callahan wrote:
> Effectively we'd be trying to emulate multiple threads/processes by using a
> single thread with some cooperative context switching and a lot of magic to
> avoid reentrancy.

Yep.

Note that we need to have most of said magic to properly do sync XHR, by
the way.....

> That doesn't sound like a workable short-term solution to me.

I guess that depends on our definitions of term durations. I can see
this maybe being doable in 6-9 months if we try. Maybe.

-Boris

Dave Townsend

unread,
Nov 8, 2011, 11:45:27 PM11/8/11
to
On 11/8/2011 8:40 PM, Dietrich Ayala wrote:
> On Tue, Nov 8, 2011 at 8:22 PM, Doug Turner<do...@mozilla.com> wrote:
>> Using SQLite, or more likely its usage, is a major problem for the mozilla platform, and it is a source of huge problems for mobile.
>
> Yes, our non-performant usage of SQLite is most often the culprit of
> our storage-related problems. However, Brendan had thoughts on why
> SQLite's mutex dependencies were a poor design in another thread, so
> maybe he'll say more here on why using it is inherently bad.
>
>> How do we start evaluating alternatives for all of these databases? Are you driving that?
>
> I'm not driving that, nor do I think it's something one person should
> drive. Usage of SQLite spans from network code all the way up to the
> glass in content preferences.
>
> Each SQLite consumer should be evaluating whether it is too big a
> hammer for their needs, and looking for main-thread-IO blocking going
> on due to their usage of it. See bug 699820.

I agree that no one person should be evaluating whether each sqlite user
is doing so for the right reasons however I think it would be
fantastically useful for one or two people to put together a list of
good storage mechanisms along with the performance and memory
characteristics to help those people at least narrow down what options
they should be looking into as alternatives.

Marco Bonardo

unread,
Nov 9, 2011, 6:54:32 AM11/9/11
to
On 09/11/2011 05:22, Doug Turner wrote:
> Using SQLite, or more likely its usage, is a major problem for the mozilla platform, and it is a source of huge problems for mobile. See https://bugzilla.mozilla.org/show_bug.cgi?id=696141#c7

The fact we were unable to use SQLite correctly doesn't make it a bad
choice for everything by itself. The problems are:
- We decided to use SQLite where we should have not. So in any cases
where there is not need to run complicate queries, see for example the
searchService where a json would have been more than enough, or see
DOMStorage where a simple hash database like levelDB would have been
much better. There are more of these.
- Where we used SQLite, we did it mostly wrong. Whoever built Places
initially had no idea what a database is, and we still fight those bad
decisions. There are other examples in the codebase where we may use it
better though. So surely we have issues that are not directly due to the
chosen datastore.
- SQlite has some issues with slop memory, we identified most of them in
bug 699708, and SQLite team is evaluating solutions.
- Some default settings we use are particularly bad, for example see bug
692487 that reduces the cache size.

> How do we start evaluating alternatives for all of these databases? Are you driving that?

bug 699820 has some connection to services using storage on mainthread,
these may be a first starting point to evaluate alternatives. For sure a
lot of consumers don't need a database, any consumer that doesn't need
to query more than 1 field at a time, or that just has to read all
entries, is a bad database consumer.

-m

Marco Bonardo

unread,
Nov 9, 2011, 7:05:19 AM11/9/11
to
On 09/11/2011 04:56, Doug Turner wrote:
> OOC, what feature would we have to drop if we moved away from SQL. I spend a few weeks in the code and didn't see anything directly tied to SQL that couldn't be replaced by a different data store, but clearly I didn't mess with the entire places schema…

I don't think this discussion on "unimpementable features" brings
anything useful, I can implement all features you want with a txt file,
but clearly they may have performance and functionality limits.
In my opinion Places can't go out of a database, unless you want to
fight worse performance issues or provide a really features-limited
solution with good performances. And we already plan to drop some
features and data to come with a smaller and more efficient datastore.
Surely we should be open to alternatives, but so far nobody provided a
decent one.
Personally I think we should start converting all those Storage users
who don't need a SQLite db and make the ones who need it be as slick as
possible, and obviously async. We have 12 databases in the profile
folder, at first look I only see 2 or 3 who deserve that (that said I
don't have deep knowledge of the needs of each single module).
-m

Nicholas Nethercote

unread,
Nov 9, 2011, 7:23:52 AM11/9/11
to Marco Bonardo, dev-pl...@lists.mozilla.org
On Wed, Nov 9, 2011 at 3:54 AM, Marco Bonardo <ma...@supereva.it> wrote:
>
> - SQlite has some issues with slop memory, we identified most of them in bug
> 699708, and SQLite team is evaluating solutions.

I have a patch from Richard Hipp that will hopefully fix this, I plan
to evaluate it tomorrow. CC yourself on the bug if you're interested
in hearing the outcome.

FWIW, I think the awesome bar is, well, awesome. No other browser
does nearly as good a job with suggestions. I'd be really sad to see
it dumbed down.

Nick

Dietrich Ayala

unread,
Nov 9, 2011, 10:38:20 AM11/9/11
to Nicholas Nethercote, Marco Bonardo, dev-pl...@lists.mozilla.org
On Wed, Nov 9, 2011 at 4:23 AM, Nicholas Nethercote
<n.neth...@gmail.com> wrote:
> FWIW, I think the awesome bar is, well, awesome.  No other browser
> does nearly as good a job with suggestions.  I'd be really sad to see
> it dumbed down.

I haven't seen anyone suggest (<- teehee!) that we should dumb down
the awesomebar. What exactly are you referring to?

Mike Connor

unread,
Nov 9, 2011, 3:27:50 PM11/9/11
to Dietrich Ayala, Marco Bonardo, dev-pl...@lists.mozilla.org, Nicholas Nethercote
Fennec Native is going to use the Android datastore for bookmarks and history. Since they have considerably less metadata around visits/types/etc, it will be dumbed down in comparison.

-- Mike

Philipp von Weitershausen

unread,
Nov 9, 2011, 3:43:10 PM11/9/11
to Boris Zbarsky, dev-pl...@lists.mozilla.org
I would like to weigh in on this some more. It seems that without
e10s, we're going to be the only major browser that remains single
threaded (and thus single core) for DOM and content script execution.
That means sloppily written websites will seemingly work fine without
locking up other browsers, whereas Firefox will continue to be janky.
This will look like Firefox's fault, not the website's fault. Judging
from the many "hey when is Firefox finally going to get process
separation?" questions I get from web developers, I think people are
already blaming Firefox instead of their own code.

This point doesn't have a lot of technical "meat", but I think it's
huge in terms of developer mindshare.

Nicholas Nethercote

unread,
Nov 9, 2011, 6:36:40 PM11/9/11
to Dietrich Ayala, Marco Bonardo, dev-pl...@lists.mozilla.org
On Thu, Nov 10, 2011 at 2:38 AM, Dietrich Ayala <auto...@gmail.com> wrote:
>
> I haven't seen anyone suggest (<- teehee!) that we should dumb down
> the awesomebar. What exactly are you referring to?

The Fennec native front-end has already ditched SQLite and so doesn't
use frecency for its awesome bar suggestions. Apparently it uses
"something close" to frecency.

Nick

Mike Hommey

unread,
Nov 9, 2011, 6:41:57 PM11/9/11
to Nicholas Nethercote, Marco Bonardo, dev-pl...@lists.mozilla.org, Dietrich Ayala
Also note that the original Fennec front-end wasn't doing the same thing
as desktop anyways, and was already not as awesome as desktop.

Mike

Gervase Markham

unread,
Nov 10, 2011, 6:07:59 AM11/10/11
to Dietrich Ayala, Nicholas Nethercote, Marco Bonardo
On 09/11/11 15:38, Dietrich Ayala wrote:
> I haven't seen anyone suggest (<- teehee!) that we should dumb down
> the awesomebar. What exactly are you referring to?

The fear is that if we use the platform store, we will no longer be able
to store metadata which is necessary to make the awesomebar as awesome
as it is, and/or will not be able to run the queries we need to run. If
that's not true from a technical perspective, that would be a useful
thing for someone to say.

Gerv

Shawn Wilsher

unread,
Nov 13, 2011, 10:42:00 PM11/13/11
to Doug Turner, dev-pl...@lists.mozilla.org
On 11/8/2011 7:56 PM, Doug Turner wrote:
> OOC, what feature would we have to drop if we moved away from SQL. I spend a few weeks in the code and didn't see anything directly tied to SQL that couldn't be replaced by a different data store, but clearly I didn't mess with the entire places schema…
I suppose I misspoke a bit. We could implement all the features of
places without SQL, but I'm fairly certain (~95%) that we'd have a hard
time making it as performant in a timely manner. We've sunk a lot of
engineering into places (and cookies) SQL performance.

I have yet to see any hard data saying "X would be better than SQL for
this component based on requirements Y", which makes me hesitant to jump
on the bandwagon yet.

Cheers,

Shawn

Shawn Wilsher

unread,
Nov 13, 2011, 11:06:41 PM11/13/11
to Dietrich Ayala, Doug Turner, dev-pl...@lists.mozilla.org
On 11/8/2011 8:40 PM, Dietrich Ayala wrote:
> Yes, our non-performant usage of SQLite is most often the culprit of
> our storage-related problems. However, Brendan had thoughts on why
> SQLite's mutex dependencies were a poor design in another thread, so
> maybe he'll say more here on why using it is inherently bad.
The mutex issues he's brought up are more of a side-effect of how we use
SQLite. If we used the `sqlite3` objects on only one thread, we could
turn off all the expensive mutexes that kill us now. This was part of
my plan for `mozIAsyncStorageConnection` to encourage adoption of the
asynchronous API (it's faster!). Sadly, I've never had time to
implement that.

I should probably make a roadmap for Storage and hope that people with
more time than I can implement it...

Cheers,

Shawn

Nicholas Nethercote

unread,
Nov 14, 2011, 12:20:30 AM11/14/11
to Marco Bonardo, Richard Hipp, dev-pl...@lists.mozilla.org
On Wed, Nov 9, 2011 at 4:23 AM, Nicholas Nethercote
<n.neth...@gmail.com> wrote:
>>
>> - SQlite has some issues with slop memory, we identified most of them in bug
>> 699708, and SQLite team is evaluating solutions.
>
> I have a patch from Richard Hipp that will hopefully fix this, I plan
> to evaluate it tomorrow.  CC yourself on the bug if you're interested
> in hearing the outcome.

Richard gave me another patch that causes SQLite to drop cache
allocations more eagerly. In my limited testing so far it gives
drastic improvements -- e.g. 2--3x reduction in SQLite memory
consumption as reported by about:memory. And it's possible that
people with old places DBs where the page size is 1KB or 4KB may see
even larger improvements.

Richard said this change causes a 5--10% performance drop in his
performance testing, but I don't know how to determine what
performance drop (if any) we'd see in Firefox.

If anyone wants to try it, it's a one line change, I've included it
below. Richard is keen to hear any feedback.

Nick



diff --git a/db/sqlite3/src/sqlite3.c b/db/sqlite3/src/sqlite3.c
--- a/db/sqlite3/src/sqlite3.c
+++ b/db/sqlite3/src/sqlite3.c
@@ -36068,17 +36068,17 @@ static void pcache1Unpin(sqlite3_pcache
pcache1EnterMutex(pGroup);

/* It is an error to call this function if the page is already
** part of the PGroup LRU list.
*/
assert( pPage->pLruPrev==0 && pPage->pLruNext==0 );
assert( pGroup->pLruHead!=pPage && pGroup->pLruTail!=pPage );

- if( reuseUnlikely || pGroup->nCurrentPage>pGroup->nMaxPage ){
+ if( reuseUnlikely || pGroup->nCurrentPage>pGroup->nMaxPage || 1 ){
pcache1RemoveFromHash(pPage);
pcache1FreePage(pPage);
}else{
/* Add the page to the PGroup LRU list. */
if( pGroup->pLruHead ){
pGroup->pLruHead->pLruPrev = pPage;
pPage->pLruNext = pGroup->pLruHead;
pGroup->pLruHead = pPage;

Shawn Wilsher

unread,
Nov 14, 2011, 12:32:47 AM11/14/11
to dev-pl...@lists.mozilla.org, m...@mozilla.com
On 11/13/2011 9:20 PM, Nicholas Nethercote wrote:
> Richard said this change causes a 5--10% performance drop in his
> performance testing, but I don't know how to determine what
> performance drop (if any) we'd see in Firefox.
We don't (that I'm aware of, at least) have any good tests for this. tp
has caught stuff in the past for us, but that's really a poor metric for
evaluating our performance needs of SQLite.

Marco might have some benchmarks for the location bar.

Cheers,

Shawn

David Mandelin

unread,
Nov 14, 2011, 6:56:24 PM11/14/11
to
On 11/13/2011 8:06 PM, Shawn Wilsher wrote:
> I suppose I misspoke a bit. We could implement all the features of places
> without SQL, but I'm fairly certain (~95%) that we'd have a hard time
making
> it as performant in a timely manner.

The hypothesis (not verified by me) is that the current version is not
performant. I think we're aiming for something quite a bit better. I
don't know the details to characterize 'better' precisely. Maybe someone
else does.

> We've sunk a lot of engineering intoplaces (and cookies) SQL performance.

I don't think that's relevant in itself to the decision.

> I have yet to see any hard data saying "X would be better than SQL for this
> component based on requirements Y", which makes me hesitant to jump
on the
> bandwagon yet.

I haven't seen any data at all. But, one question that you might know
the answer to is, can we make it good enough using sqlite?
That sounds very helpful. I think better yet, a set of options and where
each one lands us in terms of final performance.

Dave

David Rajchenbach-Teller

unread,
Nov 15, 2011, 3:53:40 AM11/15/11
to Shawn Wilsher, Doug Turner, dev-pl...@lists.mozilla.org, Dietrich Ayala
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/14/11 5:06 AM, Shawn Wilsher wrote:
> On 11/8/2011 8:40 PM, Dietrich Ayala wrote:
>> Yes, our non-performant usage of SQLite is most often the culprit
>> of our storage-related problems. However, Brendan had thoughts on
>> why SQLite's mutex dependencies were a poor design in another
>> thread, so maybe he'll say more here on why using it is
>> inherently bad.
> The mutex issues he's brought up are more of a side-effect of how
> we use SQLite. If we used the `sqlite3` objects on only one
> thread, we could turn off all the expensive mutexes that kill us
> now. This was part of my plan for `mozIAsyncStorageConnection` to
> encourage adoption of the asynchronous API (it's faster!). Sadly,
> I've never had time to implement that.
>
> I should probably make a roadmap for Storage and hope that people
> with more time than I can implement it...

That would be great.

Cheers,
David
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (Darwin)

iQEcBAEBAgAGBQJOwiiUAAoJED+FkPgNe9W+MlkIAI5/yoiGOiXF6s/z1gcsyz8D
+GM3Vnn1nyXY7RCDKiRIsMPZjFsjOVOkAsRu4B2GUsLevJuORSqEFkvrStguGX7X
NJRJHYGGYZ+A80/xiryWAkyFyVlj1QXeGBIT4SpNMidbwYJ6T6ic+5QBmRTZjRHz
B1HuReDcReqkkTvej9gemf0R+iE8p5HWhf7sRNnIq4lc69d+bV2utlvUeL17DkyA
4f8gmg0UPmD5tNcdWW6yj8IwcU5uHUdyfvrKYcJ0v4a/miQR1TDlbBKA3T4vi8pV
UGtOS5R9uRXH/yE0KJGvmb6X906snXPjToZlKof29BapZiaOaVpjqYLgK5oWABI=
=S/y0
-----END PGP SIGNATURE-----

Marco Bonardo

unread,
Nov 15, 2011, 5:51:21 AM11/15/11
to
On 15/11/2011 00:56, David Mandelin wrote:
> I haven't seen any data at all. But, one question that you might know
> the answer to is, can we make it good enough using sqlite?

Yes, definitely, the existing issues are well-known and with available
solutions, there is nothing that could be solved only by changing the
datastore. I think instead there are lots of things that may go wrong
doing that.
That's more matter for small consumers like search service, where
another datastore may indeed solve unneeded overhead without bad
consequences.
The current situation is not as bad as pointed out, there are clearly
architectural issues to be solved and those are being worked on, it
takes a while due to the size of the module, the backwards compatibility
issues and the low resources available.

Btw, today I'll file a bug about a pure async connection, I was already
thinking of it and Shawn confirmed was among his ideas, so makes sense.

PS: numbers are important, we can't continue discussing what's
performant and what's not without numbers. On mobile I see us moving to
the system history API, but I see no evidence it is faster, nor it is
not doing main-thread IO. We'll do it regardless, since there are other
reasons behind that, but it's dangerous to take decisions based on
hypothesis.

Cheers,
Marco

Shawn Wilsher

unread,
Nov 16, 2011, 1:16:00 AM11/16/11
to dev-pl...@lists.mozilla.org
On 11/14/2011 3:56 PM, David Mandelin wrote:
> The hypothesis (not verified by me) is that the current version is not
> performant. I think we're aiming for something quite a bit better. I
> don't know the details to characterize 'better' precisely. Maybe someone
> else does.
I would love to know what those are. I haven't seen anything other than
people saying it's not performant myself.

> I don't think that's relevant in itself to the decision.
Not in itself, no. My point there was that we shouldn't assume that
something else is going to be remarkably better. We need data before we
should be willing to throw away a bunch of work.

> I haven't seen any data at all. But, one question that you might know
> the answer to is, can we make it good enough using sqlite?
It really depends on what "it" is and how much engineering resources we
want to throw at the problem. Making places use asynchronous I/O has
been, at best, a two man show for the past two or three years. Those
two people weren't even working on it full time; they had other
responsibilities.

> That sounds very helpful. I think better yet, a set of options and where
> each one lands us in terms of final performance.
I'm not sure we have many options, to be honest. Our biggest problem
right now is doing disk I/O on the GUI thread. Fixing that will go a
long ways to improving things (IMO).

Cheers,

Shawn

Taras Glek

unread,
Nov 16, 2011, 1:20:39 PM11/16/11
to Shawn Wilsher
On 11/15/2011 10:16 PM, Shawn Wilsher wrote:
> On 11/14/2011 3:56 PM, David Mandelin wrote:
>> The hypothesis (not verified by me) is that the current version is not
>> performant. I think we're aiming for something quite a bit better. I
>> don't know the details to characterize 'better' precisely. Maybe someone
>> else does.
> I would love to know what those are. I haven't seen anything other than
> people saying it's not performant myself.


I disagree with mak that we have nothing that would improve with a
better backend, but I don't disagree that we should fix the frontend to
the db too.


https://bugzilla.mozilla.org/show_bug.cgi?id=699051 will give us hard
data on exactly what sucks.

In the meantime we track async query completion times via
https://bugzilla.mozilla.org/show_bug.cgi?id=693667 We have queries that
take 0.5, 3, 18 seconds. I will post this data once metrics team makes
that easier.

Experience suggests that only extremely inefficient disk backends (such
as a general purpose database, poorly designed disk cache, etc) can lead
to such horrific query times on a fairly small dataset.

I have never ever seen a high performance project that relies solely on
SQL for queries. Most of the time people deploy some sort of caching
infront of the SQL and eventually that cache becomes the primary
datastore. Full ACID is overkill for application-data storage, hence
hype over nosql.

>
>> I don't think that's relevant in itself to the decision.
> Not in itself, no. My point there was that we shouldn't assume that
> something else is going to be remarkably better. We need data before we
> should be willing to throw away a bunch of work.
>
>> I haven't seen any data at all. But, one question that you might know
>> the answer to is, can we make it good enough using sqlite?
> It really depends on what "it" is and how much engineering resources we
> want to throw at the problem. Making places use asynchronous I/O has
> been, at best, a two man show for the past two or three years. Those two
> people weren't even working on it full time; they had other
> responsibilities.
>
>> That sounds very helpful. I think better yet, a set of options and where
>> each one lands us in terms of final performance.
> I'm not sure we have many options, to be honest. Our biggest problem
> right now is doing disk I/O on the GUI thread. Fixing that will go a
> long ways to improving things (IMO).

There are 2 fundamental problems here and they make each other worse:
1) blocking on io
2) somewhat unbounded amount of io that is caused by our backend choices
and interactions between different IO users.

Taras

Marco Bonardo

unread,
Nov 16, 2011, 2:04:12 PM11/16/11
to
On 16/11/2011 19:20, Taras Glek wrote:
> I disagree with mak that we have nothing that would improve with a
> better backend, but I don't disagree that we should fix the frontend to
> the db too.

I never said that, I said it depends on the use case, we should change
the backend (meaning the datastore) wherever it makes sense. Also
"better backend" is so generic that I can't argue on it, depending on
the direction you want to take, you can always find something better on
filesize, performances, IO impact, memory impact, data safety, privacy...
What I said, is that we can do largely better use of the current
backend, that is far from saying another backend can't do better.
If you intended the opposite, please forgive my sucky english capabilities.

> Experience suggests that only extremely inefficient disk backends (such
> as a general purpose database, poorly designed disk cache, etc) can lead
> to such horrific query times on a fairly small dataset.

You may have the best datastore in the world, but if you use it wrongly
it will perform worse than anything. I'm sure I can make levelDB perform
worse than SQLite by using it wrongly, that doesn't mean it sucks, right?
If you want to argue on the fact we use SQLite to store less than 10
search engines definition that we don't even need to filter, I'm all for
killing those crazy usages, and we (Yoric exactly) are already working
on that.

> Full ACID is overkill for application-data storage, hence
> hype over nosql.

I think there is absolutely aknowledge, many of our consumers will be
fine with levelDB and some even with JSON (the search service case is
the the king).

I honestly don't see any disagreement.

Cheers,
Marco

David Rajchenbach-Teller

unread,
Nov 17, 2011, 5:15:57 AM11/17/11
to Boris Zbarsky, dev-pl...@lists.mozilla.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

To my ears, this discussion sounds a lot like a debate on
continuations/fibers/coroutines vs. multi-threading/multi-process. No
magic bullet here, but this is a problem that many projects have
encountered already.

Imho, we will probably realize eventually that we need both – or a
variation thereof – for different reasons.

Fibers & co. – which we have in Rhino, by the way – provide nice
mechanisms for writing synchronous code that can work nicely in an
asynchronous environment, in particular as one piece of code can wait
for the reply of another thread/process/server/... without blocking
everything. This is good for in-process add-ons, for porting existing
code without major interface breakage, etc. They are also good for
performance when communications between tasks are the highest cost.

System-level concurrency – multiprocess-style – is good as it permits
leveraging additional CPUs, enforces cleaner separation of resources,
and forces one to write async code. This is good for performance when
little is shared between tasks, communications are relatively rare and
small and/or the main constraint is the amount of CPU.

Anyway, should we try and go in the direction of fibers, this is a
path that has been explored and for which we should be able to find
existing experience.

Cheers,
David
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (Darwin)

iQEcBAEBAgAGBQJOxN7dAAoJED+FkPgNe9W+7S4H/13OQ/8L6QJGQTCOPb/Bryn7
1dRPStVxAX9uXCzEHIDJ4Vbw3WedpBoEN/itupmqqYwSqcDfyECNCMKo2/81U3ep
fW5FAnKMqPSIgWa1tM+WA39qE149koPYK+KQOnsr7T11lEmdtgG2h6xdOjVg699j
8RWzZs7aSMqCmJy224+82L3ekPPD2VDy89soxou+MjqbWKmAtRwiUCRULO2qjAlz
hQXi/BkxIZ+R/PkIoWS+IU1yXY50pU0vQ01UrN2FYzbTNflKyxiAX9T5Da8EjG/f
B4iCbZlDwIRBvsowRAy8k2gD26IPchdoUMzc7IMIyPk5DJ1NAnl22YotYjIx7Sk=
=DlWK
-----END PGP SIGNATURE-----

Robert Kaiser

unread,
Nov 17, 2011, 11:43:07 AM11/17/11
to
Taras Glek schrieb:
> I have never ever seen a high performance project that relies solely on
> SQL for queries. Most of the time people deploy some sort of caching
> infront of the SQL and eventually that cache becomes the primary
> datastore. Full ACID is overkill for application-data storage, hence
> hype over nosql.

Well, mork history and HTML+RDF-backed bookmarks were pretty fast in
general and I've heard users complain about perf in SeaMonkey when we
switched to places (which we did later then Firefox). Of course, those
backends were a lot less powerful and mork is an awful on-disk storage
format, but what we did there is to read _all_ data into memory and
strictly use it from there, which of course makes things way more
performant - at the cost of swallowing huge amounts of memory.
The question in the end is where the best tradeoff lies.

Robert Kaiser

--
Note that any statements of mine - no matter how passionate - are never
meant to be offensive but very often as food for thought or possible
arguments that we as a community should think about. And most of the
time, I even appreciate irony and fun! :)

Marco Bonardo

unread,
Nov 17, 2011, 11:55:35 AM11/17/11
to
On 17/11/2011 17:43, Robert Kaiser wrote:
> Well, mork history and HTML+RDF-backed bookmarks were pretty fast in
> general and I've heard users complain about perf in SeaMonkey when we
> switched to places (which we did later then Firefox).

It was also before a ton of perf improvements in Places, iirc when
Seamonkey switched to Places we were still using temp table partitioning
to reduce fsyncs, plus we didn't have async history and favicons. Dark
and ugly times :(

> format, but what we did there is to read _all_ data into memory and

That in some part is also what Chrome does with bookmarks. Honestly I
don't think it's worth it, we have lots of better ways to multiply
bookmarks speed than keeping everything in memory. I added some caching
lately to improve that, while the refactoring proceeds.

-m

Justin Lebar

unread,
Nov 17, 2011, 12:41:42 PM11/17/11
to Taras Glek, dev-pl...@lists.mozilla.org
Taras wrote:
> In the meantime we track async query completion times via https://bugzilla.mozilla.org/show_bug.cgi?id=693667
> We have queries that take 0.5, 3, 18 seconds. I will post this data once metrics team makes that easier.
>
> Experience suggests that only extremely inefficient disk backends (such as a general purpose database,
> poorly designed disk cache, etc) can lead to such horrific query times on a fairly small dataset.

It's likely SQLite is using the wrong query plan, or is otherwise
mis-tuned. SQLite is capable of being smart, but also capable of
being dumb if we don't feed it the right data. This is what [1] is
about (and why I've been arguing so loudly for the need for regression
tests, here and in that bug).

Maybe it's a problem that it's hard to keep SQLite's statistics fresh
enough that the query optimizer doesn't do stupid things. As I've
said, I think it's fair to criticize SQLite for fragility in the face
of so many knobs to tune. But that is not an indictment of the speed
of general-purpose databases.

Taras wrote:
> I have never ever seen a high performance project that relies solely on SQL for queries.
> Most of the time people deploy some sort of caching infront of the SQL and eventually
> that cache becomes the primary datastore. Full ACID is overkill for application-data
> storage, hence hype over nosql.

Firefox's usage requirements are entirely different from those of the
projects which are making a fuss about nosql.

People use caching layers and/or NoSQL to scale databases to hundreds
of thousands of users. As you point out, some sites (e.g. Facebook
and Second Life, last time I checked) put a caching layer in front of
a relational database. They still have to go to the database on a
cache miss, so the database *still has to be fast*.

It's very hard to scale an ACID database horizontally (i.e. over many
machines). It's much easier to scale nosql, because it doesn't have
the same consistency requirements. Horizontal scaling also lets you
keep a lot more data in RAM (because you have more machines, so
therefore more RAM); this is where much of the NoSQL latency
improvement comes from.

But that experience is completely irrelevant to Firefox. The problem
isn't scaling our storage engine to thousands of queries a second, but
rather ensuring that it has consistently fast response times under
minimal load.


I don't mind beating up on SQLite -- it's bitten us more than once.
But the idea here and elsewhere in this thread that a general-purpose
relational database cannot possibly be fast enough for our needs is,
imo, without merit.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=683876

Robert Kaiser

unread,
Nov 18, 2011, 10:49:19 AM11/18/11
to
Marco Bonardo schrieb:
> On 17/11/2011 17:43, Robert Kaiser wrote:
>> Well, mork history and HTML+RDF-backed bookmarks were pretty fast in
>> general and I've heard users complain about perf in SeaMonkey when we
>> switched to places (which we did later then Firefox).
>
> It was also before a ton of perf improvements in Places, iirc when
> Seamonkey switched to Places we were still using temp table partitioning
> to reduce fsyncs, plus we didn't have async history and favicons. Dark
> and ugly times :(

True for sure - I of course didn't imply that we should go to using
mork/HTML/RDF there. ;-)
Still, I think it's good to mention that those now-much-despised
backends were pretty fast (once the had loaded the data) and how they
achieved that. We of course had crappy favicons support (we used the
browser cache which randomly expired them) and much less history in
addition to no frecency-based awesomebar (but judging from mobile world,
some of us are ready to throw that away again now).

>> format, but what we did there is to read _all_ data into memory and
>
> That in some part is also what Chrome does with bookmarks. Honestly I
> don't think it's worth it, we have lots of better ways to multiply
> bookmarks speed than keeping everything in memory. I added some caching
> lately to improve that, while the refactoring proceeds.

I'm sure that loading everything into memory is not a good idea (and
MemShrink people would be pretty unhappy about it). Good to hear we have
known avenues to make things better there in other ways.
Reply all
Reply to author
Forward
0 new messages