Startup Timeline - Can you help?

4 views
Skip to first unread message

Johnathan Nightingale

unread,
May 6, 2010, 4:11:41 PM5/6/10
to dev-pl...@lists.mozilla.org
So, as part of a Q2 engineering goal, we instrumented the piss out of
startup using mozilla-central's new function timer [1] support. The
probes haven't all landed on central, but they're in a user repo[2],
as tracked in bug 560647 [3].

Our results point to some obvious candidates for improvement:

https://wiki.mozilla.org/User:Johnath/startup

Many of these will be familiar to people who have been looking at
startup for some time, but my hope is that breaking it down this way
might get them enough attention to get them taken care of.


Want in? Great! Pick off a piece and be a hero! Let me know how I can
help!


THE CULPRITS:

* Reflows (10-40%) - We doReflow on browser.xul 21 times after the
initialReflow. We doReflow 36 times all told, and if you naively add
up all the time consumed by functions with "reflow" in their name
(FrameNeedsReflow, ProcessReflowCommands, &c), there are 262 calls,
totalling 325ms (40%).

How do we reduce that? Delay reflow? Implement the browser in html to
take advantage of better performance optimization? Build the browser
DOM out of document and add it all at once? display:none the UI until
we're done screwing with it?

* delayedStartup (8%) - delayedStartup itself takes 65ms (8%), and
that likely under-credits things, since it's likely that
delayedStartup causes various delayed activities to occur (reflows,
new setTimeouts being queued, &c). Who feels like carving some of that
up?

* XRE_main blocks (14%)
• pre-XPCOM startup - 17ms (2.1%) - This one can be broken down with
more probes
• nsXREDirProvider::DoStartup - 14ms (1.8%) - Half of this is
nsExtensionManager.js
• startupNotifier - 22ms (2.8%) - At least 13ms of this is
nsSessionStartup.js
• post-final-ui-startup churn in XRE_main - 60ms (7.5%) - After final-
ui-startup and before Run we lose a good deal of time.

* Petty Criminals (5-6%) - There are several pieces which take just
long enough to be annoying:
• nsContentDLF::CreateXULDocument - 12ms (1.6%)
• GetCMSOutputProfile - 10ms (1.3%)
• Search service init - 9ms (1.1%)
• Parsing gre-resources/ua.css - 10ms (1.3%)
• Parsing skin/browser.css - 7ms (0.9%)
• urlclassifier triggering psm init - 6-10ms (0.8-1.2%) In other
cases, we've seen this block take 10x this long - plausibly having
something to do with the available entropy through the OS. Worth
cleaning up regardless.
• Our default Google-hosted homepage triggers more script execution
and reflows than necessary. It probably consumes at least a 63ms
(8.2%) block of execution, including about 10ms (1.3%) of script
execution and 47ms (6.1%) of reflowing.
• hiddenWindow.html. The hit from the hidden window varies by
platform, but something so empty should really not have much hit at
all. We should look at what we want this to accomplish, and make sure
it's doing it as minimally as possible.

Cheers,

Johnathan

[1] https://wiki.mozilla.org/Platform/FunctionTimer_Usage
[2] http://hg.mozilla.org/users/eakhgari_mozilla.com/startup-timeline/
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=560647
---
Johnathan Nightingale
Director of Firefox Development
joh...@mozilla.com


Jeff Muizelaar

unread,
May 6, 2010, 4:19:53 PM5/6/10
to Johnathan Nightingale, dev-pl...@lists.mozilla.org

On 6-May-10, at 4:11 PM, Johnathan Nightingale wrote:
> • GetCMSOutputProfile - 10ms (1.3%)

Bug 548612 should cut this down by at least 4x

-Jeff

Axel Hecht

unread,
May 6, 2010, 5:26:12 PM5/6/10
to
On 06.05.10 22:11, Johnathan Nightingale wrote:
> So, as part of a Q2 engineering goal, we instrumented the piss out of
> startup using mozilla-central's new function timer [1] support. The
> probes haven't all landed on central, but they're in a user repo[2], as
> tracked in bug 560647 [3].
>
> Our results point to some obvious candidates for improvement:
>
> https://wiki.mozilla.org/User:Johnath/startup
>
> Many of these will be familiar to people who have been looking at
> startup for some time, but my hope is that breaking it down this way
> might get them enough attention to get them taken care of.
>
>
> Want in? Great! Pick off a piece and be a hero! Let me know how I can help!
>
>
> THE CULPRITS:
>
> * Reflows (10-40%) - We doReflow on browser.xul 21 times after the
> initialReflow. We doReflow 36 times all told, and if you naively add up
> all the time consumed by functions with "reflow" in their name
> (FrameNeedsReflow, ProcessReflowCommands, &c), there are 262 calls,
> totalling 325ms (40%).
>
> How do we reduce that? Delay reflow? Implement the browser in html to
> take advantage of better performance optimization? Build the browser DOM
> out of document and add it all at once? display:none the UI until we're
> done screwing with it?

This one surprises me, I was leaving meatings (on l20n land) with the
impression that we don't reflow xul. I'd love to get this one a bit more
clarified, maybe I'm confusing incremental layout with reflows. Also,
we'll be interested to learn which feature of xul this is. Like, if l20n
changes how we're doing content creation for xul documents, how much
does whatever we learn here transports?

Axel

L. David Baron

unread,
May 6, 2010, 5:54:53 PM5/6/10
to Johnathan Nightingale, dev-pl...@lists.mozilla.org
On Thursday 2010-05-06 16:11 -0400, Johnathan Nightingale wrote:
> Our results point to some obvious candidates for improvement:
>
> https://wiki.mozilla.org/User:Johnath/startup

> * Reflows (10-40%) - We doReflow on browser.xul 21 times after the


> initialReflow. We doReflow 36 times all told, and if you naively add
> up all the time consumed by functions with "reflow" in their name
> (FrameNeedsReflow, ProcessReflowCommands, &c), there are 262 calls,
> totalling 325ms (40%).
>
> How do we reduce that? Delay reflow? Implement the browser in html
> to take advantage of better performance optimization? Build the
> browser DOM out of document and add it all at once? display:none the
> UI until we're done screwing with it?

For XUL documents we already delay reflow much more than for HTML; I
think we wait until onload.

A few questions: the 'Our Definition of "Startup"' section suggests
that startup doesn't end until you're able to type "google.com" in
the URL bar and it hits the network. If that's all included, then
I'd expect to see one (very small) reflow for each character typed
in the URL bar; possibly more if the awesomebar starts to display
things.

I'm not that worried about the large number of reflows on
browser.xul given the data that only the first is taking a
significant amount of time. I'd be interested in the time
distribution, though.


Where would I find what you're counting as reflow, exactly? My
memory is that the function called "InitialReflow" actually does
much more than reflow, especially for XUL documents (where we're
delaying it until the content tree is fully constructed).
Specifically, are you counting the frame construction and style
resolution that happens inside InitialReflow as reflow?

-David

--
L. David Baron http://dbaron.org/
Mozilla Corporation http://www.mozilla.com/

L. David Baron

unread,
May 6, 2010, 6:15:18 PM5/6/10
to Johnathan Nightingale, dev-pl...@lists.mozilla.org
One other thought about this: I don't think gathering numbers like
"X% of startup is taken up by Y" is one of the strong areas of
timeline analysis, particularly in our codebase. (It has other very
strong areas, like understanding the effects of network and I/O
latency.)

One of the things we do in our codebase is that there are many
things we initialize lazily. This has both benefits (we get to
completely skip some things that aren't needed) and costs (perhaps
increased cache churn). Some of these are rather coarse (initialize
all of X the first time it's requested) and some are very
fine-grained (such as style data computation).


An approach I've taken to gathering "X%" analyses in the past was to
build an ordered list of functions such that all time inside the
first would be attributed to it, all remaining time inside the
second would be attributed to it, etc. This allowed listing the
leaves in such lazy initialization patterns before the things that
call them. (tools/jprof/split-profile.pl is a tool to do this on a
jprof profile;
https://bug144533.bugzilla.mozilla.org/attachment.cgi?id=88207
contains an example (very out-of-date) set of splitting functions.)

This approach can be done on top of timeline instrumentation for
things that are coarsely lazily initialized. However, it doesn't
work for things that are finely lazily initialized at a frequency
approaching the resolution of the timing used for the timeline. For
that, doing the analysis on top of the data from a sampling profiler
(shark, jprof, etc.) is a better approach. Doing it on top of data
from a sampling profiler also has the advantage that it allows you
to check your work by looking at the profile of each piece to figure
out if there's stuff in it that's misattributed.

(But then there's the question of how accurate sampling profilers
are, especially on the workload that happens during startup.)

Ehsan Akhgari

unread,
May 6, 2010, 6:22:03 PM5/6/10
to L. David Baron, dev-pl...@lists.mozilla.org, Johnathan Nightingale
On Thu, May 6, 2010 at 5:54 PM, L. David Baron <dba...@dbaron.org> wrote:
> On Thursday 2010-05-06 16:11 -0400, Johnathan Nightingale wrote:
>> Our results point to some obvious candidates for improvement:
>>
>>     https://wiki.mozilla.org/User:Johnath/startup
>
>> * Reflows (10-40%)  - We doReflow on browser.xul 21 times after the
>> initialReflow. We doReflow 36 times all told, and if you naively add
>> up all the time consumed by functions with "reflow" in their name
>> (FrameNeedsReflow, ProcessReflowCommands, &c), there are 262 calls,
>> totalling 325ms (40%).
>>
>> How do we reduce that? Delay reflow? Implement the browser in html
>> to take advantage of better performance optimization? Build the
>> browser DOM out of document and add it all at once? display:none the
>> UI until we're done screwing with it?
>
> For XUL documents we already delay reflow much more than for HTML; I
> think we wait until onload.
>
> A few questions:  the 'Our Definition of "Startup"' section suggests
> that startup doesn't end until you're able to type "google.com" in
> the URL bar and it hits the network.  If that's all included, then
> I'd expect to see one (very small) reflow for each character typed
> in the URL bar; possibly more if the awesomebar starts to display
> things.

Yes, that is all included, and we see the reflows inside the keypress
event handlers, as expected.

> I'm not that worried about the large number of reflows on
> browser.xul given the data that only the first is taking a
> significant amount of time.  I'd be interested in the time
> distribution, though.

Well, you can see the actual log(s) in bug 560647. The latest
annotated version of the log is:

https://bugzilla.mozilla.org/attachment.cgi?id=443703

> Where would I find what you're counting as reflow, exactly?  My
> memory is that the function called "InitialReflow" actually does
> much more than reflow, especially for XUL documents (where we're
> delaying it until the content tree is fully constructed).
> Specifically, are you counting the frame construction and style
> resolution that happens inside InitialReflow as reflow?

Here is a rolled-up patch from the probes in my user repo on top of
the mozilla-central tree:

https://bugzilla.mozilla.org/attachment.cgi?id=443809

Specifically about reflows, I stuck to PresShell methods for now.
But we can add more probes and remove the old ones if needed. In
fact, feel free to do so and push your changes to that user repo.

--
Ehsan
<http://ehsanakhgari.org/>

Ehsan Akhgari

unread,
May 6, 2010, 6:29:13 PM5/6/10
to L. David Baron, dev-pl...@lists.mozilla.org, Johnathan Nightingale
One thing to note here is that we were trying to keep our timeline as
coarse-grained as possible. Also, we needed to associate pieces of
code with conceptual things that happen in our code base (such as
parsing ua.css). I don't think that the current timeline based
approach is suitable to "profiling" different pieces of the code. We
can use real profiling tools for that. The major goal here was to
identify a set of actionable areas for us to start investigating.


--
Ehsan
<http://ehsanakhgari.org/>

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>

Boris Zbarsky

unread,
May 7, 2010, 12:07:36 AM5/7/10
to
On 5/6/10 5:26 PM, Axel Hecht wrote:
> This one surprises me, I was leaving meatings (on l20n land) with the
> impression that we don't reflow xul.

That would be a wrong impression.

> I'd love to get this one a bit more clarified, maybe I'm confusing incremental layout with reflows.

In Gecko-speak, "reflow" is any layout whatsoever.

We perform layout of XUL, trust me.

We also perform dynamic updates to XUL layout.

We do not perform XUL layout during parsing of a XUL document; we delay
it until the document is fully loaded. However nothing stops scripts
from modifying the document after that point; see "dynamic updates" above.

> Also, we'll be interested to learn which feature of xul this is.

Which "this"?

-Boris

Boris Zbarsky

unread,
May 7, 2010, 12:17:03 AM5/7/10
to
On 5/6/10 4:11 PM, Johnathan Nightingale wrote:
> * Reflows (10-40%) - We doReflow on browser.xul 21 times after the
> initialReflow. We doReflow 36 times all told, and if you naively add up
> all the time consumed by functions with "reflow" in their name
> (FrameNeedsReflow, ProcessReflowCommands, &c), there are 262 calls,
> totalling 325ms (40%).

I'm trying to reconcile this with the other numbers from the wiki.
147ms for initial reflow, 81ms for later reflows.... where does 325ms
come from? What are the actual breakdowns by "function with 'reflow' in
the name"?

> How do we reduce that?

That depends.

> Delay reflow?

We already do as much as possible.

> Implement the browser in html to take advantage of better performance optimization?

The parts of HTML that give you flexbox-like behavior are not that
optimized.

> Build the browser DOM out of document and add it all at once?

We already do that, more or less.

> display:none the UI until we're done screwing with it?

Not an option; see below.

The 147ms for initial reflow, if you just timed how long InitialReflow
takes to run includes all of the frame tree construction, attachment of
all the XBL, execution of all the XBL constructors, and actual layout.
It would be interesting to set probes in InitialReflow to measure these
quite separate steps. If my past profiles are any guide, the "attach
all the XBL and run the XBL constructors" part is a large fraction of
the total.

Note that XBL1 is not attached in display:none subtrees.

Note that XBL1 is not exactly highly optimized on the C++ end.

Note that our XBL bindings are not optimized at all as far as I can see.

If I'm right and XBL is a large part of the cost here, then one obvious
way to reduce this time is to have fewer and simpler XBL bindings
attached to stuff in the UI.

> � nsContentDLF::CreateXULDocument - 12ms (1.6%)

I wonder why that takes so long. We shouldn't be creating more than a
dozen XUL documents or so during startup, I'd think. Bug filed?

-Boris

L. David Baron

unread,
May 7, 2010, 12:41:04 AM5/7/10
to Boris Zbarsky, dev-pl...@lists.mozilla.org
On Friday 2010-05-07 00:17 -0400, Boris Zbarsky wrote:
> >• nsContentDLF::CreateXULDocument - 12ms (1.6%)

>
> I wonder why that takes so long. We shouldn't be creating more than
> a dozen XUL documents or so during startup, I'd think. Bug filed?

But I could certainly imagine some other initialization (one that
really does need a bit of time) nesting inside of the first call to
it.

Boris Zbarsky

unread,
May 7, 2010, 12:44:46 AM5/7/10
to
On 5/7/10 12:41 AM, L. David Baron wrote:
> But I could certainly imagine some other initialization (one that
> really does need a bit of time) nesting inside of the first call to
> it.

Yes, absolutely. That's where bug+data comes in...

-Boris

Boris Zbarsky

unread,
May 7, 2010, 12:46:28 AM5/7/10
to
On 5/6/10 6:22 PM, Ehsan Akhgari wrote:
> Well, you can see the actual log(s) in bug 560647. The latest
> annotated version of the log is:
>
> https://bugzilla.mozilla.org/attachment.cgi?id=443703

Is the format documented somewhere?

-Boris

johnjbarton

unread,
May 7, 2010, 12:46:48 AM5/7/10
to
On 5/6/2010 9:17 PM, Boris Zbarsky wrote:
> On 5/6/10 4:11 PM, Johnathan Nightingale wrote:
...

> Note that XBL1 is not exactly highly optimized on the C++ end.
>
> Note that our XBL bindings are not optimized at all as far as I can see.

And this is how it should be. Most of the XBL binding is one time only
(right?). So in the big picture of a dynamic, evolving browser project
focused on Web leadership, a small, one-time cost is a good trade off
for flexibility. I am looking forward to hearing about timing studies on
Web pages and memory-reduction efforts since many users complain about
these important problems.

jjb

Boris Zbarsky

unread,
May 7, 2010, 12:53:20 AM5/7/10
to
On 5/7/10 12:46 AM, johnjbarton wrote:
> And this is how it should be. Most of the XBL binding is one time only
> (right?).

Depends on what you mean, but "probably not".

> So in the big picture of a dynamic, evolving browser project
> focused on Web leadership, a small, one-time cost is a good trade off
> for flexibility.

Until the small one-time costs make you start up forever, yes.

> I am looking forward to hearing about timing studies on
> Web pages and memory-reduction efforts since many users complain about
> these important problems.

We've done just that for memory on web pages, if you recall. See
Stuart's blog from the 1.9 timeframe.

We do in fact perform timing studies on web pages all the time (or at
least I do), but the issue there is a much wider variety of stuff going
on, a lot of which we have no control over. We optimize things as we
find them, obviously.

For startup, we have full control over most of the layers of the stack
(where cpu usage is concerned), and only one kind of browser window.
Modulo extensions.

-Boris

Ehsan Akhgari

unread,
May 7, 2010, 2:42:48 AM5/7/10
to Boris Zbarsky, dev-pl...@lists.mozilla.org

Ehsan Akhgari

unread,
May 7, 2010, 2:46:15 AM5/7/10
to Boris Zbarsky, dev-pl...@lists.mozilla.org
On Fri, May 7, 2010 at 12:17 AM, Boris Zbarsky <bzba...@mit.edu> wrote:
> The 147ms for initial reflow, if you just timed how long InitialReflow takes
> to run includes all of the frame tree construction, attachment of all the
> XBL, execution of all the XBL constructors, and actual layout. It would be
> interesting to set probes in InitialReflow to measure these quite separate
> steps.  If my past profiles are any guide, the "attach all the XBL and run
> the XBL constructors" part is a large fraction of the total.

Yes, the current probe is crude, and it only measures the total time
for PresShell::InitialReflow. You could add marks to that function in
order to get a breakdown of how much each part takes (I'd done that
myself, but I'm not sure where useful marks can be added.)

>> • nsContentDLF::CreateXULDocument - 12ms (1.6%)


>
> I wonder why that takes so long.  We shouldn't be creating more than a dozen
> XUL documents or so during startup, I'd think.  Bug filed?

We haven't filed any new bugs based on this work yet. I think for the
most part, experts in each module should analyze the data and file
bugs based on the analysis. If you think that this needs a lot of
analysis, and the analysis itself is worth filing a bug, please let me
know and I'll file one.

--
Ehsan
<http://ehsanakhgari.org/>

Boris Zbarsky

unread,
May 7, 2010, 9:39:50 AM5/7/10
to
On 5/7/10 2:46 AM, Ehsan Akhgari wrote:
> Yes, the current probe is crude, and it only measures the total time
> for PresShell::InitialReflow. You could add marks to that function in
> order to get a breakdown of how much each part takes (I'd done that
> myself, but I'm not sure where useful marks can be added.)

I'll try to take a look, but in the meantime, the right places are:

1) Start of InitialReflow
2) Right after the first block under the |if (root)| block.
3) Right after the ProcessAttachedQueue call in the |if (root)| block.
4) At the end of the |if (root)| block.
5) End of function (though this should be basically == #4).

Now that I look at this again, we no longer even do reflow under
InitialReflow... we just do it async.

-Boris

Ehsan Akhgari

unread,
May 7, 2010, 12:05:25 PM5/7/10
to Boris Zbarsky, dev-pl...@lists.mozilla.org
On Fri, May 7, 2010 at 9:39 AM, Boris Zbarsky <bzba...@mit.edu> wrote:
> On 5/7/10 2:46 AM, Ehsan Akhgari wrote:
>>
>> Yes, the current probe is crude, and it only measures the total time
>> for PresShell::InitialReflow.  You could add marks to that function in
>> order to get a breakdown of how much each part takes (I'd done that
>> myself, but I'm not sure where useful marks can be added.)
>
> I'll try to take a look, but in the meantime, the right places are:
>
> 1)  Start of InitialReflow
> 2)  Right after the first block under the |if (root)| block.
> 3)  Right after the ProcessAttachedQueue call in the |if (root)| block.
> 4)  At the end of the |if (root)| block.
> 5)  End of function (though this should be basically == #4).

I added these marks:

http://hg.mozilla.org/users/eakhgari_mozilla.com/startup-timeline/rev/e38c1b8a4f2a

Feel free to tweak them if I got them wrong. I didn't at the first
and last marks, because those will be covered by the FunctionTimer
object's ctor and dtor.

> Now that I look at this again, we no longer even do reflow under
> InitialReflow... we just do it async.

So, I guess for the most part, the time spent in InitialReflow is
actually spent on creating the frames and binding XBL, right?

--
Ehsan
<http://ehsanakhgari.org/>

Boris Zbarsky

unread,
May 7, 2010, 1:35:20 PM5/7/10
to
On 5/7/10 12:05 PM, Ehsan Akhgari wrote:
> I added these marks:
>
> http://hg.mozilla.org/users/eakhgari_mozilla.com/startup-timeline/rev/e38c1b8a4f2a
>
> Feel free to tweak them if I got them wrong.

http://hg.mozilla.org/users/eakhgari_mozilla.com/startup-timeline/rev/a2e33223aa78

> So, I guess for the most part, the time spent in InitialReflow is
> actually spent on creating the frames and binding XBL, right?

Yep.

-Boris

Gijs Kruitbosch

unread,
May 7, 2010, 6:25:33 PM5/7/10
to
On 06/05/2010 22:11 PM, Johnathan Nightingale wrote:
> <snip>
> � nsXREDirProvider::DoStartup - 14ms (1.8%) - Half of this is
> nsExtensionManager.js

Is this pre- or post- new add-ons manager?

~ Gijs

Kyle Huey

unread,
May 7, 2010, 6:33:01 PM5/7/10
to Gijs Kruitbosch, dev-pl...@lists.mozilla.org
Pre. nsExtensionManager.js goes away in the rewrite.

On Fri, May 7, 2010 at 6:25 PM, Gijs Kruitbosch
<gijskru...@gmail.com> wrote:
> On 06/05/2010 22:11 PM, Johnathan Nightingale wrote:
>>
>> <snip>

>> • nsXREDirProvider::DoStartup - 14ms (1.8%) - Half of this is


>> nsExtensionManager.js
>
> Is this pre- or post- new add-ons manager?
>
> ~ Gijs

Robert Kaiser

unread,
May 9, 2010, 7:31:00 AM5/9/10
to
Boris Zbarsky schrieb:

> If my past profiles are any guide, the "attach all
> the XBL and run the XBL constructors" part is a large fraction of the
> total.

I've heard for ages now that XBL perf is a PITA, but reality is that
we're using a good amount of XBL all over the place in UI as it's a
pretty useful technology after all.

Does the XBL(1) code really suck so much that nobody dares to look into
optimizing it? From the outside (in terms of knowing or working on any
of that code) it looks like everything else has been optimized at least
a fair bit over the years but XBL hasn't, and I wonder if that
impression is wrong or else what is the reason for that and if we can do
something about it.

As a note, just in the last few days, the places team has gotten rid of
some XBL in their code and replaced it with a rather lot of JS (AFAIK)
and one major reason was that this action improves perf...

Robert Kaiser

Boris Zbarsky

unread,
May 9, 2010, 8:27:56 AM5/9/10
to
On 5/9/10 7:31 AM, Robert Kaiser wrote:
> Does the XBL(1) code really suck so much that nobody dares to look into
> optimizing it?

1) Yes.
2) XBL1 behavior is unspecified. The only spec is the code. Any
behavior change is likely to break consumers. Some of the
obvious ways to optimize are behavior changes.
3) A significant part of the problem is that the fundamental
model (turning one element into N for N easily as large as 10,
turning each attribute set into N for N easily as large as 10,
etc) is just slow. 10 times slower than not bloating up the
DOM tree like that, obviously. "Optimizing" away the concept
of XBL anonymous content is not exactly an option.

So yes, it's a very useful tool. It's also a tool that's easy to
misuse; my numbers above are off-the-top-of-my-head examples of actual
bindings we use that I've seen come up in profiles.

-Boris

Reply all
Reply to author
Forward
0 new messages