Our results point to some obvious candidates for improvement:
Many of these will be familiar to people who have been looking at
startup for some time, but my hope is that breaking it down this way
might get them enough attention to get them taken care of.
Want in? Great! Pick off a piece and be a hero! Let me know how I can
* Reflows (10-40%) - We doReflow on browser.xul 21 times after the
initialReflow. We doReflow 36 times all told, and if you naively add
up all the time consumed by functions with "reflow" in their name
(FrameNeedsReflow, ProcessReflowCommands, &c), there are 262 calls,
totalling 325ms (40%).
How do we reduce that? Delay reflow? Implement the browser in html to
take advantage of better performance optimization? Build the browser
DOM out of document and add it all at once? display:none the UI until
we're done screwing with it?
* delayedStartup (8%) - delayedStartup itself takes 65ms (8%), and
that likely under-credits things, since it's likely that
delayedStartup causes various delayed activities to occur (reflows,
new setTimeouts being queued, &c). Who feels like carving some of that
* XRE_main blocks (14%)
• pre-XPCOM startup - 17ms (2.1%) - This one can be broken down with
• nsXREDirProvider::DoStartup - 14ms (1.8%) - Half of this is
• startupNotifier - 22ms (2.8%) - At least 13ms of this is
• post-final-ui-startup churn in XRE_main - 60ms (7.5%) - After final-
ui-startup and before Run we lose a good deal of time.
* Petty Criminals (5-6%) - There are several pieces which take just
long enough to be annoying:
• nsContentDLF::CreateXULDocument - 12ms (1.6%)
• GetCMSOutputProfile - 10ms (1.3%)
• Search service init - 9ms (1.1%)
• Parsing gre-resources/ua.css - 10ms (1.3%)
• Parsing skin/browser.css - 7ms (0.9%)
• urlclassifier triggering psm init - 6-10ms (0.8-1.2%) In other
cases, we've seen this block take 10x this long - plausibly having
something to do with the available entropy through the OS. Worth
cleaning up regardless.
• Our default Google-hosted homepage triggers more script execution
and reflows than necessary. It probably consumes at least a 63ms
(8.2%) block of execution, including about 10ms (1.3%) of script
execution and 47ms (6.1%) of reflowing.
• hiddenWindow.html. The hit from the hidden window varies by
platform, but something so empty should really not have much hit at
all. We should look at what we want this to accomplish, and make sure
it's doing it as minimally as possible.
Director of Firefox Development
Bug 548612 should cut this down by at least 4x
This one surprises me, I was leaving meatings (on l20n land) with the
impression that we don't reflow xul. I'd love to get this one a bit more
clarified, maybe I'm confusing incremental layout with reflows. Also,
we'll be interested to learn which feature of xul this is. Like, if l20n
changes how we're doing content creation for xul documents, how much
does whatever we learn here transports?
> * Reflows (10-40%) - We doReflow on browser.xul 21 times after the
> initialReflow. We doReflow 36 times all told, and if you naively add
> up all the time consumed by functions with "reflow" in their name
> (FrameNeedsReflow, ProcessReflowCommands, &c), there are 262 calls,
> totalling 325ms (40%).
> How do we reduce that? Delay reflow? Implement the browser in html
> to take advantage of better performance optimization? Build the
> browser DOM out of document and add it all at once? display:none the
> UI until we're done screwing with it?
For XUL documents we already delay reflow much more than for HTML; I
think we wait until onload.
A few questions: the 'Our Definition of "Startup"' section suggests
that startup doesn't end until you're able to type "google.com" in
the URL bar and it hits the network. If that's all included, then
I'd expect to see one (very small) reflow for each character typed
in the URL bar; possibly more if the awesomebar starts to display
I'm not that worried about the large number of reflows on
browser.xul given the data that only the first is taking a
significant amount of time. I'd be interested in the time
Where would I find what you're counting as reflow, exactly? My
memory is that the function called "InitialReflow" actually does
much more than reflow, especially for XUL documents (where we're
delaying it until the content tree is fully constructed).
Specifically, are you counting the frame construction and style
resolution that happens inside InitialReflow as reflow?
One of the things we do in our codebase is that there are many
things we initialize lazily. This has both benefits (we get to
completely skip some things that aren't needed) and costs (perhaps
increased cache churn). Some of these are rather coarse (initialize
all of X the first time it's requested) and some are very
fine-grained (such as style data computation).
An approach I've taken to gathering "X%" analyses in the past was to
build an ordered list of functions such that all time inside the
first would be attributed to it, all remaining time inside the
second would be attributed to it, etc. This allowed listing the
leaves in such lazy initialization patterns before the things that
call them. (tools/jprof/split-profile.pl is a tool to do this on a
contains an example (very out-of-date) set of splitting functions.)
This approach can be done on top of timeline instrumentation for
things that are coarsely lazily initialized. However, it doesn't
work for things that are finely lazily initialized at a frequency
approaching the resolution of the timing used for the timeline. For
that, doing the analysis on top of the data from a sampling profiler
(shark, jprof, etc.) is a better approach. Doing it on top of data
from a sampling profiler also has the advantage that it allows you
to check your work by looking at the profile of each piece to figure
out if there's stuff in it that's misattributed.
(But then there's the question of how accurate sampling profilers
are, especially on the workload that happens during startup.)
Yes, that is all included, and we see the reflows inside the keypress
event handlers, as expected.
> I'm not that worried about the large number of reflows on
> browser.xul given the data that only the first is taking a
> significant amount of time. I'd be interested in the time
> distribution, though.
Well, you can see the actual log(s) in bug 560647. The latest
annotated version of the log is:
> Where would I find what you're counting as reflow, exactly? My
> memory is that the function called "InitialReflow" actually does
> much more than reflow, especially for XUL documents (where we're
> delaying it until the content tree is fully constructed).
> Specifically, are you counting the frame construction and style
> resolution that happens inside InitialReflow as reflow?
Here is a rolled-up patch from the probes in my user repo on top of
the mozilla-central tree:
Specifically about reflows, I stuck to PresShell methods for now.
But we can add more probes and remove the old ones if needed. In
fact, feel free to do so and push your changes to that user repo.
That would be a wrong impression.
> I'd love to get this one a bit more clarified, maybe I'm confusing incremental layout with reflows.
In Gecko-speak, "reflow" is any layout whatsoever.
We perform layout of XUL, trust me.
We also perform dynamic updates to XUL layout.
We do not perform XUL layout during parsing of a XUL document; we delay
it until the document is fully loaded. However nothing stops scripts
from modifying the document after that point; see "dynamic updates" above.
> Also, we'll be interested to learn which feature of xul this is.
I'm trying to reconcile this with the other numbers from the wiki.
147ms for initial reflow, 81ms for later reflows.... where does 325ms
come from? What are the actual breakdowns by "function with 'reflow' in
> How do we reduce that?
> Delay reflow?
We already do as much as possible.
> Implement the browser in html to take advantage of better performance optimization?
The parts of HTML that give you flexbox-like behavior are not that
> Build the browser DOM out of document and add it all at once?
We already do that, more or less.
> display:none the UI until we're done screwing with it?
Not an option; see below.
The 147ms for initial reflow, if you just timed how long InitialReflow
takes to run includes all of the frame tree construction, attachment of
all the XBL, execution of all the XBL constructors, and actual layout.
It would be interesting to set probes in InitialReflow to measure these
quite separate steps. If my past profiles are any guide, the "attach
all the XBL and run the XBL constructors" part is a large fraction of
Note that XBL1 is not attached in display:none subtrees.
Note that XBL1 is not exactly highly optimized on the C++ end.
Note that our XBL bindings are not optimized at all as far as I can see.
If I'm right and XBL is a large part of the cost here, then one obvious
way to reduce this time is to have fewer and simpler XBL bindings
attached to stuff in the UI.
> ï¿½ nsContentDLF::CreateXULDocument - 12ms (1.6%)
I wonder why that takes so long. We shouldn't be creating more than a
dozen XUL documents or so during startup, I'd think. Bug filed?
But I could certainly imagine some other initialization (one that
really does need a bit of time) nesting inside of the first call to
Yes, absolutely. That's where bug+data comes in...
Is the format documented somewhere?
And this is how it should be. Most of the XBL binding is one time only
(right?). So in the big picture of a dynamic, evolving browser project
focused on Web leadership, a small, one-time cost is a good trade off
for flexibility. I am looking forward to hearing about timing studies on
Web pages and memory-reduction efforts since many users complain about
these important problems.
Depends on what you mean, but "probably not".
> So in the big picture of a dynamic, evolving browser project
> focused on Web leadership, a small, one-time cost is a good trade off
> for flexibility.
Until the small one-time costs make you start up forever, yes.
> I am looking forward to hearing about timing studies on
> Web pages and memory-reduction efforts since many users complain about
> these important problems.
We've done just that for memory on web pages, if you recall. See
Stuart's blog from the 1.9 timeframe.
We do in fact perform timing studies on web pages all the time (or at
least I do), but the issue there is a much wider variety of stuff going
on, a lot of which we have no control over. We optimize things as we
find them, obviously.
For startup, we have full control over most of the layers of the stack
(where cpu usage is concerned), and only one kind of browser window.
There is some documentation available in this post:
Yes, the current probe is crude, and it only measures the total time
for PresShell::InitialReflow. You could add marks to that function in
order to get a breakdown of how much each part takes (I'd done that
myself, but I'm not sure where useful marks can be added.)
>> • nsContentDLF::CreateXULDocument - 12ms (1.6%)
> I wonder why that takes so long. We shouldn't be creating more than a dozen
> XUL documents or so during startup, I'd think. Bug filed?
We haven't filed any new bugs based on this work yet. I think for the
most part, experts in each module should analyze the data and file
bugs based on the analysis. If you think that this needs a lot of
analysis, and the analysis itself is worth filing a bug, please let me
know and I'll file one.
I'll try to take a look, but in the meantime, the right places are:
1) Start of InitialReflow
2) Right after the first block under the |if (root)| block.
3) Right after the ProcessAttachedQueue call in the |if (root)| block.
4) At the end of the |if (root)| block.
5) End of function (though this should be basically == #4).
Now that I look at this again, we no longer even do reflow under
InitialReflow... we just do it async.
I added these marks:
Feel free to tweak them if I got them wrong. I didn't at the first
and last marks, because those will be covered by the FunctionTimer
object's ctor and dtor.
> Now that I look at this again, we no longer even do reflow under
> InitialReflow... we just do it async.
So, I guess for the most part, the time spent in InitialReflow is
actually spent on creating the frames and binding XBL, right?
> So, I guess for the most part, the time spent in InitialReflow is
> actually spent on creating the frames and binding XBL, right?
Is this pre- or post- new add-ons manager?
On Fri, May 7, 2010 at 6:25 PM, Gijs Kruitbosch
> On 06/05/2010 22:11 PM, Johnathan Nightingale wrote:
>> • nsXREDirProvider::DoStartup - 14ms (1.8%) - Half of this is
> Is this pre- or post- new add-ons manager?
> ~ Gijs
I've heard for ages now that XBL perf is a PITA, but reality is that
we're using a good amount of XBL all over the place in UI as it's a
pretty useful technology after all.
Does the XBL(1) code really suck so much that nobody dares to look into
optimizing it? From the outside (in terms of knowing or working on any
of that code) it looks like everything else has been optimized at least
a fair bit over the years but XBL hasn't, and I wonder if that
impression is wrong or else what is the reason for that and if we can do
something about it.
As a note, just in the last few days, the places team has gotten rid of
some XBL in their code and replaced it with a rather lot of JS (AFAIK)
and one major reason was that this action improves perf...
2) XBL1 behavior is unspecified. The only spec is the code. Any
behavior change is likely to break consumers. Some of the
obvious ways to optimize are behavior changes.
3) A significant part of the problem is that the fundamental
model (turning one element into N for N easily as large as 10,
turning each attribute set into N for N easily as large as 10,
etc) is just slow. 10 times slower than not bloating up the
DOM tree like that, obviously. "Optimizing" away the concept
of XBL anonymous content is not exactly an option.
So yes, it's a very useful tool. It's also a tool that's easy to
misuse; my numbers above are off-the-top-of-my-head examples of actual
bindings we use that I've seen come up in profiles.