optimizing browser handling of Facebook Timeline scrolling

Christopher Blizzard

unread,

Feb 14, 2012, 9:29:16 AM2/14/12

to dev-pl...@lists.mozilla.org

This mail didn't make it to dev-platform for some reason, but now it will!

-------- Original Message --------
Subject: optimizing browser handling of Facebook Timeline scrolling
Date: Sat, 11 Feb 2012 05:04:19 +0000
From: Steven Young <sty...@fb.com>
To: dev-pl...@lists.mozilla.org <dev-pl...@lists.mozilla.org>

I'm on the Timeline team at Facebook, which is going to be the new
format for everyone's profiles real soon now.
https://www.facebook.com/about/timeline We'd like to improve its browser
performance, so I'd appreciate any suggestions for things we should
change to speed it up. In particular, we'd like to make scrolling down
through new content smoother. There are often brief (e.g. 300 ms)
browser lockups, and other times there just seems to be a general
feeling of heaviness.

I'm going to list some of the specific issues we've identified, which we
are debating how best to fix, but I'm also very interested to hear
whatever anyone else thinks are the biggest perf bottlenecks.

A few problems:

(1) HTML / DOM size and CSS

Our HTML is huge. About half of it is coming from the light blue
"like/comment" widgets at the bottom of most stories. Within those
widgets, a major portion of it is always the same. (Some of that is only
needed once the user clicks into the widget, but we don't want another
server round trip to fetch it.) We also have a lot of CSS rules, and
applying all that CSS to all those DOM nodes gets
expensive. Experimentally, removing all like/comment widgets from the
page does give noticeably smoother scrolling, although it doesn't
completely fix the problem.

Related: We've also noticed that if you scroll very far down a
content-rich timeline, and then open and close the inline photo viewer,
this causes a noticeable lag, as it re-renders all existing content on
the page. To fix this, we investigated dynamically removing offscreen
content from the DOM and replacing it with empty divs of the same
height, but we decided it wasn't worth the code complexity and fragility.

(2) Repaints

There are several fixed elements on the page like the blue bar at the
top, the side bar, and our date navigator with the months/years.
Chrome's --show-paint-rects flag showed that under most circumstances
these fixed-position elements forced full-screen repaints instead of
incremental repaints. The rules for what triggers a repaint vary from
browser to browser, but we would ideally like to fix this everywhere.
The cost of full page repaints also sometimes varies dramatically even
comparing Chrome on two fairly newish Mac laptops. (At the time we did
this, we didn't have a good repaint tool for Firefox.)

(3) Javascript for loading content as you scroll down

We dynamically load timeline sections (e.g. a set of stories from 2009)
using our BigPipe system
(https://www.facebook.com/note.php?note_id=389414033919) in an iframe.
In a nutshell, the HTTP response to the iframe is sent with chunked
encoding, a <script> tag at a time. Each script tag contains some code
and and HTML content that is passed up to the parent window, which
requests the CSS and JS associated with that HTML content. Once the CSS
is downloaded, the HTML (timeline story markup) is inserted into an
offscreen DOM element. Then, once the JS is loaded, we do some fairly
complicated work before we actually display the content.

First, we lay out the timeline stories in an offscreen
element (position:absolute; left:-9999px) before inserting them into the
viewable page. We then have JS which checks the heights of all the
stories on in the offscreen element so it can swap stories back and
forth between the two columns, to keep things sorted by time going down
the page. To do this, we query and cache the stories' offsetTop values
all at once where possible. Probably, we could eliminate all this
height-checking and column balancing if we implemented a machine
learning algorithm to predict the height of each unit in advance, on the
server side.

Next, in an attempt to reduce user-percieved browser freezing while
scrolling, our JS does not add new content in to the bottom of the main
column as soon as it comes back from the server. Instead, we queue it up
until the user stops scrolling and add it in then. We use document
fragments where possible to insert elements. Web Inspector's profiler
showed improvements when dynamically inserting many <link
rel=stylesheet> tags in this fashion since we stopped thrashing between
"style recomputation" and JS execution for each stylesheet, and instead
just had one longer style recomputation segment.

We throttle scroll/resize events so they fire every 150 ms

All the while this is happening, we're potentially receiving more
<script> tags in the iframe and doing the same thing for other pieces of
content.

We would love any pointers you guys have.

Thanks,
Steve

Boris Zbarsky

unread,

Feb 14, 2012, 1:09:14 PM2/14/12

to

On 2/14/12 9:29 AM, Christopher Blizzard wrote:
> Our HTML is huge. About half of it is coming from the light blue
> "like/comment" widgets at the bottom of most stories. Within those
> widgets, a major portion of it is always the same. (Some of that is only
> needed once the user clicks into the widget, but we don't want another
> server round trip to fetch it.)

Can that part that's not needed be stored in a single global string,
since it's always the same, and inserted via innerHTML if the user
actually clicks on the widget?

> We also have a lot of CSS rules, and applying all that CSS to all those DOM nodes gets expensive.

What do the CSS rules look like? There are various things you can do to
speed those up that you may or may not be doing already...

> Related: We've also noticed that if you scroll very far down a
> content-rich timeline, and then open and close the inline photo viewer,
> this causes a noticeable lag, as it re-renders all existing content on
> the page. To fix this, we investigated dynamically removing offscreen
> content from the DOM and replacing it with empty divs of the same
> height, but we decided it wasn't worth the code complexity and fragility.

Re-rendering all the content on the page is a bit odd. What actually
gets changed when the inline photo viewer is opened?

> There are several fixed elements on the page like the blue bar at the
> top, the side bar, and our date navigator with the months/years.
> Chrome's --show-paint-rects flag showed that under most circumstances
> these fixed-position elements forced full-screen repaints instead of
> incremental repaints.

This is likely to be very browser-specific.... Are your fixed-position
elements completely opaque? That might help if they're not already.

> We dynamically load timeline sections (e.g. a set of stories from 2009)
> using our BigPipe system
> (https://www.facebook.com/note.php?note_id=389414033919) in an iframe.
> In a nutshell, the HTTP response to the iframe is sent with chunked
> encoding, a <script> tag at a time. Each script tag contains some code
> and and HTML content that is passed up to the parent window, which
> requests the CSS and JS associated with that HTML content. Once the CSS
> is downloaded, the HTML (timeline story markup) is inserted into an
> offscreen DOM element.

So offscreen, but in the page DOM, right? Is there a reason to do this
before the JS is loaded, esp. if the JS then modifies the markup?

Also, I assume that you're dynamically inserting the CSS at this time
too? That can be a somewhat expensive operation, since it requires
discarding and recreating various data structures. How hard a
requirement is the "insert some more globally applicable CSS" bit?

> First, we lay out the timeline stories in an offscreen element
> (position:absolute; left:-9999px) before inserting them into the
> viewable page.

When you lay them out in that offscreen element, you're inserting them

into the viewable page.

> We then have JS which checks the heights of all the
> stories on in the offscreen element so it can swap stories back and
> forth between the two columns

How is the swapping implemented?

-Boris

Boris Zbarsky

unread,

Feb 14, 2012, 10:59:15 PM2/14/12

to sty...@gmail.com

On 2/14/12 9:29 AM, Christopher Blizzard wrote:

> -------- Original Message --------
> Subject: optimizing browser handling of Facebook Timeline scrolling

OK, so I sat down with the page and a profiler for a bit.

First, an observation. As I scroll the behavior seems to be some lag
waiting on network, then content popping in, then content moving around
a bit. That last seems to be due to the images loading. I looked at
the source and styles, and the photo <img> tags don't seem to have a
size specified in the markup, and the only sizing styles applied to them
are "height: 100%, min-height: 100%". The parent div does have a fixed
height, so the height of the images is known ahead of time, but the
width is not; this triggers a relayout when the width becomes known. I
don't know how feasible it is to put the actual image dimensions in the
markup or styles, but it might help a bit.

Past that, I see us using about 18% of our CPU time in painting (images,
backgrounds, borders, text in that order).

I see about 2% spent on getBoundingClientRect from onscroll events.

A few percent in pure JS execution from what look like script onload or
afterscriptevaluated events. Another 3% in those same event handlers
but getting offsetHeights. This 3% is almost all spent on restyling;
see more on that below.

15% running script off stylesheet onload events. The time here is
mostly spent getting scrollLeft and offsetHeight and setting innerHTML.
Both the scrollLeft and the offsetHeight calls largely spend their
time recomputing style data. The vast majority of that is spent on
SelectorMatchesTree calls, which suggests that some slow selectors (see
https://developer.mozilla.org/en/Writing_Efficient_CSS ) are in use.
Looking at the rest of the selector-matching profile, I would guess
we're looking at selectors of the form ".foo *" or something along those
lines... A brief look at the stylesheets suggests that the rightmost
part is probably not "*" but rather something that just matches lots of
elements in the document ("span", "a", "img", that sort of thing). If
you're in a position to put classes directly on the elements you really
want to target, that would probably help with this part. We have some
work in progress to make situations like this faster, but it's not going
to be shipping for at a few months. In addition to that, it looks like
either DOM or style modifications or both are happening between the
scrollLeft and offsetHeight gets. Getting rid of that would help.

9% running scripts when <script> elements load. This looks like pure JS
execution, not DOM calls of any sort.

2% JPEG decoding.

1% setting up new documents, presumably for various subframes... A bunch
of this is the setup for the Window object.

20% running JS off setTimeout. Here I see a bunch of offsetHeight gets,
scrollLeft gets, actual script execution, and a bit of
appendChild/replaceChild as well as a long tail of
getBoundingClientRect, setting src on images, and so forth. The
offsetHeight gets are spending some time on CSS box (nsIFrame)
construction and layout, but at least half is style recomputation. The
scrollLeft gets are all style recomputation... All the analysis for
stylesheet onload events above applies here.

-Boris

Facebook Steve

unread,

Feb 15, 2012, 12:03:03 AM2/15/12

to

> First, an observation. As I scroll the behavior seems to be some lag
> waiting on network, then content popping in, then content moving around
> a bit.

We feel that seeing whitespace on the page (before the new content
pops in) is ok from a user standpoint. We are much more worried about
having the browser's own UI lock up or slow down when the content is
loading. (From your own experience as a user, do you agree?)

> I don't know how feasible it is to put the actual image dimensions in the
> markup or styles, but it might help a bit.

Our images are all cropped to standard sizes, so this seems doable.

> Past that, I see us using about 18% of our CPU time in painting (images,
> backgrounds, borders, text in that order).

I was a bit surprised in the webkit thread to hear that compressing
images would make them paint faster. That seems like an action item.

> 15% running script off stylesheet onload events.

> 20% running JS off setTimeout.

Ok, we will hopefully eliminate most of this by fixing our layout
thrash regression and doing a better job of optimizing our CSS.

> 9% running scripts when <script> elements load. This looks like pure JS
> execution, not DOM calls of any sort.

That would be BigPipe, and various onload event handlers subscribed to
it.

Boris Zbarsky

unread,

Feb 15, 2012, 1:26:19 AM2/15/12

to

On 2/15/12 12:03 AM, Facebook Steve wrote:
>> First, an observation. As I scroll the behavior seems to be some lag
>> waiting on network, then content popping in, then content moving around
>> a bit.
>
> We feel that seeing whitespace on the page (before the new content
> pops in) is ok from a user standpoint.

Sure; I was just describing what I was seeing, in case it's not what
other people testing this are seeing.

> We are much more worried about
> having the browser's own UI lock up or slow down when the content is
> loading. (From your own experience as a user, do you agree?)

Yep.

>> 9% running scripts when<script> elements load. This looks like pure JS
>> execution, not DOM calls of any sort.
>
> That would be BigPipe, and various onload event handlers subscribed to
> it.

I wonder what's going on there... Would it be useful to try digging
into the details of what that JS is doing from our pov?

-Boris

James Ide

unread,

Feb 15, 2012, 5:14:08 AM2/15/12

to sty...@gmail.com

Boris,

Thanks for your investigation and comments.

> Can that part that's not needed be stored in a single global string,
> since it's always the same, and inserted via innerHTML if the user
> actually clicks on the widget?

This is a good suggestion, although we'll likely need to do general client-side rendering since the markup isn't completely the same across all of the widgets. They each have different IDs, text content (e.g. # of likes), etc... We have observed in experiments that removing all of the comments boxes from the posts yields a noticeably smoother page.

> The parent div does have a fixed
> height, so the height of the images is known ahead of time, but the
> width is not; this triggers a relayout when the width becomes known.

This is good to know. The images are wrapped in divs--these divs have fixed dimensions and have overflow hidden to crop the images, so it seems like the images shouldn't affect the layout of the page, nor the painting outside of the div. We can likely send the image dimensions with the img tags.

>
> 15% running script off stylesheet onload events. The time here is
> mostly spent getting scrollLeft and offsetHeight and setting innerHTML.

This is part of our BigPipe system. Once the CSS associated with a subtree has been downloaded (we use feature detection for the load event--thanks!), we insert the HTML for that subtree and potentially start executing the JS associated with that subtree as well. For the timeline posts specifically, the JS inserts their HTML into an offscreen div, queries their offsetHeights to figure out which of the two columns to put them in (that's why they're in the main DOM tree and also why the offscreen div isn't display:none), and then moves them into the main center div, floating them left or right as previously decided.

> How hard a requirement is the "insert some more globally applicable CSS" bit?

Pretty hard. We could split the page into fewer pieces, therefore inserting CSS fewer times. It would be cool to have a stylesheet apply only to a given subtree but AFAIK that's only possible with iframes.

> Both the scrollLeft and the offsetHeight calls largely spend their
> time recomputing style data. The vast majority of that is spent on
> SelectorMatchesTree calls, which suggests that some slow selectors (see
> https://developer.mozilla.org/en/Writing_Efficient_CSS ) are in use.

We've eliminated most, if not all, "*" key selectors. There are still a few generic pseudo-class selectors (e.g. ":hover") that are likely as bad, and a bunch that just match tag names as you suggested. Most of our application-specific selectors use class names, but we'll likely have a few site-wide ones that do match all span elements for instance.

> In addition to that, it looks like
> either DOM or style modifications or both are happening between the
> scrollLeft and offsetHeight gets. Getting rid of that would help.

Definitely. We see this thrashing in Speed Tracer for instance (http://i.imgur.com/Xf5bF.png) and did one pass to cache all of the offsetHeights in data-* attributes, although that seems to have regressed. The caching did show an improvement in Speed Tracer, as all of the CSS recalculation condensed into a single, smaller chunk, but it still wasn't snappy to load in new content.

> 9% running scripts when <script> elements load. This looks like pure JS
> execution, not DOM calls of any sort.

This is the BigPipe system, again. We send down <script> tags one at a time with HTTP chunked encoding, and each one contains a markup subtree plus references to the CSS and JS it needs. The pure JS execution you observed is likely the requests for CSS/JS being made.

> 20% running JS off setTimeout.

We defer a bunch of JS execution to wait until the user has stopped scrolling. The page performed particularly poorly when we were trying to insert new timeline posts while the user was scrolling.

Our understanding of the performance issues after using various profiling tools was that parts of the rendering process are the bottlenecks. We can look into batching up the queries to offsetHeight and related functions. Currently we have little insight into why various reflows/repaints are slow, whether it's beneficial to remove <link rel=stylesheet> elements that have unused CSS, how various CSS rules affect performance (e.g. negative margins, floating; box-shadows were particularly bad) especially when they're on the same page together, how to "silo off" parts of the page so that layout is constrained to a smaller subtree, etc... Browser tools are definitely improving but a lot of the internals are still opaque.

Regards,
James

Robert O'Callahan

unread,

Feb 15, 2012, 6:19:15 AM2/15/12

to James Ide, sty...@gmail.com, dev-pl...@lists.mozilla.org

On Wed, Feb 15, 2012 at 11:14 PM, James Ide <i...@jameside.com> wrote:

> Pretty hard. We could split the page into fewer pieces, therefore
> inserting CSS fewer times. It would be cool to have a stylesheet apply only
> to a given subtree but AFAIK that's only possible with iframes.
>

HTML5 has a <style scoped> feature that does this. We could implement that,
but no browser supports it yet AFAIK.

Rob
--
“You have heard that it was said, ‘Love your neighbor and hate your enemy.’
But I tell you, love your enemies and pray for those who persecute you,
that you may be children of your Father in heaven. ... If you love those
who love you, what reward will you get? Are not even the tax collectors
doing that? And if you greet only your own people, what are you doing more
than others?" [Matthew 5:43-47]

Boris Zbarsky

unread,

Feb 15, 2012, 10:39:23 AM2/15/12

to James Ide, sty...@gmail.com

On 2/15/12 5:14 AM, James Ide wrote:
> Boris,
>
> Thanks for your investigation and comments.
>
>> Can that part that's not needed be stored in a single global string,
>> since it's always the same, and inserted via innerHTML if the user
>> actually clicks on the widget?
>

> This is a good suggestion, although we'll likely need to do general client-side rendering since the markup isn't completely the same across all of the widgets. They each have different IDs, text content (e.g. # of likes), etc....

OK. It had sounded like the markup for these was actually identical...

Something I don't quite understand, in the light of morning: why is
there any cost to styling stuff in these widgets? Are the "hidden until
the user clicks" bits not in a display:none subtree?

> This is good to know. The images are wrapped in divs--these divs have fixed dimensions and have overflow hidden to crop the images, so it seems like the images shouldn't affect the layout of the page, nor the painting outside of the div.

Hmm. Yeah, in that case I'm not sure why I saw the flicker. But note
that even in the scenario you describe there would be a relayout pass
when the image data starts coming in (though the layout of the rest of
the page may not be affected, of course), which can possibly be avoided
if the image dimensions are given up front. Sadly, it looks like we
don't optimize away the relayout in that situation.

>> 15% running script off stylesheet onload events. The time here is
>> mostly spent getting scrollLeft and offsetHeight and setting innerHTML.
>
> This is part of our BigPipe system. Once the CSS associated with a subtree has been downloaded (we use feature detection for the load event--thanks!), we insert the HTML for that subtree and potentially start executing the JS associated with that subtree as well. For the timeline posts specifically, the JS inserts their HTML into an offscreen div, queries their offsetHeights to figure out which of the two columns to put them in (that's why they're in the main DOM tree and also why the offscreen div isn't display:none), and then moves them into the main center div, floating them left or right as previously decided.

Hmm. That's actually a bit of a problem in the sense that at least in
Gecko changes to the "float" property from "none" to "left" or "right"
trigger CSS box recreation. So you're ending up creating all the boxes,
laying them out, measuring them, then destroying them all and recreating
them and having to lay them all out again.

Of course changing from "position: absolute" to "position: static" also
has to recreate the CSS boxes.

I wonder what would happen if you did the offscreen thing by just
floating the boxes left to start with but setting a very large negative
margin-left. Then when you decide which side to float them on, either
change "float" to "right" or leave it as is and remove the large
negative margin. Worth trying, at least. That approach should cause a
lot less work for the browser in Gecko, at least.

>> How hard a requirement is the "insert some more globally applicable CSS" bit?
>
> Pretty hard. We could split the page into fewer pieces, therefore inserting CSS fewer times. It would be cool to have a stylesheet apply only to a given subtree but AFAIK that's only possible with iframes.

OK. At some point scoped stylesheet support will happen, but it's going
to be a at least several months before it does. In the meantime, if
your stylesheets are really meant to be scoped, you may be able to
proactively add the "scoped" attribute to them so when it's supported
things will just work hopefully.

> We've eliminated most, if not all, "*" key selectors. There are still a few generic pseudo-class selectors (e.g. ":hover") that are likely as bad

:hover per se is not quite as bad, since you only have to walk up the
tree if the element is in fact hovered.

> and a bunch that just match tag names as you suggested. Most of our application-specific selectors use class names, but we'll likely have a few site-wide ones that do match all span elements for instance.

OK. To be clear, matching for these sorts of selectors was on the order
of 10% of the overall CPU time. So if there's any way you can put
classes on the relevant spans and whatnot and match on those, that might
help a good bit on this page at least in Gecko.

> This is the BigPipe system, again. We send down<script> tags one at a time with HTTP chunked encoding, and each one contains a markup subtree plus references to the CSS and JS it needs. The pure JS execution you observed is likely the requests for CSS/JS being made.

OK, I just looked at where time is actually spent here. 2/3 of the time
for this pure script execution is just parsing and bytecode-compiling
the script. The rest is almost entirely the interpreter and jit
compiler; very little time is actually spent running jitcode (presumably
because the code really doesn't run long enough to benefit from jit
compilation).

How big are these scripts, exactly? Is there any way to move as much
functionality as possible out of them to a shared script library that's
only loaded once?

> Our understanding of the performance issues after using various profiling tools was that parts of the rendering process are the bottlenecks. We can look into batching up the queries to offsetHeight and related functions. Currently we have little insight into why various reflows/repaints are slow, whether it's beneficial to remove<link rel=stylesheet> elements that have unused CSS, how various CSS rules affect performance (e.g. negative margins, floating; box-shadows were particularly bad) especially when they're on the same page together, how to "silo off" parts of the page so that layout is constrained to a smaller subtree, etc... Browser tools are definitely improving but a lot of the internals are still opaque.

Yes, and changing.

The answers to your questions likely differ in different browsers...
And they can be hard to answer even for someone familiar with the
browser internals. :( We definitely need better tools here, if we can
figure out how to create them.

To answer your specific questions as much as I can:

1) The slowest dynamic changes in Gecko will be ones that require CSS
box reconstruction. There's no hard guideline for what these are, but
generally anything that changes the shape of the box tree (switches
things from being in-flow to out-of-flow or changes which boxes are
containing blocks for other boxes) will trigger a frame reconstruct. So
changes to "display", changes to "position" (though at some point we may
be able to optimize changes between "static" and "relative" better),
changes of float from "none" to other values or back, that sort of
thing. These will lead to Gecko destroying the old CSS boxes, redoing
selector matching on the relevant subtree, creating new CSS boxes,
having to lay them out again, etc.

2) Removing stylesheets that have "unused" CSS is likely beneficial if
the CSS makes use of descendant combinators. CSS that doesn't use those
is much cheaper. Removing a stylesheet that's already in the DOM is
somewhat expensive compared to not putting it there in the first place.

3) The performance of various CSS properties _really_ depends on the
structure of the surrounding markup, at least in Gecko... and going
forward may depend on the state of graphics hardware acceleration on the
platform (say box-shadow).

4) Siloing off parts of the page is very hard in Gecko without using
iframes. In particular, layout happens recursively from something
called "reflow roots", which in practice means roots of subframes or
text controls. We're working on being able to make absolutely
positioned elements reflow roots. Unfortunately, in general in-flow or
floated elements can affect each other, so constraining layout to one of
them in the browser is hard. In general we aim to set dirty bits on the
things that really changed and then redoing layout of as little as we
can to deal with the part that's dirty. But some things (e.g. involving
floats) are very hard to optimize this way while retaining correct
behavior...

-Boris

David Rajchenbach-Teller

unread,

Feb 15, 2012, 10:54:47 AM2/15/12

to Boris Zbarsky, sty...@gmail.com, dev-pl...@lists.mozilla.org

I should also point out that we have ways of visualising which parts of
the screen are repainted. This is generally designed as an internal tool
to let us optimize repaint, but it can be also of use to you:

http://msujaws.wordpress.com/2012/02/01/layout-paint-flashing-in-firefox/

--
David Rajchenbach-Teller, PhD
Performance Team, Mozilla

signature.asc

Robert O'Callahan

unread,

Feb 15, 2012, 2:42:34 PM2/15/12

to Boris Zbarsky, sty...@gmail.com, dev-pl...@lists.mozilla.org

On Thu, Feb 16, 2012 at 4:39 AM, Boris Zbarsky <bzba...@mit.edu> wrote:

> OK. At some point scoped stylesheet support will happen, but it's going
> to be a at least several months before it does. In the meantime, if your
> stylesheets are really meant to be scoped, you may be able to proactively
> add the "scoped" attribute to them so when it's supported things will just
> work hopefully.

That's risky with nothing to test against. We wouldn't want to have trouble
deploying <style scoped> because it breaks Facebook.

Boris Zbarsky

unread,

Feb 15, 2012, 2:46:22 PM2/15/12

to sty...@gmail.com, dev-pl...@lists.mozilla.org

On 2/15/12 2:42 PM, Robert O'Callahan wrote:
> That's risky with nothing to test against. We wouldn't want to have trouble
> deploying<style scoped> because it breaks Facebook.

Yeah, that's fair. Don't do that. :)

-Boris

Boris Zbarsky

unread,

Feb 15, 2012, 2:46:22 PM2/15/12

to sty...@gmail.com, dev-pl...@lists.mozilla.org

On 2/15/12 2:42 PM, Robert O'Callahan wrote:

> That's risky with nothing to test against. We wouldn't want to have trouble
> deploying<style scoped> because it breaks Facebook.

Steven Young

unread,

Feb 15, 2012, 6:49:17 PM2/15/12

to dev-pl...@lists.mozilla.org, Boris Zbarsky

>> why is there any cost to styling stuff in these widgets? Are the
"hidden until the user clicks" bits not in a display:none subtree?

Only half the widget is initially hidden. It's possible all the cost is
coming from that half. At the bottom of this email, I've included the html
for an example like/comment widget.

>> in Gecko changes to the "float" property from "none" to "left" or
"right" trigger CSS box recreation.

That's definitely useful information. :oP

>> 2/3 of the time for this pure script execution is just parsing and
bytecode-compiling the script. The rest is almost entirely the
interpreter and jit compiler; very little time is actually spent running
jitcode

James - How do you think this breaks down into actual JS vs us hiding
the html we're inserting inside the script tag?

>> Removing stylesheets that have "unused" CSS is likely beneficial if
the CSS makes use of descendant combinators. CSS that doesn't use those
is much cheaper.

Our CSS packages across the site are automatically generated. I'm not
sure how much weight they are giving to "unused CSS hurts onpage perf"
in their objective function, and we generally do have a fair amount of
unused css, so this seems worth looking into. Perhaps we could do
something hacky like comment out certain parts of the css initially,
and uncomment them in JS if they turn out to be needed.

// opening tags
<div class="clearfix"><div class="fbTimelineUFI uiCommentContainer"><form
rel="async" class="live_10100150489439753_131325686911214 commentable_item
hidden_add_comment collapsed_comments" method="post"
action="/ajax/ufi/modify.php" data-live="{"seq":0}"
onsubmit="return Event.__inlineSubmit(this,event)">

// hidden inputs
<input type="hidden" name="charset_test" value="€,´,€,´,水,Д,Є"><input
type="hidden" autocomplete="off" name="post_form_id"
value="71a8d8e9faf7fd3d2c7425b51de005f5"><input type="hidden"
name="fb_dtsg" value="AQAuWn70" autocomplete="off"><input type="hidden"
autocomplete="off" name="feedback_params"
value="{"actor":"4715519","target_fbid":"10100150489439753","target_profile_id":"212031","type_id":"100","source":"0","assoc_obj_id":"","source_app_id":"0","extra_story_params":[],"content_timestamp":"1329038900","check_hash":"d47af07ba129b9e4"}"><input
type="hidden" autocomplete="off" name="timeline_ufi" value="1"><input
type="hidden" name="timeline_log_data"
value="{"eventtime":1329349307,"viewerid":212031,"profileownerid":212031,"unitimpressionid":"e7ea1f6c","contentid":-5453119702195953085,"timeline_unit_type":"WallPostUnit","timewindowsize":2,"contextwindowstart":1328083200,"contextwindowend":1330588799,"likedbyviewer":0}">

// the html that's visible on the screen
<div class="fbTimelineFeedbackHeader"><div class="fbTimelineFeedbackActions
clearfix"><span class="fbTimelineFeedbackLikes"></span><span
class="UIActionLinks UIActionLinks_bottom"
data-ft="{"type":"20"}"><button class="like_link
stat_elem as_link" title="Like this item" type="submit" name="like"
data-ft="{"type":22}"><span
class="default_message">Like</span><span
class="saving_message">Unlike</span></button> · <label class="uiLinkButton
comment_link" title="Leave a comment"><input
data-ft="{"type":24}" type="button" value="Comment"
onclick="return fc_click(this);"></label></span></div></div>

// only needed once you start interacting with the widget
<div><ul class="uiList uiUfi focus_target fbUfi"
data-ft="{"type":30}"><li class="hidden_elem uiUfiLike uiListItem
uiListVerticalItemBorder" data-ft="{"type":31}"></li><li
class="translateable_info hidden_elem uiListItem
uiListVerticalItemBorder"><input type="hidden" autocomplete="off"
name="translate_on_load" value=""></li>

// text of existing displayed comments (empty in this case)
<li class="uiUfiComments uiListItem uiListVerticalItemBorder hidden_elem"
data-ft="{"type":32}"><ul class="commentList"></ul></li>

// The box for typing new comments. Only visible if you open the widget.
<li class="uiUfiAddComment clearfix uiUfiSmall ufiItem ufiItem uiListItem
uiListVerticalItemBorder uiUfiAddCommentCollapsed"><div
class="UIImageBlock clearfix mentionsAddComment"><img class="uiProfilePhoto
actorPic UIImageBlock_Image UIImageBlock_ICON_Image uiProfilePhotoMedium
img" src="
https://fbcdn-profile-a.akamaihd.net/hprofile-ak-snc4/369639_212031_457199628_q.jpg"
alt=""><div class="commentArea UIImageBlock_Content
UIImageBlock_ICON_Content"><div class="commentBox"><div
class="uiMentionsInput textBoxContainer" id="u95p3_82"><div
class="highlighter"><div><span
class="highlighterContent"></span></div></div><div class="uiTypeahead
mentionsTypeahead" id="u95p3_83"><div class="wrap"><input type="hidden"
autocomplete="off" class="hiddenInput"><div class="innerWrap"><textarea
class="enter_submit DOMControl_placeholder uiTextareaNoResize
uiTextareaAutogrow textBox mentionsTextarea textInput" title="Write a
comment..." placeholder="Write a comment..." name="add_comment_text"
onfocus="return wait_for_load(this, event, function() {if
(!this._has_control) { new TextAreaControl(this).setAutogrow(true);
this._has_control = true; } return wait_for_load(this, event, function()
{JSCC.get('j4f3c42bac8ff060741177086').init(JSCC.get('j4f3c42bac8ff060741177088'));;JSCC.get('j4f3c42bac8ff060741177088').init(["buildBestAvailableNames","hoistFriends"]);JSCC.get('j4f3c42bac8ff060741177085').init({"max":10},
null, JSCC.get('j4f3c42bac8ff060741177086'));;;});});"
autocomplete="off">Write a comment...</textarea></div></div></div><input
type="hidden" autocomplete="off" class="mentionsHidden"></div></div><label
class="mts commentBtn stat_elem hidden_elem optimistic_submit uiButton
uiButtonConfirm" for="u95p3_84"><input value="Comment"
class="enter_submit_target" name="comment" type="submit"
id="u95p3_84"></label></div></div></li></ul></div></form></div></div>

Boris Zbarsky

unread,

Feb 15, 2012, 9:13:00 PM2/15/12

to Steven Young

On 2/15/12 6:49 PM, Steven Young wrote:
> Only half the widget is initially hidden. It's possible all the cost is
> coming from that half. At the bottom of this email, I've included the html
> for an example like/comment widget.

Yeah, it's hard to tell from that what the situation is, because I don't
know what styles all those classes get from your stylesheets.

Basically, the more "hidden" stuff you can get under an ancestor with
display:none the better, because we optimize away doing any style stuff
for elements with a display:none ancestor. Note that we have to compute
style for that ancestor itself, just to find out it's display:none, so
if you have a bunch of kids that are all display:none and toggled
together putting them under a single parent that you toggle instead is
worth it.

>>> 2/3 of the time for this pure script execution is just parsing and
> bytecode-compiling the script. The rest is almost entirely the
> interpreter and jit compiler; very little time is actually spent running
> jitcode
>
> James - How do you think this breaks down into actual JS vs us hiding
> the html we're inserting inside the script tag?

James wasn't cced on the mail, fwiw. Not sure whether he's reading the
newsgroup or mailing list....

> Our CSS packages across the site are automatically generated. I'm not
> sure how much weight they are giving to "unused CSS hurts onpage perf"
> in their objective function, and we generally do have a fair amount of
> unused css, so this seems worth looking into. Perhaps we could do
> something hacky like comment out certain parts of the css initially,
> and uncomment them in JS if they turn out to be needed.

You mean uncomment before creating the <style> elements with the text?
This seems... fragile.

-Boris

James Ide

unread,

Feb 19, 2012, 9:44:39 PM2/19/12

to sty...@gmail.com

On Wednesday, February 15, 2012 7:39:23 AM UTC-8, Boris Zbarsky wrote:
> I wonder what would happen if you did the offscreen thing by just
> floating the boxes left to start with but setting a very large negative
> margin-left. Then when you decide which side to float them on, either
> change "float" to "right" or leave it as is and remove the large
> negative margin.

We can experiment with this. One point of concern is that we'd adjust the scroll position twice: once after the boxes are inserted, and once again after some boxes are floated right. This is because a user might scroll down the page a bit to load the 2010 section of their timeline, then jump to the 2007 section; as content from 2010 is inserted we want to keep the viewport on 2007. We already do this but it only happens once, when the boxes are inserted with the correct left/right designation.

> 2/3 of the time
> for this pure script execution is just parsing and bytecode-compiling
> the script. The rest is almost entirely the interpreter and jit
> compiler; very little time is actually spent running jitcode

Each script tag contains mostly data (an HTML subtree and references to relevant JS/CSS files) with a tiny bit of code to schedule the CSS/HTML/JS insertion. Most of the script is a long JSON-encoded string of HTML (20-90 KB unzipped) with a few escaped characters. We have an alternative approach in which the markup is sent in an HTML comment node to reduce the payload size and move it out of the script tag, but it tends to have little effect for clients with good network conditions.

- James

Boris Zbarsky

unread,

Feb 20, 2012, 1:11:26 AM2/20/12

to James Ide, sty...@gmail.com

On 2/19/12 9:44 PM, James Ide wrote:
> We can experiment with this. One point of concern is that we'd adjust the scroll position twice: once after the boxes are inserted, and once again after some boxes are floated right. This is because a user might scroll down the page a bit to load the 2010 section of their timeline, then jump to the 2007 section; as content from 2010 is inserted we want to keep the viewport on 2007. We already do this but it only happens once, when the boxes are inserted with the correct left/right designation.

Let me know how this goes. Of the things I see so far, this looks most
likely to be a significant win. Possibly more than getting rid of the
descendant selectors in stylesheets, even.

> Each script tag contains mostly data (an HTML subtree and references to relevant JS/CSS files) with a tiny bit of code to schedule the CSS/HTML/JS insertion. Most of the script is a long JSON-encoded string of HTML (20-90 KB unzipped) with a few escaped characters. We have an alternative approach in which the markup is sent in an HTML comment node to reduce the payload size and move it out of the script tag, but it tends to have little effect for clients with good network conditions.

Is this JSON string then passed to eval() or something? Or is it being
dumped into the script directly?

If this is really JSON and you can use JSON.parse to parse it (which is
optimized for the specific case of JSON in all sorts of ways that don't
apply to the parser that has to parse general JS code), that may help too.

-Boris

James Ide

unread,

Feb 20, 2012, 2:41:12 AM2/20/12

to sty...@gmail.com

On Sunday, February 19, 2012 10:11:26 PM UTC-8, Boris Zbarsky wrote:
> Let me know how this goes. Of the things I see so far, this looks most
> likely to be a significant win. Possibly more than getting rid of the
> descendant selectors in stylesheets, even.

Great, we'll prioritize investigating this.

> Is this JSON string then passed to eval() or something? Or is it being
> dumped into the script directly?

It's a JSON-encoded object that is directly dumped into the script. The
setup resembles:
<script>bigpipe.schedule({"content": "<p>markup</p>", ...});</script>

Our alternative transport looks like:
<code id=x style="display:none"></code>
<script>bigpipe.schedule({"content_id": "x"});</script>

- James

Boris Zbarsky

unread,

Feb 20, 2012, 2:52:37 AM2/20/12

to

On 2/20/12 2:41 AM, James Ide wrote:
> It's a JSON-encoded object that is directly dumped into the script. The
> setup resembles:
> <script>bigpipe.schedule({"content": "<p>markup</p>", ...});</script>

Yeah, I'd definitely suggest at least trying sending a string plus
JSON.parse....

-Boris

Henri Sivonen

unread,

Feb 20, 2012, 5:01:06 AM2/20/12

to dev-pl...@lists.mozilla.org

Firefox 11 supports the 'json' responseType on XMLHttpRequest.
(Firefox 9 and 10 support "moz-json" instead.) So when the page wants
to fetch JSON data, using XHR with the 'json' responseType is the most
proper way to do it.

If the point is to let the server push JSON to the page instead of the
page requesting it, a permanently open Web Socket (also arriving in
Firefox 11) plus JSON.parse is probably the most proper way to do it.

--
Henri Sivonen
hsiv...@iki.fi
http://hsivonen.iki.fi/

i...@jameside.com

unread,

Apr 26, 2012, 7:42:31 PM4/26/12

to dev-pl...@lists.mozilla.org

In the past two months we've changed a few things about how content is loaded on timeline. Most of the DOM operations are now batched together so there's much less thrash between JS, style recalculation, and reflow. Most of our optimizations addressed Boris's analysis of what was going on.

On Tuesday, February 14, 2012 7:59:15 PM UTC-8, Boris Zbarsky wrote:
...

> I don't know how feasible it is to put the actual image dimensions in
> the markup or styles, but it might help a bit.
>

> Past that, I see us using about 18% of our CPU time in painting (images,
> backgrounds, borders, text in that order).

All of our images should have width and height attributes now specified to prevent extra reflows but there's still some lag when loading image-heavy sections. Some of the images are resized or cropped (with overflow: hidden) in the browser. Is there a good way to understand the performance implications of these operations? We'd like to know their costs when the image is initially displayed as well as the effect they have when the user is just scrolling the page. If it matters, the photos are progressive JPEGs.

Awhile back we saw a patch that improved incremental repainting on timeline - specifically it didn't repaint the whole center column when mousing over a post. Has that patch landed or is there an ETA? (Nightly 15 doesn't seem to exhibit this behavior.)

> 15% running script off stylesheet onload events. The time here is
> mostly spent getting scrollLeft and offsetHeight and setting innerHTML.

The work we do here is now more organized across the page. For each content payload added to the page, there's one chunk of HTML parsing, and usually just one style recalculation and layout chunk each (sometimes there are two of each out of necessity since some style manipulations are dependent on previously computed dimensions). The duration of these operations is still longer than we'd like, but it should be much easier to understand profiling results.

> The vast majority of that is spent on
> SelectorMatchesTree calls, which suggests that some slow selectors (see
> https://developer.mozilla.org/en/Writing_Efficient_CSS ) are in use.

Our selector library had some inefficiencies in it since it was sometimes running against tens of thousands of times during the lifetime of the page. We've pruned that to less than half with querySelectorAll, but some benchmarks have shown that qSA has its share of inefficiency. For example, querySelectorAll('.foo.bar').length on timeline posts was 30x slower than getElementsByClassName('foo bar').length (the .length access presumably resolves the lazy live NodeList from getElementsByClassName) so we're hoping to see a 5x+ improvement by moving away from qSA.

There is a lot of dead CSS on the page but most of those selectors are class selectors, which I'd expect not to be too slow.

> 1% setting up new documents, presumably for various subframes... A bunch
> of this is the setup for the Window object.

We decided to lazy-load the embedded Bing map iframes (on click, I believe). On Windows, xperf showed that bursts of CPU activity in IE were correlated with the map initialized, which comprised iframe creation, running a lot of Facebook and Bing JS, and displaying the map tiles which are images resized on the client AFAIK.

Aside from improving our selector library with getElementsByClassName and resizing/cropping images on the server, we think we've fixed the main sources of slowness. Assuming that the remaining jank is due to necessary work, I think we'll rely on the browser for performance improvements. If the reality is that scroll-loading content takes ~100 ms but those tiny hiccups are noticeable during scrolling, asynchronous scrolling and compositing stood out to me since some mobile devices feel smoother than desktop browsers. Are there plans to build https://wiki.mozilla.org/Platform/GFX/OffMainThreadCompositing into desktop Firefox and for scrolling in general?

- James