Main thread attribution for top million sites

Tony Gentilcore

unread,

Jul 16, 2013, 4:33:45 PM7/16/13

to blink-dev

Ravi Mistry and I ran Telemetry's loading_measurement over Alexa's top
million web sites. It aggregates the self time of each DevTools
Timeline event. I thought the breakdown of where the main thread
spends its time during page load would be interesting to the group.

EvaluateScript: 148574s 25.5%
Layout: 110849s 19.0%
FunctionCall: 75777s 13.0%
Program (Uncategorized): 59879s 10.3%
Paint: 43693s 7.5%
ParseHTML: 42455s 7.3%
RecalculateStyles: 34246s 5.9%
ResourceReceivedData: 31424s 5.4%
GCEvent: 14465s 2.5%
DecodeImage: 12360s 2.1%
ResizeImage: 2689s 0.5%
CompositeLayers: 2437s 0.4%
TimerFire: 1362s 0.2%
EventDispatch: 1325s 0.2%
PaintSetup: 655s 0.1%
ResourceReceiveResponse: 403s 0.1%
XHRReadyStateChange: 254s 0.0%
ScrollLayer: 51s 0.0%
FireAnimationFrame: 26s 0.0%
XHRLoad: 1s 0.0%

This doc lists the top 100 slowest sites for each category. It might
serve as an interesting bug list for optimizing each subsystem.
https://docs.google.com/a/chromium.org/document/d/1hDDUUNE5OUV8eCjtOj7Ow6EZ2DSBCTjQirnA3Rp5pOg/edit#

For example, check out the epic layout time on http://oilevent.com

Please let me know if there are other ways you'd like to see this data sliced.

-Tony

Ojan Vafai

unread,

Jul 16, 2013, 4:50:19 PM7/16/13

to Tony Gentilcore, blink-dev

Very interesting data. What's the difference between EvaluateScript and FunctionCall? Also, if a script forces a layout, does that get counted as Layout, EvaluateScript, FunctionCall or some combination thereof?

Yoav Weiss

unread,

Jul 16, 2013, 4:53:46 PM7/16/13

to Tony Gentilcore, blink-dev

Extremely interesting data!

A couple of questions:

* Where would CSS parsing time fall? Is it under "Layout"?

* What was the viewport used for these tests? I have a hunch the "ImageResize" part varies according to the viewport size, and might be larger for small viewports.

Tony Gentilcore

unread,

Jul 16, 2013, 5:03:07 PM7/16/13

to Yoav Weiss, blink-dev, Pavel Feldman

> What's the difference between EvaluateScript and FunctionCall?

According to https://developers.google.com/chrome-developer-tools/docs/timeline#timeline_event_reference

Evaluate Script: A script was evaluated.
Function Call: A top-level JavaScirpt function call was made (only
appears when browser enters JavaScript engine).

That doesn't really enlighten me much. Maybe Pavel or someone from
DevTools could comment?

> if a script forces a layout, does that get counted as Layout, EvaluateScript, FunctionCall or some combination thereof?

We're calculating the self time of each event. So if a script forces
layout, it gets counted as Layout and subtracted out from the parent
EvaluateScript or FunctionCall.

> * Where would CSS parsing time fall? Is it under "Layout"?

I don't know offhand. Anyone else know?

> * What was the viewport used for these tests?

1280x1024 (in xvfb)

Ojan Vafai

unread,

Jul 16, 2013, 5:14:35 PM7/16/13

to Tony Gentilcore, Yoav Weiss, blink-dev, Pavel Feldman

On Tue, Jul 16, 2013 at 2:03 PM, Tony Gentilcore <to...@chromium.org> wrote:

> * Where would CSS parsing time fall? Is it under "Layout"?

I don't know offhand. Anyone else know?

It should fall under RecalculateStyles.

Ojan Vafai

unread,

Jul 16, 2013, 5:15:36 PM7/16/13

to Tony Gentilcore, Yoav Weiss, blink-dev, Pavel Feldman

Whoops. No, that's not right. Selector matching should fall under RecalculateStyles.

Levi Weintraub

unread,

Jul 16, 2013, 6:30:47 PM7/16/13

to Tony Gentilcore, blink-dev

For those who are curius, Ojan and I spent a moment looking at a profile on oilevent.com. We spend nearly all our time (~75%) allocating and freeing entries in the placed floats tree (the page has thousands of floats).

Interestingly, we do much better on Mac than Linux.

William Chan (陈智昌)

unread,

Jul 16, 2013, 7:30:43 PM7/16/13

to Tony Gentilcore, blink-dev

I'm interested in a few things here:

* Desktop vs mobile breakdowns.

* Doing percentage breakdowns differently. I'm worried here that the long tail edge cases are dominating the sampling. Am I misunderstanding how you're presenting these category percentages? I guess the way I'd want to see it is giving the same percentages on a per-site basis, and then averaging those percentages across all sites with each site having the same weight. Right now, IIUC, it's possible to have a single site that spends infinite time on a single paint event, and that would cause these percentages to be 0% for all events and 100% for paint, even if that only happens for a single website. If my understanding is wrong, then please explain how the sampling is actually working :)

* Identifying how important CPU time is in page load, versus how much time is spent waiting on network. Do you have this data? I suspect it's different between desktop and mobile.

* Main thread responsiveness. I'd like to know how long critical operations are spent waiting in the event loop queue. Are we delaying event handlers due to too much main thread activity? Are we not able to paint fast enough? I'm willing to tolerate lower responsiveness in page load, but when a page is mostly loaded, I'd expect it to be jank-free. My straw main proposal here would be, after the load event fires, start measuring the CPU time in the same way you've done here and scroll through the whole page down to the end of the document, and measure CPU time. Track percentages on a per-site and aggregate, and also time distributions of individual operations, and time distributions of operations that exceed 16ms and thus prevent us from painting frames fast enough.

On Tue, Jul 16, 2013 at 1:33 PM, Tony Gentilcore <to...@chromium.org> wrote:

Eric Seidel

unread,

Jul 16, 2013, 7:39:09 PM7/16/13

to Tony Gentilcore, blink-dev

This data is amazing. Thank you so much for gathering this.

On Tue, Jul 16, 2013 at 1:33 PM, Tony Gentilcore <to...@chromium.org> wrote:

Tony Gentilcore

unread,

Jul 16, 2013, 7:52:59 PM7/16/13

to William Chan (陈智昌), blink-dev

Great ideas. Inline...

On Tue, Jul 16, 2013 at 4:30 PM, William Chan (陈智昌)
<will...@chromium.org> wrote:
> I'm interested in a few things here:
> * Desktop vs mobile breakdowns.

This experiment involved a little over 160 hours of page load time
(not counting overhead) split across 100 shards. As a rule of thumb,
mobile is about 10x slower than desktop, so that means you'd be
looking at a little over 2 months of phone CPU time to run the same
experiment.

Maybe we should just optimize these slow cases first and then run it
on the phones after it is fast ;-)

> * Doing percentage breakdowns differently. I'm worried here that the long
> tail edge cases are dominating the sampling. Am I misunderstanding how
> you're presenting these category percentages? I guess the way I'd want to
> see it is giving the same percentages on a per-site basis, and then
> averaging those percentages across all sites with each site having the same
> weight. Right now, IIUC, it's possible to have a single site that spends
> infinite time on a single paint event, and that would cause these
> percentages to be 0% for all events and 100% for paint, even if that only
> happens for a single website. If my understanding is wrong, then please
> explain how the sampling is actually working :)

I was torn on which way to aggregate this and went with summing the
times and then averaging because I thought that slower pages should be
weighted higher than faster ones. I'll redo the analysis with
averaging and then summing and post the results.

> * Identifying how important CPU time is in page load, versus how much time
> is spent waiting on network. Do you have this data? I suspect it's different
> between desktop and mobile.

We could run this experiment pretty easily. Telemetry supports network
simulation (via Web Page Replay). So we could dial in "typical" mobile
network configuration and gather page loads times under simulated
network. Then we call network time the delta between the simulated
network PLT and the instant network PLT. Even though it would account
for which operations are parallel, it still would be interesting to
weigh that percentage against the other subsystems.

> * Main thread responsiveness. I'd like to know how long critical operations
> are spent waiting in the event loop queue. Are we delaying event handlers
> due to too much main thread activity? Are we not able to paint fast enough?
> I'm willing to tolerate lower responsiveness in page load, but when a page
> is mostly loaded, I'd expect it to be jank-free. My straw main proposal here
> would be, after the load event fires, start measuring the CPU time in the
> same way you've done here and scroll through the whole page down to the end
> of the document, and measure CPU time. Track percentages on a per-site and
> aggregate, and also time distributions of individual operations, and time
> distributions of operations that exceed 16ms and thus prevent us from
> painting frames fast enough.

Telemetry's smoothness_measurement answers the question of jankiness
during scrolling. We can point that at the top 1M sites and get those
stats.

-Tony

William Chan (陈智昌)

unread,

Jul 16, 2013, 9:00:26 PM7/16/13

to Tony Gentilcore, blink-dev

On Tue, Jul 16, 2013 at 4:52 PM, Tony Gentilcore <to...@chromium.org> wrote:

Great ideas. Inline...

On Tue, Jul 16, 2013 at 4:30 PM, William Chan (陈智昌)
<will...@chromium.org> wrote:
> I'm interested in a few things here:
> * Desktop vs mobile breakdowns.

This experiment involved a little over 160 hours of page load time
(not counting overhead) split across 100 shards. As a rule of thumb,
mobile is about 10x slower than desktop, so that means you'd be
looking at a little over 2 months of phone CPU time to run the same
experiment.

Maybe we should just optimize these slow cases first and then run it
on the phones after it is fast ;-)

> * Doing percentage breakdowns differently. I'm worried here that the long
> tail edge cases are dominating the sampling. Am I misunderstanding how
> you're presenting these category percentages? I guess the way I'd want to
> see it is giving the same percentages on a per-site basis, and then
> averaging those percentages across all sites with each site having the same
> weight. Right now, IIUC, it's possible to have a single site that spends
> infinite time on a single paint event, and that would cause these
> percentages to be 0% for all events and 100% for paint, even if that only
> happens for a single website. If my understanding is wrong, then please
> explain how the sampling is actually working :)

I was torn on which way to aggregate this and went with summing the
times and then averaging because I thought that slower pages should be
weighted higher than faster ones. I'll redo the analysis with
averaging and then summing and post the results.

If anything, weight by rank on Alexa. Some of these websites are ridiculously slow and I'm worried their stupidity is drowning out other samples.

> * Identifying how important CPU time is in page load, versus how much time
> is spent waiting on network. Do you have this data? I suspect it's different
> between desktop and mobile.

We could run this experiment pretty easily. Telemetry supports network
simulation (via Web Page Replay). So we could dial in "typical" mobile
network configuration and gather page loads times under simulated
network. Then we call network time the delta between the simulated
network PLT and the instant network PLT. Even though it would account
for which operations are parallel, it still would be interesting to
weigh that percentage against the other subsystems.

Love it.

> * Main thread responsiveness. I'd like to know how long critical operations
> are spent waiting in the event loop queue. Are we delaying event handlers
> due to too much main thread activity? Are we not able to paint fast enough?
> I'm willing to tolerate lower responsiveness in page load, but when a page
> is mostly loaded, I'd expect it to be jank-free. My straw main proposal here
> would be, after the load event fires, start measuring the CPU time in the
> same way you've done here and scroll through the whole page down to the end
> of the document, and measure CPU time. Track percentages on a per-site and
> aggregate, and also time distributions of individual operations, and time
> distributions of operations that exceed 16ms and thus prevent us from
> painting frames fast enough.

Telemetry's smoothness_measurement answers the question of jankiness
during scrolling. We can point that at the top 1M sites and get those
stats.

Pardon the ignorance, but what does smoothness_measurement (smoothness_metrics.py perhaps?) measure? I suspect it might answer the rendering questions I raised, but how about stuff like event handlers and what not? Also, are we delaying things like resource requests since we aren't issuing them from the parser thread? It'd be cool to track that delay. And maybe fix it by issuing resource requests directly from the parser thread.

-Tony

template-biggerboat.jpg

Tony Gentilcore

unread,

Jul 16, 2013, 10:11:43 PM7/16/13

to William Chan (陈智昌), blink-dev

> * Doing percentage breakdowns differently. I'm worried here that the long
> tail edge cases are dominating the sampling. Am I misunderstanding how
> you're presenting these category percentages? I guess the way I'd want to
> see it is giving the same percentages on a per-site basis, and then
> averaging those percentages across all sites with each site having the same
> weight. Right now, IIUC, it's possible to have a single site that spends
> infinite time on a single paint event, and that would cause these
> percentages to be 0% for all events and 100% for paint, even if that only
> happens for a single website. If my understanding is wrong, then please
> explain how the sampling is actually working :)

I was torn on which way to aggregate this and went with summing the
times and then averaging because I thought that slower pages should be
weighted higher than faster ones. I'll redo the analysis with
averaging and then summing and post the results.

If anything, weight by rank on Alexa. Some of these websites are ridiculously slow and I'm worried their stupidity is drowning out other samples.

Here's how it looks when you average the percent spent in each category instead of averaging the time spent in each category:

EvaluateScript: 21.2%

Layout: 19.3%

Program: 12.3%

Paint: 11.3%

FunctionCall: 9.3%

ResourceReceivedData: 9.2%

ParseHTML: 7.2%

RecalculateStyles: 4.7%

DecodeImage: 2.5%

GCEvent: 1.8%

ResizeImage: 0.5%

CompositeLayers: 0.2%

TimerFire: 0.2%

EventDispatch: 0.2%

ResourceReceiveResponse: 0.1%

PaintSetup: 0.1%

XHRReadyStateChange: 0.0%

ScrollLayer: 0.0%

FireAnimationFrame: 0.0%

XHRLoad: 0.0%

Some minor shifting around, but luckily I don't think it would lead to any different conclusions.

> * Main thread responsiveness. I'd like to know how long critical operations
> are spent waiting in the event loop queue. Are we delaying event handlers
> due to too much main thread activity? Are we not able to paint fast enough?
> I'm willing to tolerate lower responsiveness in page load, but when a page
> is mostly loaded, I'd expect it to be jank-free. My straw main proposal here
> would be, after the load event fires, start measuring the CPU time in the
> same way you've done here and scroll through the whole page down to the end
> of the document, and measure CPU time. Track percentages on a per-site and
> aggregate, and also time distributions of individual operations, and time
> distributions of operations that exceed 16ms and thus prevent us from
> painting frames fast enough.

Telemetry's smoothness_measurement answers the question of jankiness
during scrolling. We can point that at the top 1M sites and get those
stats.

Pardon the ignorance, but what does smoothness_measurement (smoothness_metrics.py perhaps?) measure?

http://www.chromium.org/developers/design-documents/rendering-benchmarks

https://code.google.com/p/chromium/codesearch#chromium/src/tools/perf/measurements/smoothness.py

I suspect it might answer the rendering questions I raised, but how about stuff like event handlers and what not? Also, are we delaying things like resource requests since we aren't issuing them from the parser thread? It'd be cool to track that delay. And maybe fix it by issuing resource requests directly from the parser thread.

It would not answer those questions. There's clearly more we could measure :)

johnj...@chromium.org

unread,

Jul 16, 2013, 11:42:34 PM7/16/13

to blin...@chromium.org, Yoav Weiss, Pavel Feldman

On Tuesday, July 16, 2013 2:03:07 PM UTC-7, Tony Gentilcore wrote:

> What's the difference between EvaluateScript and FunctionCall?

According to https://developers.google.com/chrome-developer-tools/docs/timeline#timeline_event_reference

Evaluate Script: A script was evaluated.

This maps to InspectorInstrumentation::willEvaluateScript

which is called around the JS compile and run of the outer JS function (often, but not always, initialization code). Thus it is a mix of JS compile and some function call time. Includes extension content-scripts and NPObject (whatever that is) and chrome.* api initialization. It does not seem to include browser-generated event-handler scripts, but they would be insignificant for this purpose. It does not seem to count eval()/new Function().

Since this category is large, it may be worthwhile separating the compile and run times.

Function Call: A top-level JavaScirpt function call was made (only
appears when browser enters JavaScript engine).

This one looks like it is all kinds of callbacks and event handlers in JS. So, eg load event handlers in these tests. Also any XHR handlers triggered on load and setTimeout/setInterval. There are more cases here than in EvaluateScript and probably the more interesting question is whether the % is skewed by some of the sites (pathology) or is representative of most of them (average JS runtime).

hth,

jjb

Adam Klein

unread,

Jul 17, 2013, 1:23:21 AM7/17/13

to johnj...@chromium.org, blink-dev, Yoav Weiss, Pavel Feldman

On Tue, Jul 16, 2013 at 8:42 PM, <johnj...@chromium.org> wrote:

On Tuesday, July 16, 2013 2:03:07 PM UTC-7, Tony Gentilcore wrote:
> What's the difference between EvaluateScript and FunctionCall?

According to https://developers.google.com/chrome-developer-tools/docs/timeline#timeline_event_reference

Evaluate Script: A script was evaluated.

This maps to InspectorInstrumentation::willEvaluateScript

which is called around the JS compile and run of the outer JS function (often, but not always, initialization code). Thus it is a mix of JS compile and some function call time. Includes extension content-scripts and NPObject (whatever that is) and chrome.* api initialization. It does not seem to include browser-generated event-handler scripts, but they would be insignificant for this purpose. It does not seem to count eval()/new Function().

Since this category is large, it may be worthwhile separating the compile and run times.

Based on what I've seen looking at about:tracing results, the compile times are likely to be minuscule compared to the run times. But I agree that it would be nice to have compile times available, as they are in traces.

William Chan (陈智昌)

unread,

Jul 17, 2013, 3:01:46 PM7/17/13

to Tony Gentilcore, blink-dev

What's all this ResourceReceivedData time?

Tony Gentilcore

unread,

Jul 17, 2013, 6:55:10 PM7/17/13

to William Chan (陈智昌), blink-dev

After making the 790,318th most popular site 4 times faster
(http://crbug.com/261308), eseidel pointed out that likely no one ever
goes there.

So if you are interested in smaller gains on more popular sites,
here's a list of the 10 slowest in each category restricted to the top
1,000 sites:
https://docs.google.com/a/chromium.org/document/d/1ca_Q7xePmCRqaYnHe7vkpCmKNFNLdDXvzgtUPt9iG8w/edit#

It was surprising to me that PLT is 15% slower in the top 1,000 than
the top million (I would have guessed the other way around). Also
interesting is that in the top 1,000 Layout is the #1 category at
24.7%.

Alec Flett

unread,

Jul 17, 2013, 7:48:41 PM7/17/13

to Tony Gentilcore, William Chan (陈智昌), blink-dev

This is really awesome.

So one thing that's got me thinking is that deep-ish links are actually really important, and in many popular cases probably more important than the homepage. Think of an article page on wikipedia, a single-video page on youtube, a team page on mlb.com, a facebook news feed, or a single news story on nytimes.com - all probably more representitive of what people actually browser than these sites' respective toplevel homepages.

Alec

William Chan (陈智昌)

unread,

Jul 17, 2013, 9:38:21 PM7/17/13

to Tony Gentilcore, blink-dev

On Wed, Jul 17, 2013 at 3:55 PM, Tony Gentilcore <to...@chromium.org> wrote:

After making the 790,318th most popular site 4 times faster
(http://crbug.com/261308), eseidel pointed out that likely no one ever
goes there.

I like investigating anomalies: https://www.youtube.com/watch?v=-3dw09N5_Aw. I think it helps with understanding complex systems, and even if these sites aren't the most popular, there can be a lot to learn from them.

So if you are interested in smaller gains on more popular sites,
here's a list of the 10 slowest in each category restricted to the top
1,000 sites:
https://docs.google.com/a/chromium.org/document/d/1ca_Q7xePmCRqaYnHe7vkpCmKNFNLdDXvzgtUPt9iG8w/edit#

It was surprising to me that PLT is 15% slower in the top 1,000 than
the top million (I would have guessed the other way around). Also
interesting is that in the top 1,000 Layout is the #1 category at
24.7%.

It'd be useful to understand the "page weight" of the top 1000 vs 1M. I definitely expect the less popular sites to be less optimized, but a lot of the optimizations that developers have traditionally focused on are resource loading / networking related, and not necessarily stuff that's more computational. I agree with lots of what Paul has to say here: http://aerotwist.com/blog/reflections-on-performance-at-google-io/. In other words, while I expect the top 1000 sites to have better PLTs, I don't think (and your data seems to suggest this) that the CPU time during page load would necessarily be better for the top 1000 sites.

Adam Barth

unread,

Jul 17, 2013, 10:08:15 PM7/17/13

to William Chan (陈智昌), Tony Gentilcore, blink-dev

On Wed, Jul 17, 2013 at 6:38 PM, William Chan (陈智昌) <will...@chromium.org> wrote:

On Wed, Jul 17, 2013 at 3:55 PM, Tony Gentilcore <to...@chromium.org> wrote:

After making the 790,318th most popular site 4 times faster
(http://crbug.com/261308), eseidel pointed out that likely no one ever
goes there.

I like investigating anomalies: https://www.youtube.com/watch?v=-3dw09N5_Aw. I think it helps with understanding complex systems, and even if these sites aren't the most popular, there can be a lot to learn from them.

Great talk. Thanks for the link.

Adam

Annie Sullivan

unread,

Jul 18, 2013, 10:58:58 AM7/18/13

to Alec Flett, Tony Gentilcore, William Chan (陈智昌), blink-dev

On Wed, Jul 17, 2013 at 7:48 PM, Alec Flett <alec...@chromium.org> wrote:

This is really awesome.

So one thing that's got me thinking is that deep-ish links are actually really important, and in many popular cases probably more important than the homepage. Think of an article page on wikipedia, a single-video page on youtube, a team page on mlb.com, a facebook news feed, or a single news story on nytimes.com - all probably more representitive of what people actually browser than these sites' respective toplevel homepages.

+1

I just downloaded the top million sites from http://s3.amazonaws.com/alexa-static/top-1m.csv.zip

Like Alec says, wikipedia.org is #7, but there are no wikipedia articles listed. Same thing for mlb.com, facebook, nytimes.com. On youtube, there are 7000 user homepages listed, but no single-video pages. We're probably missing out on a lot of great outliers by excluding articles, popular feeds, etc. I'm not sure the best way to find deep links. I checked the alexa siteinfo pages, and they list the most popular subdomains but not any deep links. I couldn't find a publicly available list of top urls or top deep links per site. I thought maybe we could just try loading a random same-domain link from each of the top 1000 sites, but I worry we'd end up with more top-level links since there are so many menus for most sites. Especially on facebook and twitter, the main url is a login page that just links to help pages, not public feeds. Anyone have ideas how we could include more deep links?

Ravi Mistry

unread,

Jul 18, 2013, 11:09:01 AM7/18/13

to Annie Sullivan, Alec Flett, Tony Gentilcore, William Chan (陈智昌), blink-dev

On Thu, Jul 18, 2013 at 10:58 AM, Annie Sullivan <sull...@google.com> wrote:

On Wed, Jul 17, 2013 at 7:48 PM, Alec Flett <alec...@chromium.org> wrote:

This is really awesome.

So one thing that's got me thinking is that deep-ish links are actually really important, and in many popular cases probably more important than the homepage. Think of an article page on wikipedia, a single-video page on youtube, a team page on mlb.com, a facebook news feed, or a single news story on nytimes.com - all probably more representitive of what people actually browser than these sites' respective toplevel homepages.

+1

I just downloaded the top million sites from http://s3.amazonaws.com/alexa-static/top-1m.csv.zip

Like Alec says, wikipedia.org is #7, but there are no wikipedia articles listed. Same thing for mlb.com, facebook, nytimes.com. On youtube, there are 7000 user homepages listed, but no single-video pages. We're probably missing out on a lot of great outliers by excluding articles, popular feeds, etc. I'm not sure the best way to find deep links. I checked the alexa siteinfo pages, and they list the most popular subdomains but not any deep links. I couldn't find a publicly available list of top urls or top deep links per site. I thought maybe we could just try loading a random same-domain link from each of the top 1000 sites, but I worry we'd end up with more top-level links since there are so many menus for most sites. Especially on facebook and twitter, the main url is a login page that just links to help pages, not public feeds. Anyone have ideas how we could include more deep links?

I completely agree with this observation. If anybody figures out a way to create a repository of deep links please let me know and I will add it to the framework.

Thanks,

Ravi

Alec

On Wed, Jul 17, 2013 at 3:55 PM, Tony Gentilcore <to...@chromium.org> wrote:

After making the 790,318th most popular site 4 times faster
(http://crbug.com/261308), eseidel pointed out that likely no one ever
goes there.

So if you are interested in smaller gains on more popular sites,
here's a list of the 10 slowest in each category restricted to the top
1,000 sites:
https://docs.google.com/a/chromium.org/document/d/1ca_Q7xePmCRqaYnHe7vkpCmKNFNLdDXvzgtUPt9iG8w/edit#

It was surprising to me that PLT is 15% slower in the top 1,000 than
the top million (I would have guessed the other way around). Also
interesting is that in the top 1,000 Layout is the #1 category at
24.7%.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAJB_qsn_PYaRqi-pv%3DppKn%3D%2BOfBLiT3DKgwthBmSr1_jQ7eB%3DQ%40mail.gmail.com.

Alec Flett

unread,

Jul 18, 2013, 2:34:48 PM7/18/13

to Ravi Mistry, Annie Sullivan, Alec Flett, Tony Gentilcore, William Chan (陈智昌), blink-dev

Ok, this i kind of a crazy approach, but google's knowledge graph contains a list of deep links for all topics it covers - so for example given barack obama, we have all the links for him on the web:

http://www.freebase.com/m/02mjmr

(Scroll down until you get to all the links)

This stuff is available in the freebase data dumps:

https://developers.google.com/freebase/data#data-format

So you could just get a list of URLs (though you're probably talking hundreds of millions of links - the dumps are 19 Gigs compressed!) from these dumps and extract the ones whose domains are in the alexa top xxxx. this will cover certain topics like wikipedia or even nytimes topic pages, but not things like news articles.

Alec

Annie Sullivan

unread,

Jul 18, 2013, 4:27:52 PM7/18/13

to Alec Flett, Ravi Mistry, Tony Gentilcore, William Chan (陈智昌), blink-dev

On Thu, Jul 18, 2013 at 2:34 PM, Alec Flett <alec...@chromium.org> wrote:

Ok, this i kind of a crazy approach, but google's knowledge graph contains a list of deep links for all topics it covers - so for example given barack obama, we have all the links for him on the web:

http://www.freebase.com/m/02mjmr

(Scroll down until you get to all the links)

This stuff is available in the freebase data dumps:

https://developers.google.com/freebase/data#data-format

So you could just get a list of URLs (though you're probably talking hundreds of millions of links - the dumps are 19 Gigs compressed!) from these dumps and extract the ones whose domains are in the alexa top xxxx. this will cover certain topics like wikipedia or even nytimes topic pages, but not things like news articles.

This is an interesting idea! I'm downloading a data dump, and I'll take a look and see how many different domains are represented, how many of the top sites, etc.

Tony Gentilcore

unread,

Jul 22, 2013, 11:34:57 AM7/22/13

to Alec Flett, Ravi Mistry, Annie Sullivan, William Chan (陈智昌), blink-dev

Will Chan had an excellent idea to factor network time into the
breakdown. Ravi and I redid the experiment both with unlimited network
and with a simulated cable modem connection.

Results are here:
https://docs.google.com/a/chromium.org/document/d/1cpLSSYpqi4SprkJcVxbS7af6avKM0qc-imxvkexmCZs/edit#heading=h.7dyk54du640h

The unlimited network breakdown very closely matched the original
experiment, suggesting that our results are repeatable. The netsim
version loads pages a little over 3 times slower and attributes a
little over 2/3rds of the time to network. Another interesting
observation is that Paint times climb up to the 3rd highest CPU user
under network simulation. I theorize this is because with the slower
page loads we end up painting incrementally a lot more.

In my mind this really highlights the need to repeat this experiment
on mobile devices (despite the challenge). I suspect we'll see a
significantly different breakdown.

There are lots of ideas for further work in the document and I eagerly
welcome more suggestions/ideas.

-Tony

Elliott Sprehn

unread,

Jul 22, 2013, 11:39:44 AM7/22/13

to Tony Gentilcore, Alec Flett, Ravi Mistry, Annie Sullivan, William Chan (陈智昌), blink-dev

On Mon, Jul 22, 2013 at 8:34 AM, Tony Gentilcore <to...@chromium.org> wrote:

Will Chan had an excellent idea to factor network time into the
breakdown. Ravi and I redid the experiment both with unlimited network
and with a simulated cable modem connection.

Results are here:
https://docs.google.com/a/chromium.org/document/d/1cpLSSYpqi4SprkJcVxbS7af6avKM0qc-imxvkexmCZs/edit#heading=h.7dyk54du640h

The unlimited network breakdown very closely matched the original
experiment, suggesting that our results are repeatable. The netsim
version loads pages a little over 3 times slower and attributes a
little over 2/3rds of the time to network. Another interesting
observation is that Paint times climb up to the 3rd highest CPU user
under network simulation. I theorize this is because with the slower
page loads we end up painting incrementally a lot more.

This is especially interesting because I've had several conversations where folks were claiming recording the SkPicture was "free" by comparison to the raster. It seems we should be focusing on the performance of paint after all.

- E

Jeremy Roman

unread,

Jul 22, 2013, 11:46:00 AM7/22/13

to Elliott Sprehn, Tony Gentilcore, Alec Flett, Ravi Mistry, Annie Sullivan, William Chan (陈智昌), blink-dev

On Mon, Jul 22, 2013 at 11:39 AM, Elliott Sprehn <esp...@chromium.org> wrote:
> This is especially interesting because I've had several conversations where
> folks were claiming recording the SkPicture was "free" by comparison to the
> raster. It seems we should be focusing on the performance of paint after
> all.

I've done some experiments here, as part of my efforts to reduce record time.

It varies a lot by page (e.g. a page with one massive blur is quick to
record, because all of the hard work happens at raster time), but
typically, record and raster time are on the same order of magnitude.
It's absolutely worth doing what we can to reduce record time --
especially since, even when impl-side painting is on, it's on the main
thread.

--
Jeremy Roman
Software Engineering Intern
Google

Tom Hudson

unread,

Jul 22, 2013, 11:51:01 AM7/22/13

to Elliott Sprehn, Tony Gentilcore, Alec Flett, Ravi Mistry, Annie Sullivan, William Chan (陈智昌), blink-dev

On Mon, Jul 22, 2013 at 4:39 PM, Elliott Sprehn <esp...@chromium.org> wrote:

This is especially interesting because I've had several conversations where folks were claiming recording the SkPicture was "free" by comparison to the raster. It seems we should be focusing on the performance of paint after all.

I'm far more used to people complaining about SkPicture recording costs than claiming it's free.

I think we can get 2x speedup on SkPicture::record() (and owe https://codereview.chromium.org/19564007/ a review), but enne@ recently pointed out on another thread that we'd gain far more by doing less recording than by just optimizing the cost of recording. For example, in https://code.google.com/p/chromium/issues/detail?id=262912 I think a webkit-animation:pulsate is causing continuous relayout, which leads to continuous rerecording, continuous rerasterization, and continuous reuploading to the GPU.

Relative importance of recording vs rasterization also depends on whether you're doing lots of page loads, lots of animation/interaction, or lots of scrolling around relatively static pages.

Tom

Dana Jansens

unread,

Jul 22, 2013, 11:50:48 AM7/22/13

to Elliott Sprehn, Tony Gentilcore, Alec Flett, Ravi Mistry, Annie Sullivan, William Chan (陈智昌), blink-dev

On Mon, Jul 22, 2013 at 11:39 AM, Elliott Sprehn <esp...@chromium.org> wrote:

Over the lifetime of a webpage in your browser, if you are scrolling/pinching/etc, we will re-raster many times. But unless the page invalidates heavily, we will only record once. So, it seems easy to say that we spend a lot more time rastering than we do recording, but that ignores pages with heavy invalidation, or even the amount of time to paint a page during page load.

- E

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAO9Q3iLheiJn800rvTYGaPpgcryixOTfYbyKjiVk5oeoDw9SHQ%40mail.gmail.com.

William Chan (陈智昌)

unread,

Jul 22, 2013, 12:15:35 PM7/22/13

to Tony Gentilcore, blink-dev, Ravi Mistry, Annie Sullivan, Alec Flett

You guys rock. I concur with your hypothesis on the increased paint times. I can't wait to get mobile results too. I'm uncomfortably excited about all this work and look forward to checking back on this email thread from a beach in Kauai.

I have to say I'm a little surprised at the low percentage of network time, although maybe it's not too surprising since we used a simulated cable modem. If it would not be much pain, I'd also like a network simulation of our long tail users with "slow" connections, say DSL or even slower. It would also help confirm the hypothesis about extra CPU time from more incremental paints. I am concerned that we might have suboptimal rendering behavior that makes page load time for our slow network users even slower, due to extra CPU operations.

Excuse the brevity. Sent from my iAndroid.

Tony Gentilcore

unread,

Jul 22, 2013, 12:30:57 PM7/22/13

to William Chan (陈智昌), blink-dev, Ravi Mistry, Annie Sullivan, Alec Flett

> I have to say I'm a little surprised at the low percentage of network time,

There are some strong caveats in the "future work" section of the
document. In particular: DNS requests were free and slow start was not
simulated. So we should really consider this a lower-bound on network
time. Also explained in that section is a way to repeat the experiment
with better simulation (dummynet).

> If it would not be much pain, I'd also like a network simulation of our long tail users with "slow" connections, say DSL or even slower.

I'll consider running with some different network types after
improving the network simulation. To be honest, at this point I'm
really most excited about getting this to work on mobile though.

James Robinson

unread,

Jul 22, 2013, 12:46:25 PM7/22/13

to Elliott Sprehn, Tony Gentilcore, Alec Flett, Ravi Mistry, Annie Sullivan, William Chan (陈智昌), blink-dev

This data was recorded on a platform where we don't do SkPicture recording. The paint time here is actual rasterization time.

- James

Chris Bentzel

unread,

Jul 29, 2013, 4:37:11 PM7/29/13

to Tony Gentilcore, William Chan (陈智昌), blink-dev, Ravi Mistry, Annie Sullivan, Alec Flett

On Mon, Jul 22, 2013 at 12:30 PM, Tony Gentilcore <to...@chromium.org> wrote:
>> I have to say I'm a little surprised at the low percentage of network time,
>
> There are some strong caveats in the "future work" section of the
> document. In particular: DNS requests were free and slow start was not
> simulated. So we should really consider this a lower-bound on network
> time. Also explained in that section is a way to repeat the experiment
> with better simulation (dummynet).

Do you attribute time between the initial SendRequest and
ReceivedResponse events (for the top-level page) to network?

Even with zero-latency network and server this still may take time on
the client (lots of throttles such as SafeBrowsing, handling
redirects, checking cache, getting cookies, etc.)

>
>> If it would not be much pain, I'd also like a network simulation of our long tail users with "slow" connections, say DSL or even slower.
>
> I'll consider running with some different network types after
> improving the network simulation. To be honest, at this point I'm
> really most excited about getting this to work on mobile though.
>

> --
> You received this message because you are subscribed to the Google Groups "blink-dev" group.

> To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CANvLf_H%2Ba4yY3gM%3DXN741fzCoWjYcbk0cuG6mEkBY_g3Bt_agQ%40mail.gmail.com.
>
>

Eric Seidel

unread,

Jul 31, 2013, 4:38:12 AM7/31/13

to Chris Bentzel, Tony Gentilcore, William Chan (陈智昌), blink-dev, Ravi Mistry, Annie Sullivan, Alec Flett

After investigating some of the top Layout sites this evening, it
seems sites with slow layout are almost entirely dominated by
freetype/skia/libfontconfig font/glyph loading times on Linux:
https://code.google.com/p/chromium/issues/detail?id=266214

Android or Windows times would give us a better sense of what our
users are seeing.

You mentioned that it would take months to run the same experiment on
Android, but if it were possible to profile the top 1000 sites (or
even top 100) sites on Android that would be very interesting to see.

Dominic Mazzoni

unread,

Jul 31, 2013, 11:22:07 AM7/31/13

to Eric Seidel, Chris Bentzel, Tony Gentilcore, William Chan (陈智昌), blink-dev, Ravi Mistry, Annie Sullivan, Alec Flett

On Wed, Jul 31, 2013 at 1:38 AM, Eric Seidel <ese...@chromium.org> wrote:

After investigating some of the top Layout sites this evening, it
seems sites with slow layout are almost entirely dominated by
freetype/skia/libfontconfig font/glyph loading times on Linux:
https://code.google.com/p/chromium/issues/detail?id=266214

Android or Windows times would give us a better sense of what our
users are seeing.

What about Chrome OS? Is its performance similar to Linux? If so, someone

on that team might be interested in looking into the font loading issue; it might

be easier to solve there where we control the whole stack.

- Dominic

Tony Gentilcore

unread,

Jul 31, 2013, 2:28:22 PM7/31/13

to Eric Seidel, Chris Bentzel, William Chan (陈智昌), blink-dev, Ravi Mistry, Annie Sullivan, Alec Flett

On Wed, Jul 31, 2013 at 1:38 AM, Eric Seidel <ese...@chromium.org> wrote:

After investigating some of the top Layout sites this evening, it
seems sites with slow layout are almost entirely dominated by
freetype/skia/libfontconfig font/glyph loading times on Linux:
https://code.google.com/p/chromium/issues/detail?id=266214

Android or Windows times would give us a better sense of what our
users are seeing.

You mentioned that it would take months to run the same experiment on
Android, but if it were possible to profile the top 1000 sites (or
even top 100) sites on Android that would be very interesting to see.

Interesting. I'm not able to reproduce your slow font loading behavior.

simonhatch@ created a page set with the top 10 slowest layout cases in the top million sites and when aggregating their profiles, I get an entirely reasonable looking list of layout related functions.

$ xvfb-run tools/perf/run_measurement loading_profile tools/perf/page_sets/tough_layout_cases.json --output=/tmp/csv --output-format=csv

Here's the top 10 from that suite:

WebCore::PODFreeListArena::Node*): 69.1%

WebCore::RenderBlock::FloatingObjects::computePlacedFloatsTree: 9.6%

void WebCore::PODIntervalTree&) const: 8.6%

WebCore::RenderBlock::floatingObjects const: 1.5%

WTF::IntHash::hash: 1.1%

WebCore::SelectorFilter::fastRejectSelector const: 1.0%

WebCore::RenderBlock::logicalLeftFloatOffsetForLine const: 0.8%

WebCore::PODRedBlackTree::Node*): 0.7%

WebCore::LayoutUnit::LayoutUnit: 0.6%

WebCore::rangesIntersect: 0.5%

Maybe something is still up with your perf? Are you getting a reasonable number of samples in the profiles?

Or maybe there's something up w/ your linux box?

Eric Seidel

unread,

Jul 31, 2013, 3:06:33 PM7/31/13

to Tony Gentilcore, Chris Bentzel, William Chan (陈智昌), blink-dev, Ravi Mistry, Annie Sullivan, Alec Flett

I suspect some part of the font-config time is coming from my use of
--dump-render-tree, which causes us to install fonts on startup:
https://code.google.com/p/chromium/codesearch#chromium/src/content/shell/app/webkit_test_platform_support_linux.cc

I use --dump-render-tree as it's a convenient way to get content_shell
to profile just the load time.

That said, I just ran:
perf record -g -- ./out/Release/content_shell --no-sandbox
--single-process http://mop.com
(Note the lack of --dump-render-tree.)
and then ^C'd after the page looked like it was loaded.

And all the time was in libfontconfig:
+ 17.62% content_shell libfontconfig.so.1.4.4 [.] FcCompareFamily
+ 15.22% content_shell libfontconfig.so.1.4.4 [.] FcCompareValueList
+ 11.91% content_shell libfontconfig.so.1.4.4 [.] FcConfigCompareValue
+ 5.38% content_shell libfontconfig.so.1.4.4 [.] FcSortCompare
+ 3.80% content_shell libfontconfig.so.1.4.4 [.]
FcStrCaseWalkerNextIgnoreBlanks
+ 2.97% content_shell libfontconfig.so.1.4.4 [.] FcCompare
+ 1.94% content_shell libfontconfig.so.1.4.4 [.] __popcountdi2
+ 1.79% content_shell libfontconfig.so.1.4.4 [.]
FcStrCmpIgnoreBlanksAndCase
+ 1.43% content_shell libfontconfig.so.1.4.4 [.] FcStrCaseWalkerNext
+ 1.38% content_shell libflashplayer.so [.] 0x00000000007f8adc
+ 1.36% content_shell libc-2.15.so [.] msort_with_tmp.part.0
+ 1.25% content_shell libfontconfig.so.1.4.4 [.] FcStrCmpIgnoreCase

It's possible this is unique to my config. I'll see if I can repro on
a colleague's machine this afternoon. I'm also building Chrome for
Android as we speak and will check to see how we fair on these sites
there.

Tony Gentilcore

unread,

Jul 31, 2013, 3:31:58 PM7/31/13

to Eric Seidel, Chris Bentzel, William Chan (陈智昌), blink-dev, Ravi Mistry, Annie Sullivan, Alec Flett

One difference between the ways we are profiling is that I'm only profiling page load time. You are profiling startup time + page load time. Wonder whether startup could be dominating here?

Eric Seidel

unread,

Jul 31, 2013, 3:44:15 PM7/31/13

to Tony Gentilcore, Chris Bentzel, William Chan (陈智昌), blink-dev, Ravi Mistry, Annie Sullivan, Alec Flett

I'm also profiling --single-process, which is going to include Browser time.

I've learned through talking with Dan Erat just now that Linux uses a
special (non-traced!) IPC system just for things like fonts:
https://code.google.com/p/chromium/wiki/LinuxSandboxIPC

So these 4s layout times are actually just the renderer waiting for
the return of these synchronous IPC calls:
https://code.google.com/p/chromium/codesearch#chromium/src/content/common/child_process_sandbox_support_impl_linux.cc

When I profile with --single-process I'm also seeing the browser-time
spent in FontConfig:
https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/web/linux/WebFontInfo.cpp

Android hits this code too, they just don't seem to use the same
sandbox, and just call this WebFontInfo (and thus FontConfig) code
directly in the renderer.

It's very possible that something funny about my configuration is
making FontConfig take much longer than for others. I'll know more
soon.

James Robinson

unread,

Jul 31, 2013, 4:23:06 PM7/31/13

to Eric Seidel, Tony Gentilcore, Chris Bentzel, William Chan (陈智昌), blink-dev, Ravi Mistry, Annie Sullivan, Alec Flett

Profiling in --single-process is not likely to give you useful information about how Blink is behaving for our users. We take a number of shortcuts in this mode that aren't meant to be efficient and do not receive any performance attention.

- James

Message has been deleted

Patrick Meenan

unread,

Aug 5, 2013, 3:35:55 PM8/5/13

to blin...@chromium.org, Chris Bentzel, Tony Gentilcore, William Chan (陈智昌), Ravi Mistry, Annie Sullivan, Alec Flett

I added support to WebPagetest to do a main-thread breakdown if a timeline is captured. In addition to the individual component times I also expose an "Idle" time which is the time between the start of the first event and end of the last event that isn't accounted for by the individual timeline events (in theory that should be the time that Chrome spends waiting for stuff - mostly from the network).

I ran the same URL list on Windows VM's (no GPU) as well as a physical machine with a GPU and on some Motorola Razr's with Android 4.1.2. The VM's ran the full list and we have data from the top 25k or so on the GPU and Android. There are raw CSV's with all of the results here and each has a link to the WebPagetest test and full timeline data for the test (the mobile links will only work for Googlers but I can push results to the public instance as needed - the VM and GPU tests should all be fully available). The tests were run using a "cable" profile which is 5Mbps down, 1Mbps up and 28ms of last-mile latency.

Windows 7 VM (Chrome 30.0.1586.1).

(773488 results)	Including Idle	No Idle
Idle	80.17%
Program	5.13%	25.85%
EvaluateScript	4.53%	22.85%
FunctionCall	2.32%	11.72%
Layout	1.93%	9.75%
ParseHTML	1.58%	7.98%
Paint	1.55%	7.81%
ResourceReceivedData	0.99%	5.01%
RecalculateStyles	0.59%	2.96%
GCEvent	0.50%	2.53%
DecodeImage	0.48%	2.44%
ResizeImage	0.10%	0.49%
TimerFire	0.07%	0.33%
EventDispatch	0.02%	0.12%
ResourceReceiveResponse	0.01%	0.05%
ScrollLayer	0.01%	0.05%
XHRReadyStateChange	0.01%	0.04%
FireAnimationFrame	0.00%	0.00%
XHRLoad	0.00%	0.00%

Windows 7 GPU (Thinkpad T430 i5 w/Intel HD 4000, Chrome 28.0.1500.95).

(26497 results)	Including Idle	No Idle
Idle	84.12%
EvaluateScript	2.82%	17.75%
Program	2.64%	16.60%
Paint	2.23%	14.02%
FunctionCall	1.82%	11.49%
Layout	1.58%	9.92%
CompositeLayers	0.83%	5.21%
ParseHTML	0.74%	4.69%
ScrollLayer	0.65%	4.11%
ResourceReceivedData	0.52%	3.25%
RecalculateStyles	0.50%	3.14%
DecodeImage	0.49%	3.06%
GCEvent	0.46%	2.90%
ResizeImage	0.27%	1.69%
FireAnimationFrame	0.08%	0.53%
PaintSetup	0.08%	0.49%
TimerFire	0.07%	0.45%
EventDispatch	0.04%	0.23%
XHRReadyStateChange	0.03%	0.22%
ResourceReceiveResponse	0.02%	0.14%
XHRLoad	0.01%	0.09%

Mobile (Motorola Razr, Android 4.1.2, Chrome 28.0.1500.45).

(23362 Results)	Including Idle	No Idle
Idle	46.17%
Rasterize	17.73%	32.94%
EvaluateScript	6.38%	11.86%
Program	6.25%	11.62%
FunctionCall	5.17%	9.61%
Layout	3.27%	6.07%
ResourceReceivedData	2.94%	5.47%
ScrollLayer	2.74%	5.09%
Paint	2.01%	3.73%
ParseHTML	1.86%	3.46%
GCEvent	1.44%	2.67%
RecalculateStyles	1.25%	2.33%
CompositeLayers	0.94%	1.76%
TimerFire	0.88%	1.63%
DecodeImage	0.55%	1.02%
FireAnimationFrame	0.24%	0.44%
EventDispatch	0.08%	0.16%
XHRReadyStateChange	0.05%	0.09%
ResourceReceiveResponse	0.02%	0.03%
XHRLoad	0.02%	0.03%

Eric Seidel

unread,

Aug 5, 2013, 3:43:31 PM8/5/13

to Patrick Meenan, blink-dev, Chris Bentzel, Tony Gentilcore, William Chan (陈智昌), Ravi Mistry, Annie Sullivan, Alec Flett

ResourceReceivedData

2.94%

5.47%

Seems confusing.

The only time we record that is here:

https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/core/loader/cache/ResourceFetcher.cpp&sq=package:chromium&type=cs&l=1176&rcl=1375686820

And the only thing it does is call one function, which has one implementation:

https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/core/loader/ResourceLoadNotifier.cpp&sq=package:chromium&type=cs&l=65&rcl=1375686820

Does this mean we're spending 5% of total load time (on android) under:

https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/core/loader/ProgressTracker.cpp&sq=package:chromium&type=cs&l=169&rcl=1375686820

Or could the early return in ResourceFetcher be confusing the inspector's recording?

https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/core/loader/cache/ResourceFetcher.cpp&sq=package:chromium&type=cs&l=1176&rcl=1375686820

On Mon, Aug 5, 2013 at 12:24 PM, Patrick Meenan <pme...@google.com> wrote:

I added support to WebPagetest to do a main-thread breakdown if a timeline is captured. In addition to the individual component times I also expose an "Idle" time which is the time between the start of the first event and end of the last event that isn't accounted for by the individual timeline events (in theory that should be the time that Chrome spends waiting for stuff - mostly from the network).

I ran the same URL list on Windows VM's (no GPU) as well as a physical machine with a GPU and on some Motorola Razr's with Android 4.1.2. The VM's ran the full list and we have data from the top 25k or so on the GPU and Android. There are raw CSV's with all of the results here and each has a link to the WebPagetest test and full timeline data for the test (the mobile links will only work for Googlers but I can push results to the public instance as needed - the VM and GPU tests should all be fully available).

Tom Hudson

unread,

Aug 5, 2013, 4:01:09 PM8/5/13

to Patrick Meenan, blink-dev, Chris Bentzel, Tony Gentilcore, William Chan (陈智昌), Ravi Mistry, Annie Sullivan, Alec Flett

On Mon, Aug 5, 2013 at 8:24 PM, Patrick Meenan <pme...@google.com> wrote:

Mobile (Motorola Razr, Android 4.1.2, Chrome 28.0.1500.45).

(23362 Results) Including Idle No Idle
Idle 46.17%

Rasterize 17.73% 32.94%
EvaluateScript 6.38% 11.86%

Program 6.25% 11.62%
FunctionCall 5.17% 9.61%
Layout 3.27% 6.07%

This is confusing, too. Chrome on Android should be rasterizing on another thread? Is this actually picture recording?

Tom

Patrick Meenan

unread,

Aug 5, 2013, 4:03:40 PM8/5/13

to Tom Hudson, blink-dev, Chris Bentzel, Tony Gentilcore, William Chan (陈智昌), Ravi Mistry, Annie Sullivan, Alec Flett

FWIW, it is dumping all of the timeline events (and assuming they are for main-thread activity). Do the background rasterize events show up in the timeline? If so, is there any way to parse out the different threads?

Nat Duca

unread,

Aug 5, 2013, 4:04:14 PM8/5/13

to Eric Seidel, Patrick Meenan, blink-dev, Chris Bentzel, Tony Gentilcore, William Chan (陈智昌), Ravi Mistry, Annie Sullivan, Alec Flett

One thing to keep in mind with the inspector-timeline results is that they use wall time --- so, if a given task does 2ms of real work but is deschedued for 50ms ,you'll see it as having a self time of 52ms. This effect is most pronounced on low-core-count machines and during IPC. Chrome is such a threaded monster, I wouldn't be surprised to find that this is influencing the numbers a bit.

We're in the process of adding thread-specific timers to chrome that stop advancing when the thread is descheduled in order to help distinguish between these two cases. https://code.google.com/p/chromium/issues/detail?id=264306 to begin with. We will probably update the loading_timeline measurement to report this once its in.

Nat Duca

unread,

Aug 5, 2013, 4:06:47 PM8/5/13

to Patrick Meenan, Tom Hudson, blink-dev, Chris Bentzel, Tony Gentilcore, William Chan (陈智昌), Ravi Mistry, Annie Sullivan, Alec Flett

Rasterize and image decode will show up on the timeline recordings even though they're from another thread. It would be awesome to file a bug (cc caseq) to indicate whether an event was from a worker thread to help with some of this reporting.

As an alterantive, we do now have the about:tracing version of the loading measurement which gives just the records for the renderer thread. Thats here: http://src.chromium.org/chrome/trunk/src/tools/perf/measurements/loading_timeline.py

Eric Seidel

unread,

Aug 5, 2013, 7:59:57 PM8/5/13

to Nat Duca, Patrick Meenan, Tom Hudson, blink-dev, Chris Bentzel, Tony Gentilcore, William Chan (陈智昌), Ravi Mistry, Annie Sullivan, Alec Flett

I pulled down the Android data and sorted by Layout time. Interestingly, the sites with the longest Layout time weren't actually spending that much time in layout.

When you click on the webpagetest links and go to the "processing breakdown" tab, they're all spending 50+% of their time painting.

Most of these slowest pages are .cn sites which have marquees or other ads constantly running. I suspect that we're constantly eagerly repainting those offscreen regions, similar to:

https://code.google.com/p/chromium/issues/detail?id=244045

I suspect we should consider some sort of optimization whereby if an offscreen region has repainted more than N times in the last M seconds, we should throttle it to a slower repaint time instead of 16ms.

Honestly if the user isn't interacting with the page, we should consider just stopping repainting offscreen tiles all together, no?

Nat Duca

unread,

Aug 6, 2013, 4:13:58 PM8/6/13

to Eric Seidel, Patrick Meenan, Tom Hudson, blink-dev, Chris Bentzel, Tony Gentilcore, William Chan (陈智昌), Ravi Mistry, Annie Sullivan, Alec Flett

I think we're hoping to address this using

https://code.google.com/p/chromium/issues/detail?id=254320 covers the rasterization part of this, and https://code.google.com/p/chromium/issues/detail?id=264985 covers the painting part of this.

Takers needed for the '85 bug. :)

Tony Gentilcore

unread,

Aug 15, 2013, 12:26:45 AM8/15/13

to Nat Duca, Eric Seidel, Patrick Meenan, Tom Hudson, blink-dev, Chris Bentzel, William Chan (陈智昌), Ravi Mistry, Annie Sullivan, Alec Flett

Eric asked me to run Pat's Windows and Android results through my script to find the top N slowest sites in each category.

Android top 25k: https://docs.google.com/a/chromium.org/document/d/1pq4288zv1AFDAA-atcArSvWy4rE0znPVVLgZRVhgDrY/edit#

Windows GPU top 25k: https://docs.google.com/a/chromium.org/document/d/12LRHurdJSmEhCo00s7MmSOgBgifotane4CIOWw0H81Q/edit#

Windows VM top million: https://docs.google.com/a/chromium.org/document/d/1t6B4bPMEwMqMC9IHullQoi-uqXEZ05I-SNXZlKWIcFQ/edit#

Some thoughts on this data:

* As often quoted, Android is about an order of magnitude slower than desktop. This makes CPU optimization much more important to page load time on Android than it is on desktop.

* "Program" is unaccounted for main thread time. It is much higher in Pat's results (#2 on windows GPU & android, #1 on windows VM) than my linux results (#4). We should definitely profile some slow sites in this category and understand where all this time is going.

* Do we really understand what is up with the massive Rasterize time on Android?

-Tony

Tom Hudson

unread,

Aug 15, 2013, 5:08:32 AM8/15/13

to Tony Gentilcore, Nat Duca, Eric Seidel, Patrick Meenan, blink-dev, Chris Bentzel, William Chan (陈智昌), Ravi Mistry, Annie Sullivan, Alec Flett

Nat said above that the Rasterize and DecodeImage time on Android would show up in these stats, even though they were on another thread. So we see:

raster worker thread: 36.6%

main thread: ~ 60%

compositor thread: not included in these statistics

That's consistent with stereotypical Android traces. Where there's v8 or relayouts or text autoscaling, the renderer main thread is very very busy. The raster worker thread is bursty, but conceivably busiest during page load. Compositor thread cost depends heavily on page size, layer count, and what the user is doing.

Tom

Nat Duca

unread,

Aug 15, 2013, 7:15:38 AM8/15/13

to Tom Hudson, Tony Gentilcore, Eric Seidel, Patrick Meenan, blink-dev, Chris Bentzel, William Chan (陈智昌), Ravi Mistry, Annie Sullivan, Alec Flett

Would be cool to see the andorid 25k with the tracing variant. Or at least try it out on a small subset. It might give some insights into the "program" mystery [my bet is compositor commits and other rendering related costs, but thats just 'cause I'm the eternal optimist].

Painting is painfully difficult, and also heinously slow. Getting those costs down is the focus of a ton of folks work in the GPU ecosystem, both up to this point, and for a long while to continue, I expect.

William Chan (陈智昌)

unread,

Aug 15, 2013, 3:43:50 PM8/15/13

to Patrick Meenan, blink-dev, Chris Bentzel, Tony Gentilcore, Ravi Mistry, Annie Sullivan, Alec Flett

Hey Pat, is it possible to redo a comparison of mobile for unlimited network vs a mobile network profile? I'd like to answer the question of, on mobile, percentage of PLT is CPU vs network.

Patrick Meenan

unread,

Aug 15, 2013, 4:22:10 PM8/15/13

to William Chan (陈智昌), blink-dev, Chris Bentzel, Tony Gentilcore, Ravi Mistry, Annie Sullivan, Alec Flett

Yes but it will probably be a couple of weeks. The Razr's we used for the test had an issue where the batteries swelled up from the continuous testing so we're taking care of that and then bringing them back online.

Marcus Bulach

unread,

Aug 16, 2013, 4:35:12 AM8/16/13

to Patrick Meenan, William Chan (陈智昌), blink-dev, Chris Bentzel, Tony Gentilcore, Ravi Mistry, Annie Sullivan, Alec Flett, Tom Wiltzius

+wiltzius

Wiltzius is reaching out to a few partners, apparently there's some "battery eliminators" that could help solving such problems..

Thanks,

Marcus

Tom Wiltzius

unread,

Aug 16, 2013, 12:19:56 PM8/16/13

to Marcus Bulach, Patrick Meenan, William Chan (陈智昌), blink-dev, Chris Bentzel, Tony Gentilcore, Ravi Mistry, Annie Sullivan, Alec Flett

Are these tests being run on a one-off setup, or on some sort of standard Chrome infrastructure?

We're trying to get Android bot reliability under control for the various Chrome for Android perf bots (i.e. everything that shows up on on chromeperf.appspot.com). Happy to share best practices as we discover them.

Patrick Meenan

unread,

Aug 16, 2013, 12:39:37 PM8/16/13

to Tom Wiltzius, Marcus Bulach, William Chan (陈智昌), blink-dev, Chris Bentzel, Tony Gentilcore, Ravi Mistry, Annie Sullivan, Alec Flett

These tests were run on a standard mobile test infrastructure that we use for web performance testing (not Chrome specific). I'll connect you with the guys running the test infrastructure to make sure everyone is sharing notes.

Reply all

Reply to author

Forward