> * Where would CSS parsing time fall? Is it under "Layout"?
I don't know offhand. Anyone else know?
Great ideas. Inline...
This experiment involved a little over 160 hours of page load time
On Tue, Jul 16, 2013 at 4:30 PM, William Chan (陈智昌)
<will...@chromium.org> wrote:
> I'm interested in a few things here:
> * Desktop vs mobile breakdowns.
(not counting overhead) split across 100 shards. As a rule of thumb,
mobile is about 10x slower than desktop, so that means you'd be
looking at a little over 2 months of phone CPU time to run the same
experiment.
Maybe we should just optimize these slow cases first and then run it
on the phones after it is fast ;-)
I was torn on which way to aggregate this and went with summing the
> * Doing percentage breakdowns differently. I'm worried here that the long
> tail edge cases are dominating the sampling. Am I misunderstanding how
> you're presenting these category percentages? I guess the way I'd want to
> see it is giving the same percentages on a per-site basis, and then
> averaging those percentages across all sites with each site having the same
> weight. Right now, IIUC, it's possible to have a single site that spends
> infinite time on a single paint event, and that would cause these
> percentages to be 0% for all events and 100% for paint, even if that only
> happens for a single website. If my understanding is wrong, then please
> explain how the sampling is actually working :)
times and then averaging because I thought that slower pages should be
weighted higher than faster ones. I'll redo the analysis with
averaging and then summing and post the results.
We could run this experiment pretty easily. Telemetry supports network
> * Identifying how important CPU time is in page load, versus how much time
> is spent waiting on network. Do you have this data? I suspect it's different
> between desktop and mobile.
simulation (via Web Page Replay). So we could dial in "typical" mobile
network configuration and gather page loads times under simulated
network. Then we call network time the delta between the simulated
network PLT and the instant network PLT. Even though it would account
for which operations are parallel, it still would be interesting to
weigh that percentage against the other subsystems.
Telemetry's smoothness_measurement answers the question of jankiness
> * Main thread responsiveness. I'd like to know how long critical operations
> are spent waiting in the event loop queue. Are we delaying event handlers
> due to too much main thread activity? Are we not able to paint fast enough?
> I'm willing to tolerate lower responsiveness in page load, but when a page
> is mostly loaded, I'd expect it to be jank-free. My straw main proposal here
> would be, after the load event fires, start measuring the CPU time in the
> same way you've done here and scroll through the whole page down to the end
> of the document, and measure CPU time. Track percentages on a per-site and
> aggregate, and also time distributions of individual operations, and time
> distributions of operations that exceed 16ms and thus prevent us from
> painting frames fast enough.
during scrolling. We can point that at the top 1M sites and get those
stats.
-Tony
> * Doing percentage breakdowns differently. I'm worried here that the longI was torn on which way to aggregate this and went with summing the
> tail edge cases are dominating the sampling. Am I misunderstanding how
> you're presenting these category percentages? I guess the way I'd want to
> see it is giving the same percentages on a per-site basis, and then
> averaging those percentages across all sites with each site having the same
> weight. Right now, IIUC, it's possible to have a single site that spends
> infinite time on a single paint event, and that would cause these
> percentages to be 0% for all events and 100% for paint, even if that only
> happens for a single website. If my understanding is wrong, then please
> explain how the sampling is actually working :)
times and then averaging because I thought that slower pages should be
weighted higher than faster ones. I'll redo the analysis with
averaging and then summing and post the results.If anything, weight by rank on Alexa. Some of these websites are ridiculously slow and I'm worried their stupidity is drowning out other samples.
Telemetry's smoothness_measurement answers the question of jankiness
> * Main thread responsiveness. I'd like to know how long critical operations
> are spent waiting in the event loop queue. Are we delaying event handlers
> due to too much main thread activity? Are we not able to paint fast enough?
> I'm willing to tolerate lower responsiveness in page load, but when a page
> is mostly loaded, I'd expect it to be jank-free. My straw main proposal here
> would be, after the load event fires, start measuring the CPU time in the
> same way you've done here and scroll through the whole page down to the end
> of the document, and measure CPU time. Track percentages on a per-site and
> aggregate, and also time distributions of individual operations, and time
> distributions of operations that exceed 16ms and thus prevent us from
> painting frames fast enough.
during scrolling. We can point that at the top 1M sites and get those
stats.Pardon the ignorance, but what does smoothness_measurement (smoothness_metrics.py perhaps?) measure?
I suspect it might answer the rendering questions I raised, but how about stuff like event handlers and what not? Also, are we delaying things like resource requests since we aren't issuing them from the parser thread? It'd be cool to track that delay. And maybe fix it by issuing resource requests directly from the parser thread.
> What's the difference between EvaluateScript and FunctionCall?
According to https://developers.google.com/chrome-developer-tools/docs/timeline#timeline_event_reference
Evaluate Script: A script was evaluated.
Function Call: A top-level JavaScirpt function call was made (only
appears when browser enters JavaScript engine).
On Tuesday, July 16, 2013 2:03:07 PM UTC-7, Tony Gentilcore wrote:> What's the difference between EvaluateScript and FunctionCall?
According to https://developers.google.com/chrome-developer-tools/docs/timeline#timeline_event_reference
Evaluate Script: A script was evaluated.which is called around the JS compile and run of the outer JS function (often, but not always, initialization code). Thus it is a mix of JS compile and some function call time. Includes extension content-scripts and NPObject (whatever that is) and chrome.* api initialization. It does not seem to include browser-generated event-handler scripts, but they would be insignificant for this purpose. It does not seem to count eval()/new Function().Since this category is large, it may be worthwhile separating the compile and run times.
What's all this ResourceReceivedData time?
After making the 790,318th most popular site 4 times faster
(http://crbug.com/261308), eseidel pointed out that likely no one ever
goes there.
So if you are interested in smaller gains on more popular sites,
here's a list of the 10 slowest in each category restricted to the top
1,000 sites:
https://docs.google.com/a/chromium.org/document/d/1ca_Q7xePmCRqaYnHe7vkpCmKNFNLdDXvzgtUPt9iG8w/edit#
It was surprising to me that PLT is 15% slower in the top 1,000 than
the top million (I would have guessed the other way around). Also
interesting is that in the top 1,000 Layout is the #1 category at
24.7%.
On Wed, Jul 17, 2013 at 3:55 PM, Tony Gentilcore <to...@chromium.org> wrote:
After making the 790,318th most popular site 4 times faster
(http://crbug.com/261308), eseidel pointed out that likely no one ever
goes there.I like investigating anomalies: https://www.youtube.com/watch?v=-3dw09N5_Aw. I think it helps with understanding complex systems, and even if these sites aren't the most popular, there can be a lot to learn from them.
This is really awesome.So one thing that's got me thinking is that deep-ish links are actually really important, and in many popular cases probably more important than the homepage. Think of an article page on wikipedia, a single-video page on youtube, a team page on mlb.com, a facebook news feed, or a single news story on nytimes.com - all probably more representitive of what people actually browser than these sites' respective toplevel homepages.
On Wed, Jul 17, 2013 at 7:48 PM, Alec Flett <alec...@chromium.org> wrote:
This is really awesome.So one thing that's got me thinking is that deep-ish links are actually really important, and in many popular cases probably more important than the homepage. Think of an article page on wikipedia, a single-video page on youtube, a team page on mlb.com, a facebook news feed, or a single news story on nytimes.com - all probably more representitive of what people actually browser than these sites' respective toplevel homepages.+1I just downloaded the top million sites from http://s3.amazonaws.com/alexa-static/top-1m.csv.zipLike Alec says, wikipedia.org is #7, but there are no wikipedia articles listed. Same thing for mlb.com, facebook, nytimes.com. On youtube, there are 7000 user homepages listed, but no single-video pages. We're probably missing out on a lot of great outliers by excluding articles, popular feeds, etc. I'm not sure the best way to find deep links. I checked the alexa siteinfo pages, and they list the most popular subdomains but not any deep links. I couldn't find a publicly available list of top urls or top deep links per site. I thought maybe we could just try loading a random same-domain link from each of the top 1000 sites, but I worry we'd end up with more top-level links since there are so many menus for most sites. Especially on facebook and twitter, the main url is a login page that just links to help pages, not public feeds. Anyone have ideas how we could include more deep links?
AlecOn Wed, Jul 17, 2013 at 3:55 PM, Tony Gentilcore <to...@chromium.org> wrote:
After making the 790,318th most popular site 4 times faster
(http://crbug.com/261308), eseidel pointed out that likely no one ever
goes there.
So if you are interested in smaller gains on more popular sites,
here's a list of the 10 slowest in each category restricted to the top
1,000 sites:
https://docs.google.com/a/chromium.org/document/d/1ca_Q7xePmCRqaYnHe7vkpCmKNFNLdDXvzgtUPt9iG8w/edit#
It was surprising to me that PLT is 15% slower in the top 1,000 than
the top million (I would have guessed the other way around). Also
interesting is that in the top 1,000 Layout is the #1 category at
24.7%.
--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAJB_qsn_PYaRqi-pv%3DppKn%3D%2BOfBLiT3DKgwthBmSr1_jQ7eB%3DQ%40mail.gmail.com.
Ok, this i kind of a crazy approach, but google's knowledge graph contains a list of deep links for all topics it covers - so for example given barack obama, we have all the links for him on the web:(Scroll down until you get to all the links)This stuff is available in the freebase data dumps:So you could just get a list of URLs (though you're probably talking hundreds of millions of links - the dumps are 19 Gigs compressed!) from these dumps and extract the ones whose domains are in the alexa top xxxx. this will cover certain topics like wikipedia or even nytimes topic pages, but not things like news articles.
Will Chan had an excellent idea to factor network time into the
breakdown. Ravi and I redid the experiment both with unlimited network
and with a simulated cable modem connection.
Results are here:
https://docs.google.com/a/chromium.org/document/d/1cpLSSYpqi4SprkJcVxbS7af6avKM0qc-imxvkexmCZs/edit#heading=h.7dyk54du640h
The unlimited network breakdown very closely matched the original
experiment, suggesting that our results are repeatable. The netsim
version loads pages a little over 3 times slower and attributes a
little over 2/3rds of the time to network. Another interesting
observation is that Paint times climb up to the 3rd highest CPU user
under network simulation. I theorize this is because with the slower
page loads we end up painting incrementally a lot more.
This is especially interesting because I've had several conversations where folks were claiming recording the SkPicture was "free" by comparison to the raster. It seems we should be focusing on the performance of paint after all.
- E
--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAO9Q3iLheiJn800rvTYGaPpgcryixOTfYbyKjiVk5oeoDw9SHQ%40mail.gmail.com.
You guys rock. I concur with your hypothesis on the increased paint times. I can't wait to get mobile results too. I'm uncomfortably excited about all this work and look forward to checking back on this email thread from a beach in Kauai.
I have to say I'm a little surprised at the low percentage of network time, although maybe it's not too surprising since we used a simulated cable modem. If it would not be much pain, I'd also like a network simulation of our long tail users with "slow" connections, say DSL or even slower. It would also help confirm the hypothesis about extra CPU time from more incremental paints. I am concerned that we might have suboptimal rendering behavior that makes page load time for our slow network users even slower, due to extra CPU operations.
Excuse the brevity. Sent from my iAndroid.
After investigating some of the top Layout sites this evening, it
seems sites with slow layout are almost entirely dominated by
freetype/skia/libfontconfig font/glyph loading times on Linux:
https://code.google.com/p/chromium/issues/detail?id=266214
Android or Windows times would give us a better sense of what our
users are seeing.
After investigating some of the top Layout sites this evening, it
seems sites with slow layout are almost entirely dominated by
freetype/skia/libfontconfig font/glyph loading times on Linux:
https://code.google.com/p/chromium/issues/detail?id=266214
Android or Windows times would give us a better sense of what our
users are seeing.
You mentioned that it would take months to run the same experiment on
Android, but if it were possible to profile the top 1000 sites (or
even top 100) sites on Android that would be very interesting to see.
(773488 results) | Including Idle | No Idle |
Idle | 80.17% | |
Program | 5.13% | 25.85% |
EvaluateScript | 4.53% | 22.85% |
FunctionCall | 2.32% | 11.72% |
Layout | 1.93% | 9.75% |
ParseHTML | 1.58% | 7.98% |
Paint | 1.55% | 7.81% |
ResourceReceivedData | 0.99% | 5.01% |
RecalculateStyles | 0.59% | 2.96% |
GCEvent | 0.50% | 2.53% |
DecodeImage | 0.48% | 2.44% |
ResizeImage | 0.10% | 0.49% |
TimerFire | 0.07% | 0.33% |
EventDispatch | 0.02% | 0.12% |
ResourceReceiveResponse | 0.01% | 0.05% |
ScrollLayer | 0.01% | 0.05% |
XHRReadyStateChange | 0.01% | 0.04% |
FireAnimationFrame | 0.00% | 0.00% |
XHRLoad | 0.00% | 0.00% |
(26497 results) | Including Idle | No Idle |
Idle | 84.12% | |
EvaluateScript | 2.82% | 17.75% |
Program | 2.64% | 16.60% |
Paint | 2.23% | 14.02% |
FunctionCall | 1.82% | 11.49% |
Layout | 1.58% | 9.92% |
CompositeLayers | 0.83% | 5.21% |
ParseHTML | 0.74% | 4.69% |
ScrollLayer | 0.65% | 4.11% |
ResourceReceivedData | 0.52% | 3.25% |
RecalculateStyles | 0.50% | 3.14% |
DecodeImage | 0.49% | 3.06% |
GCEvent | 0.46% | 2.90% |
ResizeImage | 0.27% | 1.69% |
FireAnimationFrame | 0.08% | 0.53% |
PaintSetup | 0.08% | 0.49% |
TimerFire | 0.07% | 0.45% |
EventDispatch | 0.04% | 0.23% |
XHRReadyStateChange | 0.03% | 0.22% |
ResourceReceiveResponse | 0.02% | 0.14% |
XHRLoad | 0.01% | 0.09% |
(23362 Results) | Including Idle | No Idle |
Idle | 46.17% | |
Rasterize | 17.73% | 32.94% |
EvaluateScript | 6.38% | 11.86% |
Program | 6.25% | 11.62% |
FunctionCall | 5.17% | 9.61% |
Layout | 3.27% | 6.07% |
ResourceReceivedData | 2.94% | 5.47% |
ScrollLayer | 2.74% | 5.09% |
Paint | 2.01% | 3.73% |
ParseHTML | 1.86% | 3.46% |
GCEvent | 1.44% | 2.67% |
RecalculateStyles | 1.25% | 2.33% |
CompositeLayers | 0.94% | 1.76% |
TimerFire | 0.88% | 1.63% |
DecodeImage | 0.55% | 1.02% |
FireAnimationFrame | 0.24% | 0.44% |
EventDispatch | 0.08% | 0.16% |
XHRReadyStateChange | 0.05% | 0.09% |
ResourceReceiveResponse | 0.02% | 0.03% |
XHRLoad | 0.02% | 0.03% |
ResourceReceivedData | 2.94% | 5.47% |
I added support to WebPagetest to do a main-thread breakdown if a timeline is captured. In addition to the individual component times I also expose an "Idle" time which is the time between the start of the first event and end of the last event that isn't accounted for by the individual timeline events (in theory that should be the time that Chrome spends waiting for stuff - mostly from the network).I ran the same URL list on Windows VM's (no GPU) as well as a physical machine with a GPU and on some Motorola Razr's with Android 4.1.2. The VM's ran the full list and we have data from the top 25k or so on the GPU and Android. There are raw CSV's with all of the results here and each has a link to the WebPagetest test and full timeline data for the test (the mobile links will only work for Googlers but I can push results to the public instance as needed - the VM and GPU tests should all be fully available).
Mobile (Motorola Razr, Android 4.1.2, Chrome 28.0.1500.45).
(23362 Results) Including Idle No Idle Idle 46.17% Rasterize 17.73% 32.94% EvaluateScript 6.38% 11.86% Program 6.25% 11.62% FunctionCall 5.17% 9.61% Layout 3.27% 6.07%