My three laptops have relatively comparable hardware and run Chrome on Windows, Mac, and Linux respectively. The Linux version of Chrome feels ridiculously faster than Windows and Mac. Do we understand why this is? Can we make Windows and Mac feel that fast too?
General observations:
1) Scroll performance is extremely good. Even on Gmail, I can only get the mouse to lead the scroll bar by a dozen pixels. On Slashdot, it doesn't even look like I can do that.
2) Tab creation is very fast. Maybe the zygote is helping here? Can we pre-render the NTP on other platforms?
What version of Windows are you using? I find the double-buffering on Vista
and Win7 to have a big negative impact on performance as compared to WinXP.
I'm always delighted to run Chrome on my old WinXP laptop. It seems so
much faster there.
On X-windows, the renderer backingstores are managed by the X server, and
the transport DIBs are also managed by the X server. So, we avoid a lot of
memcpy costs incurred on Windows due to keeping the backingstores in main
memory there.
I suspect this is at least one of the bigger issues.
I also suspect that process creation is a problem on Windows. We should
probably look into having a spare child process on Windows to minimize new
tab jank. Maybe there is a bug on this already?
On Tue, Oct 27, 2009 at 9:11 PM, Adam Barth <aba...@chromium.org> wrote:
> My three laptops have relatively comparable hardware and run Chrome on
> Windows, Mac, and Linux respectively. The Linux version of Chrome
> feels ridiculously faster than Windows and Mac. Do we understand why
> this is? Can we make Windows and Mac feel that fast too?
> General observations:
> 1) Scroll performance is extremely good. Even on Gmail, I can only
> get the mouse to lead the scroll bar by a dozen pixels. On Slashdot,
> it doesn't even look like I can do that.
> 2) Tab creation is very fast. Maybe the zygote is helping here? Can
> we pre-render the NTP on other platforms?
On Tue, Oct 27, 2009 at 10:27 PM, Darin Fisher <da...@chromium.org> wrote: > What version of Windows are you using? I find the double-buffering on Vista > and Win7 to have a big negative impact on performance as compared to WinXP. > I'm always delighted to run Chrome on my old WinXP laptop. It seems so > much faster there.
I lied. I actually have four laptops. So both Vista and XP. However, the Vista one has worse specs so I wasn't counting it.
> On X-windows, the renderer backingstores are managed by the X server, and > the transport DIBs are also managed by the X server. So, we avoid a lot of > memcpy costs incurred on Windows due to keeping the backingstores in main > memory there.
We don't draw into a device dependent bitmap on Windows? Is that not similar? I was wondering if core IPC latency was lower on Linux. That number bleeds into a lot of other times.
> I suspect this is at least one of the bigger issues. > I also suspect that process creation is a problem on Windows. We should > probably look into having a spare child process on Windows to minimize new > tab jank. Maybe there is a bug on this already?
If we're not doing that already, that seems like it might be a big win.
On Tue, Oct 27, 2009 at 10:35 PM, Adam Barth <aba...@chromium.org> wrote: > On Tue, Oct 27, 2009 at 10:27 PM, Darin Fisher <da...@chromium.org> wrote: > > What version of Windows are you using? I find the double-buffering on > Vista > > and Win7 to have a big negative impact on performance as compared to > WinXP. > > I'm always delighted to run Chrome on my old WinXP laptop. It seems so > > much faster there.
> I lied. I actually have four laptops. So both Vista and XP. > However, the Vista one has worse specs so I wasn't counting it.
> > On X-windows, the renderer backingstores are managed by the X server, and > > the transport DIBs are also managed by the X server. So, we avoid a lot > of > > memcpy costs incurred on Windows due to keeping the backingstores in main > > memory there.
> We don't draw into a device dependent bitmap on Windows? Is that not > similar? I was wondering if core IPC latency was lower on Linux. > That number bleeds into a lot of other times.
We do not. We once did, but DDBs are a very limited resource on Windows. They get charged against the desktop process, and if you exceed the seemingly artificial cap, then the system will start having serious problems. New apps will fail to run properly. No remote desktop for you, etc.
So, we switched away from DDBs and just use DIBs. (We use a pixel depth to match your display--kind of.)
> > I suspect this is at least one of the bigger issues. > > I also suspect that process creation is a problem on Windows. We should > > probably look into having a spare child process on Windows to minimize > new > > tab jank. Maybe there is a bug on this already?
> If we're not doing that already, that seems like it might be a big win.
We are most definitely not doing that yet. We could also just move process creation to a background thread. An unused process might just get swapped out and be no cheaper to "make live" than it would be to create a new process.
On Tue, Oct 27, 2009 at 10:40 PM, Darin Fisher <da...@chromium.org> wrote:
> On Tue, Oct 27, 2009 at 10:35 PM, Adam Barth <aba...@chromium.org> wrote:
>> On Tue, Oct 27, 2009 at 10:27 PM, Darin Fisher <da...@chromium.org>
>> wrote:
>> > What version of Windows are you using? I find the double-buffering on
>> Vista
>> > and Win7 to have a big negative impact on performance as compared to
>> WinXP.
>> > I'm always delighted to run Chrome on my old WinXP laptop. It seems so
>> > much faster there.
>> I lied. I actually have four laptops. So both Vista and XP.
>> However, the Vista one has worse specs so I wasn't counting it.
>> > On X-windows, the renderer backingstores are managed by the X server,
>> and
>> > the transport DIBs are also managed by the X server. So, we avoid a lot
>> of
>> > memcpy costs incurred on Windows due to keeping the backingstores in
>> main
>> > memory there.
>> We don't draw into a device dependent bitmap on Windows? Is that not
>> similar? I was wondering if core IPC latency was lower on Linux.
>> That number bleeds into a lot of other times.
> We do not. We once did, but DDBs are a very limited resource on Windows.
> They get charged against the desktop process, and if you exceed the
> seemingly artificial cap, then the system will start having serious
> problems. New apps will fail to run properly. No remote desktop for you,
> etc.
> So, we switched away from DDBs and just use DIBs. (We use a pixel depth to
> match your display--kind of.)
>> > I suspect this is at least one of the bigger issues.
>> > I also suspect that process creation is a problem on Windows. We should
>> > probably look into having a spare child process on Windows to minimize
>> new
>> > tab jank. Maybe there is a bug on this already?
>> If we're not doing that already, that seems like it might be a big win.
> We are most definitely not doing that yet. We could also just move process
> creation to a background thread. An unused process might just get swapped
> out and be no cheaper to "make live" than it would be to create a new
> process.
When we were all out in MtnView last, one of the action items for some
of the Mac QA folks was to get a machine that triple-boots
(Mac/Win/Linux) so that we could run the same version of chrome on the
same hardware and see the differences between platforms and then to
run a bunch of tests (startup, new tab, page-cycler, etc). I'm pretty
sure krisr got the machine created, but I don't think we ever ran any
tests on it beyond that.
Anyone know what happened our best laid plans? This seems like
something we should be very active in tracking.
On Wed, Oct 28, 2009 at 12:11 AM, Adam Barth <aba...@chromium.org> wrote:
> My three laptops have relatively comparable hardware and run Chrome on
> Windows, Mac, and Linux respectively. The Linux version of Chrome
> feels ridiculously faster than Windows and Mac. Do we understand why
> this is? Can we make Windows and Mac feel that fast too?
> General observations:
> 1) Scroll performance is extremely good. Even on Gmail, I can only
> get the mouse to lead the scroll bar by a dozen pixels. On Slashdot,
> it doesn't even look like I can do that.
> 2) Tab creation is very fast. Maybe the zygote is helping here? Can
> we pre-render the NTP on other platforms?
Darin Fisher wrote: > I suspect this is at least one of the bigger issues. > I also suspect that process creation is a problem on Windows. We should > probably look into having a spare child process on Windows to minimize new > tab jank. Maybe there is a bug on this already?
This shouldn't be restricted to Windows, we should do it on all platforms. And we should start the first one as early as possible during the startup process.
When I benchmarked this a few months ago on a fairly ordinary Mac, it took nearly 100ms from the time that the browser started a renderer to the time that the renderer was ready to service requests. A decent chunk of that is load time and pre-main initialization in system libraries. It's beyond our control, but there's no reason we can't make it happen sooner.
On Tue, Oct 27, 2009 at 9:11 PM, Adam Barth <aba...@chromium.org> wrote: > My three laptops have relatively comparable hardware and run Chrome on > Windows, Mac, and Linux respectively. The Linux version of Chrome > feels ridiculously faster than Windows and Mac. Do we understand why > this is? Can we make Windows and Mac feel that fast too?
My first instinct is to say because (1) we're awesome and (2) Linux is awesome, but I'd prefer to have facts back it up. :)
There's a "perf" link on http://build.chromium.org that has builders tracking various metrics. If we get perf tests for the behaviors you care about, we can better compare and improve them.
On the other hand, I'm not sure if the hardware lines up between platforms so maybe the comparisons I do below are not valid...
> General observations:
General comments: Linux tends to be "lighter" which means it does better on older hardware, so depending on what sorts of laptops you're talking about that could be a major factor. Windowses later than 2000 or so need surprising amounts of hardware to run well. (I don't mention Mac below because there hasn't been much performance work there yet.)
> 1) Scroll performance is extremely good. Even on Gmail, I can only > get the mouse to lead the scroll bar by a dozen pixels. On Slashdot, > it doesn't even look like I can do that.
On "plain" pages (one scrollbar on the right, no Flash) scrolling is literally shifting the pixels down. On Linux we do this by sending a command to the X server, which is a single process that even has the graphics drivers built in so it talks directly to your graphics card and can in theory do a hardware-accelerated copy. I would expect this to be pretty fast.
However, Gmail is a "complicated" page (the main scrollbar is an iframe) so in that case I guess rendering speed is getting involved. There I'd expect Windows Chrome to be faster because the compiler is better and there have been more people looking at performance (I saw in another thread that tcmalloc, currently only used on Windows, improved the page cycler by 50%?).
The page cycler perf graphs are intended to test rendering speed. Do the numbers match your perception? I can't get the right graphs to load right now. It looks like spew from NOTIMPLEMENTED()s may be obscuring the data.
> 2) Tab creation is very fast. Maybe the zygote is helping here? Can > we pre-render the NTP on other platforms?
The zygote is paused right at process start, before we've even started a renderer. On the other hand Windows process creation is more expensive.
There is a "new tab" graph that attempts to measure this. The various lines on the graph are tracking how quickly we get to each stage in constructing the page. We hit the first line 20ms faster on Linux than Windows likely due to the zygote and "slow" Windows process creation, but process startup seems to be a relatively small part of the total time. Linux hits other lines later and Linux and Windows hit the finish line at around the same time.
In your case, I wonder if you have more history accumulated on your Windows profile, making the new tab computation more expensive than the equivalent one on the Linux box.
I'd expect the faster file system on Linux to eventually be help here. (My experience with git has been you get an order of magnitude slower each step from Linux->Mac->Windows, but that could be git or hardware-specific.)
> 3) Startup time is faster than calculator.
I'm not sure if you're kidding. Do you mean Windows calculator? Maybe there's something wrong with your Windows box -- maybe a virus scanner or disk indexer or some other crap procmon will show is continually thrashing your computer. Or maybe you have a spare Chrome instance on another virtual desktop on your Linux box so clicking the Chrome button is just telling it to show another window.
The startup tests are intended to track startup performance, and again the Windows graphs are much better than the Linux ones. However, the difference between the two is milliseconds and my experience as a user is that Chrome rarely starts that fast, so I wonder if these graphs are really measuring what a user perceives (which frequently involves disk).
In the limit, I'd expect us to pay a lot more on Linux due to using more libraries, GTK initialization, round trips to the X server, etc. but I don't know much about Windows here.
> My three laptops have relatively comparable hardware and run Chrome on
> Windows, Mac, and Linux respectively. The Linux version of Chrome
> feels ridiculously faster than Windows and Mac. Do we understand why
> this is? Can we make Windows and Mac feel that fast too?
> General observations:
> 1) Scroll performance is extremely good. Even on Gmail, I can only
> get the mouse to lead the scroll bar by a dozen pixels. On Slashdot,
> it doesn't even look like I can do that.
> 2) Tab creation is very fast. Maybe the zygote is helping here? Can
> we pre-render the NTP on other platforms?
On Wed, Oct 28, 2009 at 11:05 AM, Evan Martin <e...@chromium.org> wrote:
> On Tue, Oct 27, 2009 at 9:11 PM, Adam Barth <aba...@chromium.org> wrote:
>> My three laptops have relatively comparable hardware and run Chrome on
>> Windows, Mac, and Linux respectively. The Linux version of Chrome
>> feels ridiculously faster than Windows and Mac. Do we understand why
>> this is? Can we make Windows and Mac feel that fast too?
> My first instinct is to say because (1) we're awesome and (2) Linux is
> awesome, but I'd prefer to have facts back it up. :)
> There's a "perf" link on http://build.chromium.org that has builders
> tracking various metrics. If we get perf tests for the behaviors you
> care about, we can better compare and improve them.
> On the other hand, I'm not sure if the hardware lines up between
> platforms so maybe the comparisons I do below are not valid...
>> General observations:
> General comments: Linux tends to be "lighter" which means it does
> better on older hardware, so depending on what sorts of laptops you're
> talking about that could be a major factor. Windowses later than 2000
> or so need surprising amounts of hardware to run well. (I don't
> mention Mac below because there hasn't been much performance work
> there yet.)
>> 1) Scroll performance is extremely good. Even on Gmail, I can only
>> get the mouse to lead the scroll bar by a dozen pixels. On Slashdot,
>> it doesn't even look like I can do that.
> On "plain" pages (one scrollbar on the right, no Flash) scrolling is
> literally shifting the pixels down. On Linux we do this by sending a
> command to the X server, which is a single process that even has the
> graphics drivers built in so it talks directly to your graphics card
> and can in theory do a hardware-accelerated copy. I would expect this
> to be pretty fast.
> However, Gmail is a "complicated" page (the main scrollbar is an
> iframe) so in that case I guess rendering speed is getting involved.
> There I'd expect Windows Chrome to be faster because the compiler is
> better and there have been more people looking at performance (I saw
> in another thread that tcmalloc, currently only used on Windows,
> improved the page cycler by 50%?).
> The page cycler perf graphs are intended to test rendering speed. Do
> the numbers match your perception? I can't get the right graphs to
> load right now. It looks like spew from NOTIMPLEMENTED()s may be
> obscuring the data.
>> 2) Tab creation is very fast. Maybe the zygote is helping here? Can
>> we pre-render the NTP on other platforms?
> The zygote is paused right at process start, before we've even started
> a renderer. On the other hand Windows process creation is more
> expensive.
> There is a "new tab" graph that attempts to measure this. The various
> lines on the graph are tracking how quickly we get to each stage in
> constructing the page. We hit the first line 20ms faster on Linux
> than Windows likely due to the zygote and "slow" Windows process
> creation, but process startup seems to be a relatively small part of
> the total time. Linux hits other lines later and Linux and Windows
> hit the finish line at around the same time.
> In your case, I wonder if you have more history accumulated on your
> Windows profile, making the new tab computation more expensive than
> the equivalent one on the Linux box.
> I'd expect the faster file system on Linux to eventually be help here.
> (My experience with git has been you get an order of magnitude slower
> each step from Linux->Mac->Windows, but that could be git or
> hardware-specific.)
>> 3) Startup time is faster than calculator.
> I'm not sure if you're kidding. Do you mean Windows calculator?
> Maybe there's something wrong with your Windows box -- maybe a virus
> scanner or disk indexer or some other crap procmon will show is
> continually thrashing your computer. Or maybe you have a spare Chrome
> instance on another virtual desktop on your Linux box so clicking the
> Chrome button is just telling it to show another window.
> The startup tests are intended to track startup performance, and again
> the Windows graphs are much better than the Linux ones. However, the
> difference between the two is milliseconds and my experience as a user
> is that Chrome rarely starts that fast, so I wonder if these graphs
> are really measuring what a user perceives (which frequently involves
> disk).
> In the limit, I'd expect us to pay a lot more on Linux due to using
> more libraries, GTK initialization, round trips to the X server, etc.
> but I don't know much about Windows here.
On Wed, Oct 28, 2009 at 9:07 AM, Dan Kegel <d...@kegel.com> wrote: > On Wed, Oct 28, 2009 at 8:05 AM, Evan Martin <e...@chromium.org> wrote: >>> 3) Startup time is faster than calculator.
>> I'm not sure if you're kidding. Do you mean Windows calculator?
> On my home linux box (Jaunty, reasonably fast), > warm startup time of chrome is less > than the warm startup time of gnome calculator. > Strange but true.
Yeah, that's why I was asking -- it wouldn't surprise me if we were faster than the GNOME one. Without any numbers I'd blame the icon theme stuff (Elliot found it can do a truly ludicrous number of disk accesses on startup).
On Wed, Oct 28, 2009 at 9:20 AM, Evan Martin <e...@chromium.org> wrote: >>>> 3) Startup time is faster than calculator.
>>> I'm not sure if you're kidding. Do you mean Windows calculator?
>> On my home linux box (Jaunty, reasonably fast), >> warm startup time of chrome is less >> than the warm startup time of gnome calculator. >> Strange but true.
> Yeah, that's why I was asking -- it wouldn't surprise me if we were > faster than the GNOME one. Without any numbers I'd blame the icon > theme stuff (Elliot found it can do a truly ludicrous number of disk > accesses on startup).
Here's an experiment: Set Options > Personal Stuff > Use GTK+ Theme. Close chrome. Now try another warm start of chrome. Is it still faster than gnome calculator? If not, it's because you were using the classic chrome theme, which doesn't have to warm up GTK's *per process* icon cache. strace gnome-calculator and watch the output for reading directories and files with "icons" in them and be blown away by the amount of work done.
> When I benchmarked this a few months ago on a fairly ordinary Mac, it > took nearly 100ms from the time that the browser started a renderer to > the time that the renderer was ready to service requests. A decent > chunk of that is load time and pre-main initialization in system > libraries. It's beyond our control, but there's no reason we can't > make it happen sooner.
Unfortunately, it's nearly impossible to continue a forked process on OS X if it uses any higher-level (above POSIX) APIs. The main problem is that Mach ports can't be replicated across the fork, so if any ports were already open, they'll all be bogus in the new process. And all kinds of stuff in the OS is done via IPC across Mach ports, most significantly to the window server.
It might be possible to create a forkable renderer by doing as much setup as possible without actually invoking any OS X-specific APIs, then initializing the rest after the fork. I don't know if this has ever been tried, or if it would provide sufficient improvement to be worth the effort.
I would expect that rendering speed would suffer somewhat due to the extra layer of pixel buffering incurred by Chrome's renderers. Has anyone experimented with giving the renderer access to a child window of the browser to allow it to draw more directly?
On Wed, Oct 28, 2009 at 1:39 PM, Jens Alfke <s...@google.com> wrote:
> Unfortunately, it's nearly impossible to continue a forked process on > OS X if it uses any higher-level (above POSIX) APIs.
Nothing says we have to use fork(). Always having a renderer process started and waiting for instructions could also be done via other mechanisms. The same issue affects plugin startup time (e.g., Flash, i.e., YouTube :-)).
I would expect that rendering speed would suffer somewhat due to the
> extra layer of pixel buffering incurred by Chrome's renderers. Has > anyone experimented with giving the renderer access to a child window > of the browser to allow it to draw more directly?
There are no public APIs for cross-process child window rendering or grouping. 10.6 introduces IOSurface, which is roughly speaking a shared GPU texture, which could be useful once the renderer is GPU accelerated.
We could of course use private APIs, but it would be nice to avoid that at least for 10.6 and above.
Jens Alfke wrote: > Unfortunately, it's nearly impossible to continue a forked process on OS X > if it uses any higher-level (above POSIX) APIs. The main problem is that > Mach ports can't be replicated across the fork, so if any ports were already > open, they'll all be bogus in the new process. And all kinds of stuff in the > OS is done via IPC across Mach ports, most significantly to the window > server.
Sure, we understand that. Why does that become a concern with pre-warmed renderers in a way that it's not with the renderers we're using now?
My proposal is to fork a new process, exec the renderer, and then let it bring itself up. That's exactly how we start renderers now. The only difference is that I'm suggesting we should always keep a spare one warmed up and ready to go, and we should start the initial one sooner instead of waiting for something in the browser to say "um, I'm gonna need a renderer." We can pretty much guarantee that we'll always need a renderer, let's give it a head start.
I don't want to just pre-fork a process and have it sit around with its thumb up its Mach port. That wouldn't really gain us much on the Mac anyway, because our fork is relatively cheap. As I mentioned, the big losses that we experience in bringing up a new renderer process are in loading and initialization.
On Oct 28, 2009, at 11:08 AM, Mark Mentovai wrote:
> My proposal is to fork a new process, exec the renderer, and then let > it bring itself up. That's exactly how we start renderers now. The > only difference is that I'm suggesting we should always keep a spare > one warmed up and ready to go,
How much would that increase memory use? (says the guy on the Memory task force...) I.e. what's the RPRVT of a warmed-up renderer process?
> and we should start the initial one > sooner instead of waiting for something in the browser to say "um, I'm > gonna need a renderer." We can pretty much guarantee that we'll > always need a renderer, let's give it a head start.
If bringing up the first renderer is CPU-bound, that would be a great idea. If it's disk-bound, then it could have a negative effect on launch time. Do we have profiles/samples of renderer launch, both warm and cold?
Jens Alfke wrote: > How much would that increase memory use? (says the guy on the Memory task > force...) I.e. what's the RPRVT of a warmed-up renderer process?
Does it matter? At least for the startup case, that's a renderer we know we'll need anyway.
You could use this argument to shoot down keeping a spare warmed-up renderer ready to go at other times, but I don't think it's relevant to the startup case.
> If bringing up the first renderer is CPU-bound, that would be a great idea. > If it's disk-bound, then it could have a negative effect on launch time. Do > we have profiles/samples of renderer launch, both warm and cold?
I suspect that most (but not all) of the stuff that the renderer needs to read to warm itself up will already be in the buffer cache. The renderer shouldn't be doing much writing. But we could profile it.
On Wed, Oct 28, 2009 at 8:05 AM, Evan Martin <e...@chromium.org> wrote: > General comments: Linux tends to be "lighter" which means it does > better on older hardware, so depending on what sorts of laptops you're > talking about that could be a major factor. Windowses later than 2000 > or so need surprising amounts of hardware to run well. (I don't > mention Mac below because there hasn't been much performance work > there yet.)
I pulled out the laptops side-by-side to be more scientific about this. Here are the stats:
So, the Linux machine as 20% more CPU to work with.
>> 1) Scroll performance is extremely good. Even on Gmail, I can only >> get the mouse to lead the scroll bar by a dozen pixels. On Slashdot, >> it doesn't even look like I can do that.
> On "plain" pages (one scrollbar on the right, no Flash) scrolling is > literally shifting the pixels down. On Linux we do this by sending a > command to the X server, which is a single process that even has the > graphics drivers built in so it talks directly to your graphics card > and can in theory do a hardware-accelerated copy. I would expect this > to be pretty fast.
Looking at this more carefully, scroll performance on Slashdot is great in both Windows and Linux. On Gmail (no chat mole), there's a noticeable difference. Here's a visualization of the numb on the scroll bar:
|| || || || || || -- <-- Click here and pull down -- -- <-- Linux: mouse latency gets to here || || <-- Windows: mouse latency gets to here || || || ||
Admittedly, it's hard to see precisely, but it affects the "feel." Scroll on Windows feels slightly heavier.
>> 2) Tab creation is very fast. Maybe the zygote is helping here? Can >> we pre-render the NTP on other platforms?
> The zygote is paused right at process start, before we've even started > a renderer. On the other hand Windows process creation is more > expensive.
> There is a "new tab" graph that attempts to measure this. The various > lines on the graph are tracking how quickly we get to each stage in > constructing the page. We hit the first line 20ms faster on Linux > than Windows likely due to the zygote and "slow" Windows process > creation, but process startup seems to be a relatively small part of > the total time. Linux hits other lines later and Linux and Windows > hit the finish line at around the same time.
So, I retried this with a fresh profile on both. The differences are not as dramatic as I remember. I can't actually see a difference when I run them side-by-side.
>> 3) Startup time is faster than calculator.
> I'm not sure if you're kidding. Do you mean Windows calculator?
I meant Linux calculator.
> In the limit, I'd expect us to pay a lot more on Linux due to using > more libraries, GTK initialization, round trips to the X server, etc. > but I don't know much about Windows here.
I tried turning on the GTK theme. That killed startup performance.
Side-by-side startup easily noticeably faster in Linux. To be more precise, drawing the first pixel is noticeably faster. Total startup time is harder to say.
Interestingly startup drawing is different between Windows and Linux. We might want to film with a high-speed camera to see exactly what's going on, but here are my impressions:
Linux draw order: 1) Fill entire window with blue (This looks bad, can we use a different color? White?). 2) Paint main UI widgets, including NTP. 3) Paint NTP thumbnails. I bet that (2) is actually broken in to two pieces, I just can't see it.
Window draw order: 1) Paint NC region (the blue border around the edge). 2) Paint main UI widgets (without omnibox). 3) Blit NTP content area (the sweep from top to bottom is noticeable). 4) Paint omnibox. 5) Paint NTP thumbnails.
Keep in mind that this all happens very fast, so I could be imagining things.
Ideas for improving perceived windows startup time:
1) Draw a fake omnibox with the rest of the main UI widgets. Currently we draw the omnibox really late and it looks slow and bad. You can see this if you have a dark desktop wallpaper and you focus on where the omnibox will be. You'll see a dark rectangle inside the main toolbar which is the desktop showing through. We should never show a dark rectangle there.
2) Fill the main content area with white when drawing the main UI widgets. You can see this if you focus on the bottom of where the bookmark bar is going to be (especially when the bookmark bar is set to show only on the NTP). You'll see an edge there when the bookmark bar is draw by while the main content area is still transparent. There's no reason we should ever paint an edge there.
I bet the reason Windows startup feels slower is whatever drawing operation we're using for the main content area is slow. The top-to-bottom sweep probably makes me feel like the browser isn't loaded until the sweep reaches the bottom, whereas I feel like Linux is done earlier in its startup sequence.
On Oct 28, 2009, at 11:29 AM, Mark Mentovai wrote:
> You could use this argument to shoot down keeping a spare warmed-up > renderer ready to go at other times, but I don't think it's relevant > to the startup case.
We weren't just talking about startup — f'rinstance, Darin mentioned "new-tab jank".
> I suspect that most (but not all) of the stuff that the renderer needs > to read to warm itself up will already be in the buffer cache.
Not on a cold launch, since the renderer uses a lot of code (like WebCore) that the browser doesn't, and will be paging that stuff in. We'll need to benchmark both scenarios.
On Wed, Oct 28, 2009 at 3:12 PM, Jens Alfke <s...@google.com> wrote: > Not on a cold launch, since the renderer uses a lot of code (like > WebCore) that the browser doesn't, and will be paging that stuff in. > We'll need to benchmark both scenarios.
Indeed. Proof of concept code that we can compare hard numbers about is always better than speculation :-).
On Wed, Oct 28, 2009 at 12:05 PM, Adam Barth <aba...@chromium.org> wrote:
> On Wed, Oct 28, 2009 at 8:05 AM, Evan Martin <e...@chromium.org> wrote: > > General comments: Linux tends to be "lighter" which means it does > > better on older hardware, so depending on what sorts of laptops you're > > talking about that could be a major factor. Windowses later than 2000 > > or so need surprising amounts of hardware to run well. (I don't > > mention Mac below because there hasn't been much performance work > > there yet.)
> I pulled out the laptops side-by-side to be more scientific about > this. Here are the stats:
> So, the Linux machine as 20% more CPU to work with.
> >> 1) Scroll performance is extremely good. Even on Gmail, I can only > >> get the mouse to lead the scroll bar by a dozen pixels. On Slashdot, > >> it doesn't even look like I can do that.
> > On "plain" pages (one scrollbar on the right, no Flash) scrolling is > > literally shifting the pixels down. On Linux we do this by sending a > > command to the X server, which is a single process that even has the > > graphics drivers built in so it talks directly to your graphics card > > and can in theory do a hardware-accelerated copy. I would expect this > > to be pretty fast.
> Looking at this more carefully, scroll performance on Slashdot is > great in both Windows and Linux. On Gmail (no chat mole), there's a > noticeable difference. Here's a visualization of the numb on the > scroll bar:
> || > || > || > || > || > || > -- <-- Click here and pull down > -- > -- <-- Linux: mouse latency gets to here > || > || <-- Windows: mouse latency gets to here > || > || > || > ||
> Admittedly, it's hard to see precisely, but it affects the "feel." > Scroll on Windows feels slightly heavier.
> >> 2) Tab creation is very fast. Maybe the zygote is helping here? Can > >> we pre-render the NTP on other platforms?
> > The zygote is paused right at process start, before we've even started > > a renderer. On the other hand Windows process creation is more > > expensive.
> > There is a "new tab" graph that attempts to measure this. The various > > lines on the graph are tracking how quickly we get to each stage in > > constructing the page. We hit the first line 20ms faster on Linux > > than Windows likely due to the zygote and "slow" Windows process > > creation, but process startup seems to be a relatively small part of > > the total time. Linux hits other lines later and Linux and Windows > > hit the finish line at around the same time.
> So, I retried this with a fresh profile on both. The differences are > not as dramatic as I remember. I can't actually see a difference when > I run them side-by-side.
> >> 3) Startup time is faster than calculator.
> > I'm not sure if you're kidding. Do you mean Windows calculator?
> I meant Linux calculator.
> > In the limit, I'd expect us to pay a lot more on Linux due to using > > more libraries, GTK initialization, round trips to the X server, etc. > > but I don't know much about Windows here.
> I tried turning on the GTK theme. That killed startup performance.
> Side-by-side startup easily noticeably faster in Linux. To be more > precise, drawing the first pixel is noticeably faster. Total startup > time is harder to say.
> Interestingly startup drawing is different between Windows and Linux. > We might want to film with a high-speed camera to see exactly what's > going on, but here are my impressions:
> Linux draw order: > 1) Fill entire window with blue (This looks bad, can we use a > different color? White?). > 2) Paint main UI widgets, including NTP. > 3) Paint NTP thumbnails. > I bet that (2) is actually broken in to two pieces, I just can't see it.
> Window draw order: > 1) Paint NC region (the blue border around the edge). > 2) Paint main UI widgets (without omnibox). > 3) Blit NTP content area (the sweep from top to bottom is noticeable). > 4) Paint omnibox. > 5) Paint NTP thumbnails.
> Keep in mind that this all happens very fast, so I could be imagining > things.
> Ideas for improving perceived windows startup time:
> 1) Draw a fake omnibox with the rest of the main UI widgets. > Currently we draw the omnibox really late and it looks slow and bad. > You can see this if you have a dark desktop wallpaper and you focus on > where the omnibox will be. You'll see a dark rectangle inside the > main toolbar which is the desktop showing through. We should never > show a dark rectangle there.
> 2) Fill the main content area with white when drawing the main UI > widgets. You can see this if you focus on the bottom of where the > bookmark bar is going to be (especially when the bookmark bar is set > to show only on the NTP). You'll see an edge there when the bookmark > bar is draw by while the main content area is still transparent. > There's no reason we should ever paint an edge there.
> I bet the reason Windows startup feels slower is whatever drawing > operation we're using for the main content area is slow. The > top-to-bottom sweep probably makes me feel like the browser isn't > loaded until the sweep reaches the bottom, whereas I feel like Linux > is done earlier in its startup sequence.
For the UI bits, I'm willing to believe that GTK, which uses cairo, hence XRender for rendering, is hardware accelerated and in any case pipelined in another process (X), and so is faster than serialized, software rendered Skia. How much is the impact ? I don't know, we're not talking a huge amount of pixels, but still...
On Wed, Oct 28, 2009 at 12:24 PM, Antoine Labour <pi...@google.com> wrote: > On Wed, Oct 28, 2009 at 12:05 PM, Adam Barth <aba...@chromium.org> wrote: >> I bet the reason Windows startup feels slower is whatever drawing >> operation we're using for the main content area is slow. The >> top-to-bottom sweep probably makes me feel like the browser isn't >> loaded until the sweep reaches the bottom, whereas I feel like Linux >> is done earlier in its startup sequence.
> For the UI bits, I'm willing to believe that GTK, which uses cairo, hence > XRender for rendering, is hardware accelerated and in any case pipelined in > another process (X), and so is faster than serialized, software rendered > Skia. How much is the impact ? I don't know, we're not talking a huge amount > of pixels, but still...
I wonder if the problem is we're using a main-memory-to-video-memory blit to paint the content area in Windows on startup. How hard would it be to use a DDB during startup? That would give us a video-memory-to-video-memory blit, which can easily paint the whole screen at >180 fps.
On Wed, Oct 28, 2009 at 12:24 PM, Antoine Labour <pi...@google.com> wrote: > For the UI bits, I'm willing to believe that GTK, which uses cairo, hence > XRender for rendering, is hardware accelerated and in any case pipelined in > another process (X), and so is faster than serialized, software rendered > Skia. How much is the impact ? I don't know, we're not talking a huge amount > of pixels, but still...
Not only GTK mode. On linux, we upload (most of) the theme images to the X server so blitting the images is done server side and (hopefully) hardware accelerated.
Off the top of my head, the tabstrip and the floating bookmark bar are the only pieces of the linux UI drawn with skia.