OMTC on Windows

Bas Schouten

unread,

May 18, 2014, 3:16:13 AM5/18/14

to dev-tree-management, dev-te...@lists.mozilla.org, release-drivers, mozilla.dev.platform group

Hey all,

After quite a lot of waiting we've switched on OMTC on Windows by default today (bug 899785). This is a great move towards moving all our platforms onto OMTC (only linux is left now), and will allow us to remove a lot of code that we've currently been duplicating. Furthermore it puts us on track for enabling other features on desktop like APZ, off main thread animations and other improvements.

Having said that we realize that what we've currently landed and turned on is not completely bug free. There's several bugs still open (some more serious than others) which we will be addressing in the coming weeks, hopefully before the merge to Aurora. The main reason we've switched it on now is that we want to get as much data as possible from the nightly channel and our nightly user base before the aurora merge, as well as wanting to prevent any new regressions from creeping in while we fix the remaining problems. This was extensively discussed both internally in the graphics team and externally with other people and we believe we're at a point now where things are sufficiently stabilized for our nightly audience. OMTC is enabled and disabled with a single pref so if unforeseen, serious consequences occur we can disable it quickly at any stage. We will inevitably find new bugs in the coming weeks, please link any bugs you happen to come across to bug 899785, if anything seems very serious, please let us know, we'll attempt to come up with a solution on the short-term rather than disabling OMTC and reducing the amount of feedback we get.

There's also some important notes to make on performance, which we expect to be reported by our automated systems:

- Bug 1000640 is about WebGL. Currently OMTC regresses WebGL performance considerably, patches to fix this are underway and this should be fixed on the very short term.

- Several of the Talos test suite numbers will change considerably (especially with Direct2D enabled), this means Tscroll for example will improve by ~25%, but tart will regress by ~20%, and several other suites will regress as well. We've investigated this extensively and we believe the majority of these regressions are due to the nature of OMTC and the fact that we have to do more work. We see no value in holding off OMTC because of these regressions as we'll have to go there anyway. Once the last correctness and stability problems are all solved we will go back to trying to find ways to get back some of the performance regressions. We're also planning to move to a system more like tiling in desktop, which will change the performance characteristics significantly again, so we don't want to sink too much time into optimizing the current situation.

- Memory numbers will increase somewhat, this is unavoidable, there's several steps which have to be taken when doing off main thread compositing (like double-buffering), which inherently use more memory.

- On a brighter note: Async video is also enabled by these patches. This means that when the main thread is busy churning JavaScript, instead of stuttering your video should now happily continue playing!

- Also there's some indications that there's a subjective increase in scrolling performance as well.

If you have any questions please feel free to reach out to myself or other members of the graphics team!

Bas

Armen Zambrano G.

unread,

May 18, 2014, 1:50:50 PM5/18/14

to

What kind of bugs could we expect seeing?
Any place you would like us to put focus on testing?

Thanks for all the hard work to get this in.

cheers,
Armen

--
Zambrano Gasparnian, Armen (armenzg)
Mozilla Senior Release Engineer
https://mozillians.org/en-US/u/armenzg/
http://armenzg.blogspot.ca

Gavin Sharp

unread,

May 18, 2014, 2:23:58 PM5/18/14

to Bas Schouten, dev-te...@lists.mozilla.org, mozilla.dev.platform group, release-drivers, dev-tree-management

> but tart will regress by ~20%, and several other suites will regress as well.
> We've investigated this extensively and we believe the majority of these
> regressions are due to the nature of OMTC and the fact that we have to do
> more work.

Where can I read more about the TART investigations? I'd like to
understand why it is seen as inevitable, and get some of the details
of the regression. OMTC is important, and I'm excited to see it land
on Windows, but the Firefox and Performance teams have just come off a
months-long effort to make significant wins in TART, and the thought
of taking a 20% regression (huge compared to some of the improvements
we fought for) is pretty disheartening.

Gavin

On Sun, May 18, 2014 at 12:16 AM, Bas Schouten <bsch...@mozilla.com> wrote:
> Hey all,
>
> After quite a lot of waiting we've switched on OMTC on Windows by default today (bug 899785). This is a great move towards moving all our platforms onto OMTC (only linux is left now), and will allow us to remove a lot of code that we've currently been duplicating. Furthermore it puts us on track for enabling other features on desktop like APZ, off main thread animations and other improvements.
>
> Having said that we realize that what we've currently landed and turned on is not completely bug free. There's several bugs still open (some more serious than others) which we will be addressing in the coming weeks, hopefully before the merge to Aurora. The main reason we've switched it on now is that we want to get as much data as possible from the nightly channel and our nightly user base before the aurora merge, as well as wanting to prevent any new regressions from creeping in while we fix the remaining problems. This was extensively discussed both internally in the graphics team and externally with other people and we believe we're at a point now where things are sufficiently stabilized for our nightly audience. OMTC is enabled and disabled with a single pref so if unforeseen, serious consequences occur we can disable it quickly at any stage. We will inevitably find new bugs in the coming weeks, please link any bugs you happen to come across to bug 899785, if anything se

> ems very serious, please let us know, we'll attempt to come up with a solution on the short-term rather than disabling OMTC and reducing the amount of feedback we get.

>
> There's also some important notes to make on performance, which we expect to be reported by our automated systems:
>
> - Bug 1000640 is about WebGL. Currently OMTC regresses WebGL performance considerably, patches to fix this are underway and this should be fixed on the very short term.
>
> - Several of the Talos test suite numbers will change considerably (especially with Direct2D enabled), this means Tscroll for example will improve by ~25%, but tart will regress by ~20%, and several other suites will regress as well. We've investigated this extensively and we believe the majority of these regressions are due to the nature of OMTC and the fact that we have to do more work. We see no value in holding off OMTC because of these regressions as we'll have to go there anyway. Once the last correctness and stability problems are all solved we will go back to trying to find ways to get back some of the performance regressions. We're also planning to move to a system more like tiling in desktop, which will change the performance characteristics significantly again, so we don't want to sink too much time into optimizing the current situation.
>
> - Memory numbers will increase somewhat, this is unavoidable, there's several steps which have to be taken when doing off main thread compositing (like double-buffering), which inherently use more memory.
>
> - On a brighter note: Async video is also enabled by these patches. This means that when the main thread is busy churning JavaScript, instead of stuttering your video should now happily continue playing!
>
> - Also there's some indications that there's a subjective increase in scrolling performance as well.
>
>
> If you have any questions please feel free to reach out to myself or other members of the graphics team!
>
>
> Bas

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform

Chris Peterson

unread,

May 18, 2014, 2:28:39 PM5/18/14

to

That's awesome news, Bas! OMTC on Windows has been one of the major
dependencies for e10s.

AFAIU, Nightly users on Windows should now be able to test per-window
e10s without tweaking any prefs or restarting the browser. To open an
e10s window, open "File (or Hamburger)" menu then "New e10s Window" menu
item.

(Nightly users on OS X have been able to test per-window e10s for a
couple months now.)

chris

Bas Schouten

unread,

May 18, 2014, 2:47:23 PM5/18/14

to Gavin Sharp, dev-te...@lists.mozilla.org, mozilla.dev.platform group, release-drivers, dev-tree-management

Hi Gavin,

There have been several e-mails on different lists, and some communication on some bugs. Sadly the story is at this point not anywhere in a condensed form, but I will try to highlight a couple of core points, some of these will be updated further as the investigation continues. The official bug is bug 946567 but the numbers and the discussion there are far outdated (there's no 400% regression ;)):

- What OMTC does to tart scores differs wildly per machine, on some machines we saw up to 10% improvements, on others up to 20% regressions. There also seems to be somewhat more of a regression on Win7 than there is on Win8. What the average is for our users is very hard to say, frankly I have no idea.
- One core cause of the regression is that we're now dealing with two D3D devices when using Direct2D since we're doing D2D drawing on one thread, and D3D11 composition on the other. This means we have DXGI locking overhead to synchronize the two. This is unavoidable.
- Another cause is that we're now having two surfaces in order to do double buffering, this means we need to initialize more resources when new layers come into play. This again, is unavoidable.
- Yet another cause is that for some tests we composite 'ASAP' to get interesting numbers, but this causes some contention scenario's which are less likely to occur in real-life usage. Since the double buffer might copy the area validated in the last frame from the front buffer to the backbuffer in order to prevent having to redraw much more. If the compositor is compositing all the time this can block the main thread's rasterization. I have some ideas on how to improve this, but I don't know how much they'll help TART, in any case, some cost here will be unavoidable as a natural additional consequence of double buffering.
- The TART number story is complicated, sometimes it's hard to know what exactly they do, and don't measure (which might be different with and without OMTC) and how that affects practical performance. I've been told this by Avi and it matches my practical experience with the numbers. I don't know the exact reasons and Avi is probably a better person to talk about this than I am :-).

These are the core reasons that we were able to identify from profiling. Other than that the things I said in my previous e-mail still apply. We believe we're offering significant UX improvements with async video and are enabling more significant improvements in the future. Once we've fixed the obvious problems we will continue to see if there's something that can be done, either through tiling or through other improvements, particularly in the last point I mentioned there might be some, not 'too' complex things we can do to offer some small improvement.

If we want to have a more detailed discussion we should probably pick a list to have this on and try not to spam people too much :-).

Bas

----- Original Message -----
From: "Gavin Sharp" <ga...@gavinsharp.com>
To: "Bas Schouten" <bsch...@mozilla.com>
Cc: "dev-tree-management" <dev-tree-...@lists.mozilla.org>, dev-te...@lists.mozilla.org, "release-drivers" <release...@mozilla.org>, "mozilla.dev.platform group" <dev-pl...@lists.mozilla.org>
Sent: Sunday, May 18, 2014 6:23:58 PM
Subject: Re: OMTC on Windows

> but tart will regress by ~20%, and several other suites will regress as well.
> We've investigated this extensively and we believe the majority of these
> regressions are due to the nature of OMTC and the fact that we have to do
> more work.

Where can I read more about the TART investigations? I'd like to
understand why it is seen as inevitable, and get some of the details
of the regression. OMTC is important, and I'm excited to see it land
on Windows, but the Firefox and Performance teams have just come off a
months-long effort to make significant wins in TART, and the thought
of taking a 20% regression (huge compared to some of the improvements
we fought for) is pretty disheartening.

Gavin

On Sun, May 18, 2014 at 12:16 AM, Bas Schouten <bsch...@mozilla.com> wrote:
> Hey all,
>
> After quite a lot of waiting we've switched on OMTC on Windows by default today (bug 899785). This is a great move towards moving all our platforms onto OMTC (only linux is left now), and will allow us to remove a lot of code that we've currently been duplicating. Furthermore it puts us on track for enabling other features on desktop like APZ, off main thread animations and other improvements.
>
> Having said that we realize that what we've currently landed and turned on is not completely bug free. There's several bugs still open (some more serious than others) which we will be addressing in the coming weeks, hopefully before the merge to Aurora. The main reason we've switched it on now is that we want to get as much data as possible from the nightly channel and our nightly user base before the aurora merge, as well as wanting to prevent any new regressions from creeping in while we fix the remaining problems. This was extensively discussed both internally in the graphics team and externally with other people and we believe we're at a point now where things are sufficiently stabilized for our nightly audience. OMTC is enabled and disabled with a single pref so if unforeseen, serious consequences occur we can disable it quickly at any stage. We will inevitably find new bugs in the coming weeks, please link any bugs you happen to come across to bug 899785, if anything se

> ems very serious, please let us know, we'll attempt to come up with a solution on the short-term rather than disabling OMTC and reducing the amount of feedback we get.

>
> There's also some important notes to make on performance, which we expect to be reported by our automated systems:
>
> - Bug 1000640 is about WebGL. Currently OMTC regresses WebGL performance considerably, patches to fix this are underway and this should be fixed on the very short term.
>
> - Several of the Talos test suite numbers will change considerably (especially with Direct2D enabled), this means Tscroll for example will improve by ~25%, but tart will regress by ~20%, and several other suites will regress as well. We've investigated this extensively and we believe the majority of these regressions are due to the nature of OMTC and the fact that we have to do more work. We see no value in holding off OMTC because of these regressions as we'll have to go there anyway. Once the last correctness and stability problems are all solved we will go back to trying to find ways to get back some of the performance regressions. We're also planning to move to a system more like tiling in desktop, which will change the performance characteristics significantly again, so we don't want to sink too much time into optimizing the current situation.
>
> - Memory numbers will increase somewhat, this is unavoidable, there's several steps which have to be taken when doing off main thread compositing (like double-buffering), which inherently use more memory.
>
> - On a brighter note: Async video is also enabled by these patches. This means that when the main thread is busy churning JavaScript, instead of stuttering your video should now happily continue playing!
>
> - Also there's some indications that there's a subjective increase in scrolling performance as well.
>
>
> If you have any questions please feel free to reach out to myself or other members of the graphics team!
>
>
> Bas

avi...@gmail.com

unread,

May 18, 2014, 2:48:25 PM5/18/14

to

Re TART regressions and Gavin's concerns - as always, we should not trust the numbers blindly.

The first thing we need is probably taking few windows machines with different performance characteristics and compare tab animation perf on those machines, especially on the cases where TART shows regression (TART measures 10 cases of animation - like when opening a new tab, closing a tab, in several DPIs, etc), and preferably listen to subjective assessments on this from more than one person.

We should also remember that tests tend to be more reliable when the stuff they measure is in their "comfort zone". The more specialized the test is - the more it expects the test subject to behave within tighter constraints, and vice verse - the higher level the test is, the more it could treat its subject like a black box, and cares less about internal details.

TART happens to be quite specialized and OMTC is a major shift in graphics implementation. Even if TART is already running and providing useful results with OMTC on OS X, it could still be out of its "comfort zone" with OMTC on windows.

This is true about all tests. The more specialized the test is - the more it needs to be kept aligned with test subject, and the more it could produce less reliable results when it isn't.

So, we should take the regression numbers with a grain of salt, and try to make sure first that the results are still good, and if they are, see what we can do about it, etc.

Bas Schouten

unread,

May 18, 2014, 3:14:04 PM5/18/14

to Armen Zambrano G., dev-pl...@lists.mozilla.org

Hi Armen,

You -could- be seeing all kinds of bugs, but the most likely things I'd be expecting is things while window shapes and such are updating (i.e. resizing, pop-up windows, awesomebar, etc.) particularly since those type of short-lived compositors are not typical on mobile devices where we've been using OMTC for the longest, as well as being very platform-specific in their behavior.

Thanks!

Bas

----- Original Message -----
From: "Armen Zambrano G." <arm...@mozilla.com>
To: dev-pl...@lists.mozilla.org
Sent: Sunday, May 18, 2014 5:50:50 PM
Subject: Re: OMTC on Windows

Boris Zbarsky

unread,

May 18, 2014, 9:36:49 PM5/18/14

to

On 5/18/14, 2:23 PM, Gavin Sharp wrote:
> OMTC is important, and I'm excited to see it land
> on Windows, but the Firefox and Performance teams have just come off a
> months-long effort to make significant wins in TART, and the thought
> of taking a 20% regression (huge compared to some of the improvements
> we fought for) is pretty disheartening.

My question here is whether we have data that indicates why there is a
regression. Are we painting more, or are we waiting on things more?

In particular, if I understand correctly TART uses a somewhat bizarre
configuration: it tries to run refresh drivers at 10,000 frames per
second (yes, 10kHz). That may not going interact at all well with
compositing at 60Hz, and I'm not even sure how well it'll interact with
the work to trigger the refresh driver off vsync.

In any case, it's entirely possible to get regressions on TART that have
nothing to do with actual slowdowns at normal frame rates. That may not
be the case here, but it's a distinct possibility that it is. For
example, on Mac we ended up special-casing the TART configuration and
doing non-blocking buffer swaps in it (see bug 888899) precisely because
otherwise TART ended up gated on things other than actual rendering
time. I would not be terribly surprised if something like that needs to
be done on Windows too...

-Boris

avi...@gmail.com

unread,

May 19, 2014, 2:26:29 AM5/19/14

to

It's not really 10K. It just means that the refresh driver sets the timeout to its next iteration as 0 or 1 ms, instead of aiming at 16-17 ms intervals. On OS X it indeed also includes non-blocking swap, and on windows - I don't know.

We call it ASAP mode, and it's used on several tests (tscrollx, tsvgx, TART, CART). Since we started using it, those tests became much more sensitive and much better in detecting perf changes, with very few false positives, if at all.

But ASAP is not that weird. Except for the non-blocking swap, the refresh driver can set its intervals to 0-1 ms also under normal conditions - when the load is high. In ASAP mode we explicitly induce this state even when the load is low.

Without this mode, those tests would (and have) just flatline around 16.7 ms, thus not detecting regressions (or improvements) and in practice quite useless.

Sure, it's different than "normal" mode, and possibly changes some internal balances into less than optimal ones, but overall, for measuring rendering throughputs (and changes of those), we don't have a better tool right now, and it has proved useful and reliable.

Its problem, however, is that the numbers we measure with OMTC are not necessarily comparable to the numbers without OMTC. As long as it's without OMTC - it's reliable, and as long as it's with OMTC it's also reliable (bug 946567). But the perf changes which we measure on the switch itself might not be reliable, if only for the fact that defining "throughput" is not easy on OMTC.

The gfx/layout guys are aware of these factors, and we occasionally try to check if we could come up with something better than ASAP mode, but so far there isn't something better which we're aware of.

As I mentioned earlier, for this kind of switch, it would help to use few real machines with different human eyes looking at them and assessing if there's an observable difference.

- avih

Benjamin Smedberg

unread,

May 19, 2014, 9:01:28 AM5/19/14

to Bas Schouten, dev-te...@lists.mozilla.org, mozilla.dev.platform group

On 5/18/2014 3:16 AM, Bas Schouten wrote:
> remove a lot of code that we've currently been duplicating. Furthermore it puts us on track for enabling other features on desktop like APZ, off main thread animations and other improvements.

What is APZ?

Is OMTC turned on in all graphics setups, accelerated and not? Are we
testing browser performance/responsiveness in these setups?

> There's several bugs still open (some more serious than others) which we will be addressing in the coming weeks, hopefully before the merge to Aurora.

I want to call out one specifically, bug 933733 (and related bug 912521)
which we know is going to regress; we currently don't know how to fix
and we're not sure why it's happening. If anyone experiences the Firefox
UI freezing unless they are moving the mouse, please let us know ASAP in
the bug and we'll work with you to collect trace logs and try and
pinpoint a solution.

> - Memory numbers will increase somewhat, this is unavoidable, there's several steps which have to be taken when doing off main thread compositing (like double-buffering), which inherently use more memory.

I am concerned about this in general, because we know that OOM is a real
problem for many of our users currently, and we have very poor metrics
on memory usage in the wild. So I have a couple questions:

* Is it a fair statement to say that the primary benefit of OMTC is in
browser responsiveness and jank?
* Are there settings/knobs which which can reduce the memory usage of
this feature (other than disabling OMTC completely)? If so, do we have a
plan for tuning those knobs on beta before this hits release?
* How will we know whether OMTC is a net win for users on low-memory
computers, where increased memory usage and paging might offset the
responsiveness benefits?
* Are there accurate about:memory reporters for OMTC buffers?

--BDS

Bas Schouten

unread,

May 19, 2014, 11:58:56 AM5/19/14

to Benjamin Smedberg, dev-te...@lists.mozilla.org, mozilla.dev.platform group

----- Original Message -----
> From: "Benjamin Smedberg" <benj...@smedbergs.us>
> To: "Bas Schouten" <bsch...@mozilla.com>, dev-te...@lists.mozilla.org
> Cc: "mozilla.dev.platform group" <dev-pl...@lists.mozilla.org>
> Sent: Monday, May 19, 2014 1:01:28 PM
> Subject: Re: OMTC on Windows
>

> On 5/18/2014 3:16 AM, Bas Schouten wrote:
> > remove a lot of code that we've currently been duplicating. Furthermore it
> > puts us on track for enabling other features on desktop like APZ, off main
> > thread animations and other improvements.
>
> What is APZ?

Sorry, Asynz Pan Zoom. The ability to scroll without main thread involvement.

> Is OMTC turned on in all graphics setups, accelerated and not? Are we
> testing browser performance/responsiveness in these setups?

It's turned on (after I reland) on all Windows builds, and it's already turned on on OS X. So we are doing all the testing we regularly do. Very soon it will also be on on Linux, which will essentially mean it's on everywhere.

> > - Memory numbers will increase somewhat, this is unavoidable, there's
> > several steps which have to be taken when doing off main thread
> > compositing (like double-buffering), which inherently use more memory.
>
> I am concerned about this in general, because we know that OOM is a real
> problem for many of our users currently, and we have very poor metrics
> on memory usage in the wild. So I have a couple questions:
>
> * Is it a fair statement to say that the primary benefit of OMTC is in
> browser responsiveness and jank?

Yes. There's significant code complication as well, but perhaps that's more a secondary benefit :). There's also advantages for e10s, which should be noted has significant potential to reduce OOM.

> * Are there settings/knobs which which can reduce the memory usage of
> this feature (other than disabling OMTC completely)? If so, do we have a
> plan for tuning those knobs on beta before this hits release?

Not really, where you used to use 1x layer surface memory you now use 2x. It should be noted that in every situation where this would cause a significant

> * How will we know whether OMTC is a net win for users on low-memory
> computers, where increased memory usage and paging might offset the
> responsiveness benefits?

Let's watch OOM rate's carefully.

> * Are there accurate about:memory reporters for OMTC buffers?

For CPU memory, yes, not for GPU memory, as always that's a little tricky. However it should be noted in situations where the amount of layers is so large that that forms a problem there's more significant issues to address. In the area where gfx memory usage becomes large we're usually talking images. The memory usage for this situation will not increase.. i.e. An okay situation of 20MB might go to 40MB.. a bad situation of 1000MB will also only increase to 1020MB.

Bas

Gavin Sharp

unread,

May 19, 2014, 1:17:34 PM5/19/14

to Avi Hal, Bas Schouten, dev-platform

On Sun, May 18, 2014 at 11:48 AM, <avi...@gmail.com> wrote:
> Re TART regressions and Gavin's concerns - as always, we should
> not trust the numbers blindly.
>
> The first thing we need is probably taking few windows machines with
> different performance characteristics and compare tab animation perf
> on those machines, especially on the cases where TART shows regression
> (TART measures 10 cases of animation - like when opening a new tab,
> closing a tab, in several DPIs, etc), and preferably listen to subjective
> assessments on this from more than one person.

> So, we should take the regression numbers with a grain of salt, and try
> to make sure first that the results are still good, and if they are, see what
> we can do about it, etc.

This sounds like important work - is there a bug tracking this work
for TART specifically? Bug 946567 seems too broad to be useful as
anything but a tracker.

If someone on the desktop team can help with the subjective perf
measurement or interpreting results, please let me know.

Gavin

Bas Schouten

unread,

May 22, 2014, 5:18:34 AM5/22/14

to Gijs Kruitbosch, Gavin Sharp, dev-te...@lists.mozilla.org, mozilla.dev.platform group, release-drivers

Hi Gijs,

None of those things are true in my opinion. For what it's worth, the expected regression in CART was more around 20% than around 40%. The number surprises me a little, and we'll look into what makes CART so bad off (on our test servers) specifically. We haven't looked specifically at CART as we mostly looked into TART and Tsvgr while investigating.

I'll address each point individually:

a) Extremely hacky work for a 1% gain seems like a bad idea. Depending on how hacky, maintainable, etc, that seems like it might have been the wrong decision :-). But 1% gains can still accumulate, so I certainly wouldn't call them useless.
b) You have to look at what the tests are meant to do, Avi knows more about the tests than I do and has already supplied some information, but there's a difference in tweaking the UI, or a specific portion of rendering, or radically changing the way we draw things. The tests might not be a good reflection of the average user experience, but they help us catch situations where we unwittingly harm performance, or where we want proof that a change in some core algorithm does indeed make things run faster. That makes them very useful, just not for radical architectural changes. Fwiw, in what CART or TART is testing I'm not claiming it will be a net performance improvement, there are however other interactions that improve that are inherently linked to this one and cannot be decoupled (because of it being an architectural change). Similarly we only run our tests on one hardware configuration, this in my mind again stresses their purpose as a relative regression test as opposed to being representative of perceived UX.
c) A couple of things here, first of all we consulted with people outside of the gfx team about this, and we were in agreement with the people we talked to. When it comes to moving forward architecturally we should always be ready to accept something regressing, that has nothing to do with what team is doing the regression, that is related to what we're trying to accomplish. I can guarantee you for example, e10s will cause at least some significant regressions in some situations, yet we might very well have to accept those regressions to offer our users big improvements in other areas, as well as to pave the way for future improvements. With OMTA for example, several aspects of our (UI) performance can be improved in a way that no amount of TART improvements in the old architecture ever could.

Now I don't want to repeat too much of what I've already said, but I'd like to reiterate on the fact that if there are 'real' regressions in the overall user experience, we will of course attempt to address those issues, but we are also only a pref change away from going back to on-main-thread compositing.

The focus should, in my opinion be, what has actually been affected by this in a way that it has a strong, negative impact on user experience. Considering the scope of this change I am certain those things exist and they will need to be fixed. Perhaps the CART regression -is- a sign of some real unacceptable perceived performance, I'm not ruling that out, but we have not identified an interaction which significantly regressed. This is all extremely hardware dependent though (i.e. on most of my machines TART and CART both improve with OMTC), so if someone -has- seen this cause a significant performance regression in some sort of interaction, do let us know.

Best regards,
Bas

----- Original Message -----
From: "Gijs Kruitbosch" <gijskru...@gmail.com>
To: "Bas Schouten" <bsch...@mozilla.com>, "Gavin Sharp" <ga...@gavinsharp.com>
Cc: dev-te...@lists.mozilla.org, "mozilla.dev.platform group" <dev-pl...@lists.mozilla.org>, "release-drivers" <release...@mozilla.org>
Sent: Thursday, May 22, 2014 8:46:29 AM
Subject: Re: OMTC on Windows

Looking on from m.d.tree-management, on Fx-Team, the merge from this
change caused a >40% CART regression, too, which wasn't listed in the
original email. Was this unforeseeen, and if not, why was this
considered acceptable?

As gavin noted, considering how hard we fought for 2% improvements (one
of the Australis folks said yesterday "1% was like Christmas!") despite
our reasons of why things were really OK because of some of the same
reasons you gave (e.g. running in ASAP mode isn't realistic, "TART is
complicated", ...), this hurts - it makes it seem like (a) our
(sometimes extremely hacky) work was done for no good reason, or (b) the
test is fundamentally flawed and we're better off without it, or (c)
when the gfx team decides it's OK to regress it, it's fine, but not when
it happens to other people, quite irrespective of reasons given.

All/any of those being true would give me the sad feelings. Certainly it
feels to me like (b) is true if this is really meant to be a net
perceived improvement despite causing a 40% performance regression in
our automated tests.

~ Gijs

On 18/05/2014 19:47, Bas Schouten wrote:
> Hi Gavin,
>
> There have been several e-mails on different lists, and some communication on some bugs. Sadly the story is at this point not anywhere in a condensed form, but I will try to highlight a couple of core points, some of these will be updated further as the investigation continues. The official bug is bug 946567 but the numbers and the discussion there are far outdated (there's no 400% regression ;)):
>
> - What OMTC does to tart scores differs wildly per machine, on some machines we saw up to 10% improvements, on others up to 20% regressions. There also seems to be somewhat more of a regression on Win7 than there is on Win8. What the average is for our users is very hard to say, frankly I have no idea.
> - One core cause of the regression is that we're now dealing with two D3D devices when using Direct2D since we're doing D2D drawing on one thread, and D3D11 composition on the other. This means we have DXGI locking overhead to synchronize the two. This is unavoidable.
> - Another cause is that we're now having two surfaces in order to do double buffering, this means we need to initialize more resources when new layers come into play. This again, is unavoidable.
> - Yet another cause is that for some tests we composite 'ASAP' to get interesting numbers, but this causes some contention scenario's which are less likely to occur in real-life usage. Since the double buffer might copy the area validated in the last frame from the front buffer to the backbuffer in order to prevent having to redraw much more. If the compositor is compositing all the time this can block the main thread's rasterization. I have some ideas on how to improve this, but I don't know how much they'll help TART, in any case, some cost here will be unavoidable as a natural additional consequence of double buffering.
> - The TART number story is complicated, sometimes it's hard to know what exactly they do, and don't measure (which might be different with and without OMTC) and how that affects practical performance. I've been told this by Avi and it matches my practical experience with the numbers. I don't know the exact reasons and Avi is probably a better person to talk about this than I am :-).
>
> These are the core reasons that we were able to identify from profiling. Other than that the things I said in my previous e-mail still apply. We believe we're offering significant UX improvements with async video and are enabling more significant improvements in the future. Once we've fixed the obvious problems we will continue to see if there's something that can be done, either through tiling or through other improvements, particularly in the last point I mentioned there might be some, not 'too' complex things we can do to offer some small improvement.
>
> If we want to have a more detailed discussion we should probably pick a list to have this on and try not to spam people too much :-).
>
> Bas
>

> ----- Original Message -----
> From: "Gavin Sharp" <ga...@gavinsharp.com>
> To: "Bas Schouten" <bsch...@mozilla.com>

> Cc: "dev-tree-management" <dev-tree-...@lists.mozilla.org>, dev-te...@lists.mozilla.org, "release-drivers" <release...@mozilla.org>, "mozilla.dev.platform group" <dev-pl...@lists.mozilla.org>
> Sent: Sunday, May 18, 2014 6:23:58 PM
> Subject: Re: OMTC on Windows
>

>> but tart will regress by ~20%, and several other suites will regress as well.
>> We've investigated this extensively and we believe the majority of these
>> regressions are due to the nature of OMTC and the fact that we have to do
>> more work.
>

> Where can I read more about the TART investigations? I'd like to
> understand why it is seen as inevitable, and get some of the details

> of the regression. OMTC is important, and I'm excited to see it land

> on Windows, but the Firefox and Performance teams have just come off a
> months-long effort to make significant wins in TART, and the thought
> of taking a 20% regression (huge compared to some of the improvements
> we fought for) is pretty disheartening.
>

> Gavin

>
> On Sun, May 18, 2014 at 12:16 AM, Bas Schouten <bsch...@mozilla.com> wrote:
>> Hey all,
>>
>> After quite a lot of waiting we've switched on OMTC on Windows by default today (bug 899785). This is a great move towards moving all our platforms onto OMTC (only linux is left now), and will allow us to remove a lot of code that we've currently been duplicating. Furthermore it puts us on track for enabling other features on desktop like APZ, off main thread animations and other improvements.
>>
>> Having said that we realize that what we've currently landed and turned on is not completely bug free. There's several bugs still open (some more serious than others) which we will be addressing in the coming weeks, hopefully before the merge to Aurora. The main reason we've switched it on now is that we want to get as much data as possible from the nightly channel and our nightly user base before the aurora merge, as well as wanting to prevent any new regressions from creeping in while we fix the remaining problems. This was extensively discussed both internally in the graphics team and externally with other people and we believe we're at a point now where things are sufficiently stabilized for our nightly audience. OMTC is enabled and disabled with a single pref so if unforeseen, serious consequences occur we can disable it quickly at any stage. We will inevitably find new bugs in the coming weeks, please link any bugs you happen to come across to bug 899785, if anything se

>> ems very serious, please let us know, we'll attempt to come up with a solution on the short-term rather than disabling OMTC and reducing the amount of feedback we get.

>>
>> There's also some important notes to make on performance, which we expect to be reported by our automated systems:
>>
>> - Bug 1000640 is about WebGL. Currently OMTC regresses WebGL performance considerably, patches to fix this are underway and this should be fixed on the very short term.
>>
>> - Several of the Talos test suite numbers will change considerably (especially with Direct2D enabled), this means Tscroll for example will improve by ~25%, but tart will regress by ~20%, and several other suites will regress as well. We've investigated this extensively and we believe the majority of these regressions are due to the nature of OMTC and the fact that we have to do more work. We see no value in holding off OMTC because of these regressions as we'll have to go there anyway. Once the last correctness and stability problems are all solved we will go back to trying to find ways to get back some of the performance regressions. We're also planning to move to a system more like tiling in desktop, which will change the performance characteristics significantly again, so we don't want to sink too much time into optimizing the current situation.
>>

>> - Memory numbers will increase somewhat, this is unavoidable, there's several steps which have to be taken when doing off main thread compositing (like double-buffering), which inherently use more memory.
>>

>> - On a brighter note: Async video is also enabled by these patches. This means that when the main thread is busy churning JavaScript, instead of stuttering your video should now happily continue playing!
>>
>> - Also there's some indications that there's a subjective increase in scrolling performance as well.
>>
>>
>> If you have any questions please feel free to reach out to myself or other members of the graphics team!
>>
>>
>> Bas

Ehsan Akhgari

unread,

May 22, 2014, 8:56:26 AM5/22/14

to Bas Schouten, Gijs Kruitbosch, Gavin Sharp, dev-te...@lists.mozilla.org, mozilla.dev.platform group, release-drivers

On 2014-05-22, 5:18 AM, Bas Schouten wrote:
> Hi Gijs,
>
> None of those things are true in my opinion. For what it's worth, the expected regression in CART was more around 20% than around 40%. The number surprises me a little, and we'll look into what makes CART so bad off (on our test servers) specifically. We haven't looked specifically at CART as we mostly looked into TART and Tsvgr while investigating.
>
> I'll address each point individually:
>
> a) Extremely hacky work for a 1% gain seems like a bad idea. Depending on how hacky, maintainable, etc, that seems like it might have been the wrong decision :-). But 1% gains can still accumulate, so I certainly wouldn't call them useless.
> b) You have to look at what the tests are meant to do, Avi knows more about the tests than I do and has already supplied some information, but there's a difference in tweaking the UI, or a specific portion of rendering, or radically changing the way we draw things. The tests might not be a good reflection of the average user experience, but they help us catch situations where we unwittingly harm performance, or where we want proof that a change in some core algorithm does indeed make things run faster. That makes them very useful, just not for radical architectural changes. Fwiw, in what CART or TART is testing I'm not claiming it will be a net performance improvement, there are however other interactions that improve that are inherently linked to this one and cannot be decoupled (because of it being an architectural change). Similarly we only run our tests on one hardware configuration, this in my mind again stresses their purpose as a relative regression test as opposed to
> being representative of perceived UX.
> c) A couple of things here, first of all we consulted with people outside of the gfx team about this, and we were in agreement with the people we talked to. When it comes to moving forward architecturally we should always be ready to accept something regressing, that has nothing to do with what team is doing the regression, that is related to what we're trying to accomplish. I can guarantee you for example, e10s will cause at least some significant regressions in some situations, yet we might very well have to accept those regressions to offer our users big improvements in other areas, as well as to pave the way for future improvements. With OMTA for example, several aspects of our (UI) performance can be improved in a way that no amount of TART improvements in the old architecture ever could.
>
> Now I don't want to repeat too much of what I've already said, but I'd like to reiterate on the fact that if there are 'real' regressions in the overall user experience, we will of course attempt to address those issues, but we are also only a pref change away from going back to on-main-thread compositing.
>
> The focus should, in my opinion be, what has actually been affected by this in a way that it has a strong, negative impact on user experience. Considering the scope of this change I am certain those things exist and they will need to be fixed. Perhaps the CART regression -is- a sign of some real unacceptable perceived performance, I'm not ruling that out, but we have not identified an interaction which significantly regressed. This is all extremely hardware dependent though (i.e. on most of my machines TART and CART both improve with OMTC), so if someone -has- seen this cause a significant performance regression in some sort of interaction, do let us know.

Have you tried to test those interactions on the same Talos machines
that the CART regressions happened on? I think that would definitively
answer the question of whether these regressions affect an interaction
we care about.

(Also, out of curiosity, what is the list of those interactions?)

Cheers,
Ehsan

Bas Schouten

unread,

May 22, 2014, 10:56:55 AM5/22/14

to Ehsan Akhgari, Gavin Sharp, dev-te...@lists.mozilla.org, mozilla.dev.platform group, release-drivers, Gijs Kruitbosch

----- Original Message -----
> From: "Ehsan Akhgari" <ehsan....@gmail.com>
> To: "Bas Schouten" <bsch...@mozilla.com>, "Gijs Kruitbosch" <gijskru...@gmail.com>
> Cc: "Gavin Sharp" <ga...@gavinsharp.com>, dev-te...@lists.mozilla.org, "mozilla.dev.platform group"
> <dev-pl...@lists.mozilla.org>, "release-drivers" <release...@mozilla.org>
> Sent: Thursday, May 22, 2014 12:56:26 PM
> Subject: Re: OMTC on Windows
>

We have, but it should be noted CART does some special things I believe to draw 'as fast as possible', so even a 40% regression, could still be 'fast enough' and we still wouldn't see an affect we cared about.

As I said, I didn't look at CART specifically but more at TART. One interaction that was particularly bad there in the numbers, was 'tab-close' that looked fine to me on the machines I looked at, but it probably deserves some specific attention on some other, slower machines.

Thanks!

Bas

Girish Sharma

unread,

May 22, 2014, 4:15:18 PM5/22/14

to Bas Schouten, mozilla.dev.platform group, dev-te...@lists.mozilla.org, release-drivers

I don't know if everyone is noticing this or not, but I am noticing a lot
of misplaced paint blocks while scrolling/animation etc. Sometimes even the
complete tab won't refresh on a tab change.

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>

--
Girish Sharma
B.Tech(H), Civil Engineering,
Indian Institute of Technology, Kharagpur

Bas Schouten

unread,

May 22, 2014, 4:21:10 PM5/22/14

to Girish Sharma, dev-te...@lists.mozilla.org, mozilla.dev.platform group, release-drivers

Hi Girish,

This sounds like bug 1012213. We're aware of this bug and are working on it!

Best regards,
Bas

Girish Sharma

unread,

May 22, 2014, 4:22:30 PM5/22/14

to Bas Schouten, dev-te...@lists.mozilla.org, mozilla.dev.platform group, release-drivers

On Fri, May 23, 2014 at 1:51 AM, Bas Schouten <bsch...@mozilla.com> wrote:

> bug 1012213

Yup, looks like that only. Thanks for the bug.

Marco Zehe

unread,

May 23, 2014, 3:10:05 AM5/23/14

to Girish Sharma, Bas Schouten, dev-te...@lists.mozilla.org, mozilla.dev.platform group, release-drivers

Hi all,

for a couple of days on Nightly, I am seeing an increasing number of
WM_GETOBJECT message response failures, causing in intermittent
accessibility failures. I know for a fact that nothing in our code
changed recently that could cause this, so I was wondering if OMTC could
be responsible?

Marco

> _______________________________________________
> release-drivers mailing list
> release...@mozilla.org
> https://mail.mozilla.org/listinfo/release-drivers

Marco Zehe

unread,

May 23, 2014, 3:16:48 AM5/23/14

to Girish Sharma, Bas Schouten, dev-te...@lists.mozilla.org, mozilla.dev.platform group, release-drivers

For reference -- if this is applicable --, these failures are tracked in
bug 1014673: https://bugzilla.mozilla.org/show_bug.cgi?id=1014673

Marco

Vladimir Vukicevic

unread,

May 28, 2014, 3:12:07 PM5/28/14

to Gijs Kruitbosch, Bas Schouten, dev-te...@lists.mozilla.org, mozilla.dev.platform group, release-drivers, Gavin Sharp

(Note: I have not looked into the details of CART/TART and their interaction with OMTC)

It's entirely possible that (b) is true *now* -- the test may have been good and proper for the previous environment, but now the environment characteristics were changed such that the test needs tweaks. Empirically, I have not seen any regressions on any of my Windows machines (which is basically all of them); things like tab animations and the like have started feeling smoother even after a long-running browser session with many tabs. I realize this is not the same as cold hard numbers, but it does suggest to me that we need to take another look at the tests now.

- Vlad

----- Original Message -----
> From: "Gijs Kruitbosch" <gijskru...@gmail.com>
> To: "Bas Schouten" <bsch...@mozilla.com>, "Gavin Sharp" <ga...@gavinsharp.com>
> Cc: dev-te...@lists.mozilla.org, "mozilla.dev.platform group" <dev-pl...@lists.mozilla.org>, "release-drivers"
> <release...@mozilla.org>
> Sent: Thursday, May 22, 2014 4:46:29 AM
> Subject: Re: OMTC on Windows
>

> > Bas
> >
> > ----- Original Message -----

> > From: "Gavin Sharp" <ga...@gavinsharp.com>
> > To: "Bas Schouten" <bsch...@mozilla.com>
> > Cc: "dev-tree-management" <dev-tree-...@lists.mozilla.org>,
> > dev-te...@lists.mozilla.org, "release-drivers"
> > <release...@mozilla.org>, "mozilla.dev.platform group"
> > <dev-pl...@lists.mozilla.org>

> > Sent: Sunday, May 18, 2014 6:23:58 PM
> > Subject: Re: OMTC on Windows
> >

> >> but tart will regress by ~20%, and several other suites will regress as
> >> well.
> >> We've investigated this extensively and we believe the majority of these
> >> regressions are due to the nature of OMTC and the fact that we have to do
> >> more work.
> >
> > Where can I read more about the TART investigations? I'd like to
> > understand why it is seen as inevitable, and get some of the details
> > of the regression. OMTC is important, and I'm excited to see it land
> > on Windows, but the Firefox and Performance teams have just come off a
> > months-long effort to make significant wins in TART, and the thought
> > of taking a 20% regression (huge compared to some of the improvements
> > we fought for) is pretty disheartening.
> >
> > Gavin
> >

> >> Bas
> >> _______________________________________________
> >> dev-platform mailing list
> >> dev-pl...@lists.mozilla.org
> >> https://lists.mozilla.org/listinfo/dev-platform
>

> _______________________________________________
> dev-tech-gfx mailing list
> dev-te...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-tech-gfx
>

Gavin Sharp

unread,

May 28, 2014, 3:15:09 PM5/28/14

to Vladimir Vukicevic, Bas Schouten, dev-te...@lists.mozilla.org, mozilla.dev.platform group, release-drivers, Gijs Kruitbosch

Who's responsible for looking into the test/regression? Bas? Does the
person looking into it need help from the performance or desktop teams?

What bug is tracking that work?

Gavin

On Wed, May 28, 2014 at 12:12 PM, Vladimir Vukicevic
<vlad...@mozilla.com>wrote:

jmaher

unread,

May 28, 2014, 8:22:51 PM5/28/14

to

https://bugzilla.mozilla.org/show_bug.cgi?id=1013262 tracks all the Talos performance adjustments

Bas Schouten

unread,

May 29, 2014, 2:27:44 AM5/29/14

to Gavin Sharp, Vladimir Vukicevic, mozilla.dev.platform group, release-drivers, Gijs Kruitbosch, dev-te...@lists.mozilla.org

Hi Gavin,

I initially responded to Gijs' e-mail a while ago, before it got through dev-platform moderation. Since then I've discovered one bug which I believe is related. This one is tracked in bug 1017298. In particular this occurs when a user first has more tabs open than the window can fit, i.e. the tab strip becomes 'scrollable', and then starts closing them, without interruption, to the point where they do fit, and they start 'growing'. This causes layers to rapidly change size (every animation frame), since the tab strip in this scenario 'remains actively layerized' for a little while (unlike when it was never overflowing in the first place, and therefor never got layerized).

There are reasons to belief the new architecture performs somewhat worse than the old architecture when resizing layers. Usually this is a rare situation but I believe the situation described above -might- be exactly what happens during the TART tab close tests. In general I don't think many users will run into this, but it might explain part of that. Matt is looking into whether something can be done here.

I'll continue looking into what might affect our performance numbers. If either people from the performance teams or the desktop teams are interested in investigating our tests and how well they measure practical performance, I think that would be very valuable, and it would help us a lot in identifying for what sort of interactions we need to investigate the performance.

In addition to that if the performance or desktop teams have good ideas of other interactions (than tab-close) which seem to have regressed a lot in particular, that will also help a lot in our investigations. My knowledge of TART is limited.

Bug 1013262 is the general tracking bug but there's not too much info there to be honest.

Thanks!

avi...@gmail.com

unread,

May 29, 2014, 10:14:20 PM5/29/14

to

So, wrt TART, I now took the time to carefully examine tab animation visually on one system.

TL;DR:
- I think OMTC introduces a clearly visible regression with tab animation compared to without OMTC.
- I _think_ it regresses more with tab close than with tab open animation.
- The actual throughput regression is probably bigger than indicated by TART numbers.

The reason for the negative bias is that the TART results are an average of 10 different animations, but only one of those is close to pure graphics perf numbers, and when you look only on this test, the regression is bigger than 50-100% (more like 100-400%).

The details:

System: Windows 8.1 x64, i7-4500u, using Intel's iGPU (HD4400), and with official Firefox nightly 32bit (2014-05-29).

First, visually: both with and without ASAP mode, to my eyes, tab animation with OMTC is less smooth, and seems to have lower frame rate than without OMTC.

As for what TART measures, of all the TART subtests, there are 3 which are most suitable for testing pure graphics performance - they test the css fade-in and fade-out (that's the close/open animation) of a tab without actually opening or closing a browser tab, so whatever performance it has, the limit comes only from the animation itself and it doesn't include other overheads.

These tests are the ones which have "fade" in their name, and only one of them is enabled by default in talos - the other two are available only when running TART locally and then manually selecting animations to run.

I'll focus on a single number which is the average frame interval of the entire animation (these are the ".all" numbers), for the fade animation at default DPI (which is 1 on my system - so the most common).

What TART measures locally on my system:

OMTC without ASAP mode (as out of the box config as it gets):
iconFade-close-DPIcurrent.all Average (5): 18.91 stddev: 0.86
iconFade-open-DPIcurrent.all Average (5): 17.61 stddev: 0.78

OMTC with ASAP:
iconFade-close-DPIcurrent.all Average (5): 18.47 stddev: 0.46
iconFade-open-DPIcurrent.all Average (5): 10.08 stddev: 0.46

While this is an average of only 5 runs, stddev shows it's reasonably consistent, and the results are also consistent when I tried more.

We can already tell that close animation just doesn't get below ~18.5ms/frame on this system, ASAP doesn't affect it at all. We can also see that the open animation is around 60fps without ASAP (17.6 can happen with our inaccurate interval timers) and with ASAP it goes down to about 10ms/frame.

Without OMTC and without ASAP:
iconFade-close-DPIcurrent.all Average (5): 16.54 stddev: 0.16
iconFade-open-DPIcurrent.all Average (5): 16.52 stddev: 0.12

Without OMTC and with ASAP:
iconFade-close-DPIcurrent.all Average (5): 5.53 stddev: 0.07
iconFade-open-DPIcurrent.all Average (5): 6.37 stddev: 0.08

The results are _much_ more stable (stddev), and quite lower (in ASAP) and closer to 16.7 in "normal" mode.

While I obviously can't visually notice differences when the frame rate is higher than my screen's 60hz, from what I've seen so far, both visually and at the numbers, I think TART is not less reliable than before, it doesn't look to me as if ASAP introduces very bad bias (I couldn't deduct any), and OMTC does seem regress tab animations meaningfully.

- avih

Andreas Gal

unread,

May 29, 2014, 10:30:35 PM5/29/14

to avi...@gmail.com, dev-pl...@lists.mozilla.org

I think we should shift the conversation to how we actually animate here. Animating by trying to reflow and repaint with 60fps is just a bad idea. This might work on very high end hardware, but it will cause poor performance on the low-end Windows notebooks people buy these days. In other words, I am pretty sure our animation here was bad for a lot of our users pre-OMTC.

OMTC enables us to do smooth 60fps animations even under high load and even on very low end hardware, as long we do the animation right. So lets focus on that and figure out how to draw a tab strip that doesn’t hit pathological repainting paths.

I see two options here. We have to change our UX such that we can execute a smooth animation on the compositor (transforms, opacity changes, filter effects, etc), or we should draw the tab strip with canvas, which is more suitable for complex custom animations than reflow.

Andreas

Matt Woodrow

unread,

May 30, 2014, 1:22:25 AM5/30/14

to avi...@gmail.com, dev-pl...@lists.mozilla.org, Bas Schouten

Thanks Avi!

I can reproduce a regression like this (~100% slower on
iconFade-close-DPIcurrent.all) with my machine forced to use the intel
GPU, but not with the Nvidia one.

This suggests it's very much a driver/hardware specific problem, rather
than a general regression with OMTC, which matches Bas' original
observations.

Doing some profiling using my intel GPU suggests that my specific
regression has to do with uploading and drawing shadows. I'm seeing ~45%
of the OMTC profile [1] in nsDisplayBoxShadowOuter::Paint vs ~8% in the
non-OMTC profile [2]. It's hard to tell exactly where the slowdown is
because samples within the driver are breaking our unwinding code, but I
suspect it's probably the upload to the GPU not handling the
threads/contention well. I suspect a simple box-shadow cache would work
wonders here.

I don't know how will this will translate to the TBPL results though,
does anyone know what GPU's and drivers we are running there?

- Matt

Matt Woodrow

unread,

May 30, 2014, 1:28:29 AM5/30/14

to Andreas Gal, avi...@gmail.com, dev-pl...@lists.mozilla.org

I definitely agree with this, but we also need OMTAnimations to be
finished and enabled before any of the interesting parts of the UI can
be converted.

Given that, I don't think we can have this conversation at the expense
of trying to fix the current set of regressions from OMTC.

We may also want to invest some time into writing tools to help authors
write and debug animations that run efficiently, and without main-thread
involvement.

- Matt

On 30/05/14 2:30 pm, Andreas Gal wrote:
> I think we should shift the conversation to how we actually animate here. Animating by trying to reflow and repaint with 60fps is just a bad idea. This might work on very high end hardware, but it will cause poor performance on the low-end Windows notebooks people buy these days. In other words, I am pretty sure our animation here was bad for a lot of our users pre-OMTC.
>

> OMTC enables us to do smooth 60fps animations even under high load and even on very low end hardware, as long we do the animation right. So lets focus on that and figure out how to draw a tab strip that doesnï¿½t hit pathological repainting paths.

>
> I see two options here. We have to change our UX such that we can execute a smooth animation on the compositor (transforms, opacity changes, filter effects, etc), or we should draw the tab strip with canvas, which is more suitable for complex custom animations than reflow.
>
> Andreas
>
> On May 29, 2014, at 10:14 PM, avi...@gmail.com wrote:
>

Matt Woodrow

unread,

May 30, 2014, 1:38:43 AM5/30/14

to avi...@gmail.com, dev-pl...@lists.mozilla.org, Bas Schouten

On 30/05/14 5:22 pm, Matt Woodrow wrote:
>
> Doing some profiling using my intel GPU suggests that my specific
> regression has to do with uploading and drawing shadows. I'm seeing
> ~45% of the OMTC profile [1] in nsDisplayBoxShadowOuter::Paint vs ~8%
> in the non-OMTC profile [2]. It's hard to tell exactly where the
> slowdown is because samples within the driver are breaking our
> unwinding code, but I suspect it's probably the upload to the GPU not
> handling the threads/contention well. I suspect a simple box-shadow
> cache would work wonders here.
>

Oops,

[1]
http://people.mozilla.org/~bgirard/cleopatra/#report=87718d90b7d4d4cea6714a2c6de3458151e467b3
[2]
http://people.mozilla.org/~bgirard/cleopatra/#report=506e206b173801970896fb3a3dc7fb2974755dcd

avi...@gmail.com

unread,

May 30, 2014, 6:25:33 AM5/30/14

to

On Friday, May 30, 2014 8:22:25 AM UTC+3, Matt Woodrow wrote:
> Thanks Avi!
>
>
>
> I can reproduce a regression like this (~100% slower on
>
> iconFade-close-DPIcurrent.all) with my machine forced to use the intel
>
> GPU, but not with the Nvidia one.

Indeed, and it's not the first time we notice that Firefox performs much worse with Intel iGPUs compared to nvidia.

This comment: https://bugzilla.mozilla.org/show_bug.cgi?id=894128#c30 compares scrolling performance on a Wikipedia page, on a different system than the one I used to produce these OMTC numbers with.

It suggests that we were already doing badly enough with intel iGPUs even before OMTC (about 300% worse and much more noisy intervals than nvidia on a Wikipedia page if we're to believe those numbers), and it looks as if with OMTC the regression compared to nvidia increased even more (in relative terms).

In comment 1 of the same bug 894128, I also compared the performance of Chrome and IE on the same pages.

FWIW, IE is able to maintain 100% smooth scrolling on some really complex pages even on a _very_ low end Atom system (Intel iGPU), while Firefox doesn't come anywhere near it.

While scroll and tab animations are possibly different things, I do think there's a line which connects these dots, and it's that for whatever reason, Firefox does really badly on Intel iGPUs.

Which is unfortunate, because on many many systems these are the only available GPUs, and they're already considered good enough for the majority of users to not need a dedicated GPU.

- avih

avi...@gmail.com

unread,

May 30, 2014, 6:35:02 AM5/30/14

to

On Friday, May 30, 2014 1:25:33 PM UTC+3, avi...@gmail.com wrote:
> FWIW, IE is able to maintain 100% smooth scrolling on some really complex pages even on a _very_ low end Atom system (Intel iGPU), while Firefox doesn't come anywhere near it.

Of course, I'm hoping that APZ and maybe tiling will be able improve greatly on the scrolling case. Yet, this has been our case for a very long time now.

- avih

Andreas Gal

unread,

May 30, 2014, 8:19:22 AM5/30/14

to avi...@gmail.com, dev-pl...@lists.mozilla.org

There are likely two causes here.

First, until we have APZ enabled its very unlikely that we can ever maintain a high frame-rate scrolling on low-end hardware. OMTC is a prerequisite for APZ (async pan/zoom). Low end hardware is simply not fast enough to repaint and buffer-rotate with 60FPS.

Now for Intel hardware being slow there could be a couple reasons, and APZ might fix them actually. If I remember correctly Atom GPUs are PowerVR based, which is a tile based rendering architecture. It splits the frame buffer in small tiles and renders those. To do this efficiently it defers rendering for as long as possible. Other GPUs start rendering as soon as possible, whereas PowerVR waits until the entire frame is ready and then renders it then. We do a couple operations while rendering that might force a pipeline flush, which likely forces PowerVR to render right away, which is very bad for PowerVR’s particular render model. If you can point us to some specific hardware we really suck on we can definitely look into this.

Andreas

Gabriele Svelto

unread,

May 30, 2014, 10:06:52 AM5/30/14

to Andreas Gal, avi...@gmail.com, dev-pl...@lists.mozilla.org

On 30/05/2014 14:19, Andreas Gal wrote:
> Now for Intel hardware being slow there could be a couple reasons, and APZ might fix them actually. If I remember correctly Atom GPUs are PowerVR based, which is a tile based rendering architecture. It splits the frame buffer in small tiles and renders those. To do this efficiently it defers rendering for as long as possible. Other GPUs start rendering as soon as possible, whereas PowerVR waits until the entire frame is ready and then renders it then. We do a couple operations while rendering that might force a pipeline flush, which likely forces PowerVR to render right away, which is very bad for PowerVR’s particular render model. If you can point us to some specific hardware we really suck on we can definitely look into this.

The test in https://bugzilla.mozilla.org/show_bug.cgi?id=894128#c30 is
using an HD4000 iGPU which is an internal Intel design and not
PowerVR-based. Has anybody tried using a Intel's Graphics Performance
Analyzer [1] tools to see if we're hitting a slow path in the driver or
some other suboptimal scenario?

Gabriele

[1] https://software.intel.com/en-us/vcsource/tools/intel-gpa

signature.asc

Dao

unread,

May 30, 2014, 10:46:48 AM5/30/14

to andre...@gmail.com, avi...@gmail.com, ma...@mozilla.com, dev-pl...@lists.mozilla.org

On 30.05.2014 07:28, Matt Woodrow wrote:
> I definitely agree with this, but we also need OMTAnimations to be
> finished and enabled before any of the interesting parts of the UI can
> be converted.
>
> Given that, I don't think we can have this conversation at the expense
> of trying to fix the current set of regressions from OMTC.

Even if off-main-thread animations worked and we somehow re-designed and
re-implemented the tab strip today, this still wouldn't wipe away the
gist of the regressions, which really isn't about the tab strip. The tab
strip uses web technology or derivative thereof (XUL flexbox, but I
guess that's not at fault here...). Telling web developers that they
should only ever animate transforms / opacity or use canvas is a flawed
strategy when Gecko performs worse than it used to and/or worse than
other engines on animations involving reflows.

avi...@gmail.com

unread,

May 30, 2014, 10:48:26 AM5/30/14

to

On Friday, May 30, 2014 5:06:52 PM UTC+3, Gabriele Svelto wrote:
> On 30/05/2014 14:19, Andreas Gal wrote:
> If you can point us to some specific hardware we really suck on we can definitely look into this.

Sure, and the hardware specs are also available at bug 894128 comment 0.

100% smooth scrolling with IE was observed on the following systems:

- Acer Iconia w510 tablet, which has a Clover Trail Atom z2760, which indeed includes a PowerVR SGX 545 GPU.

- Asus T100 laptop/tablet, which has a newer and considerably stronger Bay Trail Atom Z3740, AFAIK with with HD graphics technology similar to HD4000.

- Asus N56VZ with i7-3630qm with HD4000 GPU (it also has nvidia gt650m with optimus, and comment 30 compares the intel/nvidia performance on this system).

On all these systems, Firefox is far behind IE on this front, but with margins getting lower as the systems get stronger (i.e. in order of presentation - old atom, new atom, i7+hd4000).

- avih

avi...@gmail.com

unread,

May 30, 2014, 11:03:29 AM5/30/14

to

On Friday, May 30, 2014 5:48:26 PM UTC+3, avi...@gmail.com wrote:
> On all these systems, Firefox is far behind IE on this front, but with margins getting lower as the systems get stronger (i.e. in order of presentation - old atom, new atom, i7+hd4000).

And just to complete the picture, the margin disappears for most practical concerns on the last system when using the i7 with nvidia gt650m GPU.

I.e. on this fast system, it's mostly smooth as silk to a similar level which is observed with IE.

andre...@gmail.com

unread,

May 30, 2014, 11:16:53 AM5/30/14

to Dao, avi...@gmail.com, dev-pl...@lists.mozilla.org, ma...@mozilla.com

Please read my email again. This kind of animation cannot be rendered with high FPS by any engine. It's simply conceptually expensive and inefficient for the DOM rendering model. We will work on matching other engines if we are slightly slower than we could be, but you will never reach solid performance on low end hardware with the current approach. While we work on squeezing out a few more FPS, please work on implementing a tab strip that can be rendered efficiently.

Andreas

Sent from Mobile.

avi...@gmail.com

unread,

May 30, 2014, 11:26:57 AM5/30/14

to

On Friday, May 30, 2014 6:16:53 PM UTC+3, andre...@gmail.com wrote:
> Please read my email again.

It was provided as an objective data and subjective assessment - and not as an opinion, in reply for your request for more info on systems where we suck, if I understood your request correctly.

FWIW, I'm aware of the systems in place and the plans, and mentioned already that I hope that APZ and tiling could improve the scroll case.

Gijs Kruitbosch

unread,

Jun 2, 2014, 4:48:52 AM6/2/14

to andre...@gmail.com, Dao, avi...@gmail.com, dev-pl...@lists.mozilla.org, ma...@mozilla.com

On 30/05/2014 16:16, andre...@gmail.com wrote:
> Please read my email again. This kind of animation cannot be rendered with high FPS by any engine.

This doesn't make sense. Avih posted numbers sans-OMTC, and the same
machine he used that doesn't manage to get 60fps with OMTC gets almost
200fps without it. IOW, we have real data that disproves your claim that
this is not possible.

~ Gijs

Gijs Kruitbosch

unread,

Jun 2, 2014, 4:48:52 AM6/2/14

to andre...@gmail.com, Dao, avi...@gmail.com, dev-pl...@lists.mozilla.org, ma...@mozilla.com

On 30/05/2014 16:16, andre...@gmail.com wrote:

> Please read my email again. This kind of animation cannot be rendered with high FPS by any engine.

Dao

unread,

May 30, 2014, 10:46:48 AM5/30/14

to dev-pl...@lists.mozilla.org, avi...@gmail.com, andre...@gmail.com, ma...@mozilla.com

On 30.05.2014 07:28, Matt Woodrow wrote:

> I definitely agree with this, but we also need OMTAnimations to be
> finished and enabled before any of the interesting parts of the UI can
> be converted.
>
> Given that, I don't think we can have this conversation at the expense
> of trying to fix the current set of regressions from OMTC.

avi...@gmail.com

unread,

Jun 17, 2014, 8:43:24 AM6/17/14

to

Earlier this week, after windows update which included an Intel driver update (to 10.18.10.3621 2014-05-16), I noticed a considerable improvement in Firefox performance and smoothness, with both release and nightly versions.

So I got back to the lab and grabbed some numbers. Same build as before (nightly 32 bit 2014-05-29) and same system (i7-4500u, HD4400, win 8.1 64).

Here are the measurements with the new Intel driver:

OMTC without ASAP:
iconFade-close-DPIcurrent.all Average (5): 16.35 stddev: 0.20
iconFade-open-DPIcurrent.all Average (5): 16.68 stddev: 0.15

This already looks considerably more stable than before.

OMTC with ASAP:
iconFade-close-DPIcurrent.all Average (5): 4.80 stddev: 0.23
iconFade-open-DPIcurrent.all Average (5): 3.32 stddev: 0.05

This is actually pretty fast! about 3x faster than before, or even a bit faster (vs 18ms, 10ms respectively). Yay!

Without OMTC and without ASAP:

same as before

Without OMTC and with ASAP:

iconFade-close-DPIcurrent.all Average (5): 2.16 stddev: 0.11
iconFade-open-DPIcurrent.all Average (5): 2.96 stddev: 0.06

Without OMTC it's flying! again, about 2.5x faster than before.

The regression between yes and no OMTC is smaller with this driver, but still not negligible (150% on close tab animation, 10% on open tab animation, compared to before with ~200%, ~50%, respectively).

Obviously with such numbers, but worth mentioning explicitly, I can't visually tell a difference between yes/no OMTC.

And this is only from a newer Intel driver!

I took the liberty to also run the scroll test (bookmarklet from bug 894128) on the same system and with the Firefox Wikipedia page. It scrolls for 5 seconds in ASAP mode and reports average interval and stddev, and I used full screen (1920x1080) with normal DPI of 1, with the same nightly build as above:

OMTC:
Average interval: 5.08 ms
STDDEV intervals: 1.02 ms

OMTC:off
Average interval: 8.19 ms
STDDEV intervals: 1.22 ms

We've known it already, but here's another confirmation: OMTC improves scrolling very nicely. Yay #2

So, encouraged with these very nice numbers, I decided to also test the latest nightly (2014-06-16).

It turns out that all the numbers (TART, scroll-test with[/out] OMTC) are more or less in the same ballpark as with the 2014-05-29 build.

However, on this specific system, and I'm guessing many others with a recent Intel iGPU, these latest drivers make a huge difference, ~2-3x faster than before on All Firefox versions, with or without OMTC, and it's very noticeable.

-avih