Talos - Replacing tscroll,tsvg with tscrollx,tsvgx

72 views
Skip to first unread message

Avi Halachmi

unread,
Jul 23, 2013, 6:22:04 PM7/23/13
to dev-pl...@lists.mozilla.org
TL;DR: Talos tsvg,tscroll are affected by timing much more than by rendering performance because they don't stress Firefox. tsvgx,tscrollx stress firefox: their results are different, better, noisier than the old tests. Will soon replace the old tests.


On talos, tsvg and tscroll mostly measure the overall (or average) duration to complete different cases of animation/scroll.

However, recently [1,2,3] it became clear that the results of tsvg and tscroll are affected by timers accuracy much more than by rendering performance.

This happens because the scroll/animation iterations are set to fixed intervals which don't stress Firefox, so the talos result is typically the overall timeouts durations, and Firefox is idle for most of each iteration.

While tscroll iterated initially at 10ms and later at rAF (16.7ms) [2], tsvg was much worse and typically iterated at intervals of 100-200ms [3] making some tests idle for 95%+ of each interval.

As a result, tsvg and tscroll have extremely low variance - except for when timers-related patches land. Needless to say, this doesn't really provide regression info neither on our scroll performance nor on our svg performance.


Introducing: actual stress test.

tsvgx and tscrollx are almost identical to their non-x counterparts, except that they try to iterate as fast as possible, thus making their results much better representations of our svg/scroll rendering performane.

While it could have been implemented using setTimeout(loop, 0), this approach doesn't excercise the entire rendering pipeline. The reason being that the refresh driver (which flushes layout changes) iterates at its own rate (typically 60hz).

So iterating animation with timeouts of 0ms would result in some iterations which are not flushed, therefore also not rendered, and if the rendering bottleneck are flushes/paints (and they are), these would still happen at 60hz rate, almost regardless of how fast the loop iterates at.

The solution to exercise the entire rendering pipeline therefore starts by flushing layouts as fast as possible. This is possible by setting the pref layout.frame_rate to a very high value (we use 10000), which in practice affects only one place at the code: The refresh driver will always use 0ms timeout to its next iteration.

Note that 0ms timeouts at the refresh driver can and do also happen on many real-world cases - when, for one of many valid reason, it takes more than 16ms to flush the layout, or in general, to complete a refresh driver "tick".

Not unexpectedly, this stress test exposed some issues, such as paint starvation [4] and its temporary solution [5], the need for a way to iterate quickly on OS X since by default paints are blocked on vblank [6], and some gaming concerns regarding vsync. Most of these issues are already solved (even if temporarily).

The new tests are already active but hidden, and you can view their results on tbpl by adding ?showall=1 at the url (they're at the T(T) talos item), or if you want to display only these new tests, use https://tbpl.mozilla.org/?showall=1&jobname=rafx .

Due to their different and stressful nature, their results are not comparable with the old tests. Also, they're noisier (but still acceptable), but hopefully they'll provide us better rendering regressions detection than the old tests.

So expect the old tsvg,tscroll to be replaced by the new tests soon [7].

Any comments on the new approach or otherwise are welcome.


[1] Bug 590422: remove timers filter - which improves timers accuracy, resulted in several talos regressions, which initiated this research into tsvg, tscroll and "ASAP iterations" mode.
[2] Bug 845943: tscroll depends incorrectly on timing.
[3] Bug 854746: tsvg regression (from having better timers).
[4] Bug 880036: Firefox can hang (paint starvation) temporarily when it can't keep up iterations.
[5] Bug 884955: temporary measure to prevent paint starvation.
[6] Bug 888899: allow fast iterations on OS X (because otherwise it blocks on vblank).
[7] Bug 897054: replace tsvg,tscroll with the x versions.

Ted Mielczarek

unread,
Jul 29, 2013, 7:33:07 AM7/29/13
to dev-pl...@lists.mozilla.org
On 7/23/2013 6:22 PM, Avi Halachmi wrote:
> TL;DR: Talos tsvg,tscroll are affected by timing much more than by rendering performance because they don't stress Firefox. tsvgx,tscrollx stress firefox: their results are different, better, noisier than the old tests. Will soon replace the old tests.
I just wanted to say thanks for actually digging into these tests and
finding a way to change them to measure what you care about. We have a
*lot* of Talos tests that get run and they're not always strongly-owned,
so making sure the data they generate is useful is an extremely valuable
task. I wish all of our Talos tests would get this kind of scrutiny!

-Ted

Reply all
Reply to author
Forward
0 new messages