Talos e10s dashboard

William Lachance

unread,

Feb 26, 2016, 6:58:54 PM2/26/16

to

Hey,

I wrote up a dashboard for tracking the performance delta between
non-e10s and e10s on the Talos tests on nightly:

https://treeherder.allizom.org/perf.html#/e10s

(sometime next week, https://treeherder.mozilla.org/perf.html#/e10s
will work too)

Note that the tests are not always measuring exactly the same thing, so
be careful of making naive judgements based on this data (see
https://bugzilla.mozilla.org/show_bug.cgi?id=1250620 for more details).
Still, I thought this might be generally interesting so decided to post
a link to it here.

Will

Jim Mathies

unread,

Feb 27, 2016, 10:09:47 AM2/27/16

to

On Friday, February 26, 2016 at 5:58:54 PM UTC-6, William Lachance wrote:
> Hey,
>
> I wrote up a dashboard for tracking the performance delta between
> non-e10s and e10s on the Talos tests on nightly:
>
> https://treeherder.allizom.org/perf.html#/e10s

This is great, thanks!

A couple questions - what are the run values here? For example, "100 base / 139 new", I'm curious what this means. Also, if we land a fix or someone lands a patch that causes a regression on inbound, how long would it take for that to affect values in this dashboard?

Jim

William Lachance

unread,

Feb 27, 2016, 1:22:16 PM2/27/16

to

On 2016-02-27 10:09 AM, Jim Mathies wrote:
> On Friday, February 26, 2016 at 5:58:54 PM UTC-6, William Lachance wrote:
>> Hey,
>>
>> I wrote up a dashboard for tracking the performance delta between
>> non-e10s and e10s on the Talos tests on nightly:
>>
>> https://treeherder.allizom.org/perf.html#/e10s
>
> This is great, thanks!
>
> A couple questions - what are the run values here? For example, "100 base / 139 new", I'm curious what this means.

It refers to the number of times the talos test was run for non-e10s and
e10s, over the entire sample of pushes. 100 base / 139 new == 100 test
runs for non-e10s, 139 test runs for e10s.

> Also, if we land a fix or someone lands a patch that causes a regression on inbound, how long would it take for that to affect values in this dashboard?

We sample talos data for pushes in the last 48 hours, so it would take
that long for the dashboard to be fully updated after the patch is
landed. Some talos tests are rather noisy, so I felt we needed this much
data to be confident in the results on the graph in all cases.

You can always monitor the graphs themselves if you want to get more
up-to-date information (hovering over a row will give you link), and of
course :jmaher and I are monitoring perfherder for large changes here:

https://treeherder.allizom.org/perf.html#/alerts

Will

Chris Peterson

unread,

Feb 29, 2016, 1:52:46 AM2/29/16

to

Will, this dashboard looks great! The current e10s release criteria
allows regressions up to 5% on the following Talos tests. Is it possible
to configure your e10s dashboard to display acceptable (<= 5%)
regressions on these particular tests using another color besides red?
Perhaps yellow to show that the e10s results are worse than non-e10s,
but not failure red?

glterrain
sessionrestore
TART
tpaint
tresize
tps
tp5
tp5o
tcanvasmark
tsvgx
tsvgr_opacity

chris

William Lachance

unread,

Mar 1, 2016, 12:57:14 PM3/1/16

to

On 2016-02-29 1:52 AM, Chris Peterson wrote:

> Will, this dashboard looks great! The current e10s release criteria
> allows regressions up to 5% on the following Talos tests. Is it possible
> to configure your e10s dashboard to display acceptable (<= 5%)
> regressions on these particular tests using another color besides red?
> Perhaps yellow to show that the e10s results are worse than non-e10s,
> but not failure red?
>

> ...

After talking with cpeterson on irc, I came up with a mode for the
dashboard which hides everything except for "regressions blocking the
release" according to the criteria here:

https://wiki.mozilla.org/index.php?title=Electrolysis/Release_Criteria

You can see this view here:

https://treeherder.allizom.org/perf.html#/e10s?showOnlyBlockers=1

Also, mconley suggested being able to compare the results of individual
subtests. You can access this view for any given talos test by hovering
over the line in the comparison and selecting "subtests". This sometimes
give interesting data, for instance on "tps" some pages are clearly
causing more problems than others:

https://treeherder.allizom.org/perf.html#/e10s_comparesubtest?baseSignature=fe016968d213834efd424ca88680cfa7490b6c09&e10sSignature=5c199ff7bd97284c5f3820ba908f92275620cd8b

(notice how aljaazera.net has a consistent ~450% regression!)

Will

Chris Peterson

unread,

Mar 1, 2016, 6:29:25 PM3/1/16

to

On 3/1/16 9:57 AM, William Lachance wrote:
> Also, mconley suggested being able to compare the results of individual
> subtests. You can access this view for any given talos test by hovering
> over the line in the comparison and selecting "subtests". This sometimes
> give interesting data, for instance on "tps" some pages are clearly
> causing more problems than others:
>
> https://treeherder.allizom.org/perf.html#/e10s_comparesubtest?baseSignature=fe016968d213834efd424ca88680cfa7490b6c09&e10sSignature=5c199ff7bd97284c5f3820ba908f92275620cd8b
>
>
> (notice how aljaazera.net has a consistent ~450% regression!)

This looks great, Will. Good catch on the aljazeera.net problem. The
other outliers are mail.ru (at ~410% regression) and guardian.co.uk (at
~380% regression). We should probably file bugs for those individual
sites. :)

Gabor Krizsanits

unread,

Mar 2, 2016, 6:47:05 AM3/2/16

to Chris Peterson, dev-platform

I've just visited guardian.co.uk, *(Bug 1252822*
<https://bugzilla.mozilla.org/show_bug.cgi?id=1252822>) scrolling seems
quite bad... :(

On Wed, Mar 2, 2016 at 12:29 AM, Chris Peterson <cpet...@mozilla.com>
wrote:

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>