what is new in talos, what is coming up

jma...@mozilla.com

unread,

May 1, 2015, 12:40:16 PM5/1/15

to

It is always hard to advertise every change, but there are enough changes to post a brief summary.

What has changed:
1) :bgrins has added a new talos test 'damp' - devtools at maximum performance! This was done in https://bugzilla.mozilla.org/show_bug.cgi?id=1150215
2) Talos doesn't run on OSX 10.6 anymore thanks to https://bugzilla.mozilla.org/show_bug.cgi?id=990490
3) Talos runs on OSX 10.10 on main branches and is slowly replacing OSX 10.8 out with uplifts on older branches.
4) A few android tests were turned off, now we run 3 of them- ones that are useful for tracking performance

What changes are coming up:
1) We are planning on reducing the pages we test for android tests, this is mostly because the raw data is nearly identical to other pages. For reference, this would reduce tp4m in half and tsvgx by a few pages. This work will help us reduce over load as we transition from panda boards to real devices this summer.
2) compare-talos is being integrated into perfherder- This will be detailed in a blog post with examples of how to use it for investigating regressions and fixes. This work has been taking place in https://bugzilla.mozilla.org/show_bug.cgi?id=1142680#c50
3) This quarter we will stop reporting to datazilla, this leaves graph server and perfherder (the performance view inside of treeherder).
4) talos counters will be reviewed and cleaned up this quarter as part of https://bugzilla.mozilla.org/show_bug.cgi?id=1156907
5) we will start having compare-talos style views in bugs we file for performance regressions (https://bugzilla.mozilla.org/show_bug.cgi?id=1150616).

All feedback is welcome. With active work on new tools and deliverables to make things better, you can expect more changes. Of course, the more ideas and feedback we get, the better !!

Brian Grinstead

unread,

May 4, 2015, 2:08:51 PM5/4/15

to jma...@mozilla.com, dev-pl...@lists.mozilla.org

The upcoming changes sound great! Is there currently a way (or plans to add a way) to track regressions / improvements for a single measurement within a test? I see that in perfherder I can add these measurements to a graph (http://mzl.la/1E17Zyo <http://mzl.la/1E17Zyo>) but it’s hard to distinguish between normal variation across runs and an actual regression by looking at the graph.

Brian

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform

Joel Maher

unread,

May 5, 2015, 11:40:45 AM5/5/15

to Brian Grinstead, dev-platform

Great question Brian!

I believe you are asking if we would generate alerts based on individual
tests instead of the summary of tests. The short answer is no. In looking
at reporting the subtest results as new alerts, we found there was a lot of
noise (especially in svgx, tp5o, dromaeo, v8) as compared to the summary
alerts. cart/tart has a bit of noise which might be more realistic, to
report specific pages on, we could investigate this more if there is a
strong need for it.

late last year we cleaned up our summary reporting to be a geometric mean
of all the pages/subtests which were run. This means that we do a better
job of reporting a regression of the test summary when a specific test is
the cause.

One thing we have been working on is a compare mode to perfherder which
replaces compare-talos. This is live (
https://treeherder.mozilla.org/perf.html#/comparechooser - although
changing rapidly) and will do a great job of telling you which specific
test caused a regression.

Do you have concerns about a lack of reporting subtests, or tooling to make
finding the results easier? suggestions are welcome, this is something we
work on regularly and improve to make our lives easier!

-Joel

On Mon, May 4, 2015 at 2:08 PM, Brian Grinstead <bgrin...@mozilla.com>
wrote:

> The upcoming changes sound great! Is there currently a way (or plans to
> add a way) to track regressions / improvements for a single measurement
> within a test? I see that in perfherder I can add these measurements to a

> graph (http://mzl.la/1E17Zyo) but it’s hard to distinguish between normal

jma...@mozilla.com

unread,

Jun 3, 2015, 2:47:02 PM6/3/15

to

Things are still changing with Talos- many things are becoming easier, while others still have kinks to work out- here are a few things which have changed recently:

1) Android talos has been reduced to what is useful- it should run faster and sets us up for migrating to autophone next quarter
2) compare-talos is in perfherder (https://treeherder.mozilla.org/perf.html#/comparechooser), other instances of compare-talos have a warning message at the top indicating you should use perfherder. We will deprecate those instances of compare-talos next quarter completely.
3) datazilla no longer collects talos data. This has been stopped on all branches we care about and we will be turning datazilla off completely next month!
4) talos counters are now streamlined a bit and showing up in perfherder (https://treeherder.mozilla.org/perf.html#/graphs)
5) compare view in perfherder has a more realistic comparison algorithm to point out regressions/improvements.

upcoming work:
1) finish evaluating talos counters - collect only what is useful
2) continue polishing perfherder graphs, compare-view
3) start generating alerts from perfherder (in parallel to graph server)
4) document and work on a smoother method for getting enough data and comparing data points on try pushes, regressions on the tree, and running tests locally.

A lot of good ideas come in from various folks (usually folks who are investigating a regression or worried about a change they are making). While in Whistler, we would love to show folks how these tools currently work, walk through the end game and ideal use cases, and brainstorm on things we are overlooking.

If you are interested, do look for a performance discussion on the schedule. It has yet to be scheduled.

Karl Tomlinson

unread,

Jun 4, 2015, 12:31:38 AM6/4/15

to

Thanks, Joel. I've benefited from being able to use
perf.html#/comparechooser and will look forward to the performance
discussion.

jma...@mozilla.com writes:

> 2) compare-talos is in perfherder
> (https://treeherder.mozilla.org/perf.html#/comparechooser), other instances of
> compare-talos have a warning message at the top indicating you should use
> perfherder. We will deprecate those instances of compare-talos next quarter
> completely.

This is very helpful to present PGO separated and to know that
higher is better for canvasmark, but it is not yet ready to
replace http://perf.snarkfest.net/compare-talos/index.html

The treeherder version seems to randomly choose which and how many
of the results to load and so the comparison changes after
reloads of the page.

> upcoming work:

> 2) continue polishing perfherder graphs, compare-view

Perhaps the above issue is already in this work and you know it
will be addressed by next quarter, but, if not, can we keep the
snarkfest version running please until this is resolved?

Today, the treeherder version is not loading enough results to do a
reasonable comparison, while the snarkfest version doesn't seem to
have the problem and presents results almost instantly.

If I may sneak in a request or two, then a number of results or an
estimate of standard error in the mean would be helpful to know
how to interpret the standard deviations. Also, a way to see
whether higher or lower is better even for those tests without
enough data to detect a statistically significant change would be
a bonus.

I'm keen to find out what is in the "Details" links, but they
currently just ask me to "wait a minute".

William Lachance

unread,

Jun 4, 2015, 1:53:34 PM6/4/15

to

Hi Karl,

On 2015-06-04 12:30 AM, Karl Tomlinson wrote:
> jma...@mozilla.com writes:
>
>> >2) compare-talos is in perfherder
>> >(https://treeherder.mozilla.org/perf.html#/comparechooser), other instances of
>> >compare-talos have a warning message at the top indicating you should use
>> >perfherder. We will deprecate those instances of compare-talos next quarter
>> >completely.
> This is very helpful to present PGO separated and to know that
> higher is better for canvasmark, but it is not yet ready to

> replacehttp://perf.snarkfest.net/compare-talos/index.html

>
> The treeherder version seems to randomly choose which and how many
> of the results to load and so the comparison changes after
> reloads of the page.

So this is a bug. :) Could you please file something here:

https://bugzilla.mozilla.org/enter_bug.cgi?product=Tree%20Management&component=Perfherder

For best results, include reproduction steps and comparison with the
existing snarkfest interface.

>> >upcoming work:
>> >2) continue polishing perfherder graphs, compare-view
> Perhaps the above issue is already in this work and you know it
> will be addressed by next quarter, but, if not, can we keep the
> snarkfest version running please until this is resolved?
>
> Today, the treeherder version is not loading enough results to do a
> reasonable comparison, while the snarkfest version doesn't seem to
> have the problem and presents results almost instantly.
>
> If I may sneak in a request or two, then a number of results or an
> estimate of standard error in the mean would be helpful to know
> how to interpret the standard deviations. Also, a way to see
> whether higher or lower is better even for those tests without
> enough data to detect a statistically significant change would be
> a bonus.

These sound like good ideas. Can't remember whether we've filed issues
for them yet. Feel free to have a look at the perfherder bug list and
file a feature request if something isn't open yet:

https://wiki.mozilla.org/Auto-tools/Projects/Perfherder#Bug_Table

> I'm keen to find out what is in the "Details" links, but they
> currently just ask me to "wait a minute".

Likewise, that sounds like a bug. Clicking on details is supposed to
open up a subtest summary. Here it seems to work fine for me:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=b2ad2f9f8e53&newProject=mozilla-central&newRevision=f21fac50a8cc

Will

Karl Tomlinson

unread,

Jun 4, 2015, 5:42:46 PM6/4/15

to

William Lachance writes:

> Hi Karl,
>
> On 2015-06-04 12:30 AM, Karl Tomlinson wrote:
>> jma...@mozilla.com writes:
>>
>>> >We will deprecate those instances of compare-talos next quarter
>>> >completely.

>> The treeherder version seems to randomly choose which and how many
>> of the results to load and so the comparison changes after
>> reloads of the page.
>
> So this is a bug. :) Could you please file something here:
>
> https://bugzilla.mozilla.org/enter_bug.cgi?product=Tree%20Management&component=Perfherder
>
> For best results, include reproduction steps and comparison with
> the existing snarkfest interface.

Thanks for the link. Knowing the right component is half the
process of filing bugs.

https://bugzilla.mozilla.org/show_bug.cgi?id=1171707

I don't know how to compare with snarkfest, but comparing
treeherder with treeherder is sufficient to observe the bug.

>> can we keep the
>> snarkfest version running please until this is resolved?

My main concern was because I inferred from the previous post that
deprecation of snarkfest was scheduled on a timeline basis.

Can we instead schedule on a when-its-ready basis, please?

>> If I may sneak in a request or two, then a number of results or an
>> estimate of standard error in the mean would be helpful to know
>> how to interpret the standard deviations. Also, a way to see
>> whether higher or lower is better even for those tests without
>> enough data to detect a statistically significant change would be
>> a bonus.

https://bugzilla.mozilla.org/show_bug.cgi?id=1171694
https://bugzilla.mozilla.org/show_bug.cgi?id=1171703

>> I'm keen to find out what is in the "Details" links, but they
>> currently just ask me to "wait a minute".
>
> Likewise, that sounds like a bug. Clicking on details is supposed
> to open up a subtest summary. Here it seems to work fine for me:
>
> https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=b2ad2f9f8e53&newProject=mozilla-central&newRevision=f21fac50a8cc

That link is loading for me, and that info can be very helpful,
thanks. However, others links are not.

https://bugzilla.mozilla.org/show_bug.cgi?id=1171710

jmaher

unread,

Jun 5, 2015, 8:12:49 AM6/5/15

to

>
> >> can we keep the
> >> snarkfest version running please until this is resolved?
>
> My main concern was because I inferred from the previous post that
> deprecation of snarkfest was scheduled on a timeline basis.
>
> Can we instead schedule on a when-its-ready basis, please?
>

yes- we are not doing this on a time basis- datazilla deprecation is on a time schedule- the rest is not. Most likely this is an August/September thing- ideally within the next 4-6 weeks you will be using Perfherder happily for everything!

Thanks for bringing up suggestions and using the new tools.

jma...@mozilla.com

unread,

Jul 27, 2015, 2:22:22 PM7/27/15

to

It has been a while since we posted an update on Talos

Here are some new things:
* bug 1166132 - new tps test - tab switching
* e10s on all platforms, only runs on mozilla-central for pgo builds, broken tests, big regressions are tracked in bug 1144120
* perfherder is easier to use, some polish on test selection and the compare view, and most importantly we have found a few odd bugs that has caused duplicate data to show up, check it out: https://treeherder.mozilla.org/perf.html#/graphs

Here is what is upcoming:
* moving talos source code in-tree (bug 787200)
* starting to move android talos to autophone (bug 1170685)
* perfherder: easier to find it when pushing to try and more polish on selecting which revisions to compare against.
* automatic 5 retriggers for talos jobs on try server

As always if you have issues you can file bugs:
* talos: https://bugzilla.mozilla.org/enter_bug.cgi?product=Testing&component=Talos
* perfherder: https://bugzilla.mozilla.org/enter_bug.cgi?product=Tree%20Management&component=Perfherder

Thanks for responding to regressions when pinged! Expect another update sometime in late August or early September.