proposal: replace talos with inline tests

Jim Mathies

unread,

Mar 4, 2013, 8:15:56 AM3/4/13

to dev-pl...@lists.mozilla.org

For metrofx we’ve been working on getting omtc and apzc running in the browser. One of the things we need to be able to do is run performance tests that tell us whether or not the work we’re doing is having a positive effect on perf. We currently don’t have automated tests up and running for metrofx and talos is even farther off.

So to work around this I’ve been putting together some basic perf tests I can use to measure performance using the mochitest framework. I’m wondering if this might be a useful answer to our perf tests problems long term.

Putting together talos tests is a real pain. You have to write a new test using the talos framework (which is a separate repo from mc), test the test to be sure it’s working, file rel eng bugs on getting it integrated into talos test runs, populated in graph server, and tested via staging to be sure everything is working right. Overall the overhead here seems way too high.

Maybe we should consider changing this system so devs can write performance tests that suit their needs that are integrated into our main repo? Basically:

1) rework graphs server to be open ended so that it can accept data from test runs within our normal test frameworks.
2) develop of test module that can be included in tests that allows test writers to post performance data to graph server.
3) come up with a good way to manage the life cycle of active perf tests so graph server doesn’t become polluted.
4) port existing talos tests over to the mochitest framework.
5) drop talos.

Curious what people think of this idea.

Jim

Ed Morley

unread,

Mar 4, 2013, 8:42:39 AM3/4/13

to Jim Mathies, auto-...@mozilla.com, dev-pl...@lists.mozilla.org

(CCing auto-...@mozilla.com)

jmaher and jhammel will be able to comment more on the talos specifics,
but few thoughts off the top of my head:

It seems like we're conflating multiple issues here:
1) "[talos] is a separate repo from mc"
2) "[it's a hassle to] test the test to be sure it’s working"
3) "[it's a hassle to get results] populated in graph server"
4) "[we need to] come up with a good way to manage the life cycle of

active perf tests so graph server doesn’t become polluted"

Switching from the talos harness to mochitest doesn't fix #2 (we still
have to test, and I don't see how it magically becomes any easier
without extra work - that could have been applied to talos instead) or
#3/#4 (orthogonal problem). It also seems like a brute force way of
fixing #1 (we could just check talos into mozilla-central).

Instead, I think we should be asking:
1) Is the best test framework for performance testing: [a] talos (with
improvements), [b] mochitest (with a significant amount of work to make
it compatible), or [c] a brand new framework?
2) Regardless of framework used, would checking it into mozilla-central
improve dev workflow enough to outweigh the downsides (see bug 787200
for history on that discussion)?
3) Regardless of framework used, how can we make the
development/testing/staging cycle less painful?
4) Regardless of framework used, who should be responsible for ensuring
we actively prune performance tests that are no longer relevant?

Note also that graphs.mozilla.org will be depreciated soon, in favour
of datazilla - which afaik is less painful for adding new test suites
(eg doesn't need manual database changes); jeads can say more on that
front.

Best wishes,

Ed

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform

Joel Maher

unread,

Mar 4, 2013, 8:59:41 AM3/4/13

to Ed Morley, auto-...@mozilla.com, Jim Mathies, dev-pl...@lists.mozilla.org

Some thoughts on the subject-

I would argue against running performance tests inside of mochitest. The main reason is that mochitest has a lot of profile stuff for testing as well as many other tests bundled inside of the same browser session. For a standalone metric unrelated to a user scenario, we could consider performance style tests into mochitest.

In the process of creating Datazilla, we have found endless little quirks in the end to end system how performance works. As time goes on we have continued to push forward with the goal of making a performance system that can detect regressions automatically when the test finishes.

For the last few months we have had data going both to Datazilla and graph server and have been refining our assumptions and tools along the way. When graph server is deprecated in the near future, it will be REALLY EASY to add new tests to the collection and reporting system. That doesn't solve the problem of making it easy to add or adjust a test in the test runners (buildbot scripts), but it solves half the problem.

Many of the talos tests are old and outdated and while we have tried to find owners for the tests, it has been a failing effort. To that tune, we have disabled some Talos tests which nobody had interest in anymore. If there are tests which people feel are not useful, we should disable those ASAP to reduce our load on our infrastructure and work on creating a test which people care about.

-Joel

Jim Mathies

unread,

Mar 4, 2013, 9:16:31 AM3/4/13

to dev-pl...@lists.mozilla.org

Good points, comments below.

"Ed Morley" <emo...@mozilla.com> wrote in message
news:<mailman.1992.13624045...@lists.mozilla.org>...

> (CCing auto-...@mozilla.com)
>
> jmaher and jhammel will be able to comment more on the talos specifics,
> but few thoughts off the top of my head:
>
> It seems like we're conflating multiple issues here:
> 1) "[talos] is a separate repo from mc"
> 2) "[it's a hassle to] test the test to be sure it’s working"
> 3) "[it's a hassle to get results] populated in graph server"
> 4) "[we need to] come up with a good way to manage the life cycle of
> active perf tests so graph server doesn’t become polluted"
>
> Switching from the talos harness to mochitest doesn't fix #2 (we still
> have to test, and I don't see how it magically becomes any easier without
> extra work - that could have been applied to talos instead)

I disagree here, very few devs are familiar with the talos framework and
what it takes to get a new test written. Everyone is very familiar with
mochitest and other related test frameworks on mc. I can write a mochitest
to test perf in something simple like scrolling in about an hour. Putting
together a talos scroll test would take much longer. If Talos were on mc it
would help, but integrating into existing test frameworks we have and use on
a regular basis seems like the simplest approach with the least amount of
overhead.

> Instead, I think we should be asking:
> 1) Is the best test framework for performance testing: [a] talos (with
> improvements), [b] mochitest (with a significant amount of work to make it
> compatible), or [c] a brand new framework?

On [b] there might be a significant amount of work in getting infra pieces
to work maybe (like graph server or whatever we plan to replace it with) but
not in writing a import module that devs would use to post data.

> 2) Regardless of framework used, would checking it into mozilla-central
> improve dev workflow enough to outweigh the downsides (see bug 787200 for
> history on that discussion)?

Maybe we might want to keep talos around for "big, important tests". But I
think devs need a way to run perf tests on a smaller scale that doesn't
involve infra changes. I think having this ability would be a big win for
us.

Jim

Boris Zbarsky

unread,

Mar 4, 2013, 10:00:35 AM3/4/13

to

On 3/4/13 8:15 AM, Jim Mathies wrote:
> So to work around this I’ve been putting together some basic perf tests I can use to measure performance using the mochitest framework.

How are you dealing with the fact that mochitest runs on heterogeneous
hardware (including VMs and the like last I checked, which could have
arbitrarily bad (or good!) performance characteristics depending on what
else is happening with the host system)?

> Maybe we should consider changing this system so devs can write performance tests that suit their needs that are integrated into our main repo? Basically:
>
> 1) rework graphs server to be open ended so that it can accept data from test runs within our normal test frameworks.
> 2) develop of test module that can be included in tests that allows test writers to post performance data to graph server.
> 3) come up with a good way to manage the life cycle of active perf tests so graph server doesn’t become polluted.
> 4) port existing talos tests over to the mochitest framework.
> 5) drop talos.

This sounds plausible, modulo the inability to port Tp in its current
state to a setup that involves the tests living in m-c, as long as the
problem above is kept in mind. Basically, reusing something
mochitest-like for developer familiarity may make sense, but it would
need to be a separate test suite run on completely separate test slaves
that are actually set up with performance testing in mind. A separate
test suite which is like mochitest is not a problem per se (we have the
ipcplugins, chrome, browserchrome, a11y tests already).

So the main win would be making it easier to add new tests in terms of
number of actions to be taken (something it seems like we could improve
with the current Talos setup too) and easier for developers to add tests
because the framework is already similar, right?

-Boris

Gregory Szorc

unread,

Mar 4, 2013, 12:36:03 PM3/4/13

to Jim Mathies, dev-pl...@lists.mozilla.org

On 3/4/13 5:15 AM, Jim Mathies wrote:
> For metrofx we’ve been working on getting omtc and apzc running in the browser. One of the things we need to be able to do is run performance tests that tell us whether or not the work we’re doing is having a positive effect on perf. We currently don’t have automated tests up and running for metrofx and talos is even farther off.
>
> So to work around this I’ve been putting together some basic perf tests I can use to measure performance using the mochitest framework. I’m wondering if this might be a useful answer to our perf tests problems long term.
>
> Putting together talos tests is a real pain. You have to write a new test using the talos framework (which is a separate repo from mc), test the test to be sure it’s working, file rel eng bugs on getting it integrated into talos test runs, populated in graph server, and tested via staging to be sure everything is working right. Overall the overhead here seems way too high.
>

> Maybe we should consider changing this system so devs can write performance tests that suit their needs that are integrated into our main repo? Basically:
>
> 1) rework graphs server to be open ended so that it can accept data from test runs within our normal test frameworks.
> 2) develop of test module that can be included in tests that allows test writers to post performance data to graph server.
> 3) come up with a good way to manage the life cycle of active perf tests so graph server doesn’t become polluted.
> 4) port existing talos tests over to the mochitest framework.
> 5) drop talos.
>

> Curious what people think of this idea.

Generally speaking, I think we should have a generic framework for
declaring tests. i.e. test files for xpcshell, mochitest, Talos, etc
would all look very similar from a JS perspective. I've been wanting to
unify the in-test code for a while and over the weekend I put together a
very rough draft of what I think this should look like [1]. Please
criticize it.

If all your tests are declared the same way, then presumably the test
running code would be similar and capturing performance data would
require a single implementation affecting all test suites instead of N
1-off solutions.

I'm of the opinion that would should generally collect tons of data from
all of our testing frameworks and then sort out the meaning of that data
later (e.g. ignore data from tests running on non-homogenous or
unreliable hardware). Maybe we don't care about things like rev X-Y
comparison of CPU cycles on an individual mochitest. But, we'd certainly
be interested if we saw an individual mochitest's CPU cycle count or
wall time double over the span of a month! You can't even raise eyebrows
unless you have data. We don't have this data today. Even if we did, it
would require separate implementations for each testing flavor
(xpcshell, mochitest, etc).

We should unify our test running code as much as possible. Then, we
should make decisions on whether it makes sense to collect and/or assess
performance data in each execution context/test flavor.

[1] https://gist.github.com/indygreg/5073810

Jim Mathies

unread,

Mar 4, 2013, 2:50:42 PM3/4/13

to dev-pl...@lists.mozilla.org

"Boris Zbarsky" <bzba...@mit.edu> wrote in message news:<o7ydnYp6N66OKqnM...@mozilla.org>...

> On 3/4/13 8:15 AM, Jim Mathies wrote:
> > So to work around this I’ve been putting together some basic perf tests I can use to measure performance using the mochitest framework.
>

> How are you dealing with the fact that mochitest runs on heterogeneous
> hardware (including VMs and the like last I checked, which could have
> arbitrarily bad (or good!) performance characteristics depending on what
> else is happening with the host system)?

That sounds like a rel eng problem that could be solved. I don’t know our enough about our test slaves to say for sure.

> This sounds plausible, modulo the inability to port Tp in its current
> state to a setup that involves the tests living in m-c, as long as the
> problem above is kept in mind. Basically, reusing something
> mochitest-like for developer familiarity may make sense, but it would
> need to be a separate test suite run on completely separate test slaves
> that are actually set up with performance testing in mind. A separate
> test suite which is like mochitest is not a problem per se (we have the
> ipcplugins, chrome, browserchrome, a11y tests already).

That's fine, I'm not married to mochitest, but something similar using the similar run characteristics would be best.

> So the main win would be making it easier to add new tests in terms of
> number of actions to be taken (something it seems like we could improve
> with the current Talos setup too) and easier for developers to add tests
> because the framework is already similar, right?
>
> -Boris

Yes, basically -

1) something checked into mc anyone can easily author or run (for tracking down regressions) without having to checkout a separate repo, or setup and run a custom perf test framework.
2) performance tests that generate data that spits out to the console on local runs or could be posted to a graphs server in automation.
3) no releng overhead for setup of new perf tests. something that is built into the test framework / infrastructure we set up.

Jim

Justin Lebar

unread,

Mar 4, 2013, 3:25:29 PM3/4/13

to Jim Mathies, dev-pl...@lists.mozilla.org

> 1) something checked into mc anyone can easily author or run (for tracking down regressions) without having to checkout a separate repo, or setup and run a custom perf test framework.

I don't oppose the gist of what you're suggesting here, but please
keep in mind that small perf changes are often very difficult to track
down locally. Small changes in system and toolchain configuration can
have large effects on average build speed and its variance. For
example, I've found observable performance differences between Try and
m-c/m-i builds in the past (bug 653961), despite their build configs
being nearly identical.

In my experience, we spend the majority of our time trying to track
down small perf changes, so a change which makes it easier to track
down the source of large perf changes might not have an outsize
effect.

> 3) no releng overhead for setup of new perf tests. something that is built into the test framework / infrastructure we set up.

If we did this, we'd need to figure out how and when to promote
benchmarks to "we care about them" status.

We already don't back back out changes for regressing a benchmark like
we back them out for regressing tests. I think this is at least
partially because a general sentiment that not all of our benchmarks
correlate strongly to what they're trying to measure.

I suspect if anyone could check in a benchmark, the average quality of
benchmarks would likely stay roughly the same, but the number of
benchmarks would increase. In that case we'd have even more
benchmarks with spurious regressions to deal with.

-Justin

Justin Dolske

unread,

Mar 4, 2013, 7:25:45 PM3/4/13

to

On 3/4/13 9:36 AM, Gregory Szorc wrote:

> If all your tests are declared the same way, then presumably the test
> running code would be similar and capturing performance data would
> require a single implementation affecting all test suites instead of N
> 1-off solutions.

We've talked about this before (perhaps in this very newsgroup), as a
cheap (?) way to get extra perf measurements beyond our current limited
set of tests, and to avoid having to add a new test suite/framework
whenever someone wants a metric... E.G. measure the run time of each
existing test, use scripts to figure out which ones are fairly stable
over time, then watch for regressions. A chance to begin again in a
orange land of opportunity and adventure!

But I'd also take the general ability to add a new test as a microbenchmark.

> We should unify our test running code as much as possible.

Oh god yes please.

Justin

Dave Mandelin

unread,

Mar 4, 2013, 7:47:10 PM3/4/13

to dev-pl...@lists.mozilla.org, jma...@mozilla.com, Taras Glek

On Monday, March 4, 2013 5:15:56 AM UTC-8, Jim Mathies wrote:
> For metrofx we’ve been working on getting omtc and apzc running in the browser. One of the things we need to be able to do is run performance tests that tell us whether or not the work we’re doing is having a positive effect on perf. We currently don’t have automated tests up and running for metrofx and talos is even farther off.
>
> So to work around this I’ve been putting together some basic perf tests I can use to measure performance using the mochitest framework. I’m wondering if this might be a useful answer to our perf tests problems long term.

I think this is an incredibly interesting proposal, and I'd love to see something like it go forward. Detailed reactions below.

> Putting together talos tests is a real pain. You have to write a new test using the talos framework (which is a separate repo from mc), test the test to be sure it’s working, file rel eng bugs on getting it integrated into talos test runs, populated in graph server, and tested via staging to be sure everything is working right. Overall the overhead here seems way too high.

Yup. And that's a big problem. Not only does this make your life harder, it makes people not do as much performance testing as they otherwise might. The JS team has had the experience that adding a new way of creating correctness tests incredibly easy (with *zero* overhead in the common case) really helped get more tests written and used. So I think it would be great to make it a lot easier to write perf tests.

> Maybe we should consider changing this system so devs can write performance tests that suit their needs that are integrated into our main repo? Basically:
>
> 1) rework graphs server to be open ended so that it can accept data from test runs within our normal test frameworks.

IIUC, something like this is a key requirement: letting any perf test feed into the reporting system. People have pointed out that the Talos tests run on selected machines, so the perf tests should probably run on them as well, rather than on the correctness test machines. But that's only a small change to the basic idea, right?

> 2) develop of test module that can be included in tests that allows test writers to post performance data to graph server.

Does that mean a mochitest module? This part seems optional, although certainly useful. Some tests will require non-mochitest frameworks.

I believe jmaher did some work to get in-browser standard JS benchmarks running automatically and reporting to graph-server. I'm curious how that would fit in with this idea--would the test module help at all, or could there be some other kind of more general module maybe, so that even things like standard benchmarks can be self-serve?

> 3) come up with a good way to manage the life cycle of active perf tests so graph server doesn’t become polluted.

:-) How about getting an owner optionally listed for new tests, and then tests will be removed if no one is looking at them (according to web server logs) and there is no owner of record or the owner doesn't say the tests are still needed?

> 4) port existing talos tests over to the mochitest framework.
>
> 5) drop talos.

This seems like it's in the line of "fix Talos". I'm not sure if this particular 4+5 is the right way to go, but the idea has some merit.

To the extent that people don't pay attention to Talos, it seems we really don't need to do anything with it. If people are paying attention to and taking care of performance in their area, then we're covered. To take the example I happen to know best, the JS team uses AWFY to track JS performance on standard benchmarks and additional tests they've decided are useful. So Talos is not needed to track JS performance. Having all the features of the new graph server does sound pretty cool, though.

It appears that there a few areas that are only covered by Talos for now, though. I think in that category we have warm startup time via Ts, and basic layout performance via Tp. I'm not sure about memory, because we do seem to detect increases via Talos, but we also have AWSY, and I don't know whether AWSY obviates Talos memory measurements or not.

For that kind of thing, I'm thinking maybe we should go with the same "teams take care of their own perf tests" idea. Performance is a natural owner for Ts. I'm not entirely sure about Tp, but it's probably layout or DOM. Then those teams could decide if they wanted to switch from Talos to a different framework. If everything's working properly, if the difficulty of reproducing Talos tests locally caused enough problems to warrant it, the owning teams would notice and switch.

Dave

Dave Mandelin

unread,

Mar 4, 2013, 7:47:10 PM3/4/13

to mozilla.de...@googlegroups.com, Taras Glek, jma...@mozilla.com, dev-pl...@lists.mozilla.org

On Monday, March 4, 2013 5:15:56 AM UTC-8, Jim Mathies wrote:

> For metrofx we’ve been working on getting omtc and apzc running in the browser. One of the things we need to be able to do is run performance tests that tell us whether or not the work we’re doing is having a positive effect on perf. We currently don’t have automated tests up and running for metrofx and talos is even farther off.
>
> So to work around this I’ve been putting together some basic perf tests I can use to measure performance using the mochitest framework. I’m wondering if this might be a useful answer to our perf tests problems long term.