Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
The current state of Talos benchmarks
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 26 - 50 of 55 - Collapse all  -  Translate all to Translated (View all originals) < Older  Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Ehsan Akhgari  
View profile  
 More options Aug 30 2012, 5:54 pm
Newsgroups: mozilla.dev.platform
From: Ehsan Akhgari <ehsan.akhg...@gmail.com>
Date: Thu, 30 Aug 2012 17:54:48 -0400
Local: Thurs, Aug 30 2012 5:54 pm
Subject: Re: The current state of Talos benchmarks
On 12-08-30 5:28 PM, Dave Mandelin wrote:

> On Thursday, August 30, 2012 9:11:25 AM UTC-7, Ehsan Akhgari wrote:
>> On 12-08-29 9:20 PM, Dave Mandelin wrote:

>>> On Wednesday, August 29, 2012 4:03:24 PM UTC-7, Ehsan Akhgari wrote:

>> In my opinion, one of the reasons why Talos is disliked is because many
>> people don't know where its code lives (hint:
>> http://hg.mozilla.org/build/talos/) and can't run those tests like other
>> test suites.  I think this would be very valuable to fix, so that
>> developers can read Talos tests like any other test, and fix or improve
>> them where needed.

> It is hard to find. And beyond that, it seems hard to use. It's been a while since I've run Talos locally, but last time I did it was a pain to set up and difficult to run, and I hear it's still kind of like that. For testing tools, "convenient for the developer" is a critical requirement, but has been neglected in the past.

> js/src/jit-test/ is an example of something that is very convenient for developers: creating a test is just adding a .js file to a directory (no manifest or extra files; by default error or crash is a fail, but you can change that for a test), the harness is a Python file with nice options, the test configuration and basic usage is documented in a README, and it lives in the tree.

Absolutely!  We really need to work hard to make them easier to run.  I
hear that the Automation team has already been making progress towards
that goal.

>>>> [...] I believe
>>>> that the bigger problem is that nobody owns watching over these numbers,
>>>> and as a result as take regressions in some benchmarks which can
>>>> actually be representative of what our users experience.

>>> The interesting thing is that we basically have no idea if that's true for any given Talos alarm.

>> That's something that I think should be judged per benchmark.  For
>> example, the Ts measurements will probably correspond very directly to
>> the startup time that our users experience.  The Tp5 measurements don't
>> directly correspond to anything like that, since nobody loads those
>> pages sequentially, but it could be an indication of average page load
>> performance.

> I exaggerated a bit--yes, some tests like Ts are pretty easy to understand and do correspond to user experience. With Tp5, I just don't know--I haven't spent any time trying to use it or looking at regressions, since JS doesn't affect it.

Right.  I think at the very least, on bigger tests like Tp5 we want to
know if something is regressed by a large amount, because that is very
likely to reflect an actual behavior change which is worth knowing about.

>>> - Speaking of false positives, we should seriously start tracking them. We should keep track of each Talos regression found and its outcome. (It would be great to track false negatives too but it's a lot harder to catch them and record them accurately.) That way we'd actually know whether we have a few false positives or a lot, or whether the false positives were coming up on certain tests. And we could use that information to improve the false positive rate over time.

>> I agree.  Do you have any suggestions on how we would track them?

> The details would vary according to the preferences of the person doing it, but I'd sketch it out something like this: when Talos detects a regression, file a bug to "resolve" it (i.e., show that it's not a real regression, show that it's an acceptable regression for the patch, or fix the regression). Then keep a file listing those bugs (with metadata for each: tests regressed, date, component, etc), and as each is closed, mark down the result: false positive, allowed, backed out, or fixed. That's your data set. Of course, various parts of this could be automated but that's not required.

Oh, sorry, I needed to ask my question better.  I'm specifically
wondering who needs to track and investigate the regression if it
happened on a range of, let's say, 5 committers...

Cheers,
Ehsan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "The current state of Talos benchmarks (meeting notes)" by Ehsan Akhgari
Ehsan Akhgari  
View profile  
 More options Aug 30 2012, 5:56 pm
Newsgroups: mozilla.dev.platform
From: Ehsan Akhgari <ehsan.akhg...@gmail.com>
Date: Thu, 30 Aug 2012 17:56:40 -0400
Local: Thurs, Aug 30 2012 5:56 pm
Subject: Re: The current state of Talos benchmarks (meeting notes)
On 12-08-30 5:42 PM, Taras Glek wrote:

> * Joel will revisit maintaining Talos within mozilla-central to reduce
> developer barriers to understanding what a particular Talos test result
> means. This should also make Talos easier to run

I have filed bug 787200 for this discussion.

Ehsan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
L. David Baron  
View profile  
 More options Aug 30 2012, 7:22 pm
Newsgroups: mozilla.dev.platform
From: "L. David Baron" <dba...@dbaron.org>
Date: Thu, 30 Aug 2012 16:22:22 -0700
Local: Thurs, Aug 30 2012 7:22 pm
Subject: Re: The current state of Talos benchmarks (meeting notes)
On Thursday 2012-08-30 14:42 -0700, Taras Glek wrote:

> * Joel will revisit maintaining Talos within mozilla-central to
> reduce developer barriers to understanding what a particular Talos
> test result means. This should also make Talos easier to run

This will also solve one of the other problems that leads developers
to distrust talos, which is that a significant portion of the
performance regressions reported are (or at least were at one time)
the result of changes to the tests, but that changes to the tests
don't show up as part of the list of suspected causes of
regressions.

-David

--
𝄞   L. David Baron                         http://dbaron.org/   𝄂
𝄢   Mozilla                           http://www.mozilla.org/   𝄂


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "The current state of Talos benchmarks" by Rafael Ávila de Espíndola
Rafael Ávila de Espíndola  
View profile  
 More options Aug 30 2012, 7:33 pm
Newsgroups: mozilla.dev.platform
From: Rafael Ávila de Espíndola <respind...@mozilla.com>
Date: Thu, 30 Aug 2012 19:33:53 -0400
Local: Thurs, Aug 30 2012 7:33 pm
Subject: Re: The current state of Talos benchmarks

> Some people have noted in the past that some Talos measurements are not
> representative of something that the users would see, the Talos numbers
> are noisy, and we don't have good tools to deal with these types of
> regressions.  There might be some truth to all of these, but I believe
> that the bigger problem is that nobody owns watching over these numbers,
> and as a result as take regressions in some benchmarks which can
> actually be representative of what our users experience.

I was recently hit by most of the shortcomings you mentioned while
trying to upgrade clang. Fortunately, I found the issue on try, but I
will admit that comparing talos on try is something I only do when I
expect a problem.

I still intend to write a blog post once I am done with the update and
have more data, but some interesting points that showed up so far

* compare-talos and compare.py were out of date. I was really lucky that
one of the benchmarks that still had the old name was the one that
showed the regression. I have started a script that I hope will be more
resilient to future changes. (bug 786504).

* our builds are *really* hard to reproduce. The build I was downloading
from try was faster than the one I was doing locally. In despair I
decided to fix at least part of this first. It found that our build was
depending on the way the bots use ccache (they set CCACHE_BASEDIR which
changes __FILE__), the build directory (shows up on debug info that is
not stripped), and the file system being case sensitive or not.

* testing on linux showed even more bizarre cases where small changes
cause performance problems. In particular, adding a nop *after the last
ret* in function would make the js interpreter faster on sunspider. The
nop was just enough to make the function size cross the next 16 bytes
boundary and that changed the address of every function linked after it.

* the histogram of some benchmarks don't look like a normal distribution
(https://plus.google.com/u/0/108996039294665965197/posts/8GyqMEZHHVR). I
still have to read the paper mentioned in the comments.

> I don't believe that the current situation is acceptable, especially
> with the recent focus on performance (through the Snappy project), and I
> would like to ask people if they have any ideas on what we can do to fix
> this.  The fix might be turning off some Talos tests if they're really
> not useful, asking someone or a group of people to go over these test
> results, get better tools with them, etc.  But _something_ needs to
> happen here.

There are many things we can do to make perf debugging/testing better,
but I don't think that is the main thing we need to do to solve the
problem. The tools we have do work. Try is slow and talos is noisy, but
it is possible to detect and debug regressions.

What I think we need to do is differentiate tests that we expect to
match user experience and synthetic tests. Synthetic tests *are* useful
as they can much more easily find what changed, even if it is something
as silly as the address of some function. The difference is that we
don't want to regress on the tests that match user experience. IMHO we
*can* regress on synthetic ones as long as we know what is going on. And
yes, if a particular synthetic test is too brittle then we should remove it.

With the distinction in place we can then handle perf regressions in a
similar way to how we handle test failures: revert the offending patch
and make the original developer responsible for tracking it down. If a
test is known to regress a synthetic benchmark, a comment on the commit
on the lines of "renaming this file causes __FILE__ to change in an
assert message and produces a spurious regression on md5" should be
sufficient. It is not the developers *fault* that that causes a problem,
but IHMO it should still be his responsibility to track it.

> Cheers,
> Ehsan

Cheers,
Rafael

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "The current state of Talos benchmarks (meeting notes)" by Ben Hearsum
Ben Hearsum  
View profile  
 More options Aug 30 2012, 9:05 pm
Newsgroups: mozilla.dev.platform
From: Ben Hearsum <bhear...@mozilla.com>
Date: Thu, 30 Aug 2012 21:05:36 -0400
Local: Thurs, Aug 30 2012 9:05 pm
Subject: Re: The current state of Talos benchmarks (meeting notes)
On 08/30/12 07:22 PM, L. David Baron wrote:

> On Thursday 2012-08-30 14:42 -0700, Taras Glek wrote:
>> * Joel will revisit maintaining Talos within mozilla-central to
>> reduce developer barriers to understanding what a particular Talos
>> test result means. This should also make Talos easier to run

> This will also solve one of the other problems that leads developers
> to distrust talos, which is that a significant portion of the
> performance regressions reported are (or at least were at one time)
> the result of changes to the tests, but that changes to the tests
> don't show up as part of the list of suspected causes of
> regressions.

This isn't true anymore, actually. While Talos itself isn't stored in
mozilla-central, a pointer to a specific version of it is. The test
machines pull the Talos version specified in
https://mxr.mozilla.org/mozilla-central/source/testing/talos/talos.json
at test time. For example (from
https://tbpl.mozilla.org/php/getParsedLog.php?id=14852731&tree=Firefo...
/tools/buildbot/bin/python talos_from_code.py --talos-json-url
http://hg.mozilla.org/mozilla-central/raw-file/f972f1a71e7e/testing/t...

This means that changes to the Talos suite *are* associated with a
mozilla-central revision, have tests run for them, can be backed out,
can ride trains, etc.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "The current state of Talos benchmarks" by Dave Mandelin
Dave Mandelin  
View profile  
 More options Aug 30 2012, 9:13 pm
Newsgroups: mozilla.dev.platform
From: Dave Mandelin <dmande...@gmail.com>
Date: Thu, 30 Aug 2012 18:13:33 -0700 (PDT)
Local: Thurs, Aug 30 2012 9:13 pm
Subject: Re: The current state of Talos benchmarks

On Thursday, August 30, 2012 2:54:55 PM UTC-7, Ehsan Akhgari wrote:
> Oh, sorry, I needed to ask my question better.  I'm specifically
> wondering who needs to track and investigate the regression if it
> happened on a range of, let's say, 5 committers...

Ah. I believe that's a job for a bugmaster, a position that we don't have filled at the moment. We need one. Perhaps one or more people in QA can step into part of that role, possibly temporarily.

Otherwise, it seems we just have to share the pain. Bisecting changesets is not necessarily an enjoyable job but it is a necessary one. I would suggest that sheriffs pick one of the 5 committers and ask that person to bisect the change and try not to pick the same person repeatedly (unless that person keeps landing the regressions!).

Dave


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Anthony Jones  
View profile  
 More options Aug 30 2012, 10:39 pm
Newsgroups: mozilla.dev.platform
From: Anthony Jones <ajo...@mozilla.com>
Date: Fri, 31 Aug 2012 14:39:14 +1200
Local: Thurs, Aug 30 2012 10:39 pm
Subject: Re: The current state of Talos benchmarks
On 31/08/12 13:13, Dave Mandelin wrote:

> Otherwise, it seems we just have to share the pain. Bisecting changesets is not necessarily an enjoyable job but it is a necessary one. I would suggest that sheriffs pick one of the 5 committers and ask that person to bisect the change and try not to pick the same person repeatedly (unless that person keeps landing the regressions!).

Finding an offending commit within n commits is scriptable.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Anthony Hughes  
View profile  
 More options Aug 30 2012, 11:41 pm
Newsgroups: mozilla.dev.platform
From: Anthony Hughes <ahug...@mozilla.com>
Date: Thu, 30 Aug 2012 20:41:16 -0700 (PDT)
Local: Thurs, Aug 30 2012 11:41 pm
Subject: Re: The current state of Talos benchmarks
I think tracking and investigating is all of our responsibility. QA definitely has a role to play and I think we've been playing that role to a certain extent. We don't always have the skills, knowledge, experience, or time to help but we always try and we are always willing to learn. We rely on Release Management to keep us apprised of what's important and we rely on developers to help us understand the code, tools, and testcases.

Having a Bugmaster will certainly improve things but I don't think it eliminates the necessity, nor the desire for this collaborative dynamic.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "The current state of Talos benchmarks (meeting notes)" by jmaher
jmaher  
View profile  
 More options Aug 31 2012, 3:35 am
Newsgroups: mozilla.dev.platform
From: jmaher <joel.ma...@gmail.com>
Date: Fri, 31 Aug 2012 00:35:10 -0700 (PDT)
Local: Fri, Aug 31 2012 3:35 am
Subject: Re: The current state of Talos benchmarks (meeting notes)

I have backed out changes made to talos and the tests a few times due to performance regressions.  While I might not catch every one, we do treat talos changes as another changeset in m-c.  

If there is an expected shift in numbers, we create a new test.  This is why there are 5+ versions of all the tests.  It really adds a lot of overhead and breakage (e.g. compare-talos), but this way we don't confuse the old test data with the new adjusted tests.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Justin Lebar  
View profile  
 More options Aug 31 2012, 5:37 am
Newsgroups: mozilla.dev.platform
From: Justin Lebar <justin.le...@gmail.com>
Date: Fri, 31 Aug 2012 06:37:19 -0300
Local: Fri, Aug 31 2012 5:37 am
Subject: Re: The current state of Talos benchmarks (meeting notes)
Sorry to continue beating this horse, but I don't think it's quite dead yet:

One of the best things we could do to make finding these regressions
easier is to never coalesce Talos on mozilla-inbound.  It's crazy to
waste developer time bisecting Talos locally when we don't run it on
every push.

Another thing that would help a lot is fixing bug 752002, so people
will stop filtering the e-mails.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "The current state of Talos benchmarks" by Justin Lebar
Justin Lebar  
View profile  
 More options Aug 31 2012, 6:01 am
Newsgroups: mozilla.dev.platform
From: Justin Lebar <justin.le...@gmail.com>
Date: Fri, 31 Aug 2012 07:01:15 -0300
Local: Fri, Aug 31 2012 6:01 am
Subject: Re: The current state of Talos benchmarks

> IMHO we *can* regress on synthetic ones as long as we know what is going on.

It's the requirement that we know what is going on that I think is unreasonable.

Indeed, we /have/ a no not-understood regresisons policy, IIRC.  The
extent to which it's being ignored is at least partially indicative of
how difficult these changes can be to track down.  Rafael's post has
some great examples of how insane tracking down perf regressions can
be.

I really don't think that the right way to go about fixing our
proclivity to regress Talos is to "get tough on regressions" and make
this every committer's problem.  We shouldn't expect committers to
track down the fact that "my change pushes X function down 16 bytes,
which changes some other function's alignment, which, in combination
with a change to __FILE__, affects benchmark Y" as a regular part of
their job.  And it's not clear to me that if we have any tests left if
we eliminated from the tree all tests which are affected by this sort
of thing.

I think the right way to go about this is to first investigate which
tests are stable, and how stable they are (*).  Then a team of
engineers can gain some experience finding and understanding
regressions which occur over some period of time, so we can understand
how feasible it would be to seriously ask developers to do this as a
part of their day-to-day jobs.

I'm not saying it should be OK to regress our performance tests, as a
rule.  But I think we need to acknowledge that hunting regressions can
be time-consuming, and that a policy requiring that all regressions be
understood may hamstring our ability to get anything else done.
There's a trade-off here that we seem to be ignoring.

-Justin

(*) This is essentially SfN.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ehsan Akhgari  
View profile  
 More options Aug 31 2012, 11:32 am
Newsgroups: mozilla.dev.platform
From: Ehsan Akhgari <ehsan.akhg...@gmail.com>
Date: Fri, 31 Aug 2012 11:32:25 -0400
Local: Fri, Aug 31 2012 11:32 am
Subject: Re: The current state of Talos benchmarks
On 12-08-31 6:01 AM, Justin Lebar wrote:

> I'm not saying it should be OK to regress our performance tests, as a
> rule.  But I think we need to acknowledge that hunting regressions can
> be time-consuming, and that a policy requiring that all regressions be
> understood may hamstring our ability to get anything else done.
> There's a trade-off here that we seem to be ignoring.

There is definitely a trade-off here, and at least for the past year
(and maybe for the past two years) we have in practice been weighing on
the side of the difficulty of tracking down performance regression to
the point that we've been ignoring them (except for perhaps a few people.)

It is a mistake to take Rafael's example and extend it to the average
regression that we measure on Talos.  It's true that sometimes those
things happen, and in practice we cannot deal with them all, because we
don't have an army of Rafaels.  But it bothers me when people take an
example of a very difficult to understand regression encountered by a
person who bravely dwells with low-level compiler code generation stuff
and extend it to come up with a policy covering all regressions.
Please, let's not do that.

And let's remember the other side of the trade-off too.  A lot of blood
and tears has gone into shaving off milliseconds from our startup time.
  Taking a ~5% hit on startup time within a 6-week cycle effectively
means that we have undone man-months of optimizations which have
happened to the startup time.  So it's not like letting these
regressions in beneath our noses is going to make us all more productive.

There are extremely non-stable Talos tests, and relatively stable ones.
  Let's focus on the relatively stable ones.  There are extremely hard
to diagnose performance regressions, and extremely easy ones (i.e.,
let's not wait on this lock, do this I/O, run this exponential
algorithm, load tons of XUL/XBL when a window opens, etc.)  We have many
great tools for the job, so not all regressions need to be treated the same.

Cheers,
Ehsan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "The current state of Talos benchmarks (meeting notes)" by Ehsan Akhgari
Ehsan Akhgari  
View profile  
 More options Aug 31 2012, 11:38 am
Newsgroups: mozilla.dev.platform
From: Ehsan Akhgari <ehsan.akhg...@gmail.com>
Date: Fri, 31 Aug 2012 11:38:05 -0400
Local: Fri, Aug 31 2012 11:38 am
Subject: Re: The current state of Talos benchmarks (meeting notes)
On 12-08-31 5:37 AM, Justin Lebar wrote:

> Sorry to continue beating this horse, but I don't think it's quite dead yet:

> One of the best things we could do to make finding these regressions
> easier is to never coalesce Talos on mozilla-inbound.  It's crazy to
> waste developer time bisecting Talos locally when we don't run it on
> every push.

In order to help kill that horse, I filed bug 787447 and CCed John on
it.  :-)

Ehsan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "The current state of Talos benchmarks" by Chris AtLee
Chris AtLee  
View profile  
 More options Aug 31 2012, 11:45 am
Newsgroups: mozilla.dev.platform
From: Chris AtLee <cat...@mozilla.com>
Date: Fri, 31 Aug 2012 11:45:20 -0400
Local: Fri, Aug 31 2012 11:45 am
Subject: Re: The current state of Talos benchmarks
On 31/08/12 11:32 AM, Ehsan Akhgari wrote:> There are extremely
non-stable Talos tests, and relatively stable ones.
 >   Let's focus on the relatively stable ones.  There are extremely hard
 > to diagnose performance regressions, and extremely easy ones (i.e.,
 > let's not wait on this lock, do this I/O, run this exponential
 > algorithm, load tons of XUL/XBL when a window opens, etc.)  We have many
 > great tools for the job, so not all regressions need to be treated the
 > same.

What value do the extremely non-stable Talos tests have? Shouldn't we
stop running them if they're not giving useful information?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Justin Lebar  
View profile  
 More options Aug 31 2012, 11:47 am
Newsgroups: mozilla.dev.platform
From: Justin Lebar <justin.le...@gmail.com>
Date: Fri, 31 Aug 2012 12:46:44 -0300
Local: Fri, Aug 31 2012 11:46 am
Subject: Re: The current state of Talos benchmarks

> There are extremely non-stable Talos tests, and relatively stable ones.
> Let's focus on the relatively stable ones.

It's not exclusively a question of noise in the tests.  Even
regressions in stable tests are sometimes hard to track down.  I spent
two months trying to figure out why I could not reproduce a Dromaeo
regression I saw on m-i using try, and eventually gave up (bug
653961).

It's great if we try to track down this mysterious 5% startup
regression.  We shouldn't ignore important regressions.  But what I
object to is the idea that if I regress Dromaeo DOM by 2%, I'm
automatically backed out and prevented from doing any work until I
prove that the problem is I changed filename somewhere.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
jmaher  
View profile  
 More options Aug 31 2012, 11:49 am
Newsgroups: mozilla.dev.platform
From: jmaher <joel.ma...@gmail.com>
Date: Fri, 31 Aug 2012 08:49:50 -0700 (PDT)
Local: Fri, Aug 31 2012 11:49 am
Subject: Re: The current state of Talos benchmarks
There are a few issues here which should be easy to address and a few other issues which are not so easy to address.

First off everybody who is interested in talos should read the wiki page:
https://wiki.mozilla.org/Buildbot/Talos

This explains where the code lives, what tests we run and on which branches and many other details on what is done to run talos.  If you want to run talos locally, check out these instructions:
https://wiki.mozilla.org/Buildbot/Talos#Running_locally_-_Source_Code

Another concern I have read in this thread and have heard over the last few months is why are we even running these tests as they are old, irrelevant and nobody looks at them.  A valid concern and something I have asked myself many times while working on Talos.  I took it upon myself earlier this summer to find a developer who is a point of contact for each and every test we run.  Then we figured out if the tests were relevant and testing things we care about.  Many tests have been updated/added/disabled in the last couple months.  

A similar complaint is about the noise in the numbers and how we can realistically detect a regression or gain value.  For minor regressions our current toolchain will not be very effective.  A lot of work has been done to look into how we run tests, the tools we use and if we can apply different models to the numbers to gain more reliable data.  Most of that work is documented in the Signal from Noise project: https://wiki.mozilla.org/Auto-tools/Projects/Signal_From_Noise.  I encourage folks to join into the public meetings we have to learn more about how we are actually solving this problem.

Back on subject, we want to detect regressions to the exact changeset as well as reducing our false positives that get mailed to dev.tree-management.  There is probably no silver bullet or policy we can create today which will fix our problems.  There is a big lag between a current patch's run of talos and when we get a notification in dev.tree-management.  For large regressions this can be detected by visually looking at graph server (we have links to everything from tbpl), but for small regressions, you have to see this over time as a minor increase could look like the regular noise we have in our numbers.

Coming from a talos tool maintainer perspective, I am committed to making talos easy to run and documented so we can all work on fixing regressions instead of offering sacrifices to the try server.  When there are requests for features, fixes or test adjustments somebody on the A*Team usually will resolve it quickly.  While this only solves some of the pain, it is a step in the right direction until Signal From Noise can come out and solve a large portion of the other problems.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dave Mandelin  
View profile  
 More options Aug 30 2012, 5:28 pm
Newsgroups: mozilla.dev.platform
From: Dave Mandelin <dmande...@gmail.com>
Date: Thu, 30 Aug 2012 14:28:01 -0700 (PDT)
Local: Thurs, Aug 30 2012 5:28 pm
Subject: Re: The current state of Talos benchmarks

On Thursday, August 30, 2012 9:11:25 AM UTC-7, Ehsan Akhgari wrote:
> On 12-08-29 9:20 PM, Dave Mandelin wrote:

> > On Wednesday, August 29, 2012 4:03:24 PM UTC-7, Ehsan Akhgari wrote:

> In my opinion, one of the reasons why Talos is disliked is because many
> people don't know where its code lives (hint:
> http://hg.mozilla.org/build/talos/) and can't run those tests like other
> test suites.  I think this would be very valuable to fix, so that
> developers can read Talos tests like any other test, and fix or improve
> them where needed.

It is hard to find. And beyond that, it seems hard to use. It's been a while since I've run Talos locally, but last time I did it was a pain to set up and difficult to run, and I hear it's still kind of like that. For testing tools, "convenient for the developer" is a critical requirement, but has been neglected in the past.

js/src/jit-test/ is an example of something that is very convenient for developers: creating a test is just adding a .js file to a directory (no manifest or extra files; by default error or crash is a fail, but you can change that for a test), the harness is a Python file with nice options, the test configuration and basic usage is documented in a README, and it lives in the tree.

> >> [...] I believe
> >> that the bigger problem is that nobody owns watching over these numbers,
> >> and as a result as take regressions in some benchmarks which can
> >> actually be representative of what our users experience.

> > The interesting thing is that we basically have no idea if that's true for any given Talos alarm.

> That's something that I think should be judged per benchmark.  For
> example, the Ts measurements will probably correspond very directly to
> the startup time that our users experience.  The Tp5 measurements don't
> directly correspond to anything like that, since nobody loads those
> pages sequentially, but it could be an indication of average page load
> performance.

I exaggerated a bit--yes, some tests like Ts are pretty easy to understand and do correspond to user experience. With Tp5, I just don't know--I haven't spent any time trying to use it or looking at regressions, since JS doesn't affect it.

> >> I don't believe that the current situation is acceptable, especially
> >> with the recent focus on performance (through the Snappy project), and I
> >> would like to ask people if they have any ideas on what we can do to fix
> >> this.  The fix might be turning off some Talos tests if they're really
> >> not useful, asking someone or a group of people to go over these test
> >> results, get better tools with them, etc.  But _something_ needs to
> >> happen here.
> > - Second, as you say, get an owner for performance regressions. There are lots of ways we could do this. I think it would integrate fairly easily into our existing processes if we (automatically or by a designated person) filed a bug for each regression and marked it tracking (so the release managers would own followup). Alternately, we could have a designated person own followup. I'm not sure if that has any advantages, but release managers would probably know. But doing any of this is going to severely annoy engineers unless we get the false positive rate under control.

> Note that some of the work of to differentiate between false positives
> and real regressions needs to be done by the engineers, similar to the
> work required to investigate correctness problems.  And people need to
> accept that seemingly benign changes may also cause real performance
> regressions, so it's not always possible to glance over a changeset and
> say "nah, this can't be my fault."  :-)

Agreed.

> > - Speaking of false positives, we should seriously start tracking them. We should keep track of each Talos regression found and its outcome. (It would be great to track false negatives too but it's a lot harder to catch them and record them accurately.) That way we'd actually know whether we have a few false positives or a lot, or whether the false positives were coming up on certain tests. And we could use that information to improve the false positive rate over time.

> I agree.  Do you have any suggestions on how we would track them?

The details would vary according to the preferences of the person doing it, but I'd sketch it out something like this: when Talos detects a regression, file a bug to "resolve" it (i.e., show that it's not a real regression, show that it's an acceptable regression for the patch, or fix the regression). Then keep a file listing those bugs (with metadata for each: tests regressed, date, component, etc), and as each is closed, mark down the result: false positive, allowed, backed out, or fixed. That's your data set. Of course, various parts of this could be automated but that's not required.

Dave


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dave Mandelin  
View profile  
 More options Aug 30 2012, 9:13 pm
Newsgroups: mozilla.dev.platform
From: Dave Mandelin <dmande...@gmail.com>
Date: Thu, 30 Aug 2012 18:13:33 -0700 (PDT)
Local: Thurs, Aug 30 2012 9:13 pm
Subject: Re: The current state of Talos benchmarks

On Thursday, August 30, 2012 2:54:55 PM UTC-7, Ehsan Akhgari wrote:
> Oh, sorry, I needed to ask my question better.  I'm specifically
> wondering who needs to track and investigate the regression if it
> happened on a range of, let's say, 5 committers...

Ah. I believe that's a job for a bugmaster, a position that we don't have filled at the moment. We need one. Perhaps one or more people in QA can step into part of that role, possibly temporarily.

Otherwise, it seems we just have to share the pain. Bisecting changesets is not necessarily an enjoyable job but it is a necessary one. I would suggest that sheriffs pick one of the 5 committers and ask that person to bisect the change and try not to pick the same person repeatedly (unless that person keeps landing the regressions!).

Dave


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
jmaher  
View profile  
 More options Aug 31 2012, 11:49 am
Newsgroups: mozilla.dev.platform
From: jmaher <joel.ma...@gmail.com>
Date: Fri, 31 Aug 2012 08:49:50 -0700 (PDT)
Local: Fri, Aug 31 2012 11:49 am
Subject: Re: The current state of Talos benchmarks
There are a few issues here which should be easy to address and a few other issues which are not so easy to address.

First off everybody who is interested in talos should read the wiki page:
https://wiki.mozilla.org/Buildbot/Talos

This explains where the code lives, what tests we run and on which branches and many other details on what is done to run talos.  If you want to run talos locally, check out these instructions:
https://wiki.mozilla.org/Buildbot/Talos#Running_locally_-_Source_Code

Another concern I have read in this thread and have heard over the last few months is why are we even running these tests as they are old, irrelevant and nobody looks at them.  A valid concern and something I have asked myself many times while working on Talos.  I took it upon myself earlier this summer to find a developer who is a point of contact for each and every test we run.  Then we figured out if the tests were relevant and testing things we care about.  Many tests have been updated/added/disabled in the last couple months.  

A similar complaint is about the noise in the numbers and how we can realistically detect a regression or gain value.  For minor regressions our current toolchain will not be very effective.  A lot of work has been done to look into how we run tests, the tools we use and if we can apply different models to the numbers to gain more reliable data.  Most of that work is documented in the Signal from Noise project: https://wiki.mozilla.org/Auto-tools/Projects/Signal_From_Noise.  I encourage folks to join into the public meetings we have to learn more about how we are actually solving this problem.

Back on subject, we want to detect regressions to the exact changeset as well as reducing our false positives that get mailed to dev.tree-management.  There is probably no silver bullet or policy we can create today which will fix our problems.  There is a big lag between a current patch's run of talos and when we get a notification in dev.tree-management.  For large regressions this can be detected by visually looking at graph server (we have links to everything from tbpl), but for small regressions, you have to see this over time as a minor increase could look like the regular noise we have in our numbers.

Coming from a talos tool maintainer perspective, I am committed to making talos easy to run and documented so we can all work on fixing regressions instead of offering sacrifices to the try server.  When there are requests for features, fixes or test adjustments somebody on the A*Team usually will resolve it quickly.  While this only solves some of the pain, it is a step in the right direction until Signal From Noise can come out and solve a large portion of the other problems.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ehsan Akhgari  
View profile  
 More options Aug 31 2012, 3:59 pm
Newsgroups: mozilla.dev.platform
From: Ehsan Akhgari <ehsan.akhg...@gmail.com>
Date: Fri, 31 Aug 2012 15:59:22 -0400
Local: Fri, Aug 31 2012 3:59 pm
Subject: Re: The current state of Talos benchmarks
On 12-08-31 11:45 AM, Chris AtLee wrote:

> On 31/08/12 11:32 AM, Ehsan Akhgari wrote:> There are extremely
> non-stable Talos tests, and relatively stable ones.
>  >   Let's focus on the relatively stable ones.  There are extremely hard
>  > to diagnose performance regressions, and extremely easy ones (i.e.,
>  > let's not wait on this lock, do this I/O, run this exponential
>  > algorithm, load tons of XUL/XBL when a window opens, etc.)  We have many
>  > great tools for the job, so not all regressions need to be treated the
>  > same.

> What value do the extremely non-stable Talos tests have? Shouldn't we
> stop running them if they're not giving useful information?

Either that, or find some way of making them more stable, such as not
measuring the wall clock time.

Ehsan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Chris AtLee  
View profile  
 More options Aug 31 2012, 4:03 pm
Newsgroups: mozilla.dev.platform
From: Chris AtLee <cat...@mozilla.com>
Date: Fri, 31 Aug 2012 16:03:44 -0400
Local: Fri, Aug 31 2012 4:03 pm
Subject: Re: The current state of Talos benchmarks
On 31/08/12 03:59 PM, Ehsan Akhgari wrote:

Sure, that sounds like a great project. Until that's finished, is there
any value to running these suites, or are they expensive random number
generators?

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ehsan Akhgari  
View profile  
 More options Sep 1 2012, 10:08 am
Newsgroups: mozilla.dev.platform
From: Ehsan Akhgari <ehsan.akhg...@gmail.com>
Date: Sat, 01 Sep 2012 10:08:40 -0400
Local: Sat, Sep 1 2012 10:08 am
Subject: Re: The current state of Talos benchmarks
On 12-08-31 4:03 PM, Chris AtLee wrote:

I think that is something that needs to be evaluated on a per-test
per-platform basis, hopefully by someone who knows a bit about
statistics.  :-)

Cheers,
Ehsan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
jmaher  
View profile  
 More options Sep 1 2012, 4:18 pm
Newsgroups: mozilla.dev.platform
From: jmaher <joel.ma...@gmail.com>
Date: Sat, 1 Sep 2012 13:18:33 -0700 (PDT)
Local: Sat, Sep 1 2012 4:18 pm
Subject: Re: The current state of Talos benchmarks

We are detecting regressions with this despite the large levels of noise.  So while it might appear to be a waste of machine resources to some, Talos serves a purpose.  Having people look at the results more frequently will solve many of the problems.  

I would say a handful of tests/counters on certain platforms are not very useful in the current way we are reporting numbers.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "The current state of Talos benchmarks (meeting notes)" by Justin Wood (Callek)
Justin Wood (Callek)  
View profile  
 More options Sep 2 2012, 12:34 am
Newsgroups: mozilla.dev.platform
From: "Justin Wood (Callek)" <Cal...@gmail.com>
Date: Sun, 02 Sep 2012 00:33:59 -0400
Local: Sun, Sep 2 2012 12:33 am
Subject: Re: The current state of Talos benchmarks (meeting notes)

Taras Glek wrote:
> * Joel will revisit maintaining Talos within mozilla-central to reduce
> developer barriers to understanding what a particular Talos test result
> means. This should also make Talos easier to run

To call out this point explicitly.

I'm not convinced that folding it into m-c is the necessary way forward,
and I think before folding in any other of our "stable" but
external-to-m-c repos we should start a community discussion on general
guidelines as to why/why not we would do that, and THEN evaluate those
against WHY we want talos, what goals are we solving, etc.

I don't feel that "reduce developer barriers to understanding what a
particular Talos test result means." is helped by this, if you [anyone]
thinks so, can you try to articulate why here in this thread?

[I note that myself and jhammel at least were discussing this in the bug
about moving talos to m-c as well, which we both agree does not belong
as an in-bug discussion -- and I do feel the move, if the talos module
owner feels is necessary should not get blocked on a need for an
external process, but I do feel we should think hard on this]

--
~Justin Wood (Callek)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "The current state of Talos benchmarks" by jmaher
jmaher  
View profile  
 More options Sep 1 2012, 4:18 pm
Newsgroups: mozilla.dev.platform
From: jmaher <joel.ma...@gmail.com>
Date: Sat, 1 Sep 2012 13:18:33 -0700 (PDT)
Local: Sat, Sep 1 2012 4:18 pm
Subject: Re: The current state of Talos benchmarks

We are detecting regressions with this despite the large levels of noise.  So while it might appear to be a waste of machine resources to some, Talos serves a purpose.  Having people look at the results more frequently will solve many of the problems.  

I would say a handful of tests/counters on certain platforms are not very useful in the current way we are reporting numbers.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Messages 26 - 50 of 55 < Older  Newer >
« Back to Discussions « Newer topic     Older topic »