TopFails from the Tinderbox runs

Murali Nandigama

unread,

Apr 7, 2010, 5:45:53 PM4/7/10

to

Gentle people !!

With more than 150K tests run with each check-in and at approximately
100-120 checks-in a day ... there are a tonne of test results out
there for any one to get a grip. Every engineer checks the test run
results of their putbacks and some people go through all orange and or
burning builds.

So, our team has set-up a display dash board to enable every one to
know every thing about every failure from every run .. ( lot of
every's). Here, you can see the following.

What are the latest test failures across all runs on all platforms ?
What are the TOP failing tests ?
What are the total tests that have been failing in the timeline that
we have captured ?

We also provide direct links to easily navigate to a test failure log
to build log to putback data. This of course is in addition to some
nifty time-line graphs for each test failure.

So, without further ado, we would like to present the following URLs
for your kind perusal ...

http://brasstacks.mozilla.com/topfails/
http://brasstacks.mozilla.com/topfails/topfails
http://brasstacks.mozilla.com/topfails/tests

and a sample failure timeline

http://brasstacks.mozilla.com/topfails/timeline?name=automationutils.processLeakLog%28%29

Murali Nandigama
A-Team

L. David Baron

unread,

Apr 7, 2010, 7:36:38 PM4/7/10

to Murali Nandigama, dev-pl...@lists.mozilla.org

On Wednesday 2010-04-07 14:45 -0700, Murali Nandigama wrote:
> http://brasstacks.mozilla.com/topfails/topfails

A few things that I think could make this more useful are:

* a statement of what time period the data cover, both at the top,
and perhaps also for each individual item (as a way to
distinguish non-random oranges that were in the tree for a short
period of time from random ones)

* splitting up the items listed as automation.py (which is how it
lists a test timeout) and automationutils.processLeakLog() (which
is how it leaks a failure due to leaks) at least by what test
the failure was on (mochitest-plain 2/5, mochitest-chrome, etc.)

In the case of timeouts, though, we can generally categorize the
timeout even more precisely by the last test that started
running.

* normalizing the paths so that failures of the same test on
different platforms don't show up as different items

-David

--
L. David Baron http://dbaron.org/
Mozilla Corporation http://www.mozilla.com/

Clint Talbert

unread,

Apr 7, 2010, 9:17:04 PM4/7/10

to

On 4/7/10 4:36 PM, L. David Baron wrote:
> * a statement of what time period the data cover, both at the top,
> and perhaps also for each individual item (as a way to
> distinguish non-random oranges that were in the tree for a short
> period of time from random ones)
>
> * splitting up the items listed as automation.py (which is how it
> lists a test timeout) and automationutils.processLeakLog() (which
> is how it leaks a failure due to leaks) at least by what test
> the failure was on (mochitest-plain 2/5, mochitest-chrome, etc.)
>
> In the case of timeouts, though, we can generally categorize the
> timeout even more precisely by the last test that started
> running.
>
> * normalizing the paths so that failures of the same test on
> different platforms don't show up as different items

I filed this as a general bug 557970. We'll split out the issues as we
start working on them and morph that bug into a tracking bug overtime.
Thanks for the feedback.

Clint

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=557970

Ted Mielczarek

unread,

Apr 8, 2010, 9:01:31 AM4/8/10

to mozilla.dev.planning group

On Wed, Apr 7, 2010 at 7:36 PM, L. David Baron <dba...@dbaron.org> wrote:
> On Wednesday 2010-04-07 14:45 -0700, Murali Nandigama wrote:
>> http://brasstacks.mozilla.com/topfails/topfails
>
> A few things that I think could make this more useful are:
>

> * a statement of what time period the data cover, both at the top,
> and perhaps also for each individual item (as a way to
> distinguish non-random oranges that were in the tree for a short
> period of time from random ones)

I think right now the query is over all data in the database, which
isn't really what we want as a long-term solution. We should probably
narrow it down to the past month or past two weeks. (Or have one of
those as a sensible default, and allow querying for a more narrow
window.)

> * splitting up the items listed as automation.py (which is how it
> lists a test timeout) and automationutils.processLeakLog() (which
> is how it leaks a failure due to leaks) at least by what test
> the failure was on (mochitest-plain 2/5, mochitest-chrome, etc.)
>
> In the case of timeouts, though, we can generally categorize the
> timeout even more precisely by the last test that started
> running.

I filed this: https://bugzilla.mozilla.org/show_bug.cgi?id=558045

-Ted

Robert Kaiser

unread,

Apr 8, 2010, 9:50:55 AM4/8/10

to

Murali Nandigama schrieb:

> So, our team has set-up a display dash board to enable every one to
> know every thing about every failure from every run .. ( lot of
> every's).

How is that data generated? Would it be possible to have that run for
the SeaMonkey and/or Thunderbird tests as well?

Robert Kaiser