Too Many Intermittents on Travis

Michael Henretty

unread,

Dec 6, 2013, 6:04:30 PM12/6/13

to

Travis is sad right now. So sad in fact that people are merging code before seeing green on their pull requests. Now, I know we are all sprinting hard to get our code in before the branch. But without the sanity check Travis provides, we have a much higher chance of something crash landing.

I think we need
1.) more xfails while we investigate intermittents. The tests aren't doing much good if they are intermittent anyway.
2.) Don't merge code without a green travis.

Here are a list of intermittent bugs I have seen a bunch, can we xfail these and any more like it to bring travis back to a useable state?

UI Test - test_keyboard.py test_keyboard.TestKeyboard.test_keyboard_basic
https://bugzilla.mozilla.org/show_bug.cgi?id=946665

Integration Test - share_url_test.js
https://bugzilla.mozilla.org/show_bug.cgi?id=932331

Integration Test - email notifications, disable disable notification
https://bugzilla.mozilla.org/show_bug.cgi?id=922746

TL;DR, please xfail your intermittent test failure while investigating. Don't merge without green travis.

Michael

Fabrice Desré

unread,

Dec 6, 2013, 6:12:47 PM12/6/13

to Michael Henretty, dev-...@lists.mozilla.org

On 12/06/2013 03:04 PM, Michael Henretty wrote:
> Travis is sad right now. So sad in fact that people are merging code before seeing green on their pull requests. Now, I know we are all sprinting hard to get our code in before the branch. But without the sanity check Travis provides, we have a much higher chance of something crash landing.
>
> I think we need
> 1.) more xfails while we investigate intermittents. The tests aren't doing much good if they are intermittent anyway.
> 2.) Don't merge code without a green travis.
>
> Here are a list of intermittent bugs I have seen a bunch, can we xfail these and any more like it to bring travis back to a useable state?

I'm really comfortable with having people disabling test to let them
"sprint hard to get our code in before the branch". That exactly what
the train model is meant to avoid. And I will ask for backouts if I see
abuse patterns.

What I want to understand is why are we more red now that before? Is
this an infra issue?

Fabrice
--
Fabrice Desr�
b2g team
Mozilla Corporation

Fabrice Desré

unread,

Dec 6, 2013, 6:14:39 PM12/6/13

to Michael Henretty, dev-...@lists.mozilla.org

On 12/06/2013 03:12 PM, Fabrice Desr� wrote:
> On 12/06/2013 03:04 PM, Michael Henretty wrote:

>> Travis is sad right now. So sad in fact that people are merging code before seeing green on their pull requests. Now, I know we are all sprinting hard to get our code in before the branch. But without the sanity check Travis provides, we have a much higher chance of something crash landing.
>>
>> I think we need
>> 1.) more xfails while we investigate intermittents. The tests aren't doing much good if they are intermittent anyway.
>> 2.) Don't merge code without a green travis.
>>
>> Here are a list of intermittent bugs I have seen a bunch, can we xfail these and any more like it to bring travis back to a useable state?
>

> I'm really comfortable with having people disabling test to let them

Of course, you have to read "I'm really *not* comfortable..."

Michael Henretty

unread,

Dec 6, 2013, 6:47:05 PM12/6/13

to

On Friday, December 6, 2013 3:12:47 PM UTC-8, Fabrice Desré wrote:
> What I want to understand is why are we more red now that before? Is
> this an infra issue?

I don't think it's an infrastructure issue. I think intermittents have just been slowly accumulating to the point where recently people stopped caring about travis and merging anyway.

Fabrice Desré

unread,

Dec 6, 2013, 7:01:52 PM12/6/13

to Michael Henretty, dev-...@lists.mozilla.org

On 12/06/2013 03:47 PM, Michael Henretty wrote:

> On Friday, December 6, 2013 3:12:47 PM UTC-8, Fabrice Desr� wrote:
>> What I want to understand is why are we more red now that before? Is
>> this an infra issue?
>
> I don't think it's an infrastructure issue. I think intermittents have just been slowly accumulating to the point where recently people stopped caring about travis and merging anyway.

So please close the tree. I won't sign off any landing knowing that
people just ignore failures.

Fabrice Desré

unread,

Dec 6, 2013, 7:13:39 PM12/6/13

to Michael Henretty, dev-...@lists.mozilla.org

On 12/06/2013 04:01 PM, Fabrice Desr� wrote:
> On 12/06/2013 03:47 PM, Michael Henretty wrote:

>> On Friday, December 6, 2013 3:12:47 PM UTC-8, Fabrice Desr� wrote:
>>> What I want to understand is why are we more red now that before? Is
>>> this an infra issue?
>>
>> I don't think it's an infrastructure issue. I think intermittents have just been slowly accumulating to the point where recently people stopped caring about travis and merging anyway.
>

> So please close the tree. I won't sign off any landing knowing that
> people just ignore failures.

I've been granted the power to close the tree, and just used them! We
now need a plan to fix that.

Michael Henretty

unread,

Dec 6, 2013, 7:25:18 PM12/6/13

to

On Friday, December 6, 2013 4:13:39 PM UTC-8, Fabrice Desré wrote:
> I've been granted the power to close the tree, and just used them! We
> now need a plan to fix that.

If we aren't going to go the xfail route, I suggest choosing a few of the most offending intermittents, and not open the tree until those are fixed.

Also, we should insta-backout anything that gets merged where the PR wasn't green.

Anthony Ricaud

unread,

Dec 6, 2013, 8:20:13 PM12/6/13

to mozilla-...@lists.mozilla.org

On 07/12/13 01:25, Michael Henretty wrote:

> On Friday, December 6, 2013 4:13:39 PM UTC-8, Fabrice Desrï¿½ wrote:
>> I've been granted the power to close the tree, and just used them! We
>> now need a plan to fix that.
>
> If we aren't going to go the xfail route, I suggest choosing a few of the most offending intermittents, and not open the tree until those are fixed.

We cannot xfail intermittents. xfail means the tree will be red when
those intermittents are passing.

> Also, we should insta-backout anything that gets merged where the PR wasn't green.

Travis doesn't allow us to mark test as oranges. What I do all day long
is open some bugs when I see intermittents (Julien is helping a lot with
that too). Once I open a bug, I re-run the test on Travis. You need to
be logged in to re-run one of our test-suite.

I started changing the name of some tests with "intermittent" after
proposing it in a previous thread called "Marking intermittent tests in
Travis?". I only got one answerï¿½

What we really need is more attention on intermittents. We are back in
the state where we were before the Oslo work week. Developers are not
paying attention to the result of a Travis run for their PR because
there is a 50% chance that they'll hit one of our many intermittents. I
know some of them re-run their Travis PR until they get some green but
that's a minority.

I'm the only unofficial sheriff so I cannot catch everything, I can only
catch it while I'm working, and I need to work on my main tasks so I
cannot spend too much time on this. Should we get some sheriffs looking
at Travis? Should we get managers/release management to nag people about
fixing them?

Andrew Sutherland

unread,

Dec 6, 2013, 8:49:34 PM12/6/13

to dev-...@lists.mozilla.org

On 12/06/2013 08:20 PM, Anthony Ricaud wrote:
> I'm the only unofficial sheriff so I cannot catch everything, I can
> only catch it while I'm working, and I need to work on my main tasks
> so I cannot spend too much time on this. Should we get some sheriffs
> looking at Travis? Should we get managers/release management to nag
> people about fixing them?

I think this is fundamentally an engineering issue that should be
addressed by prioritization and tooling. mozilla-central has
dedicated-ish sheriffs who can and do mark intermittents all day, but it
doesn't fix the intermittent failures in and of itself. (Although The
TBPL bugzilla traffic that results from the easily marked intermittents
is very helpful in that it helps make it clear just how intermittent a
problem is, however.)

Having said that, where engineers are failing to address this, you seem
to be getting stuck with the responsibility, so I think it makes sense
for you (or others doing what you to do) to demand that we come up with
a sheriffing schedule or something that spreads the load more fairly.

Engineering-wise, I think it's two-pronged:

- Engineers should absolutely spend time on fixing intermittents and
managers should be emphasizing this as a priority.

- We need to improve our JS marionette logging in the face of failures.
It's pretty hard to tell what is happening/happened in the test
failures. I assume :lightsofapollo is all over this with the automated
testing/landing stuff he and friends are working on, but there's
probably more we can all do to help improve this.

For Thunderbird's mozmill tests where we encountered a similar problem
of "okay, so the test failed, why did the test fail?! gaaaaaah!", I
added logging and failure test capturing that was useful in many cases
to understand what was happening on the server.

A blog post with screnshots can be seen at:
http://www.visophyte.org/blog/2011/03/02/teaser-rich-contextual-information-for-thunderbird-mozmill-failures/

And a current extracted log failure (though manually fetched since ArbPL
is no longer actively running/scraping for Thunderbird):
https://clicky.visophyte.org/tools/arbpl-mozmill-standalone/?log=https://clicky.visophyte.org/examples/arbpl-mozmill/20131201/mozmill-fail.log

Note that Thunderbird's failures frequently involved focus issues,
keypresses, popups, and XBL/XUL hiccups which is why there is so much
emphasis placed on focus changes and where events were actually
handled. These are not the things that really matter for our marionette
tests. I know just having the console.log() output for the main thread
and (shimmed/faked) console.log output for the worker for the e-mail app
is probably the most bang-for-the-buck (and efforts have already been
made to try and improve this).

Andrew

Andrew Sutherland

unread,

Dec 6, 2013, 8:52:47 PM12/6/13

to dev-...@lists.mozilla.org

On 12/06/2013 07:25 PM, Michael Henretty wrote:
> On Friday, December 6, 2013 4:13:39 PM UTC-8, Fabrice Desré wrote:
>> I've been granted the power to close the tree, and just used them! We
>> now need a plan to fix that.
> If we aren't going to go the xfail route, I suggest choosing a few of the most offending intermittents, and not open the tree until those are fixed.

Can you clarify the xfail route in the context of JS Marionette tests?
Does that mean landing a patch to disable the test, then putting [xfail]
in the white-board? I think other Mozilla test infrastructure has
explicit test runner support for that, but it doesn't seem like the JS
Marionette infrastructure has it.

Andrew

Gregor Wagner

unread,

Dec 6, 2013, 11:39:19 PM12/6/13

to

On Friday, December 6, 2013 4:13:39 PM UTC-8, Fabrice Desré wrote:

> On 12/06/2013 04:01 PM, Fabrice Desrï¿½ wrote:
>
> > On 12/06/2013 03:47 PM, Michael Henretty wrote:
>

> >> On Friday, December 6, 2013 3:12:47 PM UTC-8, Fabrice Desrï¿½ wrote:
>
> >>> What I want to understand is why are we more red now that before? Is
>
> >>> this an infra issue?
>
> >>
>
> >> I don't think it's an infrastructure issue. I think intermittents have just been slowly accumulating to the point where recently people stopped caring about travis and merging anyway.
>
> >
>
> > So please close the tree. I won't sign off any landing knowing that
>
> > people just ignore failures.
>
>
>
> I've been granted the power to close the tree, and just used them! We
>
> now need a plan to fix that.
>

We need a short and medium term fix here. The short term that allows us to open the tree asap and a medium term that prevents situations like this.
Random orange doesn't help anyone and it just slows us down.
My proposal is to disable those tests for the weekend and get engineers working on them next week.
For medium term I suggest that Anthony should ping engineering managers if a bug doesn't get enough attention. Engineering managers should take care of it.
It seems like the filing works fine but the fixing takes too long or doesn't get enough attention.

-Gregor

Michael Henretty

unread,

Dec 7, 2013, 12:14:10 AM12/7/13

to

On Friday, December 6, 2013 5:52:47 PM UTC-8, Andrew Sutherland wrote:
> Can you clarify the xfail route in the context of JS Marionette tests?

I'm sorry, I misspoke. I meant going the route of disabling these tests, not xfail. In the context of Marionette JS tests, this means 'test.skip'.

Michael Henretty

unread,

Dec 7, 2013, 12:23:48 AM12/7/13

to

On Friday, December 6, 2013 8:39:19 PM UTC-8, Gregor Wagner wrote:
> We need a short and medium term fix here. The short term that allows us to open the tree asap and a medium term that prevents situations like this.

For the medium term, it would be really nice to have some sort of dashboard that allows us to track the frequency of these failures. Do we have anything like this in TBPL?

Andrew Sutherland

unread,

Dec 7, 2013, 12:32:23 AM12/7/13

to dev-...@lists.mozilla.org

On 12/07/2013 12:23 AM, Michael Henretty wrote:
> For the medium term, it would be really nice to have some sort of dashboard that allows us to track the frequency of these failures. Do we have anything like this in TBPL?

http://brasstacks.mozilla.com/orangefactor/

https://wiki.mozilla.org/Auto-tools/Projects/WarOnOrange

Andrew

Francisco Jordano

unread,

Dec 7, 2013, 12:35:53 PM12/7/13

to Fabrice Desré, Michael Henretty, dev-...@lists.mozilla.org

Hi all,

On 07/12/2013 00:01, "Fabrice Desré" <fab...@mozilla.com> wrote:

>On 12/06/2013 03:47 PM, Michael Henretty wrote:

>> On Friday, December 6, 2013 3:12:47 PM UTC-8, Fabrice Desré wrote:
>>> What I want to understand is why are we more red now that before? Is
>>> this an infra issue?

I *kind* of understand developers frustration with intermittent failures,
just take a look at:

http://gyazo.com/07897cdc1b4d30cd2ffba24c96ab6a43
http://gyazo.com/c697e64369ca88b4e22d00260de9115a

>>
>> I don't think it's an infrastructure issue. I think intermittents have
>>just been slowly accumulating to the point where recently people stopped
>>caring about travis and merging anyway.
>
>So please close the tree. I won't sign off any landing knowing that
>people just ignore failures.

Same PR launched with 1 minute difference, and failing by different causes.

With that said, I¹ve been trying to land 3 PR for 2 days, and I didn¹t
cause is our responsibility not to land code if travis is not green.
I hope that we all are doing that, merging with green light, can we try to
get to a minimum the number of intermittent bugs. Same that we have koi+
or old tef+ that were pressure on us to have them solved, could those
intermittent test acquire higher priority?

If we need to keep the tree close and miss some features in this train to
fix that, let¹s do it :)

>
> Fabrice
>--
>Fabrice Desré
>b2g team
>Mozilla Corporation

Cheers,
F.

This electronic message contains information from Telefonica UK or Telefonica Europe which may be privileged or confidential. The information is intended to be for the use of the individual(s) or entity named above. If you are not the intended recipient be aware that any disclosure, copying distribution or use of the contents of this information is prohibited. If you have received this electronic message in error, please notify us by telephone or email. Switchboard: +44 (0)113 272 2000 Email: feed...@o2.com Telefonica UK Limited 260 Bath Road, Slough, Berkshire SL1 4DX Registered in England and Wales: 1743099. VAT number: GB 778 6037 85 Telefonica Europe plc 260 Bath Road, Slough, Berkshire SL1 4DX Registered in England and Wales: 05310128. VAT number: GB 778 6037 85 Telefonica Digital Limited 260 Bath Road, Slough, Berkshire SL1 4DX Registered in England and Wales: 7884976. VAT number: GB 778 6037 85

Gregor Wagner

unread,

Dec 7, 2013, 1:27:45 PM12/7/13

to

Quick update:
Even disabling the top failing tests leaves us with a >50% intermittent rate.
In addition, it looks like we have a regression in gecko as well. This causes a perma red on tbpl and philor was hiding Gaia-UI tests on all trees except b2g-inbound. You can follow the gecko regression status in bug 947573.
Tracing back the 'red' shows that it started with the landing of bug 937317 but I am not 100% sure because of our complex build configurations.

Right now all gaia-ui tests time out on travis so we can't reopen.

Gregor Wagner

unread,

Dec 7, 2013, 6:02:45 PM12/7/13

to

We found the gecko regression and travis is up and running again.
We still have the problem with intermittent failures and it feels like 2 new ones show up if we disable 1 failing test.

Zac Campbell

unread,

Dec 8, 2013, 7:09:29 PM12/8/13

to Gregor Wagner, dev-...@lists.mozilla.org

I had a look through the pattern of gaia-ui-test failures today and they are really unusual; there's no clear pattern of app affected or anything. Some tests that are usually very reliable are failing.

The only time I've seen these kind of 'pattern' of failures is when it's something unrelated like the app crashing softly or some kind of prompt in the way. We really need the screenshot/HTML reports on Travis!

I have to run the tests when I get to the office tomorrow and see what it shows up, I'll report in as soon as I can.

Zac

_______________________________________________
dev-gaia mailing list
dev-...@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-gaia

Gareth Aye

unread,

Dec 9, 2013, 12:31:51 PM12/9/13

to Zac Campbell, Gregor Wagner, dev-...@lists.mozilla.org

> We need to improve our JS marionette logging in the face of failures.
It's pretty hard to tell what is happening/happened in the test failures.
I assume :lightsofapollo is all over this with the automated
testing/landing stuff he and friends are working on, but there's probably
more we can all do to help improve this.

Back in Oslo, I made passing --verbose to the marionette-js-runner pipe
console.* calls to the test harness. We could enable this by default to add
more fail data.

> Can you clarify the xfail route in the context of JS Marionette tests?

Does that mean landing a patch to disable the test, then putting [xfail] in
the white-board? I think other Mozilla test infrastructure has explicit
test runner support for that, but it doesn't seem like the JS Marionette
infrastructure has it.

In addition to test.skip, we have something that's more like the TBPL thing
which allows you to disable test files (instead of test cases)
https://github.com/mozilla-b2g/marionette-js-runner/commit/6e20cf69d506f26bf21fd92f770eba8b7ecb587ewith
a JSON manifest. I think this isn't documented though.

> For the medium term, it would be really nice to have some sort of
dashboard that allows us to track the frequency of these failures. Do we
have anything like this in TBPL?

+1. I think if someone wants to hack together something that reads test
results from travis and computes flake percentage for flaky tests, it would
be extremely helpful in prioritizing our work. I already know that the
email tests that jrburke and I have been building out are suspect.

On Sun, Dec 8, 2013 at 7:09 PM, Zac Campbell <zcam...@mozilla.com> wrote:

> I had a look through the pattern of gaia-ui-test failures today and they
> are really unusual; there's no clear pattern of app affected or anything.
> Some tests that are usually very reliable are failing.
>
> The only time I've seen these kind of 'pattern' of failures is when it's
> something unrelated like the app crashing softly or some kind of prompt in
> the way. We really need the screenshot/HTML reports on Travis!
>
> I have to run the tests when I get to the office tomorrow and see what it
> shows up, I'll report in as soon as I can.
>
> Zac
>
>
>
> ----- Original Message -----
> From: Gregor Wagner <gwa...@mozilla.com>
> To: dev-...@lists.mozilla.org
> Sent: Sat, 07 Dec 2013 15:02:45 -0800 (PST)
> Subject: Re: Too Many Intermittents on Travis
>

> _______________________________________________
> dev-gaia mailing list
> dev-...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-gaia
>
> _______________________________________________
> dev-gaia mailing list
> dev-...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-gaia
>

--
Best,
Gareth

Zac Campbell

unread,

Dec 9, 2013, 12:35:39 PM12/9/13

to Gareth Aye, dev-...@lists.mozilla.org

With the flake percentage how would you separate failures caused by
flakes and failures caused by the pull itself?

There's no head/tip test run to act as a benchmark.

> _______________________________________________
> dev-gaia mailing list
> dev-...@lists.mozilla.org <mailto:dev-...@lists.mozilla.org>

> https://lists.mozilla.org/listinfo/dev-gaia
>
> _______________________________________________
> dev-gaia mailing list

> dev-...@lists.mozilla.org <mailto:dev-...@lists.mozilla.org>

Gareth Aye

unread,

Dec 9, 2013, 12:36:48 PM12/9/13

to Zac Campbell, dev-...@lists.mozilla.org

You can rerun failing builds however many times you want to see if it's
permared or orange.

On Mon, Dec 9, 2013 at 12:35 PM, Zac Campbell <zcam...@mozilla.com> wrote:

> With the flake percentage how would you separate failures caused by
> flakes and failures caused by the pull itself?
>
> There's no head/tip test run to act as a benchmark.
>
>
>
>
>
> On 09/12/13 17:31, Gareth Aye wrote:
>
> > We need to improve our JS marionette logging in the face of failures.�
> It's pretty hard to tell what is happening/happened in the test failures.�
> I assume :lightsofapollo is all over this with the automated
> testing/landing stuff he and friends are working on, but there's probably
> more we can all do to help improve this.
>
> Back in Oslo, I made passing --verbose to the marionette-js-runner pipe
> console.* calls to the test harness. We could enable this by default to add
> more fail data.
>
> > Can you clarify the xfail route in the context of JS Marionette tests?�
> Does that mean landing a patch to disable the test, then putting [xfail] in
> the white-board?� I think other Mozilla test infrastructure has explicit
> test runner support for that, but it doesn't seem like the JS Marionette
> infrastructure has it.
>
> In addition to test.skip, we have something that's more like the TBPL
> thing which allows you to disable test files (instead of test cases)

> https://github.com/mozilla-b2g/marionette-js-runner/commit/6e20cf69d506f26bf21fd92f770eba8b7ecb587ewith a JSON manifest. I think this isn't documented though.

>
> > For the medium term, it would be really nice to have some sort of
> dashboard that allows us to track the frequency of these failures. Do we
> have anything like this in TBPL?
>
> +1. I think if someone wants to hack together something that reads test
> results from travis and computes flake percentage for flaky tests, it would
> be extremely helpful in prioritizing our work. I already know that the
> email tests that jrburke and I have been building out are suspect.
>

> On Sun, Dec 8, 2013 at 7:09 PM, Zac Campbell <zcam...@mozilla.com>wrote:
>
>> I had a look through the pattern of gaia-ui-test failures today and they
>> are really unusual; there's no clear pattern of app affected or anything.
>> Some tests that are usually very reliable are failing.
>>
>> The only time I've seen these kind of 'pattern' of failures is when it's
>> something unrelated like the app crashing softly or some kind of prompt in
>> the way. We really need the screenshot/HTML reports on Travis!
>>
>> I have to run the tests when I get to the office tomorrow and see what it
>> shows up, I'll report in as soon as I can.
>>
>> Zac
>>
>>
>>
>> ----- Original Message -----
>> From: Gregor Wagner <gwa...@mozilla.com>
>> To: dev-...@lists.mozilla.org
>> Sent: Sat, 07 Dec 2013 15:02:45 -0800 (PST)
>> Subject: Re: Too Many Intermittents on Travis
>>

>> _______________________________________________
>> dev-gaia mailing list
>> dev-...@lists.mozilla.org

>> https://lists.mozilla.org/listinfo/dev-gaia
>>
>> _______________________________________________
>> dev-gaia mailing list
>> dev-...@lists.mozilla.org

>> https://lists.mozilla.org/listinfo/dev-gaia
>>
>
>
>
> --
> Best,
> Gareth
>
>
>

--
Best,
Gareth

Clint Talbert

unread,

Dec 9, 2013, 12:40:01 PM12/9/13

to Gareth Aye, Zac Campbell, Gregor Wagner, dev-...@lists.mozilla.org

On 12/09/2013 09:31 AM, Gareth Aye wrote:
>> We need to improve our JS marionette logging in the face of failures.
> It's pretty hard to tell what is happening/happened in the test failures.
> I assume :lightsofapollo is all over this with the automated
> testing/landing stuff he and friends are working on, but there's probably
> more we can all do to help improve this.
>
> Back in Oslo, I made passing --verbose to the marionette-js-runner pipe
> console.* calls to the test harness. We could enable this by default to add
> more fail data.
>
>> Can you clarify the xfail route in the context of JS Marionette tests?
> Does that mean landing a patch to disable the test, then putting [xfail] in
> the white-board? I think other Mozilla test infrastructure has explicit
> test runner support for that, but it doesn't seem like the JS Marionette
> infrastructure has it.

Correct, there is an entire set of python libraries we use for all the
test harnesses (since most of them are in python) that provide all this
kind of low level infrastructure. Unfortunately, that doesn't help you
much in NodeJS land. If you want to review the code it's all here:
https://github.com/mozilla/mozbase Maybe you can copy the relevant
sections you need into NodeJS modules to help? You might also be able to
use some of them as standalone python modules and invoke them from
NodeJS in order to keep the logic in the canonical python library (this
way we don't maintain two versions of the code providing the same
functionality).

>
> In addition to test.skip, we have something that's more like the TBPL thing
> which allows you to disable test files (instead of test cases)
> https://github.com/mozilla-b2g/marionette-js-runner/commit/6e20cf69d506f26bf21fd92f770eba8b7ecb587ewith
> a JSON manifest. I think this isn't documented though.
>
>> For the medium term, it would be really nice to have some sort of
> dashboard that allows us to track the frequency of these failures. Do we
> have anything like this in TBPL?
>
> +1. I think if someone wants to hack together something that reads test
> results from travis and computes flake percentage for flaky tests, it would
> be extremely helpful in prioritizing our work. I already know that the
> email tests that jrburke and I have been building out are suspect.

Please use our existing system called OrangeFactor
(http://brasstacks.mozilla.com/orangefactor/) for this so that we have
all the flaky test information in one place. We have already wired TBPL
to that system through the "starring" mechanism on TBPL. OrangeFactor is
just an elastic search database, so you can talk to it via REST to get
data into it. You'll want to talk to mcote and jgriffin to find out more
about the APIs available for data submission to it.

Thanks,
Clint

Gregor Wagner

unread,

Dec 9, 2013, 3:30:25 PM12/9/13

to

The tree is open again.

We fixed/disabled some intermittent tests but there are still a few remaining.

+1000 for better intermittent tracking. tbpl has it's issues but being able to see the corresponding bugzilla bug for an intermittent failure and automatic logging is what we need here as well.

Anthony Ricaud

unread,

Dec 9, 2013, 5:46:11 PM12/9/13

to mozilla-...@lists.mozilla.org

On 09/12/13 21:30, Gregor Wagner wrote:
> The tree is open again.

Thanks for this!

> We fixed/disabled some intermittent tests but there are still a few remaining.

Is there a bug/meta bug tracking the intermittents that we disabled so
that we can have an easy-to-follow bugzilla query to re-enable them?

> +1000 for better intermittent tracking. tbpl has it's issues but being able to see the corresponding bugzilla bug for an intermittent failure and automatic logging is what we need here as well.

That was the purpose of my proposal a few weeks ago to change the names
of intermittent bugs to track them.