On Wed, Nov 4, 2015 at 7:48 AM, Michael Henretty <mhen...@mozilla.com> wrote:
>
> On Wed, Nov 4, 2015 at 4:45 PM, Fabrice Desré <fab...@mozilla.com> wrote:
>>
>> Can we *right now* identify the worst offenders by looking at the tests
>> results/re-runs? You know that sheriffs will very quickly hide and
>> ignore tests that are really flaky.
>
>
>
> Yes, that's an important point. The problem is that you have to actually
> look at the logs of an individual chunk to see which tests failed. If a
> certain Gij test passes at least 1 out of it's 5 given runs, it will not
> surface to Treeherder, which means we can't start it. Looking through each
> chunk log file (of which we have 40 per run) is doable, but more time
> consuming and error prone.
Jumping in on something I haven't been able to pay much attention to
myself here so I may be missing context here, but this sounds like it
sets people up to assume that if something occasionally works we're
good to ship it, as opposed to if it occasionally fails we need to fix
it. Seems to me that this needs to be flipped around very aggressively
for these tests to provide much value.
- jst
On Wednesday 2015-11-04 16:48 +0100, Michael Henretty wrote:
> On Wed, Nov 4, 2015 at 4:45 PM, Fabrice Desré <fab...@mozilla.com> wrote:
> > Can we *right now* identify the worst offenders by looking at the tests
> > results/re-runs? You know that sheriffs will very quickly hide and
> > ignore tests that are really flaky.
>
> Yes, that's an important point. The problem is that you have to actually
> look at the logs of an individual chunk to see which tests failed. If a
> certain Gij test passes at least 1 out of it's 5 given runs, it will not
> surface to Treeherder, which means we can't start it. Looking through each
> chunk log file (of which we have 40 per run) is doable, but more time
> consuming and error prone.
Hi Gaia Folk,
If you've been doing Gaia core work for any length of time, you are probably aware that we have *many* intermittent Gij test failures on Treeherder [1]. But the problem is even worse than you may know! You see, each Gij test is run 5 times within a test chunk (g. Gij4) before it is marked as failing. Then that chunk itself is retried up to 5 times before the whole thing is marked as failing. This means that for a test to be marked as "passing," it only has to run successfully once in 25 times. I'm not kidding. Our retry logic, especially those inside the test chunk, make it hard to know which intermittent tests are our worst offenders. This is bad.
My suggestion is to stop doing the retries inside the chunks. That way, the failures will at least surface on Treeherder, which means we can star more test, which means we'll have a lot more visibility on the bad intermittents. Sheriffs will complain a lot, so we have to be ready to act on these bugs. But the alternative is that we continue to write tests with a low "raciness" bar which, IMO, have a much lower chance of catching regressions. The longer we wait, the worse this problem becomes.
Hi Gaia Folk,
If you've been doing Gaia core work for any length of time, you are probably aware that we have *many* intermittent Gij test failures on Treeherder [1]. But the problem is even worse than you may know! You see, each Gij test is run 5 times within a test chunk (g. Gij4) before it is marked as failing. Then that chunk itself is retried up to 5 times before the whole thing is marked as failing. This means that for a test to be marked as "passing," it only has to run successfully once in 25 times. I'm not kidding. Our retry logic, especially those inside the test chunk, make it hard to know which intermittent tests are our worst offenders. This is bad.
I'm not sure that it is so bad. From my own experience, regressions rarely cause intermittent failures. They mostly pop up as permareds. I think it would make sense to demonstrate that we are, in fact, masking a lot of real broken functionality before making our intermittents noisier for sheriffs.
On Wed, Nov 4, 2015 at 7:27 PM, Gareth Aye <garet...@gmail.com> wrote:
I'm not sure that it is so bad. From my own experience, regressions rarely cause intermittent failures. They mostly pop up as permareds. I think it would make sense to demonstrate that we are, in fact, masking a lot of real broken functionality before making our intermittents noisier for sheriffs.
I disagree with this mentality. For one thing, QA files bugs all the time that are themselves intermittent. They even have a template item for it, "Repro rate: XX%". With the current retry count of 25x per test, we simply cannot write an effective Gij test for one of these bugs since the retries will make the test always pass regardless of if the fix was effective.
Hi Gaia Folk,
If you've been doing Gaia core work for any length of time, you are probably aware that we have *many* intermittent Gij test failures on Treeherder [1]. But the problem is even worse than you may know! You see, each Gij test is run 5 times within a test chunk (g. Gij4) before it is marked as failing. Then that chunk itself is retried up to 5 times before the whole thing is marked as failing. This means that for a test to be marked as "passing," it only has to run successfully once in 25 times. I'm not kidding. Our retry logic, especially those inside the test chunk, make it hard to know which intermittent tests are our worst offenders. This is bad.
My suggestion is to stop doing the retries inside the chunks. That way, the failures will at least surface on Treeherder, which means we can star more test, which means we'll have a lot more visibility on the bad intermittents. Sheriffs will complain a lot, so we have to be ready to act on these bugs. But the alternative is that we continue to write tests with a low "raciness" bar which, IMO, have a much lower chance of catching regressions. The longer we wait, the worse this problem becomes.
MichaelThoughts?
Thanks,
I usually have a look at the blues as well when I look at my pull request runs. And most of the time it's an issue that's not related with the test. I think this account for most of the issues we see, actually. Do we know why this happens ?
On Mon, Nov 9, 2015 at 4:58 PM, Julien Wajsberg <jwaj...@mozilla.com> wrote:
I usually have a look at the blues as well when I look at my pull request runs. And most of the time it's an issue that's not related with the test. I think this account for most of the issues we see, actually. Do we know why this happens ?
I started this thread specifically about Gij (integration tests). Gu (unit tests) historically have had less flakiness, but you're right that there has been some weirdness there recently too. I'm not looking at Gu yet since I think Gij is in worse shape, and I didn't want to conflate this thread with Gu runner issues.
Jonas, that particular error is on the decline. Many went away when we rolled out a series of a fixes to run the tests on devices. The error itself was a symptom of a different issue. I would imagine that the ones that we still see occurring are, likely, also not directly related to sockit-to-me.
Even though this is the case, we recognize that synchronous tcp socket usage isn't ideal (we didn't think it was in the first place, necessarily, it was just the best way to make the tests easy to write).
FFWD to now, we're adding a promise based tcp driver for marionette which will enable new tests to be written using promises. Marionette calls would always return a promise which you could .then() to do something else. It's a much nicer, and standardized pattern.
_______________________________________________
dev-fxos mailing list
dev-...@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-fxos
Would separating intermittent test jobs into their own separate job help here?That way the retry logic can be removed for the more stable tests.