RFC: stop build immediately on invalid test results

Paweł Hajdan, Jr.

unread,

Nov 25, 2015, 6:53:59 AM11/25/15

to chromium-dev, infr...@chromium.org

tl;dr I'd like to stop build immediately on invalid test results

I'm still investigating regressions of CQ cycle time.

One example I've seen is http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_android_rel_ng/builds/100812 .

Invalid test results make the build fail for sure.

We used to run all other tests to give developers more info, in case the patch breaks more than one test.

In this case, it was a flake, and another build for same patch succeeded (http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_android_rel_ng/builds/100862).

It doesn't seem as useful then to run all tests if we know the build will fail anyway, and waste time executing these tests. WDYT?

Here are stats about what percentage of tryserver flakes are caused by invalid test results:

Top flaky builders (which fail and succeed in the same patch): (week of November 16)

chromium.linux linux_android_rel_ng 111/1416 ( 8%)| 17%| 50%| 15%| 2%| 11%| 5%

chromium.mac mac_chromium_rel_ng 106/1421 ( 7%)| 36%| 1%| 42%| 4%| 9%| 8%

chromium.win win_chromium_x64_rel_ng 96/1403 ( 7%)| 30%| 5%| 21%| 5%| 10%| 28%

chromium.linux linux_chromium_rel_ng 97/1425 ( 7%)| 16%| 4%| 21%| 48%| 7%| 3%

chromium.mac mac_chromium_gn_rel 83/1392 ( 6%)| 6%| 4%| 0%| 0%| 14%| 76%

Top flaky builders (which fail and succeed in the same patch): (week of November 9)

chromium.win win_chromium_rel_ng 114/1484 ( 8%)| 9%| 8%| 56%| 18%| 6%| 4%

chromium.win win_chromium_x64_rel_ng 108/1482 ( 7%)| 4%| 5%| 77%| 1%| 8%| 6%

chromium.mac mac_chromium_rel_ng 97/1510 ( 6%)| 29%| 13%| 46%| 3%| 7%| 1%

chromium.mac ios_dbg_simulator_ninja 74/1444 ( 5%)| 15%| 0%| 0%| 0%| 12%| 73%

chromium.linux linux_chromium_rel_ng 71/1463 ( 5%)| 13%| 27%| 34%| 17%| 6%| 4%

Top flaky builders (which fail and succeed in the same patch): (week of November 2)

chromium.mac mac_chromium_rel_ng 208/1490 ( 14%)| 54%| 8%| 23%| 13%| 1%| 0%

chromium.win win_chromium_rel_ng 122/1395 ( 9%)| 11%| 2%| 59%| 21%| 3%| 2%

chromium.linux linux_chromium_rel_ng 100/1405 ( 7%)| 20%| 16%| 32%| 30%| 2%| 0%

chromium.win win_chromium_x64_rel_ng 83/1364 ( 6%)| 2%| 4%| 80%| 10%| 5%| 0%

chromium.linux linux_android_rel_ng 76/1357 ( 6%)| 9%| 11%| 62%| 11%| 5%| 3%

Top flaky builders (which fail and succeed in the same patch): (week of October 26)

chromium.mac mac_chromium_rel_ng 187/1435 ( 13%)| 44%| 3%| 48%| 2%| 1%| 2%

chromium.win win_chromium_rel_ng 136/1379 ( 10%)| 7%| 5%| 69%| 17%| 1%| 1%

chromium.win win_chromium_x64_rel_ng 115/1353 ( 8%)| 10%| 6%| 79%| 3%| 0%| 1%

chromium.linux linux_android_rel_ng 83/1318 ( 6%)| 36%| 5%| 46%| 7%| 1%| 5%

chromium.linux linux_chromium_rel_ng 57/1313 ( 4%)| 9%| 7%| 49%| 25%| 2%| 9%

Paweł

Tomasz Mikolajewski

unread,

Nov 25, 2015, 7:02:11 AM11/25/15

to Paweł Hajdan, Jr., chromium-dev, infr...@chromium.org

I often use dry CQ to check if my CL works well on Mac and Win, as I'm
not able to check them locally. Then I investigate all of the failed
tests. If we failed on the first one, it would significantly slow me
down.

> --
> --
> Chromium Developers mailing list: chromi...@chromium.org
> View archives, change email options, or unsubscribe:
> http://groups.google.com/a/chromium.org/group/chromium-dev
> ---
> You received this message because you are subscribed to the Google Groups
> "Chromium-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to chromium-dev...@chromium.org.

Adam Rice

unread,

Nov 25, 2015, 8:55:37 AM11/25/15

to Tomasz Mikolajewski, Paweł Hajdan, Jr., chromium-dev, infr...@chromium.org

In my experience invalid test results have a very poor signal-to-noise ratio. I don't even want to hear about them until there is either solid proof that my patch is to blame, or they are the only thing left blocking my CQ/try job from completing.

So my answer would be: abort all other tests immediately, run the bad test both with and without my patch until confidence is attained, and then if it is my patch that's causing the problem run the other tests and give me the results. Or even better, tell me about the failing test as soon as confidence is acquired while running the rest of the tests in the background.

Joe Mason

unread,

Nov 25, 2015, 9:31:58 AM11/25/15

to mto...@google.com, Paweł Hajdan, Jr., chromium-dev, infr...@chromium.org

Maybe it should abort on first failure normally, but show all results on a dry run. (On the other hand, giving different output breaks the normal semantics of a "dry run"...)

Carlos Knippschild

unread,

Nov 25, 2015, 12:16:07 PM11/25/15

to joenot...@chromium.org, mto...@google.com, Paweł Hajdan, Jr., chromium-dev, infr...@chromium.org

In this same topic: would it be possible to color tests that failed solely due to invalid results differently? It would save time on having to open each failed test to know if it's invalid or if there are actual failures (which in itself is a very manual process).

John Abd-El-Malek

unread,

Nov 25, 2015, 12:23:29 PM11/25/15

to Paweł Hajdan, Jr., chromium-dev, infr...@chromium.org

My first question is: what exactly is invalid test results? I have seen that for both buggy patches, but also infra failures.

For infra failures, I can see that argument to stopping immediately. Do we have monitoring on these failures that is looked at by troopers?

For dev failures, I don't see why we'd want to stop immediately instead of continuing to run the other tests? It would be frustrating if a patch breaks multiple test suites that one would have to serially find this out.

--
You received this message because you are subscribed to the Google Groups "infra-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+...@chromium.org.
To post to this group, send email to infr...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/infra-dev/CAATLsPaVmFwG9JnmJsajYfm1D1%2B2UL0-pyJTWWurBnXQA2UaPA%40mail.gmail.com.

Carlos Knippschild

unread,

Nov 25, 2015, 12:38:53 PM11/25/15

to John Abd-El-Malek, Paweł Hajdan, Jr., chromium-dev, infr...@chromium.org

On Wed, Nov 25, 2015 at 6:22 PM, John Abd-El-Malek <j...@chromium.org> wrote:

My first question is: what exactly is invalid test results? I have seen that for both buggy patches, but also infra failures.

My understanding of invalid is: if the same test suit failed both with patch and without patch that's an invalid result. It doesn't mean the patch doesn't cause issues, but it does mean that that run can't prove whether it does or not. And it might not necessarily be related to infra; it could also be that tot is in a bad state.

And to clarify: what I meant by solely invalid is a bot run where all suite failures where considered invalid.

For infra failures, I can see that argument to stopping immediately. Do we have monitoring on these failures that is looked at by troopers?

For dev failures, I don't see why we'd want to stop immediately instead of continuing to run the other tests? It would be frustrating if a patch breaks multiple test suites that one would have to serially find this out.

Your second point made me think interrupting is in fact counter productive: It is useful to know about other actually failing tests as I'd be able to keep working on them even though there are invalid ones (which I would not work on).

Dirk Pranke

unread,

Nov 25, 2015, 1:41:22 PM11/25/15

to Carlos Knippschild, John Abd-El-Malek, Paweł Hajdan, Jr., chromium-dev, infr...@chromium.org

On Wed, Nov 25, 2015 at 9:37 AM, Carlos Knippschild <car...@chromium.org> wrote:

On Wed, Nov 25, 2015 at 6:22 PM, John Abd-El-Malek <j...@chromium.org> wrote:
My first question is: what exactly is invalid test results? I have seen that for both buggy patches, but also infra failures.

My understanding of invalid is: if the same test suit failed both with patch and without patch that's an invalid result. It doesn't mean the patch doesn't cause issues, but it does mean that that run can't prove whether it does or not. And it might not necessarily be related to infra; it could also be that tot is in a bad state.

And to clarify: what I meant by solely invalid is a bot run where all suite failures where considered invalid.

Actually, I believe Paweł is referring to a very specific type of failure that occurs when the executed test doesn't return a result in the format we're expecting. Most test steps pass structured JSON back, and the recipe validates the data; if the data is invalid for some reason, then we report that and fail the test.

Example: https://code.google.com/p/chromium/codesearch?q=blinktest#chromium/build/scripts/slave/recipe_modules/chromium_tests/steps.py&l=1718

This failure can occur either because of a bug in infra or a bug in a src-side script, as John says.

For infra failures, I can see that argument to stopping immediately. Do we have monitoring on these failures that is looked at by troopers?
:
For dev failures, I don't see why we'd want to stop immediately instead of continuing to run the other tests? It would be frustrating if a patch breaks multiple test suites that one would have to serially find this out.

Your second point made me think interrupting is in fact counter productive: It is useful to know about other actually failing tests as I'd be able to keep working on them even though there are invalid ones (which I would not work on).

Right. It's rare that an invalid test result on one step means that *every* step will be invalid. So, from a developer's point of view there can still be value in running the remaining tests and retrying without the patch, to help figure out what else might break.

From the CQ's point of view, though, Paweł is correct that we shouldn't ever land a patch that has failed in that way, because something's wrong somewhere :).

In this case I would err on the side of the developer (run the other tests), but make sure we send up enough alerts so that people can triage in to see if the infra structure is broken, or there are unrelated bugs in the src-side scripts, or if the patch itself is at fault.

Make sense?

-- Dirk

Marc-Antoine Ruel

unread,

Nov 25, 2015, 2:29:12 PM11/25/15

to Dirk Pranke, Carlos Knippschild, John Abd-El-Malek, Paweł Hajdan, Jr., chromium-dev, infr...@chromium.org

2015-11-25 13:39 GMT-05:00 Dirk Pranke <dpr...@chromium.org>:

On Wed, Nov 25, 2015 at 9:37 AM, Carlos Knippschild <car...@chromium.org> wrote:

On Wed, Nov 25, 2015 at 6:22 PM, John Abd-El-Malek <j...@chromium.org> wrote:
My first question is: what exactly is invalid test results? I have seen that for both buggy patches, but also infra failures.

My understanding of invalid is: if the same test suit failed both with patch and without patch that's an invalid result. It doesn't mean the patch doesn't cause issues, but it does mean that that run can't prove whether it does or not. And it might not necessarily be related to infra; it could also be that tot is in a bad state.

And to clarify: what I meant by solely invalid is a bot run where all suite failures where considered invalid.

Actually, I believe Paweł is referring to a very specific type of failure that occurs when the executed test doesn't return a result in the format we're expecting. Most test steps pass structured JSON back, and the recipe validates the data; if the data is invalid for some reason, then we report that and fail the test.

Example: https://code.google.com/p/chromium/codesearch?q=blinktest#chromium/build/scripts/slave/recipe_modules/chromium_tests/steps.py&l=1718

This failure can occur either because of a bug in infra or a bug in a src-side script, as John says.

It's tricky; for example a DLL could be missing so the executable didn't start at all. You don't want this to get committed. Yet that's listed as a missing shard.

For infra failures, I can see that argument to stopping immediately. Do we have monitoring on these failures that is looked at by troopers?
:
For dev failures, I don't see why we'd want to stop immediately instead of continuing to run the other tests? It would be frustrating if a patch breaks multiple test suites that one would have to serially find this out.

Your second point made me think interrupting is in fact counter productive: It is useful to know about other actually failing tests as I'd be able to keep working on them even though there are invalid ones (which I would not work on).

Right. It's rare that an invalid test result on one step means that *every* step will be invalid. So, from a developer's point of view there can still be value in running the remaining tests and retrying without the patch, to help figure out what else might break.

From the CQ's point of view, though, Paweł is correct that we shouldn't ever land a patch that has failed in that way, because something's wrong somewhere :).

In this case I would err on the side of the developer (run the other tests), but make sure we send up enough alerts so that people can triage in to see if the infra structure is broken, or there are unrelated bugs in the src-side scripts, or if the patch itself is at fault.

Make sense?

-- Dirk

--
You received this message because you are subscribed to the Google Groups "infra-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+...@chromium.org.
To post to this group, send email to infr...@chromium.org.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/infra-dev/CAEoffTBPxE9xU3p-6z_bey5Cm%2BppVB41sY5ezQ_Gf5PWGgxBPw%40mail.gmail.com.

Paweł Hajdan, Jr.

unread,

Nov 25, 2015, 2:36:08 PM11/25/15

to Marc-Antoine Ruel, Dirk Pranke, Carlos Knippschild, John Abd-El-Malek, chromium-dev, infr...@chromium.org

Thanks for the feedback so far. There are good points, and there's also fair amount of confusion. Let me try to clarify.

1. We do not retry invalid test results without patch.

2. I'd define invalid test results as "couldn't determine a list of individual failing tests".

3. I don't think we're hitting issues with something 100% misconfigured between the harness and recipe. However, it seems to me some test harnesses don't seem to handle various edge cases well, and just bail out.

4. Make sure to see example build http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_android_rel_ng/builds/100812 . Test results are invalid because telemetry test harness crashed ("ImportError: No module named pkg_resources"). I think the recipe is handling this as expected, but let me know if you see room for improvement.

5. Failures with and without the patch are not considered invalid results. In fact, we ignore failures that happened with the patch if they also happened without the patch. This doesn't happen at entire step level, but individual test level. If we can't get a list of individual failed test, it doesn't make sense to try without patch.

6. I'm not aware of working and used monitoring solution for infra failures, but I've been active on an internal email thread about it.

7. Yes, there's tradeoff here: this could help reduce cycle time, but potentially make individual try jobs less useful to developers. On one hand I can understand the reluctance to make this change. On the other hand, I'm just looking for ways to avoid CQ cycle time regressions. One possibility might be to try making the change and revert if it makes debugging harder.

7a. If you can provide examples of try runs that had invalid test results but other test results were still helpful to you, please let me know.

Paweł

Dirk Pranke

unread,

Nov 25, 2015, 3:28:40 PM11/25/15

to Paweł Hajdan, Jr., Marc-Antoine Ruel, Carlos Knippschild, John Abd-El-Malek, chromium-dev, infr...@chromium.org

On Wed, Nov 25, 2015 at 11:35 AM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:

Thanks for the feedback so far. There are good points, and there's also fair amount of confusion. Let me try to clarify.

1. We do not retry invalid test results without patch.

2. I'd define invalid test results as "couldn't determine a list of individual failing tests".

Right. I think this is supposed to reflect "something went wrong in the integration of the two components"

more than "one component is totally broken".

3. I don't think we're hitting issues with something 100% misconfigured between the harness and recipe. However, it seems to me some test harnesses don't seem to handle various edge cases well, and just bail out.

Yup.

4. Make sure to see example build http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_android_rel_ng/builds/100812 . Test results are invalid because telemetry test harness crashed ("ImportError: No module named pkg_resources"). I think the recipe is handling this as expected, but let me know if you see room for improvement.

In this particular case I would argue that one of the src-side scripts should be prepared to handle the case where //tools/perf/run_tests crashes on start up and hand back something intelligible; I'm not sure if the right thing to do is to modify //scripts/infra/runtest_wrapper.py, or //testing/scripts/telemetry_perf_unittests.py or //testing/scripts/common.py , though.

5. Failures with and without the patch are not considered invalid results. In fact, we ignore failures that happened with the patch if they also happened without the patch. This doesn't happen at entire step level, but individual test level. If we can't get a list of individual failed test, it doesn't make sense to try without patch.

6. I'm not aware of working and used monitoring solution for infra failures, but I've been active on an internal email thread about it.

7. Yes, there's tradeoff here: this could help reduce cycle time, but potentially make individual try jobs less useful to developers. On one hand I can understand the reluctance to make this change. On the other hand, I'm just looking for ways to avoid CQ cycle time regressions. One possibility might be to try making the change and revert if it makes debugging harder.

7a. If you can provide examples of try runs that had invalid test results but other test results were still helpful to you, please let me know.

Paweł

On Wed, Nov 25, 2015 at 8:28 PM, Marc-Antoine Ruel <mar...@chromium.org> wrote:
2015-11-25 13:39 GMT-05:00 Dirk Pranke <dpr...@chromium.org>:
On Wed, Nov 25, 2015 at 9:37 AM, Carlos Knippschild <car...@chromium.org> wrote:

On Wed, Nov 25, 2015 at 6:22 PM, John Abd-El-Malek <j...@chromium.org> wrote:
My first question is: what exactly is invalid test results? I have seen that for both buggy patches, but also infra failures.

My understanding of invalid is: if the same test suit failed both with patch and without patch that's an invalid result. It doesn't mean the patch doesn't cause issues, but it does mean that that run can't prove whether it does or not. And it might not necessarily be related to infra; it could also be that tot is in a bad state.

And to clarify: what I meant by solely invalid is a bot run where all suite failures where considered invalid.

Actually, I believe Paweł is referring to a very specific type of failure that occurs when the executed test doesn't return a result in the format we're expecting. Most test steps pass structured JSON back, and the recipe validates the data; if the data is invalid for some reason, then we report that and fail the test.

Example: https://code.google.com/p/chromium/codesearch?q=blinktest#chromium/build/scripts/slave/recipe_modules/chromium_tests/steps.py&l=1718

This failure can occur either because of a bug in infra or a bug in a src-side script, as John says.

It's tricky; for example a DLL could be missing so the executable didn't start at all. You don't want this to get committed. Yet that's listed as a missing shard.

This is another case where I'd expect some script to hand back an intelligible, more specific error rather than just "test results are invalid".

-- Dirk

John Abd-El-Malek

unread,

Nov 25, 2015, 5:18:42 PM11/25/15

to Paweł Hajdan, Jr., Marc-Antoine Ruel, Dirk Pranke, Carlos Knippschild, chromium-dev, infr...@chromium.org

On Wed, Nov 25, 2015 at 11:35 AM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:

Thanks for the feedback so far. There are good points, and there's also fair amount of confusion. Let me try to clarify.

1. We do not retry invalid test results without patch.

2. I'd define invalid test results as "couldn't determine a list of individual failing tests".

I'm still not sure I know the answer to my question: can this happen because of both infra and dev bugs? My experience has been yes.

3. I don't think we're hitting issues with something 100% misconfigured between the harness and recipe. However, it seems to me some test harnesses don't seem to handle various edge cases well, and just bail out.

4. Make sure to see example build http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_android_rel_ng/builds/100812 . Test results are invalid because telemetry test harness crashed ("ImportError: No module named pkg_resources"). I think the recipe is handling this as expected, but let me know if you see room for improvement.

5. Failures with and without the patch are not considered invalid results. In fact, we ignore failures that happened with the patch if they also happened without the patch. This doesn't happen at entire step level, but individual test level. If we can't get a list of individual failed test, it doesn't make sense to try without patch.

6. I'm not aware of working and used monitoring solution for infra failures, but I've been active on an internal email thread about it.

7. Yes, there's tradeoff here: this could help reduce cycle time,

How is this reducing cycle time? Is it because changes iwth infra failures will notice the bad tryjob earlier and then retry sooner?

If so, I don't think this is what will drop our latency. By the time we have heard back from one swarming bot, we'll have heard back from other swarming bots around that same time. At most, there'd be a few minutes difference. Most of the time was before this: syncing & building.

Dirk Pranke

unread,

Nov 25, 2015, 5:29:33 PM11/25/15

to John Abd-El-Malek, Paweł Hajdan, Jr., Marc-Antoine Ruel, Carlos Knippschild, chromium-dev, infr...@chromium.org

On Wed, Nov 25, 2015 at 2:17 PM, John Abd-El-Malek <j...@chromium.org> wrote:

On Wed, Nov 25, 2015 at 11:35 AM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:
Thanks for the feedback so far. There are good points, and there's also fair amount of confusion. Let me try to clarify.

1. We do not retry invalid test results without patch.

2. I'd define invalid test results as "couldn't determine a list of individual failing tests".

I'm still not sure I know the answer to my question: can this happen because of both infra and dev bugs? My experience has been yes.

I'm sorry, I thought I answered this. Yes, this can happen because of both infra and dev bugs.

Though, as Paweł commented, bugs in build-side scripts that cause this are quite rare; usually the problem is something src-side.

I would actually refine what I said earlier, which is that I'd like to see us draw a line that says that the only way we should get

"invalid test results" is from a bug in build-side scripts, or a change to something in //infra or //testing/scripts . A src-side CL that doesn't

change something in //infra or //testing should never be able to produce an "invalid test results" failure, only some other kind of

failure.

That would then allow us to say that getting "invalid test results" means some sort of infra issue (where changes in //infra and //testing/scripts

are also considered infra for that purpose, which makes sense as those directories are mostly OWNed by infra committers).

The goal, obviously, is that we want to be able classify as many failures as possible as "potentially caused by the CL" or "not caused by the CL".

Does that make sense?

-- Dirk

John Abd-El-Malek

unread,

Nov 25, 2015, 5:34:27 PM11/25/15

to Dirk Pranke, Paweł Hajdan, Jr., Marc-Antoine Ruel, Carlos Knippschild, chromium-dev, infr...@chromium.org

On Wed, Nov 25, 2015 at 2:28 PM, Dirk Pranke <dpr...@chromium.org> wrote:

On Wed, Nov 25, 2015 at 2:17 PM, John Abd-El-Malek <j...@chromium.org> wrote:

On Wed, Nov 25, 2015 at 11:35 AM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:
Thanks for the feedback so far. There are good points, and there's also fair amount of confusion. Let me try to clarify.

1. We do not retry invalid test results without patch.

2. I'd define invalid test results as "couldn't determine a list of individual failing tests".

I'm still not sure I know the answer to my question: can this happen because of both infra and dev bugs? My experience has been yes.

I'm sorry, I thought I answered this. Yes, this can happen because of both infra and dev bugs.

Sorry I missed that.

I thought I've seen this from my own patches (that didn't involve changes to src-side scripts). i.e. causing a testsuite to not return results because it crashes immediately.

Dirk Pranke

unread,

Nov 25, 2015, 5:37:30 PM11/25/15

to John Abd-El-Malek, Paweł Hajdan, Jr., Marc-Antoine Ruel, Carlos Knippschild, chromium-dev, infr...@chromium.org

On Wed, Nov 25, 2015 at 2:33 PM, John Abd-El-Malek <j...@chromium.org> wrote:

On Wed, Nov 25, 2015 at 2:28 PM, Dirk Pranke <dpr...@chromium.org> wrote:

On Wed, Nov 25, 2015 at 2:17 PM, John Abd-El-Malek <j...@chromium.org> wrote:

On Wed, Nov 25, 2015 at 11:35 AM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:
Thanks for the feedback so far. There are good points, and there's also fair amount of confusion. Let me try to clarify.

1. We do not retry invalid test results without patch.

2. I'd define invalid test results as "couldn't determine a list of individual failing tests".

I'm still not sure I know the answer to my question: can this happen because of both infra and dev bugs? My experience has been yes.

I'm sorry, I thought I answered this. Yes, this can happen because of both infra and dev bugs.

Sorry I missed that.

I thought I've seen this from my own patches (that didn't involve changes to src-side scripts). i.e. causing a testsuite to not return results because it crashes immediately.

Right, and I'm suggesting that we need to bulletproof our scripts against this so it can be reported as a different kind of failure. "invalid test results" is not nearly as useful as "test crashed immediately" or something like that :). Of course, how you detect the different kinds of failures can be complicated, but I think we can do a better job than we're currently doing, as I've tried to illustrate elsewhere in this thread.

-- Dirk

John Abd-El-Malek

unread,

Nov 25, 2015, 6:53:30 PM11/25/15

to Dirk Pranke, Paweł Hajdan, Jr., Marc-Antoine Ruel, Carlos Knippschild, chromium-dev, infr...@chromium.org

Got it. Yes, if error X (invalid_test_results, or anything else) is known to be caused only by infra failures, then stopping the build immediately sounds good. I got confused because I thought this proposal would also apply to buggy dev patches.

-- Dirk

Dirk Pranke

unread,

Nov 25, 2015, 8:06:19 PM11/25/15

to John Abd-El-Malek, Paweł Hajdan, Jr., Marc-Antoine Ruel, Carlos Knippschild, chromium-dev, infr...@chromium.org

I still think there can be value in continuing to run the other test steps, because a bug that affects one

may not necessarily affect the others.

That said, given that we should be striving for a world where this error doesn't happen at all, I'd be

fine w/ us aborting early now as long as infra is on top of fixing the issues causing this. If that turns out