+1 to this policy.Flaky tests have direct impact on productivity in so far as they impact the time it takes to land patches (flaky tests cause retries) as well as making it harder to sheriff. I'm all for disabling flaky tests. That said, sheriffs need to make an effort before blindly disabling all tests. A test that has run reliably for a long time and suddenly starts flaking is typically an indication that something recent is to blame and should be reverted.-ScottTo view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAKARY_m43UO9gJkPw%2BfjCS2RV-e%2B1FvMoNP2H0h6-gAGqe6O%3DA%40mail.gmail.com.--On Tue, Nov 14, 2017 at 12:25 PM, Dirk Pranke <dpr...@chromium.org> wrote:Hi all,--I'm sheriff today, and in the course of my sheriff-y duties, I've realized that our policy for how we're supposed to handle flaky tests is perhaps unclear or not in broad agreement.Specifically, it is my belief that we are supposed to disable any flaky tests we come across [1]. However, the documentation I can point at [2] suggests that you're supposed to let them run. I think the documentation needs to be changed.However, before I do that, I wanted to see if there was something I'm missing.From the discussions we've had in the past, the objections to disabling tests generally tend to be twofold:1) If you disable a flaky test, you lose whatever potential coverage that test might be giving us.2) If you disable a flaky test, you can't easily tell if the test stops being flaky and starts passing (or failing consistently).These are both valid objections.The counterarguments I've heard (and made) to the first is that you can't actually trust the coverage the test might be giving you since it's too hard to tell different failure modes apart, and so the value you get from continuing to run things is outweighed by the confusion of looking at ongoing failures and trying to figure out if they're new or not.The counterarguments to the second are that the cost of continuing to run the flaky test (both in itself, and in potential knock-on costs due to the way we run tests and one test might interfere with another) outweighs the value you get by doing so (i.e., things don't fix themselves often enough, and rapidly enough).These objections were discussed in the thread [3] linked from the current documentation. That discussion happened in 2012, so, quite some time ago. I don't remember offhand discussions that have happened since then, but I do know that we got rid of the ability -- in gtests -- to mark a test as FLAKY, so there currently isn't a way to keep running a test but ignore failures. We could arguably make this possible, but for the moment I'm going to declare that a separate discussion.There are related objections and topics that are relevant. First, there's the question of how we would track disabled tests, and lindsayw@ has been working on processes that we'll start following soon for this. Second, there's the question of measuring test (and code) coverage, so we can start to learn whether this matters or not, and baxley@ and liaoyuke@ and others are working on a plan for that as well.So, to repeat: I think we should be disabling flaky tests, and I think the docs need to be updated to make that extra clear.Anyone strongly object to this?-- Dirk[1] I double-checked with a few other folks just to make sure I wasn't misrembering[2] https://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium/handling-a-failing-test
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAEoffTC%2BfeGZ3CUkAvN4E9PHfBYkkr3CK92Ta%2Bgef0-1%2BkCyYg%40mail.gmail.com.
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
(forking to avoid detracting from Dirk's policy point. :)While porting Chromium to the new Fuchsia operating system, we've encountered a number of flakes that turned out to be cross-platform issues that just repro'd more often for us due to scheduler & emulation introducing timing differences - many such issues also repro on other platforms, but sufficiently rarely that they're effectively masked by the retry logic in both the test-launcher and the bot infrastructure.We also have tests which appear to pass, but are actually taking longer than our allotted per-test-timeout, and get away with that because they happen to be run in a batch with a load of fast-running tests - these slow tests "pass" only because they're gobbling up other tests' timeout allocation, and if the sharding & batching of tests changes then we'll see them start to mysteriously fail.There are two bugs with proposals for reducing flake:1. Add bots which run "10x" testing (crbug.com/782656 was filed for this for layout tests), to increase likelihood of repro'ing flake.2. Remove retries from most tests (crbug.com/697215), so that flakey tests can't sneak past the bots so easily.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CALekkJeHNXEYLCi4PCJxVf2x1sg7J4sukVpNu4LTE_y461g%2BZQ%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAKARY_nAHPu%3D2qx8Lci8NTvjKz-Os-QYUkwefSk4BHqR%3DYptMA%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAKARY_%3De-qsnKOmR9iW%2Botzr91j5Q7W5B9%3D2O6D63OBWo_MtjQ%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAMGE5NFg7qj2r9O4pTMboNzLjzr41LV3-uRh9K0BqoXgCGTQYw%40mail.gmail.com.
Summary of all test iterations:
2 tests failed:
AutoclickTest.MouseMovement (../../ash/autoclick/autoclick_unittest.cc:140)
WindowManagerTest.UpdateCursorVisibilityOnKeyEvent (../../ash/wm/window_manager_unittest.cc:776)
4 tests crashed:
HighlighterControllerTest.DetachedClient (../../ash/highlighter/highlighter_controller_unittest.cc:431)
MultiMirroringTest.SwitchToAndFromSoftwareMirrorMode (../../ash/display/display_manager_unittest.cc:3940)
PaletteTrayTestWithVoiceInteraction.MetalayerToolActivatesHighlighter (../../ash/system/palette/palette_tray_unittest.cc:332)
ToplevelWindowEventHandlerTest.GestureDragCaptureLoss (../../ash/wm/toplevel_window_event_handler_unittest.cc:974)
7 tests had unknown result:
HighlighterControllerTest.HighlighterGestures (../../ash/highlighter/highlighter_controller_unittest.cc:173)
HighlighterControllerTest.HighlighterPrediction (../../ash/highlighter/highlighter_controller_unittest.cc:148)
MultiMirroringTest.HardwareMirrorMode (../../ash/display/display_manager_unittest.cc:3828)
ToplevelWindowEventHandlerTest.EasyResizerUsedForTransient (../../ash/wm/toplevel_window_event_handler_unittest.cc:742)
ToplevelWindowEventHandlerTest.EscapeReverts (../../ash/wm/toplevel_window_event_handler_unittest.cc:849)
ToplevelWindowEventHandlerTest.GestureDragForUnresizableWindow (../../ash/wm/toplevel_window_event_handler_unittest.cc:773)
ToplevelWindowEventHandlerTest.MinimizeMaximizeCompletes (../../ash/wm/toplevel_window_event_handler_unittest.cc:866)
End of the summary.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/b8a7bbf7-e009-498f-9401-7ec1c1ab4542%40chromium.org.
Responding to a few different things at once:To Colin's (and, presumably, others') concern: we will not change policies if it makes the CQ even flakier. The flake rate is something that we actively monitor, and it's already too high. This is not something I consider up for debate :).
On Wed, Nov 15, 2017 at 6:56 PM Dirk Pranke <dpr...@chromium.org> wrote:Responding to a few different things at once:To Colin's (and, presumably, others') concern: we will not change policies if it makes the CQ even flakier. The flake rate is something that we actively monitor, and it's already too high. This is not something I consider up for debate :).This statement sounds good to me! But how would removing retry *not* make the CQ flakier as an immediate consequence?
I've noticed that test retries have allowed tests that flake depending on the test execution order to land. Test T1 runs and modifies some global state, then test T2 runs and fails because of that global state. When T2 is retried it passes because the retry happens in a new process where that global state hasn't been changed. I would imagine running most of the big test targets with no retries and permuting the order (eg. "--gtest_shuffle --test-launcher-retry-limit=0 --gtest_repeat=10") produces interesting flakes. For example, here is what I got running ash_unittests 10 times.Summary of all test iterations:
2 tests failed:
AutoclickTest.MouseMovement (../../ash/autoclick/autoclick_unittest.cc:140)
WindowManagerTest.UpdateCursorVisibilityOnKeyEvent (../../ash/wm/window_manager_unittest.cc:776)
4 tests crashed:
HighlighterControllerTest.DetachedClient (../../ash/highlighter/highlighter_controller_unittest.cc:431)
MultiMirroringTest.SwitchToAndFromSoftwareMirrorMode (../../ash/display/display_manager_unittest.cc:3940)
PaletteTrayTestWithVoiceInteraction.MetalayerToolActivatesHighlighter (../../ash/system/palette/palette_tray_unittest.cc:332)
ToplevelWindowEventHandlerTest.GestureDragCaptureLoss (../../ash/wm/toplevel_window_event_handler_unittest.cc:974)
7 tests had unknown result:
HighlighterControllerTest.HighlighterGestures (../../ash/highlighter/highlighter_controller_unittest.cc:173)
HighlighterControllerTest.HighlighterPrediction (../../ash/highlighter/highlighter_controller_unittest.cc:148)
MultiMirroringTest.HardwareMirrorMode (../../ash/display/display_manager_unittest.cc:3828)
ToplevelWindowEventHandlerTest.EasyResizerUsedForTransient (../../ash/wm/toplevel_window_event_handler_unittest.cc:742)
ToplevelWindowEventHandlerTest.EscapeReverts (../../ash/wm/toplevel_window_event_handler_unittest.cc:849)
ToplevelWindowEventHandlerTest.GestureDragForUnresizableWindow (../../ash/wm/toplevel_window_event_handler_unittest.cc:773)
ToplevelWindowEventHandlerTest.MinimizeMaximizeCompletes (../../ash/wm/toplevel_window_event_handler_unittest.cc:866)
End of the summary.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/b8a7bbf7-e009-498f-9401-7ec1c1ab4542%40chromium.org.
On Wed, Nov 15, 2017 at 9:59 AM, Colin Blundell <blun...@chromium.org> wrote:On Wed, Nov 15, 2017 at 6:56 PM Dirk Pranke <dpr...@chromium.org> wrote:Responding to a few different things at once:To Colin's (and, presumably, others') concern: we will not change policies if it makes the CQ even flakier. The flake rate is something that we actively monitor, and it's already too high. This is not something I consider up for debate :).This statement sounds good to me! But how would removing retry *not* make the CQ flakier as an immediate consequence?Well, yeah, okay, any removal of retries should actually make things flakier, by definition :).But, the lions' share of CQ failures comes from things that are known to be flaky (like telemetry_unittests). If we stop retrying failures for something that basically never fails, like webkit_python_unittests, that'll have no net effect on the overall flakiness, but it might help catch new flakes a bit faster (by not retrying them).
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev+unsubscribe@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAMYvS2fjbOV1kmiCjhJZkhhWyubao1rDq9QZ3dFrMQzFvh67cQ%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CA%2BapAgE57%2BNbBSZuvbvZz3_OvKnPhKWVKAze%3DmkornO-U2RmJw%40mail.gmail.com.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAMYvS2fjbOV1kmiCjhJZkhhWyubao1rDq9QZ3dFrMQzFvh67cQ%40mail.gmail.com.
--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CA%2BapAgE57%2BNbBSZuvbvZz3_OvKnPhKWVKAze%3DmkornO-U2RmJw%40mail.gmail.com.
--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CA%2BN%2BEKaHsws_D9Skc9vOb0nZNvL6kYUDTHXweb8ZE9n2XrHv3A%40mail.gmail.com.
On Wed, Nov 15, 2017 at 10:28 AM, Dirk Pranke <dpr...@chromium.org> wrote:On Wed, Nov 15, 2017 at 9:59 AM, Colin Blundell <blun...@chromium.org> wrote:On Wed, Nov 15, 2017 at 6:56 PM Dirk Pranke <dpr...@chromium.org> wrote:Responding to a few different things at once:To Colin's (and, presumably, others') concern: we will not change policies if it makes the CQ even flakier. The flake rate is something that we actively monitor, and it's already too high. This is not something I consider up for debate :).This statement sounds good to me! But how would removing retry *not* make the CQ flakier as an immediate consequence?Well, yeah, okay, any removal of retries should actually make things flakier, by definition :).But, the lions' share of CQ failures comes from things that are known to be flaky (like telemetry_unittests). If we stop retrying failures for something that basically never fails, like webkit_python_unittests, that'll have no net effect on the overall flakiness, but it might help catch new flakes a bit faster (by not retrying them).I disagree with this assessment and think it unfairly singles out a test suite which inherently acts as a full-browser integration test, uncovering random crashes throughout the code base.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAEoffTC%3DFevvuYJaX0ZpG1BpVHKZMEn9bV-_Rhzi3A_imqQ-6Q%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAOee4m6Mo6NVGogoW7iHzzp_F91NoTxBtr1E8%2BebeONZ5pL6Ng%40mail.gmail.com.
On Wed, Nov 15, 2017 at 7:21 AM <kyle...@chromium.org> wrote:I've noticed that test retries have allowed tests that flake depending on the test execution order to land. Test T1 runs and modifies some global state, then test T2 runs and fails because of that global state. When T2 is retried it passes because the retry happens in a new process where that global state hasn't been changed. I would imagine running most of the big test targets with no retries and permuting the order (eg. "--gtest_shuffle --test-launcher-retry-limit=0 --gtest_repeat=10") produces interesting flakes. For example, here is what I got running ash_unittests 10 times.Summary of all test iterations:
2 tests failed:
AutoclickTest.MouseMovement (../../ash/autoclick/autoclick_unittest.cc:140)
WindowManagerTest.UpdateCursorVisibilityOnKeyEvent (../../ash/wm/window_manager_unittest.cc:776)
4 tests crashed:
HighlighterControllerTest.DetachedClient (../../ash/highlighter/highlighter_controller_unittest.cc:431)
MultiMirroringTest.SwitchToAndFromSoftwareMirrorMode (../../ash/display/display_manager_unittest.cc:3940)
PaletteTrayTestWithVoiceInteraction.MetalayerToolActivatesHighlighter (../../ash/system/palette/palette_tray_unittest.cc:332)
ToplevelWindowEventHandlerTest.GestureDragCaptureLoss (../../ash/wm/toplevel_window_event_handler_unittest.cc:974)
7 tests had unknown result:
HighlighterControllerTest.HighlighterGestures (../../ash/highlighter/highlighter_controller_unittest.cc:173)
HighlighterControllerTest.HighlighterPrediction (../../ash/highlighter/highlighter_controller_unittest.cc:148)
MultiMirroringTest.HardwareMirrorMode (../../ash/display/display_manager_unittest.cc:3828)
ToplevelWindowEventHandlerTest.EasyResizerUsedForTransient (../../ash/wm/toplevel_window_event_handler_unittest.cc:742)
ToplevelWindowEventHandlerTest.EscapeReverts (../../ash/wm/toplevel_window_event_handler_unittest.cc:849)
ToplevelWindowEventHandlerTest.GestureDragForUnresizableWindow (../../ash/wm/toplevel_window_event_handler_unittest.cc:773)
ToplevelWindowEventHandlerTest.MinimizeMaximizeCompletes (../../ash/wm/toplevel_window_event_handler_unittest.cc:866)
End of the summary.Even on this seemingly simple point, the problem is non-trivial. I wrote a little script to do exhaustive pairwise checking, and then ran it on base_unittests. At the time I ran it, 14331 of the 8151025 pairs failed, here's the list*. This is pretty "easy" flake, but sometimes even these can take some work to fix, and that's only one of the smaller binaries on one platform (Linux ASAN base_unittests in this case).I feel our tests are in a deep hole. They require substantial investment. We don't need more tools/dashboards/bots/policies/instrumentation, it's easy to find flaky tests. We "just" need many engineering hours of work, and a will to spend them on this.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAEoffTArMjC0UqtKzYAx9dwSgFE360OmeSmqrxCr99HXzXEidQ%40mail.gmail.com.
(from the right email this time)Scott's list of failing base unit test pairs seems like a good actionable list to deflake that test suite. Perhaps we can start by finding owners for all of those failures and getting to a point that all those tests are either fixed or disabled (based on owners discretion) and then turn off retries for that test suite and see how that looks? Then we can use that to inform us if we want to do the same for other suites or whether we need a different approach.I'm certainly happy to look at any flakes in base metrics tests for my share. Hopefully we can find owners for the other tests too.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev+unsubscribe@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAEoffTArMjC0UqtKzYAx9dwSgFE360OmeSmqrxCr99HXzXEidQ%40mail.gmail.com.
--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAC7btF5-OxJEjviV%3D4qRnda6EWbETKrtSRYdN_AO4UTSG4q8yw%40mail.gmail.com.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev+unsubscribe@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAEoffTArMjC0UqtKzYAx9dwSgFE360OmeSmqrxCr99HXzXEidQ%40mail.gmail.com.
--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAC7btF5-OxJEjviV%3D4qRnda6EWbETKrtSRYdN_AO4UTSG4q8yw%40mail.gmail.com.