I remember a case where a field trial was making tests fail randomly. It was quite difficult and annoying to diagnose.I'd like to disable field trials for tests. Does that sound like a good idea? If so, what'd be the best to do that?
Responding to a possible counter-point that our tests should reflect the product as closely as possible: well, we already disable "web resources", preconnect, first run, default browser check, safebrowsing auto-update, force native password store on Linux and override GPU settings. Field trials by definition add randomness to the tests, and in my opinion having more stable tests hugely outweighs any possible losses of coverage because of the disabling. If something is really important, a test should be added has that field trial always enabled.
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
On Tue, Sep 27, 2011 at 4:06 PM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:
I remember a case where a field trial was making tests fail randomly. It was quite difficult and annoying to diagnose.I'd like to disable field trials for tests. Does that sound like a good idea? If so, what'd be the best to do that?
Disable them or have them always return the default group? Some field trials enable a feature by default and disable it in the field trial, some the opposite. I think there's a tacit assumption that the default group is what is expected to be stable and the alternatives might become the default in time.
Returning the default group for all field trials is trivial if you can determine that you're running in a test environment. That would have to be a compile-time flag I think. It would be necessary to allow some tests to override this, however, so that they can test all code-paths.
Bottom-line is that the alternative code paths from a field test should not cause regressions. If you agree, then we should keep them enabled but make it easy to either force a test to be run under a particular field trial group or enforce checking of field trial group before running a particular test.
If something is really important, a test should be added [that] has that field trial always enabled.
On Tue, Sep 27, 2011 at 16:45, Dominic Hamon <domi...@google.com> wrote:
On Tue, Sep 27, 2011 at 4:06 PM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:
I remember a case where a field trial was making tests fail randomly. It was quite difficult and annoying to diagnose.I'd like to disable field trials for tests. Does that sound like a good idea? If so, what'd be the best to do that?Disable them or have them always return the default group? Some field trials enable a feature by default and disable it in the field trial, some the opposite. I think there's a tacit assumption that the default group is what is expected to be stable and the alternatives might become the default in time.Yeah, to be precise I think I mean using always the default group.Returning the default group for all field trials is trivial if you can determine that you're running in a test environment. That would have to be a compile-time flag I think. It would be necessary to allow some tests to override this, however, so that they can test all code-paths.Why a compile-time flag? Can't we make that a command-line flag?
Bottom-line is that the alternative code paths from a field test should not cause regressions. If you agree, then we should keep them enabled but make it easy to either force a test to be run under a particular field trial group or enforce checking of field trial group before running a particular test.I anticipated that point. See my original e-mail:If something is really important, a test should be added [that] has that field trial always enabled.
I meant that there already is a compile-time flag.
If something is really important, a test should be added [that] has that field trial always enabled.
My point is a little stronger than that. I think we should allow field trials in tests but enforce that tests that are affected by field trials should still be green. If this means running different tests for field trial versions, then so be it.
On Wed, Sep 28, 2011 at 13:19, Dominic Hamon <domi...@google.com> wrote:I meant that there already is a compile-time flag.What's the name of that flag?
And then I'd really like to have something runtime-based. What do you think about adding a command-line flag with the same effect?
If something is really important, a test should be added [that] has that field trial always enabled.
My point is a little stronger than that. I think we should allow field trials in tests but enforce that tests that are affected by field trials should still be green. If this means running different tests for field trial versions, then so be it.I believe this is unrealistic. We have a lot of problems with flakiness. This is a tradeoff: yeah, it would be great to have 100% coverage and no flaky tests, but we're not there. If tests are flaky because of field trials activating randomly (and this has happened at least with prerendering), you are effectively losing coverage because people don't trust the tests and ignore them. And you'll always be behind if you try to fix tests in a reactive way, i.e. detect and patch. I'm looking at a more general solution now, and this effort is a part of it.
On Thu, Sep 29, 2011 at 10:44 AM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:On Wed, Sep 28, 2011 at 13:19, Dominic Hamon <domi...@google.com> wrote:I meant that there already is a compile-time flag.What's the name of that flag?
UNIT_TEST is certainly defined for some subset of tests.
And then I'd really like to have something runtime-based. What do you think about adding a command-line flag with the same effect?I think this opens the door to (more) code that behaves differently during testing than in released builds.
I'm already suspicious of things like TestingProfile that are worthwhile to keep test footprints small, but do change the order of construction/destruction and may hide race conditions, or even cause them. We should be working to minimize the differences between test code and release code.
I agree that it's challenging, but you're proposing a technical solution to a procedural problem. If people ignore the tests then what can we do to make it impossible to ignore them?
Under your proposal field trials would still introduce test regressions but they wouldn't be caught until the field trial is made the default, and that is far later than I'm comfortable with.
I'd rather that engineers are forced to consider the impact of introducing an alternative code path at the time they introduce it.
We shouldn't accept flakiness, and if a field trial introduces it then it should also introduce a fix for that flakiness.
I think I should change the test to check which version of the code they're running against and change the expectations accordingly.
On Thu, Sep 29, 2011 at 11:05, Dominic Hamon <domi...@chromium.org> wrote:
On Thu, Sep 29, 2011 at 10:44 AM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:On Wed, Sep 28, 2011 at 13:19, Dominic Hamon <domi...@google.com> wrote:I meant that there already is a compile-time flag.What's the name of that flag?
UNIT_TEST is certainly defined for some subset of tests.Right, I was thinking you meant something field trial specific. But we can't use UNIT_TEST because the field trial files are compiled into base, which is never compiled with UNIT_TEST. That symbol can only be used in headers or test-only files.And then I'd really like to have something runtime-based. What do you think about adding a command-line flag with the same effect?I think this opens the door to (more) code that behaves differently during testing than in released builds.As said before, we already change the behavior quite a lot, i.e. disable safe browsing auto-update, first run, etc. This is not opening any more doors that are already open. In fact, we have places in the testing infrastructure specifically dedicated to add flags for different runtime behavior (test_launcher_utils).
And our GPU behavior is just completely different in tests than in the real world. In my opinion this point just doesn't apply here.I'm already suspicious of things like TestingProfile that are worthwhile to keep test footprints small, but do change the order of construction/destruction and may hide race conditions, or even cause them. We should be working to minimize the differences between test code and release code.TestingProfile is only used in unit tests, which should mock things out. In browser and ui tests we don't use it (if people do, it's bad). Browser and ui tests start the full browser. They are extremely susceptible to browser-global issues like crashes and so on. That includes field trials.We can't control crash rate directly, but it should be possible to disable field trials very easily. This is a different thing from a mock (which is unit test thing). This is keeping large tests stable by eliminating as much randomness from them as possible.I agree that it's challenging, but you're proposing a technical solution to a procedural problem. If people ignore the tests then what can we do to make it impossible to ignore them?Not impossible, but if we make tests more reliable, ignoring them is much more painful. Actually I don't think the problem is procedural but in fact a technical one. The tests are not stable enough. You can't force everyone who gets unrelated test failures to chase random flakes. The test author/owner needs to fix those problems, otherwise he should be aware that there is effectively no coverage from a flaky test. This is a fact.
Under your proposal field trials would still introduce test regressions but they wouldn't be caught until the field trial is made the default, and that is far later than I'm comfortable with.Are we catching regressions earlier because tests are more flaky? No! People just file bugs which sit there for months. Seriously, the case that made me write this is precisely something against your point. There was no bug in the product, just tests got more unstable. We're not going to magically fix more problems by running more unstable tests and then ignoring them.
I'd rather that engineers are forced to consider the impact of introducing an alternative code path at the time they introduce it.Agreed, this is sort of obvious.We shouldn't accept flakiness, and if a field trial introduces it then it should also introduce a fix for that flakiness.Yeah, again this is unrealistic. We can't detect flakiness on the first check-in that introduces it. By the time people detect it, the reputation of the test is already damaged. And many people ignore flakiness bugs, so the fix isn't always going to be quick.
Furthermore, those issues frequently affect multiple tests, which increases the scale of the problem. Unless you have an equally magical fix for increasing our flakiness removal rate, I think this is not going to work.
I think I should change the test to check which version of the code they're running against and change the expectations accordingly.Are you volunteering to do that for thousands of tests we have and every field trial anyone introduces? Sorry, it just sounds unrealistic to me.
Cheers,
Jói