As part of an experiment to reserve regular time for "product excellence", the Views Toolkit subteam recently spent three weeks re-enabling disabled UI tests. Hopefully, hearing our results will encourage your team to spend time on product excellence too! (If you don't care about this, you can close this email.)
Takeaway: This seemed like a good use of our time, didn't lead to burnout, and most of the tests we re-enabled likely wouldn't have been re-enabled in the foreseeable future without our work. We also learned about our testing infrastructure and more of the UI code. We'll likely do this again.
Stats:
- 5 engineers participated for a total of about 12 engineer-weeks: kylixrd (1.5 weeks), pkasting, rameier, robliao (~half time), weili
- Re-enabled 85 tests (2 of which turned out to still be problematic and were reverted) across 21 directories. Some tests were previously disabled for all targets and some only for specific platforms. Most were re-enabled for all platforms, but with a few we only partially re-enabled.
- There are still 47 other tests "in progress", most of which will hopefully be re-enabled before EOY.
- For purposes of estimation, if we count in-progress tests as "half done", this means we re-enabled about 1.8 tests per engineer per day.
- Associated bugs go all the way back to May 2009. Many of the tests had been disabled for years. The vast majority worked fine with no changes when re-enabled. Of course, we did have to fix some real bugs, but the conclusion is that many tests that get disabled rot unnecessarily for years.
Stories:
Below are some problems we encountered when things didn't just work, as written by the engineers themselves.
***
What's In A Name? by Rob Liao
In my attempt to reenable ui_base_unittests L10nUtilTest.GetString, I innocently named the test resources label as ui_base_test_resources. Imagine to my surprise that once I started integrating against Android that repackaging the resources with android_assets suddenly requested ui_base_test_resources__build_config. Even more surprising is that components_browsertests_assets's components_tests_pak had no such dependency.
As far as I can tell, none of this is documented in Chromium's
Android build_config documents. The takeaway here is that when structuring your build system, doing special things based off of a build label identifiers name is likely to cause surprises in the future. A better way forward would be to provide a way to opt-into the magic instead of reusing mechanisms that should be passthrough.
***
In The Key Of... by Allen Bauer
While re-enabling interactive_ui_tests' FindInPageTest.SelectionRestoreOnTabSwitch and related I discovered that the reason for failures is the manner in which Chrome interacts with injected keyboard events on Windows. The test code itself is reasonable. It is the keyboard event injection code using the Windows API, SendInput, that seems to be behaving oddly with Chrome.
For some reason, phantom WM_KEYUP events are being placed into the message queue. For instance, the key events being injected are: shift-key-down, left-arrow-down, left-arrow-up, and shift-key-up. However, the events are being pulled from the queue as such: shift-key-down, shift-key-up(???), left-arrow-down, left-arrow-up, shift-key-up. This has the effect of resetting the key state of key modifiers (shift, ctrl, alt) which means the test fails. In order to verify that the problem is not in the Windows API, I wrote a simple Windows app and verified that SendInput does behave properly.
Unfortunately, this is an ongoing story without a happy ending. I enjoy a a mystery and challenge as much as anybody, but after spending several days of debugging, logging, and scouring the code, I'm still unable to determine where the phantom key up event is being placed into the message queue. When I get back to this issue, I will probably need to fire up the kernel debugger and "go primal" on this.
***
How Hard Could It Be? by Peter Kasting
The test comment said "only enabled on ChromeOS for now, since it's hard to align an EventGenerator when multiple native windows are involved". I didn't know what an EventGenerator was, but it turned out the reason for the test not working on Windows was simple: the EventGenerator wasn't correctly converting between window-relative coordinates and screen-space coordinates for the mouse events it was trying to generate.
Surely that couldn't be too hard to fix... ah, EventGenerator references something called a ScreenPositionClient to do this conversion. And Windows initializes one by default -- but the default EventGeneratorDelegate didn't bother telling the generator about it.
Easy fix! Right?
So
re-enabling one test required touching dozens of other files and hundreds of lines of changes across more than a half dozen CLs and a couple weeks of work. And the test still doesn't work on Mac. :/ OTOH, I made EventGenerator work more consistently, with better documentation, and fixed a variety of bugs in other tests too. Hopefully worth it in the end.
***
Try It Yourself:
- Look for disabled tests with a query like this -- just add "file:your/module" to the end to narrow the search.
- Pick a directory you know something about and blindly re-enable all the disabled tests. (If there are a lot and you intend to actually fix failures you find, don't tackle more than two dozen disabled tests at once; that should be enough to turn up some real bugs.)
- Run locally; --gtest_filter=testA:testB:testC lets you run multiple specific tests.
- When you find broken tests, either:
- Fix them, if you can! or,
- Leave them disabled, but make sure there's an open, assigned bug that's clear about what fails and why, and the disable message references the bug. No disabled tests without bugs, or referencing vague or long-closed bugs please!
- For tests that were supposed to have been flaky before, run locally 50 times (--gtest_repeat=50) if you can test on the relevant platform(s).
- Upload your CL and run on the trybots. For flaky tests you didn't test locally, do at least a couple of trybot runs on those platforms. You'll need to force re-runs by choosing specific bots from the "CHOOSE TRYJOBS" button; just hitting "dry run" again won't always re-run everything.
- Profit!
And if you read this far... thanks! Hope it was worth your time.
PK