Results of spending a few weeks enabling tests

Skip to first unread message

Peter Kasting

Dec 14, 2018, 3:13:32 PM12/14/18
to Chromium-dev
As part of an experiment to reserve regular time for "product excellence", the Views Toolkit subteam recently spent three weeks re-enabling disabled UI tests.  Hopefully, hearing our results will encourage your team to spend time on product excellence too!  (If you don't care about this, you can close this email.)

Takeaway: This seemed like a good use of our time, didn't lead to burnout, and most of the tests we re-enabled likely wouldn't have been re-enabled in the foreseeable future without our work.  We also learned about our testing infrastructure and more of the UI code.  We'll likely do this again.

  • 5 engineers participated for a total of about 12 engineer-weeks: kylixrd (1.5 weeks), pkasting, rameier, robliao (~half time), weili
  • Re-enabled 85 tests (2 of which turned out to still be problematic and were reverted) across 21 directories.  Some tests were previously disabled for all targets and some only for specific platforms.  Most were re-enabled for all platforms, but with a few we only partially re-enabled.
  • There are still 47 other tests "in progress", most of which will hopefully be re-enabled before EOY.
  • For purposes of estimation, if we count in-progress tests as "half done", this means we re-enabled about 1.8 tests per engineer per day.
  • Associated bugs go all the way back to May 2009.  Many of the tests had been disabled for years.  The vast majority worked fine with no changes when re-enabled.  Of course, we did have to fix some real bugs, but the conclusion is that many tests that get disabled rot unnecessarily for years.

Below are some problems we encountered when things didn't just work, as written by the engineers themselves.


What's In A Name? by Rob Liao

In my attempt to reenable ui_base_unittests L10nUtilTest.GetString, I innocently named the test resources label as ui_base_test_resources. Imagine to my surprise that once I started integrating against Android that repackaging the resources with android_assets suddenly requested ui_base_test_resources__build_config. Even more surprising is that components_browsertests_assets's components_tests_pak had no such dependency.

Turns out, Android build rules are sensitive to the build label name. If your label name matches in _java_target_whitelist (like a label that ends with _resource), then you'll get this mysterious dependency that you never wanted. This is now .

As far as I can tell, none of this is documented in Chromium's Android build_config documents. The takeaway here is that when structuring your build system, doing special things based off of a build label identifiers name is likely to cause surprises in the future. A better way forward would be to provide a way to opt-into the magic instead of reusing mechanisms that should be passthrough.


In The Key Of... by Allen Bauer

While re-enabling interactive_ui_tests' FindInPageTest.SelectionRestoreOnTabSwitch and related I discovered that the reason for failures is the manner in which Chrome interacts with injected keyboard events on Windows. The test code itself is reasonable. It is the keyboard event injection code using the Windows API, SendInput, that seems to be behaving oddly with Chrome.

For some reason, phantom WM_KEYUP events are being placed into the message queue. For instance, the key events being injected are: shift-key-down, left-arrow-down, left-arrow-up, and shift-key-up.  However, the events are being pulled from the queue as such: shift-key-down, shift-key-up(???), left-arrow-down, left-arrow-up, shift-key-up. This has the effect of resetting the key state of key modifiers (shift, ctrl, alt) which means the test fails. In order to verify that the problem is not in the Windows API, I wrote a simple Windows app and verified that SendInput does behave properly.

Unfortunately, this is an ongoing story without a happy ending. I enjoy a a mystery and challenge as much as anybody, but after spending several days of debugging, logging, and scouring the code, I'm still unable to determine where the phantom key up event is being placed into the message queue. When I get back to this issue, I will probably need to fire up the kernel debugger and "go primal" on this.


How Hard Could It Be? by Peter Kasting

The test comment said "only enabled on ChromeOS for now, since it's hard to align an EventGenerator when multiple native windows are involved".  I didn't know what an EventGenerator was, but it turned out the reason for the test not working on Windows was simple: the EventGenerator wasn't correctly converting between window-relative coordinates and screen-space coordinates for the mouse events it was trying to generate.

Surely that couldn't be too hard to fix... ah, EventGenerator references something called a ScreenPositionClient to do this conversion.  And Windows initializes one by default -- but the default EventGeneratorDelegate didn't bother telling the generator about it.  Easy fix!  Right?

Well, no.  Because a variety of tests had come to _rely_ on using inconsistent or incorrect coordinate systems -- they passed the wrong coordinates, which got transformed incorrectly, and the result happened to work.  To fix this I had to make the coordinate system usage consistent.  That couldn't be done without passing the correct RootWindow objects around, to base the coordinates off of.  While doing that I found I had to fix some broken tests too.  And of course I did some cleanup along the way, because you should always clean up the things you start touching.

So re-enabling one test required touching dozens of other files and hundreds of lines of changes across more than a half dozen CLs and a couple weeks of work.  And the test still doesn't work on Mac.  :/  OTOH, I made EventGenerator work more consistently, with better documentation, and fixed a variety of bugs in other tests too.  Hopefully worth it in the end.


Try It Yourself:
  • Look for disabled tests with a query like this -- just add "file:your/module" to the end to narrow the search.
  • Pick a directory you know something about and blindly re-enable all the disabled tests.  (If there are a lot and you intend to actually fix failures you find, don't tackle more than two dozen disabled tests at once; that should be enough to turn up some real bugs.)
  • Run locally; --gtest_filter=testA:testB:testC lets you run multiple specific tests.
  • When you find broken tests, either:
    • Fix them, if you can! or,
    • Leave them disabled, but make sure there's an open, assigned bug that's clear about what fails and why, and the disable message references the bug.  No disabled tests without bugs, or referencing vague or long-closed bugs please!
  • For tests that were supposed to have been flaky before, run locally 50 times (--gtest_repeat=50) if you can test on the relevant platform(s).
  • Upload your CL and run on the trybots.  For flaky tests you didn't test locally, do at least a couple of trybot runs on those platforms.  You'll need to force re-runs by choosing specific bots from the "CHOOSE TRYJOBS" button; just hitting "dry run" again won't always re-run everything.
  • Profit!
And if you read this far... thanks!  Hope it was worth your time.


Jakob Kummerow

Dec 14, 2018, 3:42:05 PM12/14/18
to Peter Kasting, Chromium-dev
Thanks to everyone involved for doing this, good job! Product Excellence++ :-)

Chromium Developers mailing list:
View archives, change email options, or unsubscribe:
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To view this discussion on the web visit

Justin Donnelly

Dec 17, 2018, 11:13:58 AM12/17/18
to, Peter Kasting, Chromium-dev
This is really cool, thanks for the report. The stories are interesting and it's nice to have instructions on how to try this ourselves.

To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit

Chris Harrelson

Dec 17, 2018, 12:45:48 PM12/17/18
to,, Peter Kasting, Chromium-dev
This is great work!

Making sure tests work reliably is a critical part of not just product excellence, but productivity of feature development. For example, the only way to safely or quickly refactor or add features to code is to depend upon comprehensive tests to verify behavior does not break. Not to mention that every time a bug is caught before commit saves hours fixing bugs in production.


Reply all
Reply to author
0 new messages