The Chrome Dev Infrastructure Team is removing a number of high-recall tests from CQ, and making them CI-only. This is in response to the team recently hitting capacity limits, which caused CQ times to run long because testing queues were pending while waiting for available machines.
To mitigate some of this capacity crunch, we will select the top test suites by resource usage that analysis shows have a 100% recall rate and demote them to CI only. With this, we expect to save about 3.33% of CQ test run time across all builders and all tests in addition to capacity savings and speed gains from reduced pending times.
The Regression Test Selection (RTS) analysis tool analyzes CQ test runs and calculates the Recall rate for every test suite, the Recall rate being the percentage of failed CLs in the given time period that would still have failed if the test suite was not run on the builder (successful CLs do not factor into this number).
To be conservative and ensure no regression in test coverage, this change will only remove top test suites by resource usage that have a 100.00% recall rate.
The expectation is that this change will continue to maintain coverage, as the listed tests will still continue to run on other builders, and the RTS analysis shows that any CQ runs failing those tests would still have failed without that specific builder/test suite combination anyway. To be conservative about not losing test coverage, only builder/suite combinations with a 100% Recall rate are being demoted.
Removing other high-recall test suites could save additional resources and reduce CQ time without reducing test coverage. However, the scope of this current round of demotions is intended to address the capacity issues for Linux and Windows builders.
To be conservative and avoid disruption, the demotions will first address the Linux and Windows builders, and monitor for test coverage regressions or increased CQ failure rates. If subsequent RTS analysis on future CQ runs shows substantial additional savings in other builders, another round of CQ updates may be possible.
--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAD4jpnMhqWHVBAEgw87WvqKmDpfNmUBoXX4mpe1EhqHqHZ6byw%40mail.gmail.com.
Include-Ci-Only-Tests: true will enable all CI-only tests mirrored by any try builders.Include-Ci-Only-Tests: builder1,builder2 will enable CI-only tests for the specified builders.Include-Ci-Only-Tests: test1,test2 will enable CI-only tests for the specified tests.A few other questions:Just to clarify, what's proposed to be disabled here is just the blink_web_tests step and *not* any of:
- vulkan_swiftshader_blink_web_tests
- high_dpi_blink_web_tests
- not_site_per_process_blink_web_tests
- anything with *wpt* rather than *web* (except for not_site_per_process_blink_wpt_tests as explicitly listed)
Second, is the Recall Rate calculated in order to choose to disable multiple test suites at the same time? For example, if the recall rate for suite A is 100% and the recall rate for suite B is 100%, it seems like that could happen because some failures were caught by only A and B and no other test suites. Was there any check run relative to the particular set of suites being disabled that together they still have a 100% recall rate?
Also, along those lines, is the Recall Rate calculation based only on CQ+2 runs, or also on earlier CQ+1 runs and runs of individual try jobs?
Third, another interesting set of issues here is that for web tests and wpt tests, we have explicitly chosen to run a bunch of the tests only on Linux and not on other platforms. In particular, many VirtualTestSuites are configured to run only on Linux (as documented in the VirtualTestSuites configuration), and the flag-specific configurations are all run only on Linux (although with other *_blink_web_tests names that I *think* are not being disabled in this change). So it seems like this is dropping our only CQ testing for a bunch of our VirtualTestSuites configurations (in so far as they're testing web tests and not wpt tests, which I think are not being disabled here). That might then influence our future choices about what platforms to run virtual test suites on. If we have a particular shortage of Windows and Linux test machines right now, should we switch our default for one-platform virtual test suites from Linux to Mac?
Jeffrey,Can you go into a little more detail about how you came up with this proposal? For example, are you only looking at situations where someone ran a full CQ attempt (all of the configurations at once) rather than just individual builders? And by 100% recall you're saying that e.g., blink_web_tests *never* failed on linux without also failing on another platform (e.g., windows) in the same attempt?
I would not be too surprised by that, if so, there's not a lot of platform-specific code in the blink tests (compared to many of the other suites, at least), and linux is probably the most well-tested config either by hand or by people testing just one config or platform before doing a full CQ attempt.For the test suites that are being removed, can you say which other configurations they will still be run on (or at least some of the other configurations)? E.g., if blink_web_tests will still be run on linux ASAN, then not running on linux (non-ASAN) is probably less of an issue.
As long as we have sufficient coverage of tests on other platforms and I have the above right, then I wouldn't feel too bad about removing that configuration. As you say, it sounds like it is giving us no distinctive information at all.If, to David's point, there are a bunch of test suites that are only run in that configuration and not in any others (though I would expect most of the virtual test suites that were linux only would be caught by, e.g., the ASAN config unless they were explicitly skipped in ASAN for some reason (e.g., too slow), then I would wonder what the fact that they never fail is telling us.