PSA: Removing browser_tests from linux_chromium_asan_rel_ng

102 views
Skip to first unread message

Gary Tong

unread,
Sep 26, 2022, 2:15:34 PM9/26/22
to Chromium-dev
Hello fellow devs,

We're planning on removing the browser_tests suite from linux_chromium_asan_rel_ng on CQ. We've recently been hitting our capacity limits on our Linux testing fleet in Google cloud, and are looking at ways to reduce that load. We expect this change to have minimal effects on revert rate.

As you can see in the graph below, browser_tests takes up roughly 45% of our Linux testing capacity:
image.png

Of that, approximately 1/3 comes from linux_chromium_asan_rel_ng.

Simulations on the last 3 months of CQ activity show that we can remove browser_tests from linux_chromium_asan_rel_ng and still catch 99.8% of CL failures. Because browser_tests are run on multiple CQ builders (linux-rel, linux-wayland-rel, linux-chromeos-rel, linux-lacros-rel and win10_chromium_x64_rel_ng) there's enough overlapping coverage that most breakages are caught by other builders. If browser_tests would have been disabled on linux_chromium_asan_rel_ng for the past 3 months, CQ would have let through 15 additional CLs, some of which were likely due to test flakes.

We will continue to run browser_tests on the "Linux ASan LSan Tests (1)" CI builder.

If you have any questions or concerns, please let us know!

Gary Tong

unread,
Sep 26, 2022, 4:55:29 PM9/26/22
to Chromium-dev
Apologies for the broken image. Hopefully this one works:

image.png

David Benjamin

unread,
Sep 26, 2022, 5:14:43 PM9/26/22
to gat...@chromium.org, Chromium-dev
Does this mean we won't have any pre-commit ASan browser_tests coverage? If so, the 99.8% number is very surprising. While we would still have browser_tests coverage on the other builders you listed, none of them are instrumented for memory errors. Memory leaks won't be noticed at all, and use-after-frees (UAFs) or out-of-bounds accesses may sneak through.

Did the simulation account for fixes made by developers on CQ dry runs, or just the final CQ run? I would naively expect bugs, particularly non-flaky ones, to be caught on CQ dry runs, rather than the final CQ run. CQ dry runs introduce other scenarios: a bot like linux-rel may show some failures, but then ASan shows even more. Or, if a UAF bug still crashes, linux-rel may give some arbitrary crash far away from the bug, while ASan will pinpoint the actual error.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAEuHQJC%3DE_TRYLpcHZOJuig%2B7BfGeX9FxUcWJ3mOnR4fgvuSPg%40mail.gmail.com.

Caleb Raitto

unread,
Sep 27, 2022, 12:43:29 PM9/27/22
to Chromium-dev, David Benjamin, Chromium-dev, Gary Tong
+1 to David's comments -- I understand the need to reduce resource use. I think even if we decide to save resources by not running ASAN browser tests as part of CQ+1/2 for every CL (and rely on CI to catch these bugs), we should still have a way to trigger an ASAN browser test run *manually* on unsubmitted CLs -- these manual runs should be far more infrequent than the CQ+1/2 runs that happen at least once per every Chromium CL. 

For instance, the work I did fixing LSAN on Linux for CI/CQ largely relied on the ability to use trybots to find leaking tests (LSAN was broken in CI for a while, so we accumulated a large number of leaking tests in the codebase that needed to be disabled on ASAN/LSAN to land the fix). Many of the leaking tests were browser tests. Trying to track these down via local runs on my workstation probably would have been possible, but it would probably have been much more difficult and slow, given the large number of tests involved.

(In general, it's important to be able to manually trigger a run on an unsubmitted CL to reproduce a CI failure -- for instance, to verify a fix, or run with more logging -- my understanding is that's not currently the case? I've sometimes received bugs for CI failures that I couldn't figure out how to reproduce on trybots, or my workstation.)

-Caleb

To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev+unsubscribe@chromium.org.

dan...@chromium.org

unread,
Sep 27, 2022, 12:49:52 PM9/27/22
to Caleb Raitto, Chromium-dev, David Benjamin, Gary Tong
At the moment we are moving toward:

a) Requiring writing code behind feature flags to lessen the impact of shipping bugs, because we're shipping too many bugs.
b) Proposing reducing critical coverage to catch bugs (which can't be unit-tested for) before they roll out.

I respect the desire to reduce the impact of bugs, but at the same time I think we need investment in ways to ship fewer bugs at the same time.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/412bb670-2f36-42f0-af10-5291c743a989n%40chromium.org.

Erik Staab

unread,
Sep 27, 2022, 1:31:07 PM9/27/22
to Dana Jansens, Caleb Raitto, Chromium-dev, David Benjamin, Gary Tong
> Did the simulation account for fixes made by developers on CQ dry runs, or just the final CQ run?

Yes, the metrics we analyzed include dry runs.

> Proposing reducing critical coverage to catch bugs (which can't be unit-tested for) before they roll out.

To be perfectly clear, this is just CQ coverage and the tests will continue to run in CI and be sheriffed there. We aren't talking about shipping more bugs to users, we're talking about a small impact on falsely accepted CLs and reverts (1 / 500 CLs) and load on sheriffs.

> For instance, the work I did fixing LSAN on Linux for CI/CQ largely relied on the ability to use talbots

You will still be able to include ASAN browser_tests trybot coverage when you need it with "Include-Ci-Only-Tests: true" in your CL description footers.

The fact is this test suite in this configuration is affecting our ability to provide a fast and reliable service and isn't contributing much value at CQ time to justify it running for every CL. We constantly have to balance CQ time, false rejection/acceptance rates, and sheriffing load and this particular decision isn't even close, especially when you consider the magnitude of the resource usage in this case.

dan...@chromium.org

unread,
Sep 27, 2022, 1:42:18 PM9/27/22
to Erik Staab, Caleb Raitto, Chromium-dev, David Benjamin, Gary Tong
On Tue, Sep 27, 2022 at 1:29 PM Erik Staab <est...@chromium.org> wrote:
> Did the simulation account for fixes made by developers on CQ dry runs, or just the final CQ run?

Yes, the metrics we analyzed include dry runs.

> Proposing reducing critical coverage to catch bugs (which can't be unit-tested for) before they roll out.

To be perfectly clear, this is just CQ coverage and the tests will continue to run in CI and be sheriffed there. We aren't talking about shipping more bugs to users, we're talking about a small impact on falsely accepted CLs and reverts (1 / 500 CLs) and load on sheriffs.

Thanks for clarifying this for me.

Lei Zhang

unread,
Sep 27, 2022, 1:44:05 PM9/27/22
to est...@chromium.org, Dana Jansens, Caleb Raitto, Chromium-dev, David Benjamin, Gary Tong
On Tue, Sep 27, 2022 at 10:30 AM Erik Staab <est...@chromium.org> wrote:
> You will still be able to include ASAN browser_tests trybot coverage when you need it with "Include-Ci-Only-Tests: true" in your CL description footers.

Thanks for the answer. I see this option is listed, along with several
other CQ options, in docs/infra/cq.md. It may be helpful in the future
to periodically announce new options to chromium-dev as they become
available. This is not well known, as Include-Ci-Only-Tests has only
been used ~5 times in 2022 according to git log.

Erik Staab

unread,
Sep 27, 2022, 2:04:05 PM9/27/22
to Lei Zhang, Dana Jansens, Caleb Raitto, Chromium-dev, David Benjamin, Gary Tong
> Thanks for clarifying this for me.

No problem, I think we need to do a better job communicating how we set up our systems, especially as we've had to make them more complex to scale to the project size.
When we introduced it we struggled with how to make it clear when it was happening and what to do about it. We added a build step "ci_only tests" that has the text "The following tests are not being run on this try builder because they are marked 'ci_only', adding "Include-Ci-Only-Tests: true" to the gerrit footers will cause them to run". An example of that is here. That said, it isn't very discoverage since the test names aren't in the step name so you would need to know what you're looking for. We can probably improve that so I filed a bug.

Thanks for all of the questions and for bringing up concerns!

Erik
Reply all
Reply to author
Forward
0 new messages