Re: [chromium-dev] Preliminary results in Blink Web Tests

21 views
Skip to first unread message

Ben Pastene

unread,
Dec 13, 2021, 1:39:17 PM12/13/21
to g...@google.com, infra-dev, Chromium-dev, Dana Jansens
Not sure what the "Preliminary results" UI is that you're referring to. Is that the new "Checks" UI in gerrit? Or the "Test Results" tab in a build page?

A link or screenshot might help.

On Mon, Dec 13, 2021 at 9:06 AM 'Gabriel Charette' via Chromium-dev <chromi...@chromium.org> wrote:
It seems that Blink Web Tests always fail in the new "Preliminary results" CQ UI and then are later ignored by CQ logic. Makes the results noisy and inactionable.

Can we:
 1) Make Blink Web Tests less flaky..?
 2) Hide known flakes in preliminary results?

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAJTZ7LLhSg_rj3oW%3D1JzERoC%2BEz%2BkM8oy%3DUPs08iu112EsqwmQ%40mail.gmail.com.

Gabriel Charette

unread,
Dec 13, 2021, 5:16:07 PM12/13/21
to Ben Pastene, infra-dev, Chromium-dev, Dana Jansens
It's transient but it's there during any CQ dry run in my experience.

image.png
image.png

Erik Staab

unread,
Dec 14, 2021, 4:01:56 PM12/14/21
to Gabriel Charette, Gavin Mak, Matthew Warton, Ben Pastene, infra-dev, Chromium-dev, Dana Jansens
Yeah, these preliminary results are presented between the first run of tests and subsequent retries and when lots of tests are consistently flaky on the first attempt it ends up being pretty spammy.

Hiding known flakes could be a good workaround since I don't think making major progress in web test flakiness will be as easy.

cc +Gavin Mak for gerrit checks UI
cc +Matthew Warton for test results 

You received this message because you are subscribed to the Google Groups "infra-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/infra-dev/CAJTZ7LLKVJpZhbOYHegHtsd4CBn3JmHjFUwoFOBUD0A154Oz-Q%40mail.gmail.com.

Gabriel Charette

unread,
Dec 14, 2021, 8:50:28 PM12/14/21
to Gavin Mak, Erik Staab, Gabriel Charette, Matthew Warton, Ben Pastene, infra-dev, Chromium-dev, Dana Jansens

On Tue, Dec 14, 2021 at 5:04 PM Gavin Mak <gavi...@google.com> wrote:
It seems straightforward enough to mark these known-flaky variants in ResultDB with a tag that Gerrit could use to hide preliminary results from.

Stepping back though, how useful are preliminary results in general? If they are more noisy than helpful we could hide them by default and show them only if "Additional Results" are shown. This is the approach we currently use for exonerated/flaky test results.

I really like the idea of Preliminary Results, I don't like that I currently need to sift through the ones I know to always be flaky to figure out if there's an actually interesting preliminary result.

Gavin Mak

unread,
Dec 16, 2021, 12:04:35 PM12/16/21
to Gabriel Charette, Erik Staab, Matthew Warton, Ben Pastene, infra-dev, Chromium-dev, Dana Jansens
In that case, I think it'd be best to tag flaky tests accordingly in ResultDB. Once they're tagged it'll be simple to omit them from showing up on Gerrit. Eventually, I could see the tag being used as a predicate s.t. the flaky tests aren't fetched in the first place.

Matthew, would you be the right person for actually tagging results? 

Xianzhu Wang

unread,
Dec 16, 2021, 12:04:41 PM12/16/21
to gavi...@google.com, Gabriel Charette, Erik Staab, Matthew Warton, Ben Pastene, infra-dev, Chromium-dev, Dana Jansens, Brian Sheedy
Another way is to add entries in TestExpectations for the flaky tests found by FindIt. I think it has the following benefits:
1. The failures won't be shown in preliminary results.
2. We don't need two ways (TestExpectations and Known-Flake-by-FindIt) to suppress failures of flaky tests. I think TestExpectations is well known by blink developers, while Known-Flake-by-FindIt is less known. (Also will FindIt be deprecated?)
3. The stale test expectation removal tool (+bsheedy@) will work for all previously-known flaky tests that are no longer flaky.


Dirk Pranke

unread,
Dec 16, 2021, 12:15:54 PM12/16/21
to wangx...@chromium.org, gavi...@google.com, Gabriel Charette, Erik Staab, Matthew Warton, Ben Pastene, infra-dev, Chromium-dev, Dana Jansens, Brian Sheedy
+1 to this suggestion. Please don't duplicate what TestExpectations does.

-- Dirk

Gavin Mak

unread,
Dec 16, 2021, 5:24:04 PM12/16/21
to Dirk Pranke, wangx...@chromium.org, Gabriel Charette, Erik Staab, Matthew Warton, Ben Pastene, infra-dev, Chromium-dev, Dana Jansens, Brian Sheedy
I'm not familiar with TestExpectations. What, if anything, would need to be done on the Gerrit side?

Xianzhu Wang

unread,
Dec 16, 2021, 7:00:41 PM12/16/21
to Gavin Mak, Dirk Pranke, Gabriel Charette, Erik Staab, Matthew Warton, Ben Pastene, infra-dev, Chromium-dev, Dana Jansens, Brian Sheedy
TestExpectations is Blink's way to suppress web test failures, permanently or temporarily.

If FindIt can add found flaky tests into TestExpectations like:
# Added by FindIt
crbug.com/bug-number foo/bar.html [ Pass Failure ]
then Gerrit probably doesn't need to do anything. FindIt needs to do more work though.

The drawback is that the above change will need to be committed (probably through the commit queue) to take effect (instead of currently FindIt suppressions taking effect immediately), but this is just like how sheriffs suppress failing web tests which seems to work well.

Gabriel Charette

unread,
Dec 17, 2021, 10:43:38 AM12/17/21
to wangx...@chromium.org, Gavin Mak, Dirk Pranke, Gabriel Charette, Erik Staab, Matthew Warton, Ben Pastene, infra-dev, Chromium-dev, Dana Jansens, Brian Sheedy
Do we need to commit expectations? It'd seem better to me if Infra was able to file bugs for flaky tests, ignore them on CQ, but keep running them (auto-closing bugs of the ones that stop flaking).

I have a more general concern with explicitly disabling flaky tests however. I've been struggling with the inability to conscientiously avoid introducing new flakes to the codebase. When making core changes that affect all tests, sometimes even in an attempt to address existing flakes, it's really hard to know that you're not introducing new flakes. Impossible as a dry run (CQ explicitly hides flakes from you.. no way to request otherwise), so the "best" way is to wait for pinpoint/sheriffs, 2-3 days of silence is usually a good sign...

Whenever such a CL gets reverted, I'm always left to wonder how many other tests were disabled because of the systemic flake I inadvertently introduced... Some sheriffs do tag me/bug but there's no guarantee. Other disabled tests are gone forever...

If instead Infra had a database of known flakes, it could:
 1) Ignore known flakes in CQ
 2) Close bugs for flakes that vanish (like ClusterFuzz)
 3) Have a "flakiness Dry Run" on CQ where it could spot new flakes before landing :)!

Also, re. TestExpectations. This issue is not specific to only Blink Web Tests, they are #1 but there are other instances.

- Gab

K. Moon

unread,
Dec 17, 2021, 11:05:03 AM12/17/21
to Gabriel Charette, wangx...@chromium.org, Gavin Mak, Dirk Pranke, Erik Staab, Matthew Warton, Ben Pastene, infra-dev, Chromium-dev, Dana Jansens, Brian Sheedy
+1; I have the uncomfortable feeling that disabling tests due to flakiness is just hiding problems introduced elsewhere in the code at least part of the time. Really wish there was better automation around this. And as a sheriff who doesn't routinely work with Web tests, committing changes to TestExpectations works, but is sorta a pain the first time you do it. And then the next time, when I've forgotten how to do it. :-)

Reply all
Reply to author
Forward
0 new messages