-expected.txt files vs TestExpectations

16 views
Skip to first unread message

Nicolás Peña

unread,
May 15, 2020, 11:50:39 AM5/15/20
to ecosyst...@chromium.org
Hi folks,

I was looking into a fix for a flaky test (external/wpt/event-timing/dblclick.html) and noticed that the autoroller landed 5 -expected files for the test: see here (note: I'm landing a fix soon that removes these files, but they're basically third_party/blink/web_tests/platform/win7/external/wpt/event-timing/dblclick-expected.txt plus other platform-specific expectation files). What's the advantage of doing this over populating a single line in TestExpectations? lpz@ had to manually do this for https://bugs.chromium.org/p/chromium/issues/detail?id=1082739 but shouldn't this be the default bot action? Also marking flaky tests by platform instead of more aggressively marking them flaky altogether seems like the wrong tradeoff to me. A test owner can later specify the platforms later on, if the test is actually only flaky on certain platforms.

Stephen Mcgruer

unread,
May 20, 2020, 8:20:29 AM5/20/20
to Nicolás Peña, ecosystem-infra
Hi Nicolás,

Sorry for the delay in replying.

Overall, I think your suggestion makes sense, but there are complications.

The purpose of having platform-specific -expected.txt files in general is a Blink web_tests 'thing', not a WPT thing. It exists because there are tests which legitimately produce different output on different platforms. Even different Mac versions can output different things!

So in such a world, how do we detect when differing outputs across platforms is from flake versus legitimate difference? Note that for the autoroller to land a CL, the test must have produced the same output on those platforms twice - once for us to generate expectations and once to pass the CQ! So at the worst cast the test is < 1/2 flaky - which is not a high bar but we have to balance CI resources and latency against detecting such flakes. And just adding to TestExpectations might give the wrong impression - if my test deterministically passes on Mac10.11 but fails on Mac10.12, now the TestExpectations entry would claim that I am Pass/Failure flaky.

One could see various solutions, including using the upstream WPT stability checks as input data, but they will all be heuristic-y. Possibly others on the team may have more thoughts on detecting legitimate differences vs flakes.

Thanks,
Stephen

On Fri, 15 May 2020 at 11:50, Nicolás Peña <n...@chromium.org> wrote:
Hi folks,

I was looking into a fix for a flaky test (external/wpt/event-timing/dblclick.html) and noticed that the autoroller landed 5 -expected files for the test: see here (note: I'm landing a fix soon that removes these files, but they're basically third_party/blink/web_tests/platform/win7/external/wpt/event-timing/dblclick-expected.txt plus other platform-specific expectation files). What's the advantage of doing this over populating a single line in TestExpectations? lpz@ had to manually do this for https://bugs.chromium.org/p/chromium/issues/detail?id=1082739 but shouldn't this be the default bot action? Also marking flaky tests by platform instead of more aggressively marking them flaky altogether seems like the wrong tradeoff to me. A test owner can later specify the platforms later on, if the test is actually only flaky on certain platforms.

--
You received this message because you are subscribed to the Google Groups "ecosystem-infra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ecosystem-inf...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/ecosystem-infra/CAAATDinMWpiqWRtD58U%2BtyKNfcSyr9KCfxruw0ZFhAEMjrRk5Q%40mail.gmail.com.

PhistucK

unread,
May 20, 2020, 8:47:43 AM5/20/20
to Stephen Mcgruer, Nicolás Peña, ecosystem-infra
>  It exists because there are tests which legitimately produce different output on different platforms.
Note that -expected is also used by some to indicate bugs that should be fixed in Chrome (and not only for legitimate differences).
I read that a goal is to remove -expected (and not due to operating system differences that will remain forever, but due to bugs that should be fixed).

This part might need some clarification and strict guidelines. Perhaps a new suffix is needed for legitimate differences versus partially passing/completely failing tests due to bugs (-expected versus -buggy where -buggy would ideally be eliminated and -expected would be here to stay).
Partially passing tests would require something (TextExpectations are all-or-nothing, so partial regressions are hidden if you use it).

PhistucK


Stephen Mcgruer

unread,
May 20, 2020, 8:49:59 AM5/20/20
to PhistucK, Nicolás Peña, ecosystem-infra
Ah, when I said 'legitimate differences' I didn't mean not-a-bug, I meant not-a-flake. I was including "there's a bug in mac 10.11, so this test fails there" in 'legitimate differences'. I can definitely see how my choice of wording would be misleading, sorry!

Fergal Daly

unread,
Jan 16, 2023, 9:16:57 PM1/16/23
to ecosystem-infra, smcg...@chromium.org, Nicolás Peña, ecosystem-infra, PhistucK, Evan Stade, Mingyu Lei
The topic of which to use has come up in a CL and it's still unclear to me. When developing a feature and doing somewhat test-driven development (which can be useful when Chrome is catching up on a feature that others already have) should we add not-yet-passing tests to TestExpectations or add -expected.txt files for them?

In this CL, we're adding tests for an optional optimisation (BFCache) so the test will exit with NOT_RUN (PRECONDITION_FAILED) rather than with a fail. I don't see a way for TestExpectations to capture that distinction. Is that a missing feature from TestExpecations or a good reason to use -expected.txt?

F

Mingyu Lei

unread,
Jan 16, 2023, 9:35:18 PM1/16/23
to ecosystem-infra, Fergal Daly, smcg...@chromium.org, Nicolás Peña, ecosystem-infra, PhistucK, Evan Stade, Mingyu Lei
The CL link in Fergal's comment seems wrong, here is the right one.

Nicolás Peña

unread,
Jan 17, 2023, 9:53:12 AM1/17/23
to Mingyu Lei, ecosystem-infra, Fergal Daly, smcg...@chromium.org, Nicolás Peña, PhistucK, Evan Stade
I think that is a good scenario to use "-expected.txt". The TestExpectation would only let you say it is a "Failure" which is less information than what you can get with the file. Since the failure message is always going to be exactly the same, there is no issue from using "-expected.txt". And reading the comments, it also sounds like it does not matter too much in that particular case :)

Evan Stade

unread,
Jan 17, 2023, 12:17:25 PM1/17/23
to Nicolás Peña, Mingyu Lei, ecosystem-infra, Fergal Daly, smcg...@chromium.org, PhistucK
TIL. Thanks.

-- Evan Stade

Reply all
Reply to author
Forward
0 new messages