I thought css-flexbox pass rate would increase...

David Grogan

unread,

Jul 1, 2021, 7:04:27 PM7/1/21

to compa...@googlegroups.com

A few weeks ago I landed two patches in Chrome 93.0.4544.0 that translated a bunch of reference tests that included many subtests per file into checkLayout tests. Blink failed at least one case in each file so each reference test previously contributed 0 to the pass rate. But Blink passes many of the subtests in the new checkLayout versions so I expected to see an increase in Blink's pass rate. But Blink's pass rate has remained at 0.917 since Chrome 93.0.4542.2. Am I misunderstanding something about how the pass rate on the dashboard is (re-)calculated?

For example, the first patch translated 12 reference tests to checkLayout tests. Each checkLayout test has 24 subtests. Blink passes ~15 subtests in each. So I expected the pass rate to increase[1] by (12 * 15/24) / 1083 ≈ 0.007, which would give a pass rate of 0.917 + 0.007 ≈ 0.924. But there has been no change.

Am I misunderstanding something or should there have been a jump in the graph?

[1] The 1083 is because there are 1083 tests in https://github.com/Ecosystem-Infra/wpt-results-analysis/blob/main/compat-2021/css-flexbox-tests.txt

smcgruer

unread,

Jul 2, 2021, 9:58:59 AM7/2/21

to Browser Compatibility 2021

Thanks David; I'm looking into this now.

I believe the two runs that should have updated the data were:

Each of them shows an increase in experimental pass rate for flexbox in the css-flexbox-experimental-full-results.csv, but the impact on css-flexbox-experimental.csv is non-existent in one case and minimal in the other.

I need to do some debugging to figure out if this is WAI for some reason or a bug. :)

smcgruer

unread,

Jul 14, 2021, 10:39:17 AM7/14/21

to Browser Compatibility 2021

Hi David, thanks for your patience.

In the end, this did turn out to be a bug; we were not giving tests a fractional score for subtests (it was either all or nothing). This has been fixed, and the numbers on wpt.fyi re-generated. There was no top-level score change from this (i.e. it was generally minor) and no browser benefited significantly better from any other (i.e. they all had some partially-passing tests that are now being counted), so overall we believe it just gives a more accurate read on the status.

There is now a jump between 4549 and 4557 for flexbox, from 0.939 to 0.957, that I believe should be your changes. (Note that it won't align with Chrome releases because the changes you made were to tests not the Chrome binary, so its instead about when the tests landed in WPT - a bit of guessing involved :D).

Thanks,
Stephen

Reply all

Reply to author

Forward