A quick survey on setting up fuzzy match rules to identify resolvable flaky web tests

Vivian Zhi (支文文）

unread,

Jul 29, 2022, 8:20:20 PM7/29/22

to blin...@chromium.org, Shirley Ji, Weizhong Xia, Chrome-Blink-EngProd

Hi blink-dev

I would like to let you know that blink-engprod has added feature support for non-WPT fuzzy tests. It now allows both non-WPT reftests and pixel tests to use the same fuzzy matching meta-tags as WPT tests.It also shows max color channel difference and total number of different pixels image diff stats in results.html. With these capabilities in place, we like to research further to see if we can set up some general fuzzy match rules, help blink dev identify flaky tests that can be potentially resolved by adjusting fuzzy matching rules. Currently there are quite some web tests that are flaky due to a slight image mismatch, which should have been tolerated. If we setup a general fuzzy matching rule , something like:

Instruct the image comparison web tests that if color channel and pixel diff fall within the range of the rule, we can ignore the diff and pass the test.This way we can reduce test flakiness while still maintain test accuracy without missing a real bug.

We want to ask you some quick survey questions to help us make design decisions, whether it makes sense to set up an universal cross-the-board fuzzy match tolerant rule for all blink web tests, or we should make the rules more specific to individual test or test sets.

1. Is an universal fuzzy match tolerant rule acceptable for the web tests in your area?

a). If the answer is yes, what is the acceptable range of max color channel and pixel diff for your tests?

b) If the answer is no, pls share your reasons.

2. Do you prefer fuzzy matching rule adjustment at a per-test or per test set level based on the pixel difference numbers shown in results.html?

Here is some sample data help you make choice, we collected data recently from blink_web_tests result on linux-test builder, the distribution of color channel maxDifference and totalPixel diff for failing/flaky blink_web_tests

( Note: over 70% tests in color channel maxDifference 0-10 range have maxDifference=1):

Color Channel maxDifferenece Range	Fail test count
0-10	98
11-100	31
101-200	28
201-260	111

totalPixels Diff Range	Fail test count
0-100	30
100-1000	57
1000-10,000	99
10,000-100,000	66
100,000-1,000,000	16

Let me know if you have any questions, looking forward to hearing from you!

Vivian

on behalf of Chrome-Blink-EngProd

Stephen Chenney

unread,

Aug 1, 2022, 7:25:46 AM8/1/22

to Vivian Zhi (支文文）, blin...@chromium.org, Shirley Ji, Weizhong Xia, Chrome-Blink-EngProd

Thanks for investigating the potential for fuzzy matching.

Rendering Core continues to oppose a single fuzzy match rule across all web_tests. We have some tests where single pixel differences matter (related to pixel snapping, for example) and a universal fuzzy match would fail to identify problems with those. This came up in practice recently when the GPU team enabled fuzzy matching without telling us, and expected failing tests started passing when they shouldn't.

Maybe specific sub teams have directories they could apply default fuzzy matching to. My guess is that the same directories where it will work will be directories with few failing tests, limiting the impact of a per-directory approach.

Is there a way to reproduce the sampling below with a side-by-side comparison of the images? I would find it helpful to look through some of the cases that would pass with <meta name="fuzzy" content="0-1;0-1000">, for example.

Stephen.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAPCqkTs-L5u22-Xp5U_LeBdLP%3D%2BTDH1KGv8MTmtKQFRcANCZJg%40mail.gmail.com.

Xianzhu Wang

unread,

Aug 1, 2022, 11:40:09 AM8/1/22

to Stephen Chenney, Vivian Zhi (支文文）, blink-dev, Shirley Ji, Weizhong Xia, Chrome-Blink-EngProd

On Mon, Aug 1, 2022 at 4:25 AM Stephen Chenney <sche...@chromium.org> wrote:

Thanks for investigating the potential for fuzzy matching.

Rendering Core continues to oppose a single fuzzy match rule across all web_tests. We have some tests where single pixel differences matter (related to pixel snapping, for example) and a universal fuzzy match would fail to identify problems with those. This came up in practice recently when the GPU team enabled fuzzy matching without telling us, and expected failing tests started passing when they shouldn't.

I think a key difference between the original fuzzy matching rule and the rule proposed by Vivian is the ranges. With maxDifference=0-1, we should be able to catch most visible single pixel differences. What I'm not sure is whether a difference like rgb(1, 0, 0) vs rgb(0, 0, 0) (each component in the range of 0-255) should be treated as a failure in some cases.

Maybe specific sub teams have directories they could apply default fuzzy matching to. My guess is that the same directories where it will work will be directories with few failing tests, limiting the impact of a per-directory approach.

Is there a way to reproduce the sampling below with a side-by-side comparison of the images? I would find it helpful to look through some of the cases that would pass with <meta name="fuzzy" content="0-1;0-1000">, for example.

A filter by actual maxDifference and totalPixels in results.html will be helpful. I can add it when I get time.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAGsbWzRDrX%3Dgz9NNcwpBEOXCxR37p2XwZC3Agm6fdE6%2BFcPhvg%40mail.gmail.com.

Vivian Zhi (支文文）

unread,

Aug 1, 2022, 6:16:37 PM8/1/22

to Xianzhu Wang, Stephen Chenney, blink-dev, Shirley Ji, Weizhong Xia, Chrome-Blink-EngProd

Thanks for valuable feedback! Stephen, Xianzhu, will see if we can add a filter in result.html to grab those tests in range.

Xianzhu Wang

unread,

Aug 2, 2022, 12:04:00 PM8/2/22

to Vivian Zhi (支文文）, Stephen Chenney, blink-dev, Shirley Ji, Weizhong Xia, Thorben Troebst

On Mon, Aug 1, 2022 at 10:36 AM Vivian Zhi (支文文） <viv...@google.com> wrote:

Thanks for valuable feedback! Stephen, Xianzhu, will see if we can add a filter in result.html to grab those tests in range.

The CL adding pixel diff filter in results.html has landed. Thanks Thorben!

In this example results.html, you can examine the pixel results of tests that produced pixel differences matching a particular fuzzy rule in the following steps:

1. Enter pixel difference filter e.g. "channel_max:1-1" in the filter input box;

2. Click "All" button (as we show regressions only by default).

You might want to switch to "side-by-side view" and click the image to examine the pixel values.

With "channel_max:1-1" we can see all tests that produced pixel differences that can be tolerated with a fuzzy rule like <meta name=fuzzy content="0-1;0-1000000">. There are 70 such tests in the example results.html. All of them look benign to me. So perhaps a universal rule (for non wpt tests) is proper?

On the other hand, even if we have such a universal rule, we can only recover 70 tests. Instead of applying the rule automatically, we can also manually modify these tests to include a meta fuzzy rule.

Philip Rogers

unread,

Aug 2, 2022, 12:10:54 PM8/2/22

to blink-dev, Xianzhu Wang, Stephen Chenney, blink-dev, Shirley Ji, weiz...@google.com, Thorben Troebst, Vivian Zhi (支文文）

How much of a problem is flakiness caused by minor pixel differences compared to overall flakiness? I looked at the top 10 flaky tests here and none of them were minor pixel differences.

70 tests is a manageable number and it seems reasonable to add fuzzy matching to them.

Stephen.

To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscribe@chromium.org.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAPCqkTs-L5u22-Xp5U_LeBdLP%3D%2BTDH1KGv8MTmtKQFRcANCZJg%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscribe@chromium.org.

Reply all

Reply to author

Forward