RFC: merge layout test specific test results types

Erik Staab

unread,

Sep 26, 2016, 4:59:51 PM9/26/16

to blink-dev, Ojan Vafai, Dirk Pranke

Currently test-results.appspot.com knows about different layout test specific failure types, i.e. TEXT, IMAGE, IMAGE+TEXT, and AUDIO. I would like to merge all of these statuses to a single FAIL status. Other statuses would remain unchanged.

This will let us remove layout test special-casing in test-results and the test results JSON format, will make test results easier to comprehend for newcomers, and will make layout tests more maintainable as they'll look more like other test suites.

Please let me know if there are objections.

Thanks,

Erik

Stephen Chenney

unread,

Sep 26, 2016, 5:08:10 PM9/26/16

to Erik Staab, blink-dev, Ojan Vafai, Dirk Pranke

Are you talking about just changing the flakiness board? That is, will text/image/crash etc still be present in the layout test results page that is constructed after a test run?

I personally like being able to see quickly that a failure is image or text vs timeout or crash because it tells me how likely it is that the failure can be resolved with a new baseline (which is fast) vs. a code change (which is not so much). But I can live with the proposed change if it just affects the flakiness board.

Stephen.

Elliott Sprehn

unread,

Sep 26, 2016, 5:08:25 PM9/26/16

to Erik Staab, Dirk Pranke, blink-dev, Ojan Vafai

Do you mean merge in the UI or in the TestExpectations file?

Erik Staab

unread,

Sep 26, 2016, 6:14:45 PM9/26/16

to Elliott Sprehn, Dirk Pranke, blink-dev, Ojan Vafai

On Mon, Sep 26, 2016 at 2:08 PM, Elliott Sprehn <esp...@google.com> wrote:

Do you mean merge in the UI or in the TestExpectations file?

I believe this change was already applied to TestExpectations. Lines have "Failure" here:

https://chromium.googlesource.com/chromium/src/+/master/third_party/WebKit/LayoutTests/TestExpectations

and get mapped here:

https://cs.chromium.org/chromium/src/third_party/WebKit/Tools/Scripts/webkitpy/layout_tests/models/test_expectations.py?rcl=0&l=309

This proposal is for the UI.

Erik Staab

unread,

Sep 26, 2016, 6:16:52 PM9/26/16

to Stephen Chenney, blink-dev, Ojan Vafai, Dirk Pranke

On Mon, Sep 26, 2016 at 2:08 PM, Stephen Chenney <sche...@chromium.org> wrote:

Are you talking about just changing the flakiness board? That is, will text/image/crash etc still be present in the layout test results page that is constructed after a test run?

I personally like being able to see quickly that a failure is image or text vs timeout or crash because it tells me how likely it is that the failure can be resolved with a new baseline (which is fast) vs. a code change (which is not so much). But I can live with the proposed change if it just affects the flakiness board.

I only plan to change the generated JSON file and I'm not sure if that's used as input to the layout test results page - Ojan or Dirk, do you know?

ser...@google.com

unread,

Sep 26, 2016, 8:31:20 PM9/26/16

to blink-dev, oj...@chromium.org, dpr...@chromium.org

When creating Dremel queries for Flakiness Pipeline, I've actually been hit by this, because it's non-trivial to write a query checking if the test has failed because both actual results and expected results are nested repeated proto fields. See https://yaqs.googleplex.com/eng/q/5010402896773120 for more details. It would be nice if I could just check if "PASS" is not present in the actual array.

Even though I wrote the query and now it does not matter as much to me anymore, I can image future projects processing this metadata may experience similar difficulties, therefore +1 for making it simpler.

Ojan Vafai

unread,

Sep 27, 2016, 5:39:40 AM9/27/16

to Erik Staab, Stephen Chenney, blink...@chromium.org, blink-dev, Dirk Pranke

+blink...@chromium.org

On Tue, Sep 27, 2016 at 12:16 AM Erik Staab <est...@chromium.org> wrote:

On Mon, Sep 26, 2016 at 2:08 PM, Stephen Chenney <sche...@chromium.org> wrote:
Are you talking about just changing the flakiness board? That is, will text/image/crash etc still be present in the layout test results page that is constructed after a test run?

I personally like being able to see quickly that a failure is image or text vs timeout or crash because it tells me how likely it is that the failure can be resolved with a new baseline (which is fast) vs. a code change (which is not so much). But I can live with the proposed change if it just affects the flakiness board.

I only plan to change the generated JSON file and I'm not sure if that's used as input to the layout test results page - Ojan or Dirk, do you know?

The layout test results page reads failing_results.json. Incidentally, it looks like we still upload failing_results.json to test-results, but I'm pretty sure nothing uses it from there anymore.

I propose we:

keep these specific value types for failing_results.json. That will require modifying this code.
normalize them for full_results.json as proposed in this email. The modification would go in roughly the same place as #1.
stop uploading failing_results.json to test-results.appspot and have it be a purely run-webkit-tests feature. code here

In general, I support this change. We need to make the dashboard and all the tooling built around it simpler.

Dirk Pranke

unread,

Sep 27, 2016, 2:13:35 PM9/27/16

to Erik Staab, Stephen Chenney, blink...@chromium.org, Ojan Vafai, blink-dev

I'm not sure if I understand the details of what Erik is proposing, and I think the details matter (to avoid confusion). So, here's my understanding:

1. Currently, the TestExpectations files support Pass/Failure/Timeout/Crash for expected results.

2. The actual results also include the more detailed IMAGE/IMAGE_PLUS_TEXT/TEXT_AUDIO keywords.

3. The UI can show the more detailed results in the per-builder table of history, and can also show the -expected, -actual, and -diff results for the different kinds of failures.

I believe Erik is proposing that we get rid of the more detailed keywords and the more detailed results in the per-builder table. I'm not sure if he's proposing we also get rid of the different kinds of results, but I'd guess not?

We would still keep the Pass/Failure/Timeout/Crash differences in the per-builder history, since those results are generic to both layout tests and other kinds of tests.

The underlying detailed results would still be available in the build logs and the layout_test_results.zip.

Erik, do I have that right? If so, that seems fine to me. Seeing in the history that a test changes from image to image+text is somewhat useful, but not incredibly so.

-- Dirk

Ojan Vafai

unread,

Sep 27, 2016, 2:35:31 PM9/27/16

to Dirk Pranke, Erik Staab, Stephen Chenney, blink...@chromium.org, blink-dev

That matches my understanding. Imo, understanding the transition from image to image+text is useful, but not enough to justify the implementation complexity or the complexity of having the whole team understand that layout tests are a bit different from other test suites.

Dirk Pranke

unread,

Sep 27, 2016, 2:40:44 PM9/27/16

to Ojan Vafai, Erik Staab, Stephen Chenney, blink...@chromium.org, blink-dev

Right.

Erik Staab

unread,

Oct 11, 2016, 11:06:54 AM10/11/16

to Dirk Pranke, Ojan Vafai, Stephen Chenney, blink...@chromium.org, blink-dev

Those summaries sound correct to me, thanks.

Before I change the data at the source I made a change to the test-results frontend to rewrite the layout-specific types as FAIL. You can see a demo here: before, after. If no one screams when it goes live I'll continue with the changes to the source.

You can follow progress of this work here: http://crbug.com/654500

Reply all

Reply to author

Forward