[ Skip ] in TestExpectations

23 views
Skip to first unread message

Xianzhu Wang

unread,
Sep 14, 2017, 12:11:28 PM9/14/17
to blink-dev
Hi,

Currently there are 431 lines containing "[ Skip ]" in TestExpectations.  It seems that quite a few tests marked Skip are just to avoid flakiness. IMHO except that we really don't want to run them, we should mark them flaky or failing instead of skipping them, so that we can see them on the flakiness dashboard. I even think we should ban Skip in TestExpectations, with another file (SkippedTests) to skip tests that we really don't want to run (e.g. tests harmful to run-webkit-tests, tests taking too much time).

Wdyt?

Xianzhu

Dirk Pranke

unread,
Sep 14, 2017, 2:41:04 PM9/14/17
to Xianzhu Wang, blink-dev
For many years, I thought it was better to use [ Pass Failure ] for flaky tests, since as you say it seemed like it would be better to keep running the tests and see if they got better or worse (e.g., started failing consistently, or started crashing).

However, I've come to the conclusion many others have come to, that flaky tests are more trouble than they're worth. Suppressing the failures moves them out of sight, but they still keep imposing a resource cost. 

We have a lot of complexity in the tooling in webkitpy to handle all of the different kinds of test expectations. I think it's more complexity than we need, and reducing the complexity would make it easier for us to build tools that we can share with the other parts of Chrome to make it easier to report on disabled tests and make it easier to disable (or reenable) them.

So, I actually think we should *not* support [ Pass Failure ]. Even [ Skip ] or [ Failure ] entries should be strictly time-bounded, since we shouldn't be tolerating new failures getting added to the tree, people need to either fix their failures or stop running the tests, since you've apparently decided that fixing them isn't worth it.

I'm tempted to also remove the [ Release Debug ] distinction, but I'd need to look at the impact of that more first.

I do think there's a reason to keep [ WontFix ] around, though, in the NeverFixTests, since it's the only good mechanism we have for indicating that tests will pass on most platforms will never pass on one of them and shouldn't be run.

-- Dirk


--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CADBxricgEK7Zbzs%3D1m4eBPobNeU4BiTJGmMXTwbjYjNuTyxK7A%40mail.gmail.com.

Xianzhu Wang

unread,
Sep 14, 2017, 3:12:24 PM9/14/17
to Dirk Pranke, blink-dev
On Thu, Sep 14, 2017 at 11:40 AM, Dirk Pranke <dpr...@chromium.org> wrote:
For many years, I thought it was better to use [ Pass Failure ] for flaky tests, since as you say it seemed like it would be better to keep running the tests and see if they got better or worse (e.g., started failing consistently, or started crashing).

However, I've come to the conclusion many others have come to, that flaky tests are more trouble than they're worth. Suppressing the failures moves them out of sight, but they still keep imposing a resource cost. 

We have a lot of complexity in the tooling in webkitpy to handle all of the different kinds of test expectations. I think it's more complexity than we need, and reducing the complexity would make it easier for us to build tools that we can share with the other parts of Chrome to make it easier to report on disabled tests and make it easier to disable (or reenable) them.

So, I actually think we should *not* support [ Pass Failure ]. Even [ Skip ] or [ Failure ] entries should be strictly time-bounded, since we shouldn't be tolerating new failures getting added to the tree, people need to either fix their failures or stop running the tests, since you've apparently decided that fixing them isn't worth it.

If we decide to prefer [ Skip ] to [ Pass Failure ], I think it would be better to setup bots to run skipped tests periodically and let the flakiness dashboard show them.

About the time to allow flaky and failure entries, I think we should also rely on the bug tracking system. We can allow a flaky or failure entry as long as the associated bug is open. We should not allow a flaky or failure entry without an associated bug or associated with a closed bug.


I'm tempted to also remove the [ Release Debug ] distinction, but I'd need to look at the impact of that more first.

I do think there's a reason to keep [ WontFix ] around, though, in the NeverFixTests, since it's the only good mechanism we have for indicating that tests will pass on most platforms will never pass on one of them and shouldn't be run.


Agreed.

Aleks Totic

unread,
Sep 14, 2017, 5:27:59 PM9/14/17
to Xianzhu Wang, Dirk Pranke, blink-dev
My LayoutTests pet peeve is [ Timeout ]
 
On bots, there are ~200 timeouts per full test run. Each timeout is 6s, that is 1200s, or 20 minutes of run time. Being parallel helps.....

On my local machine, 600 tests fail, making timeouts take 60 minutes, or 90s when fully parallel. That is >15% of all time spent running tests.

Aleks

To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscribe@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CADBxridZm524Lc088bgfWgzfC29MHki%2BGLF-OJHuZGQqzMmSnw%40mail.gmail.com.

Stephen Chenney

unread,
Sep 18, 2017, 11:10:36 AM9/18/17
to Aleks Totic, Xianzhu Wang, Dirk Pranke, blink-dev
On Thu, Sep 14, 2017 at 5:27 PM, 'Aleks Totic' via blink-dev <blin...@chromium.org> wrote:
My LayoutTests pet peeve is [ Timeout ]
 
On bots, there are ~200 timeouts per full test run. Each timeout is 6s, that is 1200s, or 20 minutes of run time. Being parallel helps.....

On my local machine, 600 tests fail, making timeouts take 60 minutes, or 90s when fully parallel. That is >15% of all time spent running tests.

Aleks

Tests that are expected to Timeout should be treated as Skipped, as far as I am concerned. If they always time out they are not giving any benefit at all.
 
On Thu, Sep 14, 2017 at 12:12 PM, Xianzhu Wang <wangx...@chromium.org> wrote:

On Thu, Sep 14, 2017 at 11:40 AM, Dirk Pranke <dpr...@chromium.org> wrote:
For many years, I thought it was better to use [ Pass Failure ] for flaky tests, since as you say it seemed like it would be better to keep running the tests and see if they got better or worse (e.g., started failing consistently, or started crashing).

However, I've come to the conclusion many others have come to, that flaky tests are more trouble than they're worth. Suppressing the failures moves them out of sight, but they still keep imposing a resource cost. 

We have a lot of complexity in the tooling in webkitpy to handle all of the different kinds of test expectations. I think it's more complexity than we need, and reducing the complexity would make it easier for us to build tools that we can share with the other parts of Chrome to make it easier to report on disabled tests and make it easier to disable (or reenable) them.

So, I actually think we should *not* support [ Pass Failure ]. Even [ Skip ] or [ Failure ] entries should be strictly time-bounded, since we shouldn't be tolerating new failures getting added to the tree, people need to either fix their failures or stop running the tests, since you've apparently decided that fixing them isn't worth it.

If we decide to prefer [ Skip ] to [ Pass Failure ], I think it would be better to setup bots to run skipped tests periodically and let the flakiness dashboard show them.

About the time to allow flaky and failure entries, I think we should also rely on the bug tracking system. We can allow a flaky or failure entry as long as the associated bug is open. We should not allow a flaky or failure entry without an associated bug or associated with a closed bug.


I'm tempted to also remove the [ Release Debug ] distinction, but I'd need to look at the impact of that more first.

There are some number of tests that produce different painting results from Debug to Release, particularly on Windows. It is related to inlining and register usage in the compiler. We need coverage for the tested feature, so we can't disable them. Hence them appearing in WontFix with a [Debug] Modifier.

Cheers,
Stephen.


I do think there's a reason to keep [ WontFix ] around, though, in the NeverFixTests, since it's the only good mechanism we have for indicating that tests will pass on most platforms will never pass on one of them and shouldn't be run.


Agreed.
 
-- Dirk


On Thu, Sep 14, 2017 at 9:09 AM, Xianzhu Wang <wangx...@chromium.org> wrote:
Hi,

Currently there are 431 lines containing "[ Skip ]" in TestExpectations.  It seems that quite a few tests marked Skip are just to avoid flakiness. IMHO except that we really don't want to run them, we should mark them flaky or failing instead of skipping them, so that we can see them on the flakiness dashboard. I even think we should ban Skip in TestExpectations, with another file (SkippedTests) to skip tests that we really don't want to run (e.g. tests harmful to run-webkit-tests, tests taking too much time).

Wdyt?

Xianzhu

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CADBxricgEK7Zbzs%3D1m4eBPobNeU4BiTJGmMXTwbjYjNuTyxK7A%40mail.gmail.com.


--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscribe@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CADBxridZm524Lc088bgfWgzfC29MHki%2BGLF-OJHuZGQqzMmSnw%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.

Dirk Pranke

unread,
Sep 22, 2017, 6:11:27 PM9/22/17
to Stephen Chenney, Aleks Totic, Xianzhu Wang, blink-dev
On Mon, Sep 18, 2017 at 8:10 AM, Stephen Chenney <sche...@chromium.org> wrote:


On Thu, Sep 14, 2017 at 5:27 PM, 'Aleks Totic' via blink-dev <blin...@chromium.org> wrote:
My LayoutTests pet peeve is [ Timeout ]
 
On bots, there are ~200 timeouts per full test run. Each timeout is 6s, that is 1200s, or 20 minutes of run time. Being parallel helps.....

On my local machine, 600 tests fail, making timeouts take 60 minutes, or 90s when fully parallel. That is >15% of all time spent running tests.

Aleks

Tests that are expected to Timeout should be treated as Skipped, as far as I am concerned. If they always time out they are not giving any benefit at all.
 
On Thu, Sep 14, 2017 at 12:12 PM, Xianzhu Wang <wangx...@chromium.org> wrote:

On Thu, Sep 14, 2017 at 11:40 AM, Dirk Pranke <dpr...@chromium.org> wrote:
For many years, I thought it was better to use [ Pass Failure ] for flaky tests, since as you say it seemed like it would be better to keep running the tests and see if they got better or worse (e.g., started failing consistently, or started crashing).

However, I've come to the conclusion many others have come to, that flaky tests are more trouble than they're worth. Suppressing the failures moves them out of sight, but they still keep imposing a resource cost. 

We have a lot of complexity in the tooling in webkitpy to handle all of the different kinds of test expectations. I think it's more complexity than we need, and reducing the complexity would make it easier for us to build tools that we can share with the other parts of Chrome to make it easier to report on disabled tests and make it easier to disable (or reenable) them.

So, I actually think we should *not* support [ Pass Failure ]. Even [ Skip ] or [ Failure ] entries should be strictly time-bounded, since we shouldn't be tolerating new failures getting added to the tree, people need to either fix their failures or stop running the tests, since you've apparently decided that fixing them isn't worth it.

If we decide to prefer [ Skip ] to [ Pass Failure ], I think it would be better to setup bots to run skipped tests periodically and let the flakiness dashboard show them.

About the time to allow flaky and failure entries, I think we should also rely on the bug tracking system. We can allow a flaky or failure entry as long as the associated bug is open. We should not allow a flaky or failure entry without an associated bug or associated with a closed bug.


I'm tempted to also remove the [ Release Debug ] distinction, but I'd need to look at the impact of that more first.

There are some number of tests that produce different painting results from Debug to Release, particularly on Windows. It is related to inlining and register usage in the compiler. We need coverage for the tested feature, so we can't disable them. Hence them appearing in WontFix with a [Debug] Modifier.

Good point.

-- Dirk
Reply all
Reply to author
Forward
0 new messages