"experimental" RuntimeEnabledFeatures in tests - change policy?

57 views
Skip to first unread message

Chris Harrelson

unread,
Sep 13, 2018, 11:08:35 AM9/13/18
to blink-dev
Hi all,

When a RuntimeEnabledFeature is marked as "test" or "experimental", it is automatically turned on for unittests and layout tests.

There have been several instances lately where this situation caused significant extra work for developers working on unrelated features, and reduced quality of testing. WebKit is also discussing the same issue.

Here are some options for how to adjust the current situation:

1. Never test what we do not ship.

2. Put strict limits on the time a feature is allowed to be tested-but-not-shipped. Proposal: should ship in the same Chrome major version in which tests were flipped to test the feature. Any earlier testing should be done with techniques such as virtual layout test suites or RuntimeEnabledFlag override/TEST_P patterns in unittests.

3. Separate "test" from "experimental". "experimental" would merely mean --enable-experimental-web-platform-features turns on the feature. "test" means on for testing. They can be turned on separately.

I am leaning towards adopting #2, #3. Opinions?

Examples:

* In several instances I know of for just my team, a developer got confused about the actual launch state of features, and incorrectly treated an "experimental" feature as blocking the launch of a second feature (instead, it should have been a best-effort fix/discuss with feature owner). In some cases, the "experimental" feature was actually abandoned or on indefinite hold looking for staffing.

* We are not testing the code configuration we ship. This can cause us to miss bugs. Not sure if I have seen an instance of this recently though.

* Running content_shell with --run-web-tests mysteriously changes behavior due to turning on certain features, leading to lots of delay while debugging. This has bit my team at least twice in the last month or so.

* There are a number of abandoned features in the "experimental" or "test" state, and turning them off takes work because it causes some tests to fail. (happened to me a few months ago)

Chris

Philip Rogers

unread,
Sep 13, 2018, 1:04:53 PM9/13/18
to Chris Harrelson, blink-dev
I recently spent several days fixing backdrop-filter support for BlinkGenPropertyTrees, not realizing that backdrop-filter was only partially implemented and was not actively being worked on.

I support option #2 but would like to expand on what it would look like for a new feature with WPT tests. I think the pattern would be to add WPT tests as usual, but mark them as failing by default. Developers of a cool new feature would then add third_party/WebKit/LayoutTests/virtual/enable-cool-feature which would run specific subdirectories of WPT tests with the in-development feature.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOMQ%2Bw8n%2BHzV0qmjihQ5%2BMMCncpCkQ9cERFDdt6S1HStt22PUA%40mail.gmail.com.

Chris Harrelson

unread,
Sep 13, 2018, 1:06:35 PM9/13/18
to Philip Rogers, blink-dev
On Thu, Sep 13, 2018 at 10:04 AM Philip Rogers <p...@chromium.org> wrote:
I recently spent several days fixing backdrop-filter support for BlinkGenPropertyTrees, not realizing that backdrop-filter was only partially implemented and was not actively being worked on.

I support option #2 but would like to expand on what it would look like for a new feature with WPT tests. I think the pattern would be to add WPT tests as usual, but mark them as failing by default. Developers of a cool new feature would then add third_party/WebKit/LayoutTests/virtual/enable-cool-feature which would run specific subdirectories of WPT tests with the in-development feature.

I agree that this would be the recommended feature development pattern.
 

On Thu, Sep 13, 2018 at 8:08 AM Chris Harrelson <chri...@chromium.org> wrote:
Hi all,

When a RuntimeEnabledFeature is marked as "test" or "experimental", it is automatically turned on for unittests and layout tests.

There have been several instances lately where this situation caused significant extra work for developers working on unrelated features, and reduced quality of testing. WebKit is also discussing the same issue.

Here are some options for how to adjust the current situation:

1. Never test what we do not ship.

2. Put strict limits on the time a feature is allowed to be tested-but-not-shipped. Proposal: should ship in the same Chrome major version in which tests were flipped to test the feature. Any earlier testing should be done with techniques such as virtual layout test suites or RuntimeEnabledFlag override/TEST_P patterns in unittests.

3. Separate "test" from "experimental". "experimental" would merely mean --enable-experimental-web-platform-features turns on the feature. "test" means on for testing. They can be turned on separately.

I am leaning towards adopting #2, #3. Opinions?

Examples:

* In several instances I know of for just my team, a developer got confused about the actual launch state of features, and incorrectly treated an "experimental" feature as blocking the launch of a second feature (instead, it should have been a best-effort fix/discuss with feature owner). In some cases, the "experimental" feature was actually abandoned or on indefinite hold looking for staffing.

* We are not testing the code configuration we ship. This can cause us to miss bugs. Not sure if I have seen an instance of this recently though.

* Running content_shell with --run-web-tests mysteriously changes behavior due to turning on certain features, leading to lots of delay while debugging. This has bit my team at least twice in the last month or so.

* There are a number of abandoned features in the "experimental" or "test" state, and turning them off takes work because it causes some tests to fail. (happened to me a few months ago)

Chris

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOMQ%2Bw8n%2BHzV0qmjihQ5%2BMMCncpCkQ9cERFDdt6S1HStt22PUA%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAJgFLLt59a-521Sv%2B0mNWK_FHUheVn4Bo1XUqp1mdNEDt0HC4Q%40mail.gmail.com.

Emil A Eklund

unread,
Sep 13, 2018, 1:16:27 PM9/13/18
to Chris Harrelson, Philip Rogers, blink-dev
I've also been bitten by this in the past. Option 2 and 3 both sound good to me.

Stefan Zager

unread,
Sep 13, 2018, 1:19:48 PM9/13/18
to Chris Harrelson, blink-dev
On Thu, Sep 13, 2018 at 8:08 AM Chris Harrelson <chri...@chromium.org> wrote:
Hi all,

When a RuntimeEnabledFeature is marked as "test" or "experimental", it is automatically turned on for unittests and layout tests.

There have been several instances lately where this situation caused significant extra work for developers working on unrelated features, and reduced quality of testing. WebKit is also discussing the same issue.

Here are some options for how to adjust the current situation:

1. Never test what we do not ship.

Amen, I am also bothered by this, and have wasted time on it.
 
2. Put strict limits on the time a feature is allowed to be tested-but-not-shipped. Proposal: should ship in the same Chrome major version in which tests were flipped to test the feature. Any earlier testing should be done with techniques such as virtual layout test suites or RuntimeEnabledFlag override/TEST_P patterns in unittests.

I would like to also investigate whether there's something lighter-weight than a virtual test suite to accomplish this. Also, if we created a virtual test suite for *every* feature marked "test" or "experimental", it would greatly increase the number of tests we run. At the least, maybe we should have a virtual test suite that turns on all of the "experimental" features at once?
 
3. Separate "test" from "experimental". "experimental" would merely mean --enable-experimental-web-platform-features turns on the feature. "test" means on for testing. They can be turned on separately.

I feel like one of these should imply the other, but I'm not sure which way it should go.

Yoichi Osato

unread,
Sep 13, 2018, 10:45:43 PM9/13/18
to Stefan Zager, Chris Harrelson, blink-dev
+1 to split stable tests and features-on tests.
It is worthless to pay cost to keep mostly abandoned features on mainstream bots.

2018年9月14日(金) 2:19 Stefan Zager <sza...@chromium.org>:
--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

Hayato Ito

unread,
Sep 13, 2018, 11:28:03 PM9/13/18
to Chris Harrelson, blink-dev
I like an option 3.

We might want to encourage { experimental: true, test: false } for new experimental features.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.


--
Hayato

Yutaka Hirano

unread,
Sep 13, 2018, 11:39:37 PM9/13/18
to Hayato Ito, Chris Harrelson, blink-dev
Regarding using virtual test suites, It is often tiresome to copy existing TestExpectation entries. It would be very convenient if a virtual test suite inherits the base tests' expectations by default.

To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

Morten Stenshorne

unread,
Sep 14, 2018, 2:18:17 AM9/14/18
to Yutaka Hirano, Hayato Ito, Chris Harrelson, blink-dev
Yutaka Hirano <yhi...@chromium.org> writes:

> Regarding using virtual test suites, It is often tiresome to copy
> existing TestExpectation entries. It would be very convenient if a
> virtual test suite inherits the base tests' expectations by default.

That's one good point. Another point about virtual tests is that they
refuse to run in parallel. Seems to me that I never get more jobs than
the number of directories involved, or something like that.

Try running virtual/layout_ng/fast/block/float ; that's 215 tests, but
it still instists on just running 1 content_shell. Takes forever.

So I would really not like us to use virtual tests more extensively than
today until those issues have been fixed.

--
Morten Stenshorne, Software developer,
Blink/Layout, Google, Oslo, Norway

Philip Jägenstedt

unread,
Sep 14, 2018, 4:15:26 AM9/14/18
to mste...@chromium.org, Yutaka Hirano, Hayato Ito, Chris Harrelson, blink-dev
Addressing this by option 2 or 3 or similar sounds good to me.

There is a somewhat related issue, which is that we're enabled experimental features for Chrome in https://wpt.fyi/results/?label=experimental to stay close to the LayoutTests configuration. This is because we'd like it to be possible to verify that tests pass in vanilla Chrome before shipping. So whatever the change is here, we'll need to adapt a bit.

Chris, are you willing to drive this? Which is your own preferred solution?

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.

Jochen Eisinger

unread,
Sep 14, 2018, 4:18:46 AM9/14/18
to Philip Jägenstedt, mste...@chromium.org, Yutaka Hirano, Hayato Ito, Chris Harrelson, blink-dev
I like option 2).

I think we shouldn't make something "experimental" that isn't tested.

Hayato Ito

unread,
Sep 14, 2018, 4:34:59 AM9/14/18
to Jochen Eisinger, Philip Jägenstedt, mste...@chromium.org, yhi...@chromium.org, Chris Harrelson, blink-dev
On Fri, Sep 14, 2018 at 5:18 PM Jochen Eisinger <joc...@chromium.org> wrote:
I like option 2).

I think we shouldn't make something "experimental" that isn't tested.

I think Option 3 doesn't encourage it. Rather, experimental features should be surely tested in other ways; enabling it explicitly in each tests, or having virtual test suites.
By this way, the configuration of *main* tests would match what we are shipping.


--
Hayato

Jochen Eisinger

unread,
Sep 14, 2018, 4:47:53 AM9/14/18
to Hayato Ito, Philip Jägenstedt, mste...@chromium.org, yhi...@chromium.org, Chris Harrelson, blink-dev
On Fri, Sep 14, 2018 at 10:34 AM Hayato Ito <hay...@chromium.org> wrote:
On Fri, Sep 14, 2018 at 5:18 PM Jochen Eisinger <joc...@chromium.org> wrote:
I like option 2).

I think we shouldn't make something "experimental" that isn't tested.

I think Option 3 doesn't encourage it. Rather, experimental features should be surely tested in other ways; enabling it explicitly in each tests, or having virtual test suites.
By this way, the configuration of *main* tests would match what we are shipping.
 

I think in practice, this will decrease the test coverage for experimental features, as we won't be able to duplicate all tests for all features, and we'll increase the risk of random things breaking once we try to ship the feature.

For non-experimental features, we also get coverage from canary / dev, so I'm less worried about them, assuming that implementing 2) also ensured that features aren't experimental for ages..

Philip Jägenstedt

unread,
Sep 14, 2018, 4:51:51 AM9/14/18
to Jochen Eisinger, Luna Lu, Hayato Ito, mste...@chromium.org, Yutaka Hirano, Chris Harrelson, blink-dev
On Fri, Sep 14, 2018 at 10:47 AM Jochen Eisinger <joc...@chromium.org> wrote:


On Fri, Sep 14, 2018 at 10:34 AM Hayato Ito <hay...@chromium.org> wrote:
On Fri, Sep 14, 2018 at 5:18 PM Jochen Eisinger <joc...@chromium.org> wrote:
I like option 2).

I think we shouldn't make something "experimental" that isn't tested.

I think Option 3 doesn't encourage it. Rather, experimental features should be surely tested in other ways; enabling it explicitly in each tests, or having virtual test suites.
By this way, the configuration of *main* tests would match what we are shipping.
 

I think in practice, this will decrease the test coverage for experimental features, as we won't be able to duplicate all tests for all features, and we'll increase the risk of random things breaking once we try to ship the feature.

For non-experimental features, we also get coverage from canary / dev, so I'm less worried about them, assuming that implementing 2) also ensured that features aren't experimental for ages.

Right, having features stay experimental after work has stalled isn't very helpful, regardless of the testing strategy.

This reminds me a bit of the problem of API removals slipping past the milestone when they were supposed to be removed. +Luna Lu perhaps a similar solution could be applied to both?

Harald Alvestrand

unread,
Sep 14, 2018, 4:53:36 AM9/14/18
to Jochen Eisinger, Hayato Ito, Philip Jägenstedt, mste...@chromium.org, Yutaka Hirano, Chris Harrelson, blink-dev
Perhaps we should have some focus on getting rid of experiments that have ended?

I see "experimental" as having 3 states:

- "We're trying this out to see if it's possible to implement" - ideally, should treat like a branch, but we don't do branches, so we can't - better ideas needed. "Testing" does some of this, but impacts others' tests.

- "We want to see if Chrome will continue to work if we land this" - what we get with today's "experimental"

- "I've given up on landing this for the foreseeable future, but haven't removed it" - remove it from tests.

The last could be handled with a classification of "off" for a feature - in which case it should be linked to a bug to remove the code ASAP.


To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscribe@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CALjhuifs12vzQuPLpxV6yX0xZcSb2LxUk-B2kJJ7%2BDO3iSi6QA%40mail.gmail.com.

Hayato Ito

unread,
Sep 14, 2018, 4:56:42 AM9/14/18
to Jochen Eisinger, Philip Jägenstedt, mste...@chromium.org, yhi...@chromium.org, Chris Harrelson, blink-dev
On Fri, Sep 14, 2018 at 5:47 PM Jochen Eisinger <joc...@chromium.org> wrote:


On Fri, Sep 14, 2018 at 10:34 AM Hayato Ito <hay...@chromium.org> wrote:
On Fri, Sep 14, 2018 at 5:18 PM Jochen Eisinger <joc...@chromium.org> wrote:
I like option 2).

I think we shouldn't make something "experimental" that isn't tested.

I think Option 3 doesn't encourage it. Rather, experimental features should be surely tested in other ways; enabling it explicitly in each tests, or having virtual test suites.
By this way, the configuration of *main* tests would match what we are shipping.
 

I think in practice, this will decrease the test coverage for experimental features, as we won't be able to duplicate all tests for all features, and we'll increase the risk of random things breaking once we try to ship the feature.

Yeah, I agree, however, we can always use { experimental: true, test: true } for such a feature.
I guess there is an experimental feature for which we want { experimental: true, test: false }, so that we don't worry about the side effect of this feature on main tests.

Could this freedom make the situation complex?


--
Hayato

Daniel Ehrenberg

unread,
Sep 14, 2018, 6:35:02 AM9/14/18
to sza...@chromium.org, chri...@chromium.org, blink-dev
On Thu, Sep 13, 2018 at 7:19 PM Stefan Zager <sza...@chromium.org> wrote:
On Thu, Sep 13, 2018 at 8:08 AM Chris Harrelson <chri...@chromium.org> wrote:
Hi all,

When a RuntimeEnabledFeature is marked as "test" or "experimental", it is automatically turned on for unittests and layout tests.

There have been several instances lately where this situation caused significant extra work for developers working on unrelated features, and reduced quality of testing. WebKit is also discussing the same issue.

Here are some options for how to adjust the current situation:

1. Never test what we do not ship.

Amen, I am also bothered by this, and have wasted time on it.
 
2. Put strict limits on the time a feature is allowed to be tested-but-not-shipped. Proposal: should ship in the same Chrome major version in which tests were flipped to test the feature. Any earlier testing should be done with techniques such as virtual layout test suites or RuntimeEnabledFlag override/TEST_P patterns in unittests.

I would like to also investigate whether there's something lighter-weight than a virtual test suite to accomplish this. Also, if we created a virtual test suite for *every* feature marked "test" or "experimental", it would greatly increase the number of tests we run. At the least, maybe we should have a virtual test suite that turns on all of the "experimental" features at once?

For test262 tests, V8 used to run in "Experimental JS features" mode for all tests, in order to let more tests pass; now, the test runner has support for turning on and off flags per test, based on which in-development features are used. This approach doesn't increase the number of tests that are run (since the flag is only set on the affected tests), while allowing test coverage even earlier in the development process (before the feature is ready to be added to the "experimental JS features" bucket). Where available, V8's test262 test runner uses feature metadata contained in the test262 tests' frontmatter to determine the flag settings, avoiding the need to enumerate every test affected. Would such an approach possibly work for Blink as well, maybe using a lightweight virtual test suites?

Dan 
 
3. Separate "test" from "experimental". "experimental" would merely mean --enable-experimental-web-platform-features turns on the feature. "test" means on for testing. They can be turned on separately.

I feel like one of these should imply the other, but I'm not sure which way it should go.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.

Ned Nguyen

unread,
Oct 11, 2018, 3:28:26 PM10/11/18
to blink-dev
Hi all, 
I (from Chrome core automation team) will like to drive this bug (Q1 2019 realistically). I filed crbug.com/894569 to track this effort, so please bring new discussions to that bug instead.

Chris Harrelson

unread,
Oct 12, 2018, 6:30:13 PM10/12/18
to Ned Nguyen, blink-dev
Thanks for driving Ned!

All: in the meantime, I have a request for volunteers. Task: look at the list of flags in runtime_enabled_features.json5, find  a feature you know about that is in status experimental or test, but that is not actually very close to shipping, or is abandoned, and remove those fields.

Bonus points for deleting code for truly abandoned features.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/860e08ff-a68a-4af0-9b22-8dc9fa546e6b%40chromium.org.
Reply all
Reply to author
Forward
0 new messages