Can we remove status:"test" for RuntimeEnabledFeatures?

90 views
Skip to first unread message

Xianzhu Wang

unread,
Jul 10, 2020, 3:15:48 PM7/10/20
to blink-dev
Hi,

The RuntimeEnabledFeatures doc doesn't clearly state the reasons for status:"test", but lists a lot of problems with it. I only see one reason for it: convenience of demoting status:"experimental" when we find a feature is too buggy, to avoid rebaselining a lot of tests. However this makes our tests further away from what we ship.

Besides the drawbacks listed in the doc, status:"test" also makes content_shell behave differently in browser mode and web tests mode without an easy way to remedy. For status:"experimental" we have --enable-experimental-web-platform-features so it's less of this problem.

Can we remove it? I think virtual tests and flag-specific tests provide good alternatives to it.

Jeremy Roman

unread,
Jul 10, 2020, 3:25:59 PM7/10/20
to Xianzhu Wang, blink-dev
There are at least some features intended for use only in test, like MojoJSTest, and some features which aren't yet ready for experimental status (which is sometimes turned on in the wild) but are stable enough for usage in web tests. It seems like it would be tedious to use virtual test suites for all of the latter.

Would a switch to content_shell that turned on the test features without full --run-web-tests address your concern?

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CADBxrifYJ4VJk5Wx5KAcGWJ%2Ba9Q7Ek7tBbvA2uaOzt5m8W16Kg%40mail.gmail.com.

Xianzhu Wang

unread,
Jul 10, 2020, 5:12:45 PM7/10/20
to Jeremy Roman, blink-dev
On Fri, Jul 10, 2020 at 12:25 PM Jeremy Roman <jbr...@chromium.org> wrote:
There are at least some features intended for use only in test, like MojoJSTest, and some features which aren't yet ready for experimental status (which is sometimes turned on in the wild) but are stable enough for usage in web tests. It seems like it would be tedious to use virtual test suites for all of the latter.

I don't think we should enable a feature that is not ready for experimental status for all tests on all bots, because that causes loss of test coverage of the normal code path and other problems stated in the doc. Dedicated bot (e.g. this) is a working alternative, which avoids a lot of problems of status:"test". Another less formal method is to create a CL which enables the feature, and try the CL on trybots periodically.

It might be fine to use status:"test" for features that are intended for use only in tests, but to avoid misuse for other features, I think we should also find alternative ways. We have such features not using status:"test", e.g. the testRunner and internals APIs.
 
Would a switch to content_shell that turned on the test features without full --run-web-tests address your concern?

This partly addresses the concern about running content_shell with test features, but doesn't address other problems.

Jeremy Roman

unread,
Jul 10, 2020, 7:39:15 PM7/10/20
to Xianzhu Wang, blink-dev
On Fri, Jul 10, 2020 at 5:12 PM Xianzhu Wang <wangx...@chromium.org> wrote:
On Fri, Jul 10, 2020 at 12:25 PM Jeremy Roman <jbr...@chromium.org> wrote:
There are at least some features intended for use only in test, like MojoJSTest, and some features which aren't yet ready for experimental status (which is sometimes turned on in the wild) but are stable enough for usage in web tests. It seems like it would be tedious to use virtual test suites for all of the latter.

I don't think we should enable a feature that is not ready for experimental status for all tests on all bots, because that causes loss of test coverage of the normal code path and other problems stated in the doc. Dedicated bot (e.g. this) is a working alternative, which avoids a lot of problems of status:"test". Another less formal method is to create a CL which enables the feature, and try the CL on trybots periodically.

Some features change how existing code paths behave, but many are additive and expose new APIs which we're fairly confident do nothing if not invoked. I don't think there's a substantial loss of test coverage in those cases.

Xianzhu Wang

unread,
Jul 10, 2020, 8:20:09 PM7/10/20
to Jeremy Roman, blink-dev
On Fri, Jul 10, 2020 at 4:39 PM Jeremy Roman <jbr...@chromium.org> wrote:
On Fri, Jul 10, 2020 at 5:12 PM Xianzhu Wang <wangx...@chromium.org> wrote:
On Fri, Jul 10, 2020 at 12:25 PM Jeremy Roman <jbr...@chromium.org> wrote:
There are at least some features intended for use only in test, like MojoJSTest, and some features which aren't yet ready for experimental status (which is sometimes turned on in the wild) but are stable enough for usage in web tests. It seems like it would be tedious to use virtual test suites for all of the latter.

I don't think we should enable a feature that is not ready for experimental status for all tests on all bots, because that causes loss of test coverage of the normal code path and other problems stated in the doc. Dedicated bot (e.g. this) is a working alternative, which avoids a lot of problems of status:"test". Another less formal method is to create a CL which enables the feature, and try the CL on trybots periodically.

Some features change how existing code paths behave, but many are additive and expose new APIs which we're fairly confident do nothing if not invoked. I don't think there's a substantial loss of test coverage in those cases.

I think an additive feature can be tested with dedicated tests or a small set of existing tests in virtual suites, so it should not need status:test. Otherwise it probably changes key code paths. 

My concern is about the internal features rewriting a lot of code, e.g. LayoutNG* and CompositeAfterPaint, using status:"test". 

Xianzhu Wang

unread,
Jul 13, 2020, 12:33:58 PM7/13/20
to Jeremy Roman, blink-dev
What do you think about the following actions:
1. Add --enable-blink-test-features so that we can run content_shell and chrome with the same set of features as blink web tests.
2. Rename status:test to something like status:for-testing-only to indicate that it's for features required by testing only, not for testing features.
3. Clean up current status:test features.

(BTW, I think we should also clean up status:stable features that have been enabled for more than 3 milestones.)

Jeremy Roman

unread,
Jul 13, 2020, 3:06:10 PM7/13/20
to Xianzhu Wang, Kentaro Hara, blink-dev, platform-architecture-dev
I don't have any strong objection though I'm not sure how much of an improvement it will be, and maintaining more and more virtual test suites (esp. in combination with the desire to land as much as possible in external/wpt/) seems tedious. Ideally +Kentaro Hara or someone else from plat arch can weigh in too.

Kentaro Hara

unread,
Jul 14, 2020, 4:12:40 AM7/14/20
to Jeremy Roman, Xianzhu Wang, blink-dev, platform-architecture-dev
Yeah, I'd second Jeremy's opinion.

LayoutNG, CompositeAfterPaint, BFcache, Oilpan's unified heap and any non-trivial architecture features want to start with "status:test". In common cases, they set up their dedicated bots to test the feature. We could ask them to set up virtual test suites but I'm not sure if that's a lot of improvement (adding more virtual test suites will increase the checkout size, right?).

1. Add --enable-blink-test-features so that we can run content_shell and chrome with the same set of features as blink web tests.

If --enable-blink-test-features enables all status:test features, that will be problematic. LayoutNG wants to enable only the LayoutNG flag and test.

3. Clean up current status:test features.

Separately, +1 to cleaning up the status:test features :)
--
Kentaro Hara, Tokyo, Japan

Xianzhu Wang

unread,
Jul 14, 2020, 12:57:07 PM7/14/20
to Kentaro Hara, Jeremy Roman, blink-dev, platform-architecture-dev
On Tue, Jul 14, 2020 at 1:12 AM Kentaro Hara <har...@chromium.org> wrote:
Yeah, I'd second Jeremy's opinion.

LayoutNG, CompositeAfterPaint, BFcache, Oilpan's unified heap and any non-trivial architecture features want to start with "status:test". In common cases, they set up their dedicated bots to test the feature. We could ask them to set up virtual test suites but I'm not sure if that's a lot of improvement (adding more virtual test suites will increase the checkout size, right?).

As I understand it, Jeremy's point was that status:test was useful for additive features, but I think dedicated tests and small virtual tests are more suitable for additive features. Virtual tests are necessary to ensure test coverage of code paths that we ship and we experiment. An additive feature should only need a small virtual suite which won't increase checkout size very much, or not at all if there are no new baselines.

More importantly, status:test causes loss of test coverage of shipped code for systematic features. I don't know other features, but for features that I developed or contributed, such as BlinkGenPropertyTrees, LayoutNG and CompositeAfterPaint, we took measures to ensure test coverage of both code paths with the feature enabled and disabled:
  • LayoutNG Block-Flow didn't use status:test. The launch plan ensured test coverage of both code paths with LayoutNG disabled and enabled at each step. Now we still have a dedicated bot fully testing the feature-disabled code paths which are still executed in some cases in production.
  • The launch process of BlinkGenPropertyTrees was a bit different, but we also didn't use status:test. We also ensured test coverage in each step until we fully launched the feature.
  • We are using a dedicated bot for CompositeAfterPaint. When launching it, I would like to follow a process similar to the above features.

1. Add --enable-blink-test-features so that we can run content_shell and chrome with the same set of features as blink web tests.

If --enable-blink-test-features enables all status:test features, that will be problematic. LayoutNG wants to enable only the LayoutNG flag and test.

We enable all status:test features when running tests. Sometimes when a web test fails, we want to reproduce the failure (if possible) interactively in chrome or content_shell browser mode, with the same features used for web tests, then --enable-blink-test-features would be useful as the first step to see if the failure is related to any test feature before we know which feature it is.

Philip Rogers

unread,
Jul 15, 2020, 11:04:03 AM7/15/20
to Xianzhu Wang, Kentaro Hara, Jeremy Roman, blink-dev, platform-architecture-dev
Differences between chrome/content_shell and web_tests are a low-level time sink for the team. Just this week, atotic sunk a few hours on a test failure before finding it was due to a status:test feature (LayoutNGFragmentItem). This issue bites me about once a quarter.

Another issue with status:test is that it hides WPT failures from the team. For example, wpt/css/css-values/calc-numbers.html passes on our bots but fails on wpt.fyi due to the CSSCalcAsInt runtime enabled feature.

There is difficulty in using the no-status + virtual test suite approach today (e.g., cost of flipping virtual and non-virtual expectations when launching), so I'm reluctant to argue for removing status:test, even if it's where I think we should be heading. Could we start with better documentation around launching features without status:test? With this information, I think we'll see more feature authors choosing the no-status approach going forward.


Also a big +1 to the proposal of --enable-blink-test-features as well as cleaning up existing status:test features.

Stephen Mcgruer

unread,
Jul 15, 2020, 11:19:21 AM7/15/20
to Philip Rogers, Xianzhu Wang, Kentaro Hara, Jeremy Roman, blink-dev, platform-architecture-dev
Hi; chiming in from the WPT perspective (big thanks to pdr@ for bringing our attention to this, and nudging me to comment here).

We look at this from two angles: the cross-browser interop one (aka what should show up on wpt.fyi and be compared to Firefox/Safari/Edge/etc), and the usage of WPT for Chromium testing (aka external/wpt and wpt_internal).

For the former: historically we've split wpt.fyi into two groups; 'current' releases of browsers, which are intended to be representative of what users see in the world today (these will be stable releases with few to no flags on), and 'experimental' releases of browsers, which are intended to be representative of what users will see in the near-to-mid future. For the latter class, we use Chrome Dev and turn on --enable-experimental-web-platform-features, with the justification that most features behind that flag will go to stable sooner or later.

status:test features are a different story, to me. If those features are actually internal APIs required to run some test, and are not spec'd in some way, then we can't really test them cross-browser and we should work on building a standards-compliant way to test the feature (e.g. via spec work, via webdriver extensions, or via test-only API specs). Otherwise, if the feature is a 'real' web feature but is still behind status:test, what are we saying about it? It seems we're saying that we're not happy with users having access to it, and it seems that any such feature will graduate to status:experimental at some point, at which point it will be reflected on wpt.fyi.

Now, all of this changes when we are looking at using wptrunner in the Chromium CI itself, which is an ongoing effort[0]. There, our goal is to provide test correctness for Chromium engineers, which can mean turning on status:test flags. Likely we will have to either add a flag similar to --enable-experimental-web-platform-features for chrome as Xianzhu suggests, or parse runtime_enabled_features.json5 to extract the individual flags and enable them via --enable-blink-features .


So overall, I think I would be supportive of (a) exposing a `--enable-test-only-features` flag or similar (though note that any test *explicitly* using an internal API would still have to go in wpt_internal, we can't upstream those), (b) cleaning up the current test:status entries, and (c) better understanding why a 'real' feature should be on-for-test but not on-for-experimental (given the lost testing coverage of the not-enabled path).


Jason Chase

unread,
Jul 15, 2020, 5:01:42 PM7/15/20
to blink-dev, Stephen McGruer, Xianzhu Wang, Kentaro Hara, Jeremy Roman, blink-dev, platform-architecture-dev, Philip Rogers
As others have pointed out, there's overhead to the virtual test suite approach, both technical and in process/effort. I like the idea of an incremental approach to better understand the implications and potential benefits, before moving to remove status:"test" entirely. Specifically, +1 to (a) and (b) from Stephen's email.

I'll emphasize the point that runtime enabled flags are complicated, and I don't think we have a good grasp on all the scenarios. That seems evident from crbug.com/832393, and the change history of content/child/runtime_features.cc, just to start.

If I understand correctly, one concern is status:"test" causes less test coverage for the not-enabled paths, i.e. we should test what we ship. It's not clear to me that removing status:"test" would significantly improve test coverage. It could, to the extent that forces features to have no status. However, I think it could also result in features being put into status:"experimental" sooner, for convenience reasons. Other than the  RuntimeEnabledFeatures doc, I'm not aware of a robust process that ensures that status is set appropriately. Even so, I think we have a non-trivial number of features that legitimately spend multiple releases in status:"experimental", another case where we're not testing what we ship.

Thanks,
Jason

Xianzhu Wang

unread,
Jul 16, 2020, 5:04:23 PM7/16/20
to Jason Chase, blink-dev, Stephen McGruer, Kentaro Hara, Jeremy Roman, platform-architecture-dev, Philip Rogers
Thanks everyone for feedback.

Updates:
- Added --enable-blink-test-features (crrev.com/788887). It will enable both experimental and test features because both are enabled for blink tests.
   - Added that using status:"test" for features having substantially different code paths from production is strongly discouraged;
   - Added links to documentation and examples of alternatives to status:"test".

I think setting up reminders for cleaning up runtime enabled features may be helpful. For example, we can remind the owners of a feature that has been in test, experimental or stable status for more than 2 milestones. A feature with alternative testing plans can stay longer in test or experimental status.

Jason Chase

unread,
Jul 17, 2020, 11:34:16 AM7/17/20
to Xianzhu Wang, Jason Chase, blink-dev, Stephen McGruer, Kentaro Hara, Jeremy Roman, platform-architecture-dev, Philip Rogers
Yes, we definitely could benefit from a process to clean up runtime enabled features. This is on the long-term backlog for the Feature Control team (although I can't find a tracking bug). This is a similar problem to cleaning up deprecated features (we do have a tracking bug for that crbug.com/834238).
Reply all
Reply to author
Forward
0 new messages