What site-breaking changes deserve policy flag treatment?

47 views
Skip to first unread message

Arvind Murching

unread,
Jul 8, 2021, 7:26:02 PM7/8/21
to blink-api-owners-discuss, Rick Byers, dli...@microsoft.com
Hi,

This question is provoked by an enterprise customer who ran into a line of business site breakage due to a TablesNG change in 92.

There is precedence for introducing a policy that gives customers time to adjust to a breaking change, e.g. https://chromeenterprise.google/policies/?policy=LegacySameSiteCookieBehaviorEnabled.

What criteria are used to decide whether or not to apply such treatment to major changes that cause site regressions?

Thanks
Arvind

Daniel Bratell

unread,
Jul 9, 2021, 4:42:26 AM7/9/21
to blink-api-ow...@chromium.org

I can't recall any documented checklist for when to use or not to use enterprise policies. As API Owners we from time to time recommend implementors to add such flags, and I can say what triggers me to think about enterprise:

First, if a feature is likely to have a higher usage on intranets than on the general web, then we know that the collected usage numbers have a blind spot since many enterprises disable usage reporting. This includes old features since intranet/enterprise applications are often old but also typical enterprise features such as printing and authentication.

Second, if it's a feature that can be hard to work around. Enterprise applications often have much longer lifecycles than public applications so an enterprise policy can give enterprises more time to implement, test and deploy an update. Sync XHR could a good example of this.

Third, if there is "enterprise" related feedback on a shipping thread.

Fourth, if Chromium is breaking new ground. If this is a feature untested in the wild, it makes sense to be more careful than if other browsers already implement it. This mostly just emphasizes or weakens the previous points.

This won't capture every case where an enterprise policy may be useful, but I hope it will cover most of them. About TablesNG, I would not consider such changes specifically deserving of an enterprise flag because I would not expect enterprise to be affected any more or less than the normal world, and in most cases any differences would be cosmetic rather than site breaking.

In the end it will be up to the Chrome enterprise team and the implementors, though I think they in general agree with enterprise policy suggestions when they are given.

In this particular case, have you verified that the breaking change was intentional and not just a bug? Bugs do happen and they are supposed to be prevented in other ways.

/Daniel

Arvind --
You received this message because you are subscribed to the Google Groups "blink-api-owners-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-api-owners-d...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-api-owners-discuss/93cfc509-8d90-48ca-92dd-e997b330fc6bn%40chromium.org.

Rick Byers

unread,
Jul 9, 2021, 5:41:37 PM7/9/21
to Daniel Bratell, blink-api-ow...@chromium.org, Brandon Heenan, Johnny Stenback
+1 to what Daniel is saying. 

Also our breaking change guidelines point to this policy. +Brandon from the Chrome enterprise team. 

Where I think we've gotten into trouble in the past is in the blurry line between what is just a trivial bug-fix and what is an "enterprise breaking change". It's sometimes very hard to identify these in advance. Sometimes we screw up and fail to identify something we really should have, but often there was just no reasonable way to know in advance without adding huge overhead to every single behavior-impacting bug we fix in the rendering engine (probably dozens every Chrome release).

Is it perhaps worth digging into the details of this TablesNG issue to see if there's anything to learn? Eg. was it discussed as a breaking change at all, or did we miss that there was some evidence it might cause compat problems? Is the issue resolved now, or should we be considering whether it should be revisited?

To me the most important thing is that we are good at being reactive and revisiting decisions when new information comes to light. Eg. we often say "this is fine as long as there are no reports of any sites being broken in beta, but if some come in then please hold and do a compat analysis and intent thread to get approval". So I think we should always be prepared to revisit changes when non-trivial compat issues are surfaced.

But this is all based on the assumption that we have no reasonable way to detect compat issues like this in advance of user reports. Chrome has never really had a web compat test suite beyond what we can collect via UMA and the manual testing that occurs with every launch, but I understand Chrome is an outlier in the browser world there. Arvind, can you share any information on the sort of web compat testing framework your team maintains these days? I've heard rumours of a legendary powerful system from the EdgeHTML days. Perhaps there is a way to leverage the Edge team's expertise and tooling to help us better identify issues like this sooner in the future?

Rick

Aleks Totic

unread,
Jul 9, 2021, 6:30:02 PM7/9/21
to Rick Byers, Daniel Bratell, blink-api-owners-discuss, Brandon Heenan, Johnny Stenback
Is it perhaps worth digging into the details of this TablesNG issue to see if there's anything to learn? Eg. was it discussed as a breaking change at all, or did we miss that there was some evidence it might cause compat problems? Is the issue resolved now, or should we be considering whether it should be revisited?

As a lead TablesNG developer, I worried about "tables inside enterprise".

The old code had many quirks/bugs that were fixed, and no longer there with TablesNG. The enterprise scenario was worrisome because:
- enterprises can mandate Chrome-only
- Chrome-only pages are more likely to be dependent on Chrome-only quirks.

To alleviate the risk, I had SAP run their entire test suite with TablesNG and submit feedback before we turned the flag on.

I just assumed that enterprises would be able to turn TablesNG flag on and off, I did not know that Chrome enterprise only manages a subset of flags. If I was aware of this, I'd have advocated to give enterprise an option to turn off TablesNG during the transition period.

Aleks

Arvind Murching

unread,
Jul 9, 2021, 7:54:29 PM7/9/21
to blink-api-owners-discuss, Aleks Totic, Daniel Bratell, blink-api-owners-discuss, Brandon Heenan, Johnny Stenback, rby...@chromium.org
Appreciate the responses! Here are two bugs that illustrate the issue and may end up WontFix-

1227868 - TableNG rendering regression with min-height and percentages

1227884 - TableNG interop - relative positions on <tr> now works

The first is from an LOB site used within Microsoft, and the second was reported by a non-Microsoft platform used by many enterprises.

Rick, I'm not sure that there were magical things in EdgeHTML process that would catch such things - I know WPT was used, but I'll check if there were more.

Thanks

Arvind

Rick Byers

unread,
Jul 12, 2021, 12:47:23 PM7/12/21
to Arvind Murching, blink-api-owners-discuss, Aleks Totic, Daniel Bratell, Brandon Heenan, Johnny Stenback
Thanks Aleks, that's really interesting that you say you would have added a policy knob if you knew about it. This is something API owners sometimes ask for in breaking change intents, since it's come up as an issue in the past with enterprise customers and is part of the "enterprise-friendly change policy". There wasn't an intent thread for TablesNG, was there? In retrospect, I wonder if there were enough little bug-fixes here that it would have been worthwhile to discuss the compat risk in an intent thread? I think that likely would have led to the suggestion of adding an enterprise policy knob. Thoughts?

Arvind, would there still be any value in adding such a knob now? Aleks, what's the timeline planned for deleting the old tables code?

Rick

--
You received this message because you are subscribed to the Google Groups "blink-api-owners-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-api-owners-d...@chromium.org.

Ian Kilpatrick

unread,
Jul 12, 2021, 1:27:58 PM7/12/21
to Rick Byers, Arvind Murching, blink-api-owners-discuss, Aleks Totic, Daniel Bratell, Brandon Heenan, Johnny Stenback
Catching up a little here.

One thing that we do to mitigate issues is to "Finch-On-By-Default". E.g. Chrome took the majority of the compat risk here by launching a release ahead of the other Chromium implementations, by finching TablesNG on (Edge/Brave/etc 91 didn't have TablesNG on while Chrome 91 did). We work closely with our release engineers such that if there is a large compat issue identified in Chrome that is breaking we would have punted a release (e.g. punting would mean that we'd finch on by default for M92 instead, and other Chromium implementations receiving the update in M93 in that case). However the issues raised in our M91 release were relatively minor, and the majority of them have been fixed and merged in the M92 release. I don't believe we had any fixes that we merged into the M91 release.

As for these two issues I'm looking into them at the moment, at least one we should fix, and still looking into the other issue. (Part of the problem here is that tables doesn't have great test coverage for these cases :) ).

These are the only two issues that I'm aware of from TablesNG in the M92 release which is a relatively small number compared to other fixes we've done in the past. They have come in pretty late - only a week away from stable promotion - so will likely target the M93 release unless we receive more evidence. If they had come in a few weeks earlier we would have been able to apply a fix to M92. All Chromium implementations should have had TablesNG enabled by default on their Canary/Dev channels since Jan/Feb this year.

Likely we'll remove the switch and associated code in the M94/95 timeframe - typically we sometimes receive issues post N+2 releases.


Brandon Heenan

unread,
Jul 12, 2021, 2:31:53 PM7/12/21
to Ian Kilpatrick, Rick Byers, Arvind Murching, blink-api-owners-discuss, Aleks Totic, Daniel Bratell, Johnny Stenback
I just assumed that enterprises would be able to turn TablesNG flag on and off, I did not know that Chrome enterprise only manages a subset of flags. If I was aware of this, I'd have advocated to give enterprise an option to turn off TablesNG during the transition period.

To expand on this, enterprises tend not to be able to use flags at all for production environments, and indeed, we actively discourage it. See this recent change where we specifically warn admins not to use flags in production. Flags today don't have any change management guarantees (they can change or be removed at any time without notice), and don't have the testing or documentation that other options have. I see them as a developer tool, not an admin tool.

In contrast, enterprise policies have change management guidelines, documentation, and interact well with common admin tools like Group Policy. If we feel like some enterprise admins will need that extra layer of control, a policy is the way to go. 

I was aware of this particular change and we did give admins a heads up about it in the Chrome enterprise release notes, including instructions on how to turn it on in advance of launch for testing purposes. I didn't recommend a policy at the time because it wasn't expected to have any breaking changes, but that may have been a mistake.

As already expressed in this thread, it's probably more important that we react to new information as it becomes available, since we're usually working off of imperfect information prior to shipping a change.

Rick Byers

unread,
Jul 12, 2021, 2:58:43 PM7/12/21
to Brandon Heenan, Ian Kilpatrick, Arvind Murching, blink-api-owners-discuss, Aleks Totic, Daniel Bratell, Johnny Stenback
Perfect, thank you Ian and Brandon! I'm really glad to hear you, Brandon, were aware of this change and that it was mentioned in the enterprise release notes. Have you observed any fallback from it yourself?

Aleks Totic

unread,
Jul 12, 2021, 3:20:23 PM7/12/21
to Brandon Heenan, Ian Kilpatrick, Rick Byers, Arvind Murching, blink-api-owners-discuss, Daniel Bratell, Johnny Stenback
>>>>>
There wasn't an intent thread for TablesNG, was there? In retrospect, I wonder if there were enough little bug-fixes here that it would have been worthwhile to discuss the compat risk in an intent thread? I think that likely would have led to the suggestion of adding an enterprise policy knob. Thoughts?
<<<<<

Reading this thread over, I do not think there were any systemic failures. Enterprise was aware of the change, and made a judgement call not to add the flag. There was very little data to base this decision on. Even in retrospect, it is unclear whether having an enterprise flag would have been the right thing to do. I am not aware of any large-scale, site is unusable kind of breaks.

The public web adopted to the change without much trouble. This is because TablesNG implemented the standard. There were only a couple of very edge cases that broke if your site worked in FF, Edge. Ex: Chrome started supporting max-width on <COL> elements.

The problem was enterprises relying on Chrome-only non-standard behavior.

Aleks

Brandon Heenan

unread,
Jul 12, 2021, 4:11:19 PM7/12/21
to Aleks Totic, Ian Kilpatrick, Rick Byers, Arvind Murching, blink-api-owners-discuss, Daniel Bratell, Johnny Stenback
We received one report of an enterprise customer reporting a rendering issue on 5-10 devices, but it appeared that they were able to fix it with a new version of the webapp from their vendor (at least that's what it sounded like to me--Ian and Aleks were both on that thread too).

At the time, we did decide this constituted enough risk to flag it for enterprises in the release notes with testing instructions and a link to file a bug in monorail, but not enough to add a policy to rollback to the old behavior. Here's what we said in the enterprise release notes before the feature shipped:

Chrome will use updated table rendering

Chrome is updating the way it renders tables on web pages. This change fixes known issues and brings Chrome closer to the behavior of other browsers, so impact is expected to be minimal. However, you should test important workflows in your environment for unexpected issues. A full explainer is available here.

You can enable the new rendering behavior using chrome://flags/#enable-table-ng in Chrome 90 and above. If you experience any unexpected issues when testing with the flag enabled, please file a chromium bug.

Maybe one take-away here is that the enterprise team should ask for a policy more often, including for any large implementation changes with high risk of long-tail bugs.

Daniel Libby

unread,
Jul 13, 2021, 2:32:17 PM7/13/21
to blink-api-owners-discuss, Brandon Heenan, ikilp...@chromium.org, rby...@chromium.org, Arvind Murching, blink-api-owners-discuss, Daniel Bratell, Johnny Stenback, Aleks Totic
Chiming in late here: I agree that in this particular instance everything (so far!) has worked out reasonably well for such a large change not having a policy (kudos to Aleks and Ian for the quality of implementation and quick responses to issues filed). I don't think retroactively pursuing a policy at this point is necessary.

Orthogonal to this particular instance, it sounds like the current process is developers reaching out to the enterprise team, or the enterprise team keeping an eye out for breaking changes (bheenan@ please correct me if this is wrong). If that is the case, do API_OWNERS think it would be useful to add a sentence to the Compatibility section to ensure developers are thinking about this when evaluating ship-readiness? I do think it would be useful to have a minimal framework for developers' awareness on whether or not introducing a policy would be appropriate for specific changes.


Rick Byers

unread,
Jul 14, 2021, 12:41:17 PM7/14/21
to Daniel Libby, blink-api-owners-discuss, Brandon Heenan, ikilp...@chromium.org, Arvind Murching, Daniel Bratell, Johnny Stenback, Aleks Totic
Thanks Daniel. I definitely think it should be part of what people are considering, but I'm not sure we should be starting down a pattern of "adding a sentence" to the intent template for each compat consideration they should be thinking about. Does anyone have an example of an intent where that would have led to a better outcome?

It's the API owners responsibility to internalize and update our compatibility principles and to call out the ones which seem to need more thought in each intent, and I think the enterprise section there covers all of this well, right? Although we do point to that doc and are happy for developers to try to apply it, I personally felt it was unrealistic to expect every developer to internalize all the guidance in that document and so had to rely on the expertise and judgement of API owners to highlight the areas that any given intent might be deficient in. I'm just worried about the template turning even more into a monster checklist of a hundred things people have to read and think about, 90% of which won't apply to their specific situation.

Thoughts? Is there perhaps an argument for why this one principle should be highlighted in the template while the other 15 aren't? Or maybe there's other ways we can improve the guidance in the principles doc and work to reduce the risk that API owners might fail to highlight it when necessary? In particular I love adding to the "case law" examples in the doc - especially where we have examples to learn from where things went sub-optimally. 

Rick

Reply all
Reply to author
Forward
0 new messages