Intent to Ship: Forbidden and escaped host characters compliant with the URL standard

253 views
Skip to first unread message

Jiacheng Guo

unread,
Mar 10, 2023, 10:07:38 AM3/10/23
to blin...@chromium.org

Contact emails

g...@google.comgotlo...@gmail.comblink-net...@google.com

Explainer

As a part of the URL interop 2023, the forbidden character table of hostnames will be updated as described in the URL spec. The characters in hostnames will be no longer percent escaped since it's not required by the URL spec.

Specification

https://url.spec.whatwg.org/#host-writing

Summary

The writing and parsing rule of the URL host characters are updated to be compliant with the URL standard. The following characters characters will become forbidden in the hostnames as described in https://url.spec.whatwg.org/#forbidden-host-code-point: ' ' (space), '<', '>' and '|'. '[' and ']' are still allowed as a part of IPv6 addresses but will be forbidden in any other hostnames. The following characters will no longer be percent escaped in hostnames: '!', '"', '$', '&', ''' (the ' character itself), '(', ')', '*', ';', '=', '`', '{', '}' and '~'



Blink component

Blink>Network

TAG review



TAG review status

Not applicable

Risks



Interoperability and Compatibility

The URL standard is a well established standard and the effort is a part of the URL interop 2023. We expect the risk to be minimal.



Gecko: Positive The forbidden characters are partially followed in firefox. '*' is considered as an invalid character in hostnames. The characters are not percent escaped in the hostnames.

WebKit: Shipped/Shipping Safari strictly follows the forbidden character list and never percent escape the characters in the hostnames.

Web developers: No signals

Other signals:

WebView application risks

Does this intent deprecate or change behavior of existing APIs, such that it has potentially high risk for Android WebView-based applications?



Debuggability

The forbidden characters will throw TypeErrors where developers can find in the console.



Will this feature be supported on all six Blink platforms (Windows, Mac, Linux, Chrome OS, Android, and Android WebView)?

Yes

Is this feature fully tested by web-platform-tests?

Yes

Flag name



Requires code in //chrome?

False

Tracking bug

https://crbug.com/1398117

Sample links


https://chromium-review.googlesource.com/c/chromium/src/+/4199790

Estimated milestones

No milestones specified



Anticipated spec changes

No spec change


Link to entry on the Chrome Platform Status

https://chromestatus.com/feature/5074885224693760

This intent message was generated by Chrome Platform Status.

Eli Grey

unread,
Mar 10, 2023, 11:20:36 AM3/10/23
to blink-dev, g...@google.com
Will this affect the behavior of the URL.prototype.host accessor at all? I rely on the automatic escaping to detect 'invalid' hosts with this utility: https://gist.github.com/eligrey/6549ad0a635fa07749238911b42923da

Martin Thomson

unread,
Mar 12, 2023, 7:05:05 PM3/12/23
to blink-dev, g...@google.com

Please ask rather than guessing or inferring what the Gecko position might be.  https://github.com/mozilla/standards-positions/blob/main/CONTRIBUTING.md

Yoav Weiss

unread,
Mar 13, 2023, 12:30:09 AM3/13/23
to Jiacheng Guo, blin...@chromium.org
Thanks for working on interop! :)

On Fri, Mar 10, 2023 at 4:07 PM 'Jiacheng Guo' via blink-dev <blin...@chromium.org> wrote:

Contact emails

g...@google.comgotlo...@gmail.comblink-net...@google.com

Explainer

As a part of the URL interop 2023, the forbidden character table of hostnames will be updated as described in the URL spec. The characters in hostnames will be no longer percent escaped since it's not required by the URL spec.

Can you please explain what would be the impact of this change and provide examples of cases that are currently working and would stop working after this change is landed?
Web developers are asking questions on this thread, and it'd be good to have an explainer that answers such questions.
 

Specification

https://url.spec.whatwg.org/#host-writing

Summary

The writing and parsing rule of the URL host characters are updated to be compliant with the URL standard. The following characters characters will become forbidden in the hostnames as described in https://url.spec.whatwg.org/#forbidden-host-code-point: ' ' (space), '<', '>' and '|'. '[' and ']' are still allowed as a part of IPv6 addresses but will be forbidden in any other hostnames. The following characters will no longer be percent escaped in hostnames: '!', '"', '$', '&', ''' (the ' character itself), '(', ')', '*', ';', '=', '`', '{', '}' and '~'



Blink component

Blink>Network

TAG review



TAG review status

Not applicable

Risks



Interoperability and Compatibility

The URL standard is a well established standard and the effort is a part of the URL interop 2023. We expect the risk to be minimal.



Gecko: Positive The forbidden characters are partially followed in firefox. '*' is considered as an invalid character in hostnames. The characters are not percent escaped in the hostnames.

As Martin asked, please don't assume a position. Can you ask for one on Mozilla's positions repo?

Can you also elaborate on the "partially followed" part?
 

WebKit: Shipped/Shipping Safari strictly follows the forbidden character list and never percent escape the characters in the hostnames.

Web developers: No signals

Other signals:

WebView application risks

Does this intent deprecate or change behavior of existing APIs, such that it has potentially high risk for Android WebView-based applications?



Debuggability

The forbidden characters will throw TypeErrors where developers can find in the console.


Do we have use counters for content that would start throwing once this change lands?
 


Will this feature be supported on all six Blink platforms (Windows, Mac, Linux, Chrome OS, Android, and Android WebView)?

Yes

Is this feature fully tested by web-platform-tests?

Yes

Can you provide a link to the tests? 

Flag name



Requires code in //chrome?

False

Tracking bug

https://crbug.com/1398117

Sample links


https://chromium-review.googlesource.com/c/chromium/src/+/4199790

Estimated milestones

No milestones specified



Anticipated spec changes

No spec change


Link to entry on the Chrome Platform Status

https://chromestatus.com/feature/5074885224693760

This intent message was generated by Chrome Platform Status.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAJQw1NyTJOqj0O0HMPQQuYrBgtjjPN3fjH8st1XP15AtsV1fPA%40mail.gmail.com.

Philip Jägenstedt

unread,
Mar 13, 2023, 5:22:11 AM3/13/23
to Eli Grey, blink-dev, g...@google.com
Hi Eli,

Jumping in here to answer your question since it was easy enough to test. `new URL('https://example!.com').host` in Chrome currently returns "example%21.com", but in Safari it's "example!.com". With the proposed change, Chrome will match Sarari.

I've also confirmed that your isValidHost("example!.com") helper is giving different results in Chrome and Safari and is sensitive to this change.

What would the downstream impact be of isValidHost() flipping its return value? Can you achieve the same thing in a way that currently works in both Chrome and Safari?

Best regards,
Philip

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

Philip Jägenstedt

unread,
Mar 13, 2023, 5:46:58 AM3/13/23
to Yoav Weiss, Jiacheng Guo, blin...@chromium.org
To simplify and keep this moving, I've filed https://github.com/mozilla/standards-positions/issues/759 as an umbrella issue for anything URL in Interop 2023.

My view is that we can't improve our risk assessment of this by much with metrics, because we can't distinguish between harmless and serious breakage. Instead what we should do is take some comfort in the fact that the behavior is already shipping in Safari, try to ship it and revert at the first sign of trouble to evaluate. In other words, I'll happily LGTM this, but I'll send this round of feedback first, in case Yoav disagrees with that.

Some additional replies inline:

On Mon, Mar 13, 2023 at 5:30 AM Yoav Weiss <yoav...@chromium.org> wrote:
Thanks for working on interop! :)

Indeed, I'm very grateful and happy to see this work!

Can you please explain what would be the impact of this change and provide examples of cases that are currently working and would stop working after this change is landed?
Web developers are asking questions on this thread, and it'd be good to have an explainer that answers such questions.

I've replied to Eli.

More generally, since this is a change in the nitty gritty details, my concrete advice for web developers would be to test what Safari currently does and assume that's what Chrome is going to start doing. If one doesn't have access to Safari, then https://www.browserstack.com/screenshots can be used for one-off tests, as long as the test result appears on the page.

The other difference to Safari that's easy to test for is when exceptions are thrown. `new URL('https://example|.com')` returns a URL escaped as "https://example%7C.com" in Chrome, but throws TypeError in Safari.

Do we have use counters for content that would start throwing once this change lands?

I'll let Jiacheng answer, but if the answer is no, I'm skeptical that adding use counters will meaningfully help us judge the risk of this. Breaking it down:
  • Many invalid URLs already throw exceptions, which may be caught. Knowing that new exceptions will be thrown in X% of page views will not help know how often those are caught and the web app still behaves correctly.
  • Changes in serialization are akin to changing a return value. We can't instrument the downstream effects of that and determine if the difference led to a different outcome.
Can you provide a link to the tests? 


There's no way to link to exactly the subtests that will be affected, but "Parsing: <http://example example.com> against <http://other.com/>" in url-constructor.any.html is one of them.

Best regards,
Philip

Yoav Weiss

unread,
Mar 13, 2023, 6:05:46 AM3/13/23
to Philip Jägenstedt, Jiacheng Guo, blin...@chromium.org
On Mon, Mar 13, 2023 at 10:46 AM Philip Jägenstedt <foo...@chromium.org> wrote:
To simplify and keep this moving, I've filed https://github.com/mozilla/standards-positions/issues/759 as an umbrella issue for anything URL in Interop 2023.

My view is that we can't improve our risk assessment of this by much with metrics, because we can't distinguish between harmless and serious breakage.

Metrics can give us an upper bound, as well as a pile of examples that one can then manually sample and assess breakage.
 
Instead what we should do is take some comfort in the fact that the behavior is already shipping in Safari, try to ship it and revert at the first sign of trouble to evaluate.

Those are not contradictory. E.g. we could add metrics (+UKM) and a flag, and then be alert for bug reports from Beta, as well as randomly examine sites that touch the relevant usecounters and see if they were broken.
Would that work from your perspective?

Philip Jägenstedt

unread,
Mar 13, 2023, 6:21:59 AM3/13/23
to Yoav Weiss, Jiacheng Guo, blin...@chromium.org
On Mon, Mar 13, 2023 at 11:05 AM Yoav Weiss <yoav...@chromium.org> wrote:


On Mon, Mar 13, 2023 at 10:46 AM Philip Jägenstedt <foo...@chromium.org> wrote:
To simplify and keep this moving, I've filed https://github.com/mozilla/standards-positions/issues/759 as an umbrella issue for anything URL in Interop 2023.

My view is that we can't improve our risk assessment of this by much with metrics, because we can't distinguish between harmless and serious breakage.

Metrics can give us an upper bound, as well as a pile of examples that one can then manually sample and assess breakage.
 
Instead what we should do is take some comfort in the fact that the behavior is already shipping in Safari, try to ship it and revert at the first sign of trouble to evaluate.

Those are not contradictory. E.g. we could add metrics (+UKM) and a flag, and then be alert for bug reports from Beta, as well as randomly examine sites that touch the relevant usecounters and see if they were broken.
Would that work from your perspective?

Is the suggestion to do the same as in https://chromium-review.googlesource.com/c/chromium/src/+/4252309 (for Intent to Ship: Port overflow check in URL setters) to add the use counter but not wait for data before trying to ship this?

That would work for me if Jiacheng thinks it's reasonable in this case.

Yoav Weiss

unread,
Mar 13, 2023, 6:32:01 AM3/13/23
to Philip Jägenstedt, Jiacheng Guo, blin...@chromium.org
That's what I'm suggesting (+ a manual sampling & inspection of URLs we'd get from UKM to actively verify there's no significant breakage coming)  

Jiacheng Guo

unread,
Mar 13, 2023, 7:44:15 AM3/13/23
to Yoav Weiss, Philip Jägenstedt, blin...@chromium.org
For Eli Grey's question:
Yes, the behavior will change with the feature.

I believe it's reasonable to add use. The isValidHost function behavior varies among different browsers. The change will make Chrome act as the URL standard.

I believe it's reasonable to add a use counter for the feature. Since the CL is created by an external developer, would you suggest creating a feature flag for it as well?

Jiacheng Guo

Mike Taylor

unread,
Mar 13, 2023, 10:06:28 AM3/13/23
to Jiacheng Guo, Yoav Weiss, Philip Jägenstedt, blin...@chromium.org

Yes, ideally this change ships behind a flag.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

Yoav Weiss

unread,
Mar 13, 2023, 10:11:57 AM3/13/23
to Mike Taylor, Jiacheng Guo, Philip Jägenstedt, blin...@chromium.org
On Mon, Mar 13, 2023 at 3:06 PM Mike Taylor <mike...@chromium.org> wrote:

Yes, ideally this change ships behind a flag.

On 3/13/23 7:43 AM, 'Jiacheng Guo' via blink-dev wrote:
For Eli Grey's question:
Yes, the behavior will change with the feature.

I believe it's reasonable to add use. The isValidHost function behavior varies among different browsers. The change will make Chrome act as the URL standard.

I believe it's reasonable to add a use counter for the feature. Since the CL is created by an external developer, would you suggest creating a feature flag for it as well?
You'd also need someone working at Google to look at internal UKM data and do manual sampling.

Philip Jägenstedt

unread,
Mar 13, 2023, 11:07:43 AM3/13/23
to Mike Taylor, Jiacheng Guo, Yoav Weiss, blin...@chromium.org
Mike, do you mean in order to put this behind Finch?

On Mon, Mar 13, 2023 at 3:06 PM Mike Taylor <mike...@chromium.org> wrote:

Jiacheng Guo

unread,
Mar 13, 2023, 11:20:38 AM3/13/23
to Philip Jägenstedt, Mike Taylor, Yoav Weiss, blin...@chromium.org
Then we may follow this way:

I will ask the contributor to implement the feature behind a feature flag and fix any test failures. The contributor may add a use counter. Otherwise I will add one.
Then I can manually ship the feature behind the flag and monitor the UKM data.

Does that make sense to you?

Jiacheng Guo

Mike Taylor

unread,
Mar 13, 2023, 11:26:59 AM3/13/23
to Philip Jägenstedt, Yoav Weiss, blin...@chromium.org, Jiacheng Guo

Philip, no - feature flag, in case we need to killswitch it.

Yoav Weiss

unread,
Mar 13, 2023, 11:40:20 AM3/13/23
to Jiacheng Guo, Philip Jägenstedt, Mike Taylor, blin...@chromium.org
On Mon, Mar 13, 2023 at 4:20 PM Jiacheng Guo <g...@google.com> wrote:
Then we may follow this way:

I will ask the contributor to implement the feature behind a feature flag and fix any test failures. The contributor may add a use counter. Otherwise I will add one.
Then I can manually ship the feature behind the flag and monitor the UKM data.

Does that make sense to you?

That WFM! :)

Domenic Denicola

unread,
Mar 13, 2023, 9:26:18 PM3/13/23
to Philip Jägenstedt, Yoav Weiss, Jiacheng Guo, blin...@chromium.org
On Mon, Mar 13, 2023 at 6:46 PM Philip Jägenstedt <foo...@chromium.org> wrote:
To simplify and keep this moving, I've filed https://github.com/mozilla/standards-positions/issues/759 as an umbrella issue for anything URL in Interop 2023.

My view is that we can't improve our risk assessment of this by much with metrics, because we can't distinguish between harmless and serious breakage. Instead what we should do is take some comfort in the fact that the behavior is already shipping in Safari, try to ship it and revert at the first sign of trouble to evaluate. In other words, I'll happily LGTM this, but I'll send this round of feedback first, in case Yoav disagrees with that.

Some additional replies inline:

On Mon, Mar 13, 2023 at 5:30 AM Yoav Weiss <yoav...@chromium.org> wrote:
Thanks for working on interop! :)

Indeed, I'm very grateful and happy to see this work!

Can you please explain what would be the impact of this change and provide examples of cases that are currently working and would stop working after this change is landed?
Web developers are asking questions on this thread, and it'd be good to have an explainer that answers such questions.

I've replied to Eli.

More generally, since this is a change in the nitty gritty details, my concrete advice for web developers would be to test what Safari currently does and assume that's what Chrome is going to start doing. If one doesn't have access to Safari, then https://www.browserstack.com/screenshots can be used for one-off tests, as long as the test result appears on the page.

The other difference to Safari that's easy to test for is when exceptions are thrown. `new URL('https://example|.com')` returns a URL escaped as "https://example%7C.com" in Chrome, but throws TypeError in Safari.

Developers can also use the whatwg-url Node.js package, including the live URL viewer. It is kept 1:1 with the URL Standard and so reflects the behavior that all browsers will be aiming toward as part of Interop 2023 (and that Safari is already compliant with). Examples:
 

Do we have use counters for content that would start throwing once this change lands?

I'll let Jiacheng answer, but if the answer is no, I'm skeptical that adding use counters will meaningfully help us judge the risk of this. Breaking it down:
  • Many invalid URLs already throw exceptions, which may be caught. Knowing that new exceptions will be thrown in X% of page views will not help know how often those are caught and the web app still behaves correctly.
  • Changes in serialization are akin to changing a return value. We can't instrument the downstream effects of that and determine if the difference led to a different outcome.
Can you provide a link to the tests? 


There's no way to link to exactly the subtests that will be affected, but "Parsing: <http://example example.com> against <http://other.com/>" in url-constructor.any.html is one of them.

Best regards,
Philip

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

Eli Grey

unread,
Mar 14, 2023, 2:05:39 AM3/14/23
to blink-dev, dom...@chromium.org, yoav...@chromium.org, g...@google.com, blin...@chromium.org, Philip Jägenstedt
I've updated my isValidHost() util to support this change. Could someone please have another look and let me know if my implementation now aligns well with the spec?

Jiacheng Guo

unread,
Mar 14, 2023, 2:27:33 AM3/14/23
to Eli Grey, blink-dev, dom...@chromium.org, yoav...@chromium.org, Philip Jägenstedt
Hi Eli,

The implementation is not fully in line with the spec.
Though hosts are not percent encoded by the browser. The browser will percent decode host strings. Thus 'test%22.com' is a valid host string.
If all the browsers are spec compliant when parsing the hosts. I believe simply setting url.host and checking for errors will work. (This is not the case now)

Jiacheng Guo

Philip Jägenstedt

unread,
Mar 14, 2023, 6:04:57 AM3/14/23
to Jiacheng Guo, Eli Grey, blink-dev, dom...@chromium.org, yoav...@chromium.org
Hi Eli,

Adding to what Jiacheng said, I've tested isValidHost('example!.com') with your new code, and it doesn't give the same result in Chrome and Safari.

However, the url.host setter doesn't throw for invalid hosts, instead it does nothing. You could start your helper like this:

  let url;
  try {
    url = new globalThis.URL('https://' + host);
  } catch (e) {
    return false;
  }


But isValidHost('example!.com') is still not going to get the same result in Chrome and Safari, because new URL('https://example!.com') doesn't throw in either, but Chrome currently will escape the host as 'example%21.com' while Safari will leave it as 'example!.com'.

If you want something that is guaranteed to work exactly the same in all browsers before and after these changes, I think your best bet is to avoid the URL constructor/API entirely,

Philip Jägenstedt

unread,
Mar 14, 2023, 6:09:09 AM3/14/23
to Jiacheng Guo, Eli Grey, blink-dev, dom...@chromium.org, yoav...@chromium.org
LGTM1 to ship this change with a feature flag which we can use as a kill switch. Adding use counters so that we can get examples of breakage if it happens would be great too, if it's not too much overhead.

Yoav Weiss

unread,
Mar 14, 2023, 9:22:25 AM3/14/23
to Philip Jägenstedt, Jiacheng Guo, Eli Grey, blink-dev, dom...@chromium.org
LGTM2. Please make sure the use counters are exposed to UKM.

Chris Harrelson

unread,
Mar 14, 2023, 10:04:51 AM3/14/23
to Yoav Weiss, Philip Jägenstedt, Jiacheng Guo, Eli Grey, blink-dev, dom...@chromium.org

Hayato Ito

unread,
Aug 29, 2023, 3:02:21 AM8/29/23
to Chris Harrelson, Yoav Weiss, Philip Jägenstedt, Jiacheng Guo, Eli Grey, blink-dev, dom...@chromium.org
Hi, there

I'm now taking over ownership of this I2S from jiacheng@.

Regarding the metrics, I found that we did a previous measurement here:
https://bugs.chromium.org/p/chromium/issues/detail?id=1065667#c30

It appears that the rough conclusion from 3 years ago was that most characters, excluding space and asterisk, are probably fine to change to match the spec.

Given this, my plan is as follows

1. Fix and ship, except for space and asterisk, but with a kill switch.
2. Re-visit on space and asterisk char issues later.

I'll update the chrome status to reflect that.

According to https://bugs.chromium.org/p/chromium/issues/detail?id=1248196#c4,
it seems the previous measurement had a noticeable performance impact.



--
Hayato

Chris Harrelson

unread,
Aug 30, 2023, 11:14:02 AM8/30/23
to Hayato Ito, Yoav Weiss, Philip Jägenstedt, Jiacheng Guo, Eli Grey, blink-dev, dom...@chromium.org

Sounds good, consider this still approved.
Reply all
Reply to author
Forward
0 new messages