Intent to Ship: percent-encode U+0020 SPACE when in URLs computed by custom protocol handlers

126 views
Skip to first unread message

Frédéric Wang

unread,
Jul 29, 2020, 12:27:17 PM7/29/20
to blink-dev
Contact emails
fw...@chromium.org

Explainer

Spec
https://html.spec.whatwg.org/multipage/system-state.html#custom-handlers

TAG review
Not needed since it's an existing spec discussed at WHATWG.

Summary
If you register protocol handler with navigator.registerProtocolHandler
and follow a link for that protocol containing a space, then the space
will be escaped as "%20" instead of "+".

This makes it consistent with the other browser implementing it (Firefox).

Link to “Intent to Prototype” blink-dev discussion
None

Risks
Interoperability and Compatibility
* Interop: Low. Only one other browser implements custom protocol
handler (Firefox) and percent-encode spaces as %20. This is the behavior
in the current spec and the proposal
https://github.com/whatwg/html/issues/3377 does not change it.

* Compat: Low. This is observable by checking the URL string of the
redirection page. Current Chrome encodes space as "+" but the string
with a "%20" instead will load example the same page.

Gecko: Positive
(https://github.com/whatwg/html/issues/3377#issuecomment-625102116)

Shipped.

WebKit: N/A

Does not apply since registerProtocolHandler is not implemented and
there is no such plan in WebKit.

Web developers: Positive (https://github.com/whatwg/html/issues/3377)
Reported by web developers to the WHATWG tracker.

Will this feature be supported on all six Blink platforms (Windows, Mac,
Linux,
Chrome OS, Android, and Android WebView)?
No
All platforms except those that don't support custom protocol handlers:

https://bugs.chromium.org/p/chromium/issues/detail?id=178097
https://bugs.chromium.org/p/chromium/issues/detail?id=589502
Is this feature fully tested by web-platform-tests?
Yes
Manual WPT tests are being added as
https://github.com/web-platform-tests/wpt/pull/23504 which unfortunately
we cannot convert as automated WPT tests with existing testing
infrastructure. This has to be verified using browser tests instead.

Tracking bug
https://bugs.chromium.org/p/chromium/issues/detail?id=1110842

Link to entry on the Chrome Platform Status
https://chromestatus.com/feature/5678518908223488

This intent message was generated by Chrome Platform Status.

--
Frédéric Wang

Mike West

unread,
Jul 30, 2020, 2:32:54 PM7/30/20
to blink-dev, fw...@igalia.com
LGTM1. This seems like another good, small change to align our implementation with the spec and Gecko. I'm happy to see it land.

-mike

Chris Harrelson

unread,
Jul 30, 2020, 2:46:24 PM7/30/20
to Mike West, blink-dev, fw...@igalia.com
What if sites currently expect + and fail to parse %20? How often does that happen? Do we have evidence that they will not break?
 

This does not count as a positive signal from Gecko. You should instead say "Shipped" (because Gecko ships this behavior), and link to the WHATWG issue as additional data points indicating engagement on this spec issue specifically.
 

Shipped.

WebKit: N/A  

Does not apply since registerProtocolHandler is not implemented and
there is no such plan in WebKit.

They may implement in the future, so WebKit's opinion is applicable. You could instead say No Signals if you felt this did not rise to the level of asking on webkit-dev. Did they say explicitly that they don't want to implement registerProtocolHandler? If not, I think it could be useful to ask this question on webkit-dev.
 

Web developers: Positive (https://github.com/whatwg/html/issues/3377)
Reported by web developers to the WHATWG tracker.

Will this feature be supported on all six Blink platforms (Windows, Mac,
Linux,
Chrome OS, Android, and Android WebView)?
No
All platforms except those that don't support custom protocol handlers:

https://bugs.chromium.org/p/chromium/issues/detail?id=178097
https://bugs.chromium.org/p/chromium/issues/detail?id=589502
Is this feature fully tested by web-platform-tests?
Yes
Manual WPT tests are being added as
https://github.com/web-platform-tests/wpt/pull/23504 which unfortunately
we cannot convert as automated WPT tests with existing testing
infrastructure. This has to be verified using browser tests instead.

Tracking bug
https://bugs.chromium.org/p/chromium/issues/detail?id=1110842

Link to entry on the Chrome Platform Status
https://chromestatus.com/feature/5678518908223488

This intent message was generated by Chrome Platform Status.

--
Frédéric Wang

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/c7f8cab4-0912-4faf-9205-0522bc528aa1n%40chromium.org.

Alex Russell

unread,
Jul 30, 2020, 3:32:45 PM7/30/20
to blink-dev, fw...@igalia.com
On Wednesday, July 29, 2020 at 9:27:17 AM UTC-7 fw...@igalia.com wrote:
Contact emails
fw...@chromium.org

Explainer

Spec
https://html.spec.whatwg.org/multipage/system-state.html#custom-handlers

TAG review
Not needed since it's an existing spec discussed at WHATWG.

Without commenting one way or the other on the content of the intent, this is not a valid reason to avoid seeking TAG review.

Daniel Bratell

unread,
Jul 30, 2020, 3:45:05 PM7/30/20
to Chris Harrelson, Mike West, blink-dev, fw...@igalia.com

fwang, this looks like a minor and simple change, but could you please give me some background to why we're doing it? Just to get me into the loop.

/Daniel

Frédéric Wang

unread,
Jul 30, 2020, 4:29:51 PM7/30/20
to Alex Russell, blink-dev
On 30/07/2020 21:32, Alex Russell wrote:

TAG review
Not needed since it's an existing spec discussed at WHATWG.

Without commenting one way or the other on the content of the intent, this is not a valid reason to avoid seeking TAG review.
 

Right, sorry it's not well formulated. I meant it's a very tiny change, it's already the behavior defined by the HTML5 specification and the ongoing interop changes discussed at the WHATWG are leaning to a consensus of keeping that behavior: https://github.com/whatwg/html/issues/3377 ; so it does not seem extra TAG review is needed here.

-- 
Frédéric Wang

Frédéric Wang

unread,
Jul 30, 2020, 4:40:28 PM7/30/20
to blin...@chromium.org
Hi,

Sure, a user reported 3 years ago that Chrome was not following the specification for percent-encoding and proposed a patch: https://bugs.chromium.org/p/chromium/issues/detail?id=795919 ; however, we didn't really think this was valid and discussion continued at the WHATWG: https://github.com/whatwg/html/issues/3377

Recently, Anne van Kesteren reactivated this issue to ensure interoperable handling between Chrome and Firefox, improving the HTML5 specification and adding manual tests. The conclusion of his analysis is here:


Interest from at least two implementers are needed before going ahead with the HTML5 changes, so several chromium developers were privately asked to comment on this and indicate whether there is support and people who can commit to do required changes. Since I'm currently working on registerProtocolHandler improvements, I was put in the loop and submitted patches for Chrome.

Frédéric Wang

unread,
Jul 30, 2020, 5:27:00 PM7/30/20
to blin...@chromium.org
On 30/07/2020 20:46, Chris Harrelson wrote:

* Compat: Low. This is observable by checking the URL string of the
redirection page. Current Chrome encodes space as "+" but the string
with a "%20" instead will load example the same page.

What if sites currently expect + and fail to parse %20? How often does that happen? Do we have evidence that they will not break?

First, I believe in general using "+" for spaces in general is not a good idea for URLs and so it's possible that website using registerProtocolHandler that work in Firefox currently don't work in Chrome.

+ is however the default for application/x-www-form-urlencoded forms:

https://people.igalia.com/fwang/form_url_with_space.html

https://people.igalia.com/fwang/form_url_with_space.html?variable=a+b

https://people.igalia.com/fwang/form_url_with_space.html?variable=a%20b

If someone parses the query string with URLSearchParams, there should not be any problem, this will be interpreted as a space in both case.

If someone uses a regexp or similar to parse location.search, there could be some breakage depending on how it is implemented.

If that's a concern, maybe we can add a counter to measure how much pages with ProtocolHandler::TranslateUrl uses space in order to get an upper bound. I'm not sure we can easily measure how many pages actually assume the translated url use a + for spaces.

 

This does not count as a positive signal from Gecko. You should instead say "Shipped" (because Gecko ships this behavior), and link to the WHATWG issue as additional data points indicating engagement on this spec issue specifically.

Correct.

 

Shipped.

WebKit: N/A  

Does not apply since registerProtocolHandler is not implemented and
there is no such plan in WebKit.

They may implement in the future, so WebKit's opinion is applicable. You could instead say No Signals if you felt this did not rise to the level of asking on webkit-dev. Did they say explicitly that they don't want to implement registerProtocolHandler? If not, I think it could be useful to ask this question on webkit-dev.
I think there were concerns in the 2015 webkit-dev discussions. Actually, checking again what they said when I raised this topic in April, it seems this was a bit more positive:

https://lists.webkit.org/pipermail/webkit-dev/2020-April/031179.html

-- 
Frédéric Wang

Yoav Weiss

unread,
Aug 3, 2020, 1:28:14 AM8/3/20
to Frédéric Wang, blink-dev
Getting an upper bound for potential breakage from either use-counters or HTTPArchive sounds like a good idea.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

Frédéric Wang

unread,
Aug 3, 2020, 8:57:18 AM8/3/20
to Yoav Weiss, blink-dev
On 03/08/2020 07:27, Yoav Weiss wrote:
> Getting an upper bound for potential breakage from either use-counters
> or HTTPArchive sounds like a good idea.
Hi Yoav,

So regarding
https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/osabCTBhDSs
one way to check URLs that could be affected would be to do a
HTTPArchive search with a regexp like (in JavaScript):
/["'](bitcoin|geo|im|irc|ircs|magnet|mailto|mms|news|nntp|openpgp4fpr|sip|sms|smsto|ssh|tel|urn|webcal|wtai|xmpp|web\+)[^"']+
[^"']*["']/

I don't know I can do that myself with
https://console.cloud.google.com/bigquery though, is it even possible
for non-Googlers? If not, can someone please help?

Relying on a use counter will probably take more time to get an answer.
The place where the URL translation happens
(https://chromium-review.googlesource.com/c/chromium/src/+/2324126/3/chrome/common/custom_handlers/protocol_handler.cc)
is outside blink / the web process, so this is something new to me and I
will need to figure out where/how to properly add the counter here.
However, that will probably be more reliable than a HTTPArchive search.

--
Frédéric Wang

Yoav Weiss

unread,
Aug 3, 2020, 9:19:04 AM8/3/20
to Frédéric Wang, blink-dev
On Mon, Aug 3, 2020 at 2:57 PM Frédéric Wang <fw...@igalia.com> wrote:
On 03/08/2020 07:27, Yoav Weiss wrote:
> Getting an upper bound for potential breakage from either use-counters
> or HTTPArchive sounds like a good idea.
Hi Yoav,

So regarding
https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/osabCTBhDSs
one way to check URLs that could be affected would be to do a
HTTPArchive search with a regexp like (in JavaScript):
/["'](bitcoin|geo|im|irc|ircs|magnet|mailto|mms|news|nntp|openpgp4fpr|sip|sms|smsto|ssh|tel|urn|webcal|wtai|xmpp|web\+)[^"']+
[^"']*["']/

I don't know I can do that myself with
https://console.cloud.google.com/bigquery though, is it even possible
for non-Googlers? If not, can someone please help?

I can help with HA. Reaching out offline.
 

Relying on a use counter will probably take more time to get an answer.
The place where the URL translation happens
(https://chromium-review.googlesource.com/c/chromium/src/+/2324126/3/chrome/common/custom_handlers/protocol_handler.cc)
is outside blink / the web process, so this is something new to me and I
will need to figure out where/how to properly add the counter here.
However, that will probably be more reliable than a HTTPArchive search.

--
Frédéric Wang

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

Frédéric Wang

unread,
Aug 5, 2020, 10:59:16 AM8/5/20
to blin...@chromium.org
Hi,

I've been working with Yoav on evaluating the risk. As a reminder, a
backward compatibility issue can happen if all the steps are followed:

1. A page registers something like
navigator.registerProtocolHandler("scheme", "myhandler.html?url=%s") for
any scheme allowed by HTML5.

2. A page on the same-origin redirects to an URL with the registered
scheme, containing space e.g. "scheme:something with space"

3. The registered handler receives that data and is not able to properly
deal with the spaces (e.g. does not use URLSearchParams to handle both
"+" and "%20").

We have executed queries on
httparchive.response_bodies.2020_07_01_mobile which contains about 5M
pages in order to try to evaluate the first steps 1+2. This would be
better evaluated with a use counter, but from the result below, it does
not seem necessary.

httparchive.scratchspace.custom_protocol_use_with_page are URLs
containing strings with a custom scheme prefix and spaces inside. There
are quite a significant number of results, ~503000 URLs from ~406000
pages (8%). A lot are mailto, tel or sms which indeed can contain spaces.

httparchive.scratchspace.custom_protocol_registrations_including_page
are URLs with a call to registerProtocolHandler. This is quite small,
479 resources from 442 pages (< 0.0001%). Again, all but 62 resources
are actually mailto.

Intersecting the two, we get only 23 different pages (< 0.00001%) which
all looks webmail clients and indeed only use schemes like "mailto",
"xmpp" or a generic "urn". The space is actually due to passing
javascript variables built with + concatenation.

The last point suggests that people could indeed build URLs containing
spaces in JavaScript or outside the source, and so probably this was not
counted in scratchspace.custom_protocol_use_with_page nor in the
intersection. However, the result of
custom_protocol_registrations_including_page seems enough itself to show
that it is safe to go ahead with this change.

--
Frédéric Wang

Yoav Weiss

unread,
Aug 6, 2020, 11:36:43 AM8/6/20
to Frédéric Wang, blink-dev
Thanks for the detailed analysis.

LGTM1

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

Chris Harrelson

unread,
Aug 6, 2020, 11:46:50 AM8/6/20
to Yoav Weiss, Frédéric Wang, blink-dev

Daniel Bratell

unread,
Aug 6, 2020, 2:19:40 PM8/6/20
to Chris Harrelson, Yoav Weiss, Frédéric Wang, blink-dev

LGTM3 (which I think makes four LGTM in total)

/Daniel

Reply all
Reply to author
Forward
0 new messages