Intent to Implement and Ship: Apply Unicode flag to `pattern' attribute of INPUT element

105 views
Skip to first unread message

TAMURA, Kent

unread,
May 23, 2016, 11:11:44 AM5/23/16
to blink-dev
tk...@chromium.org
https://html.spec.whatwg.org/multipage/forms.html#the-pattern-attribute > ... when compiled as a JavaScript regular expression with only the "u" flag specified, ...

Apply the unicode flag to `pattern' attribute value. Syntax checking will be stricter, and '.' matches to a surrogate pair.

Before this change, '.' in pattern attribute value matches to single UTF-16 code unit.  That is to say, we need to specify '..' to match an Emoji character, which is represented by two UTF-16 code units.  Emoji characters are getting popular these days, and the '..' hack should be resolved.
With the unicode flag, '.' will be matched to single Unicode code point; any single character including a BMP character and a surrogate pari like an Emoji.

Firefox: Shipped
Edge: No public signals
Safari: No public signals
Web developers: No signals

Firefox already shipped this in a stable release, and it seems this was not reverted.
AFAIK, WebKit and Edge don't implement this yet.


- Because of the '.' behavior change, a form would accept or reject different set of strings.
It won't affect BMP characters.
- Syntax checking for regular expressions will be stricter. e.g. Unnecessary escaping like\@ will be an error.

We have a counter for all expected behavior changes.
https://www.chromestatus.com/metrics/feature/timeline/popularity/1264
At most 0.0008% of page views will be affected.


N/A
Yes
https://www.chromestatus.com/features/4753420745441280 Yes

--
TAMURA Kent
Software Engineer, Google


Domenic Denicola

unread,
May 23, 2016, 11:16:07 AM5/23/16
to TAMURA, Kent, blink-dev
From: tk...@google.com [mailto:tk...@google.com] On Behalf Of TAMURA, Kent

> Compatibility risk
> - Syntax checking for regular expressions will be stricter. e.g. Unnecessary escaping like\@ will be an error.

To be a bit clearer about this point: "an error" in this case means that the regexp pattern will be considered invalid, and no checking will be applied to the value before it is sent to the server. Thus, it is the same as if the author had not specified a pattern="" attribute, or if the user had turned off JavaScript, or used devtools to bypass the pattern.

So this is a pretty minor risk IMO. Firefox helped mitigate it by adding a console warning for invalid patterns, which is probably a good idea anyway (e.g. for invalid patterns like "[a-z" missing the closing bracket). But maybe that is not so necessary for 0.0008% of page views.

PhistucK

unread,
May 23, 2016, 12:03:17 PM5/23/16
to Domenic Denicola, TAMURA, Kent, blink-dev

On Mon, May 23, 2016 at 6:15 PM, Domenic Denicola <d...@domenic.me> wrote:
or if the user had turned off JavaScript

​No, turning off JavaScript does not turn off HTML5 validation. Turn off JavaScript using the Developer Tools while running this, enter 8 or something in the input field and press Enter -
data:text/html,<!doctype html><form><input name="d" pattern="090" required><input type=submit></form>
This is actually one of the ​benefits of HTML5 validation - it does not need JavaScript.

> adding a console warning for invalid patterns, which is probably a good idea anyway
I agree.
I also wish HTML errors (unmatched tags and more) and CSS errors would show console errors as well, like they once used to (at least HTML errors used to). Off topic, though.



PhistucK

Rick Byers

unread,
May 25, 2016, 11:13:14 PM5/25/16
to PhistucK, Domenic Denicola, TAMURA, Kent, blink-dev
LGTM1

Chris Harrelson

unread,
May 26, 2016, 1:16:27 PM5/26/16
to Rick Byers, PhistucK, Domenic Denicola, TAMURA, Kent, blink-dev
LGTM2

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

Philip Jägenstedt

unread,
Jun 1, 2016, 9:22:04 AM6/1/16
to Chris Harrelson, Rick Byers, PhistucK, Domenic Denicola, TAMURA, Kent, blink-dev
LGTM3
Reply all
Reply to author
Forward
0 new messages