Intent to Remove: Non-ASCII Case-insensitive equivalents for ASCII keywords

281 views
Skip to first unread message

Frédéric Wang

unread,
Dec 16, 2019, 4:09:22 AM12/16/19
to blink-dev
Primary eng (and PM) emails

fw...@igalia.com


Summary


Remove support for writing ASCII keywords using non-ASCII Case-insensitive equivalents, such as:


<link rel="ſtylesheet"> (LATIN SMALL LETTER LONG S)

window.getSelection().modify('extend', "bacKward", "character") (KELVIN SIGN)


Most web platform specifications rely on ASCII case-insensitiveness, that is two strings are equivalent if they are equal after lowercasing all the characters in the range A-Z [1]. Unicode extends this "A-Z downcasing" in different ways [2] so that e.g. "¡FRÉDÉRIC NO ES ESPAÑOL!" would be case-insensitively equal to "¡Frédéric no es Español!". This is used by some web platform specifications where it matters e.g. CSS font-families [3].


Since most web platform specifications use ASCII keywords (or even alphanumeric with a few special characters like dashes) it seems fine to restrict to ASCII case-insensitiveness. However, if you look carefully at [3], you'll find some mappings from non-ASCII to ASCII [4]. For simple mappings these are KELVIN SIGN (mapped to "k") and "LATIN SMALL LETTER LONG S" (mapped to "s").


[4] is the general chromium bug for moving to ASCII case-insensitiveness. From a quick look at DeprecatedEqualIgnoringCase [5], I see three different versions. IIUC, all these use "non-Turkic case folding". The first one use "full case folding" (which include more mapping to ASCII e.g. "LATIN SMALL LIGATURE ST" to "st") but the others just use "single case folding":

- Comparison between two strings with 16-bits chars (UTF-16)

- Comparison between two strings with 8-bits chars (Latin-1 block)

- Comparison between 8-bits and 16-bits versions.


This intent is about moving to ASCII case-insensitiveness for pre-defined ASCII keywords. In the Blink code, they are typically hardcoded 8-bits strings and use the "single case folding" so keywords containing "K" and "S" are the only ones affected. I already landed a patch for <link rel="stylesheet"> [7] so I felt I should bring this up to blink-dev.


[1] https://infra.spec.whatwg.org/#ascii-case-insensitive

[2] ftp://ftp.unicode.org/Public/UNIDATA/CaseFolding.txt

[3] https://drafts.csswg.org/css-fonts/#localized-name-matching

[4] https://github.com/w3c/csswg-drafts/issues/4599#issuecomment-565794132

[5] https://bugs.chromium.org/p/chromium/issues/detail?id=627682

[6] https://source.chromium.org/search?q=deprecatedEqualIgnoringCase%20filepath:third_party%2Fblink%2F

[7] https://chromium-review.googlesource.com/c/chromium/src/+/1963850


Motivation

- Follow web platform specifications (which use ASCII case insensitiveness).

- Align with what WebKit and Gecko implement.

- Make Chromium usage consistent (some features like CSS colors already use ASCII case insensitiveness).


Interoperability and Compatibility Risk

The interoperability risk seems low, HTML5 use ASCII case-insensitiveness and that's already the case for most CSS specifications too ( https://github.com/w3c/csswg-drafts/issues/4599#issuecomment-565816911 ). I haven’t checked status and position of other browsers but at least <link rel="ſtylesheet"> is not supported in WebKit or Gecko. 


The compatibility risk seems low too as these are really edge cases. Regressions would happen only if someone uses the Kelvin sign or the long S to write “k” and “s” to write ASCII keywords which seems really unlikely.


Alternative implementation suggestion for web developers

Alternative is to use ASCII case-insensitive equivalents, which are actually easier to write.


Usage information from UseCounter

No


Entry on the feature dashboard

Not needed, it’s a very tiny change.

-- 
Frédéric Wang

Yoav Weiss

unread,
Dec 16, 2019, 4:20:58 AM12/16/19
to Frédéric Wang, blink-dev
LGTM1 assuming you're also adding WPTs as part of the landing patches

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/d3f909b3-dc61-dcac-9488-ab490d444c4c%40igalia.com.

Mike West

unread,
Dec 16, 2019, 7:22:40 AM12/16/19
to Yoav Weiss, Frédéric Wang, blink-dev
LGTM2.

I see ~400 calls to `WTF::DeprecatedEqualIgnoringCase()`, which is kind of a lot. Any chance it would be safe to simply `s/DeprecatedEqualIgnoringCase/EqualIgnoringASCIICase/g`?

-mike


Frédéric Wang

unread,
Dec 16, 2019, 7:31:43 AM12/16/19
to Mike West, Yoav Weiss, blink-dev
I think most of them are of the form DeprecatedEqualIgnoringCase(str, "AHarcodedString") or DeprecatedEqualIgnoringCase(str, AConstantString) so it should be safe to just replace them and moreover there won't be any behavior changes as long as the string compared against does not contain a S or a K letter. This is basically what I'm proposing here.

Other cases would probably need a more careful analysis. Definitely we don't want to do that for CSS font families and there are other cases that probably need to be clarified by spec editors ( for CSS: https://github.com/w3c/csswg-drafts/issues/4599#issuecomment-565816911 ).
-- 
Frédéric Wang

Daniel Bratell

unread,
Dec 16, 2019, 8:36:38 AM12/16/19
to Frédéric Wang, Mike West, Yoav Weiss, blink-dev

LGTM3 - good catch

Since WebKit and Blink differs, I guess they have already done the change successfully,  or this was introduced by accident.

/Daniel

Frédéric Wang

unread,
Dec 16, 2019, 8:44:00 AM12/16/19
to blin...@chromium.org
https://bugs.chromium.org/p/chromium/issues/detail?id=627682 mentions "WebKit has already switched almost all callers over to ASCII." although it does not provide any link and I was not able to find anything on webkit-dev. I would need to check the code & history to be sure.

Frédéric Wang

unread,
Jan 8, 2020, 7:14:25 AM1/8/20
to blin...@chromium.org
On 16/12/2019 10:20, Yoav Weiss wrote:
LGTM1 assuming you're also adding WPTs as part of the landing patches

@Yoav Just to clarify, I understand you mean adding WPT tests for actual behavior changes? As Mike pointed out, there are a lot of calls and while a search & replace is easy, writing tests will be a bit more work. However, as I mentioned most of the keywords don't contain any S or K so the change is not visible at all in these cases. Other things that can be affected like DOMSelection.modify (https://developer.mozilla.org/en-US/docs/Web/API/Selection/modify) don't seem to be part of any web standard, so I'm not sure we should write WPT tests for them (although it's probably good to write internal ones).

-- 
Frédéric Wang

Anne van Kesteren

unread,
Jan 8, 2020, 7:25:38 AM1/8/20
to Frédéric Wang, blink-dev
On Wed, Jan 8, 2020 at 1:14 PM Frédéric Wang <fw...@igalia.com> wrote:
> Other things that can be affected like DOMSelection.modify (https://developer.mozilla.org/en-US/docs/Web/API/Selection/modify) don't seem to be part of any web standard, so I'm not sure we should write WPT tests for them (although it's probably good to write internal ones).

I'd recommend writing .tentative. WPT tests if you decide to write
tests. It definitely needs to be standardized as it's implemented in
all browsers. https://github.com/w3c/selection-api/issues/37 tracks
that.

Yoav Weiss

unread,
Jan 8, 2020, 8:21:59 AM1/8/20
to Frédéric Wang, blink-dev
On Wed, Jan 8, 2020 at 1:14 PM Frédéric Wang <fw...@igalia.com> wrote:
On 16/12/2019 10:20, Yoav Weiss wrote:
LGTM1 assuming you're also adding WPTs as part of the landing patches

@Yoav Just to clarify, I understand you mean adding WPT tests for actual behavior changes? As Mike pointed out, there are a lot of calls and while a search & replace is easy, writing tests will be a bit more work. However, as I mentioned most of the keywords don't contain any S or K so the change is not visible at all in these cases.

Obviously, there's no need to add tests to cases where the change is not web visible.
For cases where it is, it would be good to have tests to make sure this area is consistent across implementations over time. 

I went over code search, and indeed found quite a few examples:
HTML tags with 's'/'k' - e.g. `<link>`, `<aside>` as well as others
Attribute values:
`<link rel=stylesheet>` (which you mentioned)
`<link media=screen>`
`crossorigin="use-credentials"`
`<input type=week>`
`<textarea wrap=physical>`
`<frame scrolling=yes>`
`contenteditable=false`
`<link type="text/css">`
`<embed hidden=yes>`
`<form method=post>`
`draggable=false`
<frameset frameBorder=yes>
<style type="text/css">
<button type=reset>
HTTP headers: expires, last-modified, transfer-encoding, keep-alive, x-frame-options, x-xss-protection
"text/css" mime types

While it's admittedly a rather long list, for everything that's a tag or attribute value, maybe it can be enough to have a test page that includes those elements/attributes with the non-ASCII equivalent and checks their values were not parsed?

The tests for HTTP headers and mime-types may indeed be a bit more tricky, and require more understanding of what that code does.



 

Other things that can be affected like DOMSelection.modify (https://developer.mozilla.org/en-US/docs/Web/API/Selection/modify) don't seem to be part of any web standard, so I'm not sure we should write WPT tests for them (although it's probably good to write internal ones).


As Anne said, tentative WPTs seem like the way to go here.
 

-- 
Frédéric Wang

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

Rick Byers

unread,
Jan 8, 2020, 1:15:44 PM1/8/20
to Yoav Weiss, Frédéric Wang, blink-dev
It seems reasonable to me to treat this like a bugfix than a deprecation. I.e. assume sites won't be broken by depending on an edge case, but be on the lookout for any reports and react aggressively (and reconsider strategy if necessary) rather than work to proactively quantify the compat risk.

So I'd be supportive of taking some risk and trying out a fairly mechanical replacement of DeprecatedEqualIgnoringCase (otherwise these sorts of things tend to just linger forever out of fear of fixing them). But if we hit a couple real-world cases of breakage then we may need to scale back to a subset and/or add some metrics to be more scientific about it.

Rick

Joe Medley

unread,
Jan 9, 2020, 11:34:18 AM1/9/20
to Rick Byers, Yoav Weiss, Frédéric Wang, blink-dev

Frédéric Wang

unread,
Jan 9, 2020, 5:55:53 PM1/9/20
to blin...@chromium.org
On 09/01/2020 17:33, 'Joe Medley' via blink-dev wrote:
"Entry on the feature dashboard

Not needed, it’s a very tiny change."

Actually we let developers know about removals.

https://www.chromestatus.com/features#browsers.chrome.status%3A%22Deprecated%22

https://www.chromestatus.com/features#browsers.chrome.status%3A%22Removed%22

https://developers.google.com/web/updates/tags/deprecations

Joe Medley | Technical Writer, Chrome DevRel | jme...@google.com | 816-678-7195
If an API's not documented it doesn't exist.

@Joe: I followed documentation here https://docs.google.com/document/d/1Z7bbuD5ZMzvvLcUs9kAgYzd65ld80d5-p4Cl2a04bV0/edit#

"The feature dashboard is used to keep track of web-facing changes in Blink (and V8) that matter to developers. Make sure your change has an entry if you think it merits outreach to developers (e.g inclusion in the Chromium Blog Beta posts). If there’s no entry, please explain why you think this change doesn’t need one (e.g. “small change”, “fits under an existing entry”). You may be asked to create one."

As I explained, in practice I believed nobody writes ASCII keywords with non-ASCII characters + it's not supported by other browsers, so it does not seem important to inform users about this change.

However, I'm happy to create an entry if people feel it's needed.

-- 
Frédéric Wang

Joe Medley

unread,
Jan 10, 2020, 10:41:37 AM1/10/20
to Frédéric Wang, blink-dev
At least send me the tracking bug so I'll know when to list it.

Joe Medley | Technical Writer, Chrome DevRel | jme...@google.com | 816-678-7195
If an API's not documented it doesn't exist.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

Frédéric Wang

unread,
Jan 10, 2020, 3:14:34 PM1/10/20
to blin...@chromium.org
On 10/01/2020 16:41, 'Joe Medley' via blink-dev wrote:
At least send me the tracking bug so I'll know when to list it.
Joe Medley | Technical Writer, Chrome DevRel | jme...@google.com | 816-678-7195
If an API's not documented it doesn't exist.

There is a more general tracking bug that I mentioned in my initial email, for moving callers to EqualIgnoringASCIICase, LowerASCII, UpperASCII:

https://bugs.chromium.org/p/chromium/issues/detail?id=627682

Do you want me to create a separate tracking bug for the particular web-exposed changes corresponding to this intent-to-remove?

-- 
Frédéric Wang

Frédéric Wang

unread,
Feb 25, 2020, 3:11:06 AM2/25/20
to Delan Azabani, blink-dev, yo...@yoav.ws
On 24/02/2020 12:24, Delan Azabani wrote:
On 9 January, Rick Byers wrote:
So I'd be supportive of taking some risk and trying out a fairly mechanical replacement of DeprecatedEqualIgnoringCase (otherwise these sorts of things tend to just linger forever out of fear of fixing them). But if we hit a couple real-world cases of breakage then we may need to scale back to a subset and/or add some metrics to be more scientific about it.

 I’ve been working on this intent with that approach in mind, so after my first couple of patches [1][2] I put together a breakdown of Blink’s deprecated string operations:

Hi Delan,

Thanks for the very detailed breakdown.

          • 1 CSS syntax (needs further discussion)
          • 4 Chrome extension API
          • 1 Blink internal API

I guess these are a bit of the scope of this intent-to, you could either send trivial replacements CL without tests or just mention them in a summary comment on the broader tracking bug when you are done with the rest of the replacement.

      • 47 two arguments neither 8-bit literal
        • ? one of which is ASCII constant
      • 13 three arguments
What are is the three-argument version of DeprecatedEqualIgnoringCase?
    • 46 DeprecatedUpper
      • ? result compared against ASCII constant
    • 0 DeprecatedLower
I think this is the other way around,  DeprecatedUpper was removed in https://codereview.chromium.org/2821633002
-- 
Frédéric Wang

Delan Azabani

unread,
Feb 26, 2020, 7:23:26 PM2/26/20
to blink-dev, yo...@yoav.ws, fw...@igalia.com
On 9 January, Rick Byers wrote:
So I'd be supportive of taking some risk and trying out a fairly mechanical replacement of DeprecatedEqualIgnoringCase (otherwise these sorts of things tend to just linger forever out of fear of fixing them). But if we hit a couple real-world cases of breakage then we may need to scale back to a subset and/or add some metrics to be more scientific about it.

 I’ve been working on this intent with that approach in mind, so after my first couple of patches [1][2] I put together a breakdown of Blink’s deprecated string operations:
  • 184 deprecated string operations (crbug 627682 [3])
    • 138 DeprecatedEqualIgnoringCase
      • 78 8-bit ASCII literal (roughly this intent but not quite)
          • 1 CSS syntax (needs further discussion)
          • 3 DeprecatedEqual unit tests (skip)
        • 66 s/k (needs WPT where visible to the web platform)
          • 38 HTML attribute values
          • 3 HTML <!DOCTYPE> identifiers
          • 1 CSS syntax
          • 7 other web platform features
          • 6 high-level fetching system
          • 4 Chrome extension API
          • 1 Blink internal API
          • 6 DeprecatedEqual unit tests (skip)
      • 47 two arguments neither 8-bit literal
        • ? one of which is ASCII constant
      • 13 three arguments
    • 46 DeprecatedUpper
      • ? result compared against ASCII constant
    • 0 DeprecatedLower
    You can go to https://bucket.daz.cat/crbug-627682.html for the latest version with more detail. I’m currently working on migrating those 8-bit ASCII literal comparisons, with new WPT coverage where necessary and optimistic patches for the rest. Once those are done, I’ll skim the non-literal and DeprecatedUpper operations for anything that’s actually a comparison against an ASCII constant keyword, and I’ll fix those too. The rest should be fixed as part of the broader tracking bug.

    Cheers,
    Delan

    Delan Azabani

    unread,
    Mar 18, 2020, 4:24:16 AM3/18/20
    to blink-dev, daza...@igalia.com, yo...@yoav.ws, DongJun Kim
    On 25 February, Frédéric Wang wrote:

    Thanks for the very detailed breakdown.


    Happy to help!
     

    I guess these are a bit of the scope of this intent-to, you could either send trivial replacements CL without tests or just mention them in a summary comment on the broader tracking bug when you are done with the rest of the replacement.


    I’ve since discussed the CSS syntax comparisons with Yoav and uploaded a CL proposing that we make functions like EqualIgnoringASCIICase available by way of an implicit StringView constructor [1].

    Many thanks to DongJun Kim, who migrated the comparisons around other web platform features, Chrome extension API, and Blink internal API [2][3].

    All that remains for this intent is to migrate all other ASCII literal call sites [4] plus any relevant non-literal or DeprecatedLower call sites.

    What are is the three-argument version of DeprecatedEqualIgnoringCase?

    There are a few that take a wtf_size_t length [5], but their only callers are other IgnoringCase utility functions. Such call sites are beyond the scope of this intent, but I’ve updated my analysis to help anyone working on the bug.
    I think this is the other way around,  DeprecatedUpper was removed in https://codereview.chromium.org/2821633002
     

    Delan Azabani

    unread,
    Mar 27, 2020, 11:55:51 PM3/27/20
    to blink-dev
    Now that my non-literal [1] and DeprecatedLower [2] patches have landed, this intent is complete! Many thanks to Frédéric Wang and DongJun Kim for their help.

    This intent is only a subset of the broader tracking bug [3], which tracks the deprecation of all implicitly-Unicode string operations, rather than just comparisons against ASCII constants. Some additional work [4] is necessary to close the bug (note that these counts might not be current):
    • 180 deprecated string operations
      • 48 DeprecatedLower
        • 33 remaining with unclear intent or not exclusively ASCII constant comparison
      • 132 DeprecatedEqualIgnoringCase
        • 41 two arguments neither 8-bit literal
          • 13 remaining with no clear evidence of ASCII constant argument
        • 13 three arguments
          • 1 used to implement FindIgnoringCaseInternal (string_impl.cc:887)
          • 4 used to implement StartsWithIgnoringCase (string_impl.cc:1083:1086:1090:1093)
          • 4 used to implement EndsWithIgnoringCase (string_impl.cc:1142:1145:1149:1152)
    Cheers,
    Delan


    Joe Medley

    unread,
    Mar 30, 2020, 2:21:58 PM3/30/20
    to Delan Azabani, blink-dev
    Can someone please create a Chrome Status entry for this?

    Joe Medley | Technical Writer, Chrome DevRel | jme...@google.com | 816-678-7195
    If an API's not documented it doesn't exist.

    --
    You received this message because you are subscribed to the Google Groups "blink-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

    Frédéric Wang

    unread,
    Mar 31, 2020, 5:18:50 AM3/31/20
    to blin...@chromium.org
    Hi Joe,

    Sure, I was waiting for https://groups.google.com/a/chromium.org/forum/#!topic/blink-api-owners-discuss/oJsHeahMUfw but it seems the discussion have stalled. I'll create an entry today anyway.

    Frédéric Wang

    unread,
    Mar 31, 2020, 8:04:28 AM3/31/20
    to blin...@chromium.org
    On 31/03/2020 11:18, Frédéric Wang wrote:
    Hi Joe,

    Sure, I was waiting for https://groups.google.com/a/chromium.org/forum/#!topic/blink-api-owners-discuss/oJsHeahMUfw but it seems the discussion have stalled. I'll create an entry today anyway.

    On 30/03/2020 20:21, 'Joe Medley' via blink-dev wrote:
    Can someone please create a Chrome Status entry for this?
    Joe Medley | Technical Writer, Chrome DevRel | jme...@google.com | 816-678-7195
    If an API's not documented it doesn't exist.



    There is also a google doc with more detailed of possible web-facing changes:


    Please let me know if you need anything else.
    -- 
    Frédéric Wang
    
    Reply all
    Reply to author
    Forward
    0 new messages