Intent to Ship: Intl BestFitMatcher by using ICU LocaleMatcher

120 views
Skip to first unread message

Frank Tang

unread,
Jun 10, 2021, 3:10:52 AM6/10/21
to blink-dev, Mark Davis, Markus Scherer, Shane Carr, Nebojša Ćirić, Jakob Kummerow, Shu-yu Guo, Mathias Bynens

Contact emails

ft...@google.com

Specification

https://tc39.es/ecma402/#sec-bestfitmatcher

Design docs


https://docs.google.com/document/d/1cPGfiihn76yj2iAomKcspPFyLLcnk3WkCiqceBQPQyk/edit#

Summary

Use the ICU LocaleMatcher to implement the BestFitMatcher of ECMA402 in v8 JavaScript engine. ECMA402 defined the BestFitMatcher abstract operation to allow browser implementation to implement a better way to match locale data. UTS35 sec "4.4 Language Matching" details a data driven algorithm to use CLDR. ICU 67.1 (launched in April 2020) comes with an improved icu::LocaleMatcher API and implementation . This document shows how we implement v8's BestFitMatcher to use such API.



Blink component

Blink>JavaScript>Internationalization

TAG review



TAG review status

Not applicable

Risks



Interoperability and Compatibility

The locale which will be enabled by this better locale matcher are locales with less usage. Therefore the risk is smaller for matching for these locale. It will not impact most common use locales since Chrome already ship data for them and are available already. One example is after the launch, the "ab" (Abkhazian ) locale in the past will fallback to system locale- most likely "en" (English), but with this launch it will fallback to "ru" (Russian) because Chrome ship with Russian locale and we know Abkhazian is more likely to use Russian locale than the system locale.



Gecko: No signal

WebKit: No signal

Web developers: No signals

Ergonomics

The "best fit" localeMatcher is the default value for all Intl. objects therefore with the launch the behavior will be the default for all ECMA402 operations unless user pass in {localeMatcher: "lookup"} to force the old behavior.



Activation

The "best fit" localeMatcher is the default value for all Intl. objects therefore with the launch the behavior will be the default for all ECMA402 operations unless user pass in {localeMatcher: "lookup"} to force the old behavior.



Security

No security risk by launching this.



Is this feature fully tested by web-platform-tests?

Yes

Flag name

--harmony_intl_best_fit_matcher

Tracking bug

https://bugs.chromium.org/p/v8/issues/detail?id=7051

Link to entry on the Chrome Platform Status

https://chromestatus.com/feature/5407573287108608

Links to previous Intent discussions

Ready for Trial: https://groups.google.com/a/chromium.org/g/blink-dev/c/W7TcX1tSHDI/m/1AthUhEWBAAJ


This intent message was generated by Chrome Platform Status.

Mike West

unread,
Jun 10, 2021, 4:42:43 AM6/10/21
to Frank Tang, blink-dev, Mark Davis, Markus Scherer, Shane Carr, Nebojša Ćirić, Jakob Kummerow, Shu-yu Guo, Mathias Bynens
On Thu, Jun 10, 2021 at 9:10 AM Frank Tang <ft...@chromium.org> wrote:

Contact emails

ft...@google.com

Specification

https://tc39.es/ecma402/#sec-bestfitmatcher

Design docs


https://docs.google.com/document/d/1cPGfiihn76yj2iAomKcspPFyLLcnk3WkCiqceBQPQyk/edit#

Summary

Use the ICU LocaleMatcher to implement the BestFitMatcher of ECMA402 in v8 JavaScript engine. ECMA402 defined the BestFitMatcher abstract operation to allow browser implementation to implement a better way to match locale data. UTS35 sec "4.4 Language Matching" details a data driven algorithm to use CLDR. ICU 67.1 (launched in April 2020) comes with an improved icu::LocaleMatcher API and implementation . This document shows how we implement v8's BestFitMatcher to use such API.



Blink component

Blink>JavaScript>Internationalization

TAG review



TAG review status

Not applicable


I think we're relying on the TC39 process here, and for signals below. Presumably this has been accepted by the group, since it's in the spec. :)
 

Risks



Interoperability and Compatibility

The locale which will be enabled by this better locale matcher are locales with less usage. Therefore the risk is smaller for matching for these locale. It will not impact most common use locales since Chrome already ship data for them and are available already. One example is after the launch, the "ab" (Abkhazian ) locale in the past will fallback to system locale- most likely "en" (English), but with this launch it will fallback to "ru" (Russian) because Chrome ship with Russian locale and we know Abkhazian is more likely to use Russian locale than the system locale.



Gecko: No signal

WebKit: No signal

Web developers: No signals


Do we have any indication of a timeline along which other vendors will ship this as well?
 


Ergonomics

The "best fit" localeMatcher is the default value for all Intl. objects therefore with the launch the behavior will be the default for all ECMA402 operations unless user pass in {localeMatcher: "lookup"} to force the old behavior.



Activation

The "best fit" localeMatcher is the default value for all Intl. objects therefore with the launch the behavior will be the default for all ECMA402 operations unless user pass in {localeMatcher: "lookup"} to force the old behavior.



Security

No security risk by launching this.


It's not clear to me what the delta is between information this mechanism reveals about a user's local system, and what's already available via existing i18n APIs. I think the claim here is that the ordering is defined by Chrome, not the local system, and that the fallback order is going to be the same for all users, regardless of the language they're using Chrome in, and the language preferences they may have adjusted via chrome://settings?
 

Is this feature fully tested by web-platform-tests?

Yes

Flag name

--harmony_intl_best_fit_matcher

Tracking bug

https://bugs.chromium.org/p/v8/issues/detail?id=7051

Link to entry on the Chrome Platform Status

https://chromestatus.com/feature/5407573287108608

Links to previous Intent discussions

Ready for Trial: https://groups.google.com/a/chromium.org/g/blink-dev/c/W7TcX1tSHDI/m/1AthUhEWBAAJ


This intent message was generated by Chrome Platform Status.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAOcELL_J7yEvEbV08_Wp_6qVZzNcUx%2B6%3Dd4QBsp16aK%2BvZtEnQ%40mail.gmail.com.

Paul Irish

unread,
Jun 10, 2021, 1:01:51 PM6/10/21
to Frank Tang, blink-dev, Mark Davis, Markus Scherer, Shane Carr, Nebojša Ćirić, Jakob Kummerow, Shu-yu Guo, Mathias Bynens
I'm slightly familiar with locale-fitting from the application-developer POV and had some quick non-blocking questions.
I see ICU 67.1 introduced some improvements to the LocaleMatcher, and know that BestFitMatcher has flexibility in the UA implementation…

Does this matcher implementation handle the 3 operations mentioned here? Namely: macrolanguage replacements, alias replacements, and parent locale resolving.
In our app (Lighthouse), we ported this method to maximize our compatibility for requestedLocales, but would happily drop it to use the best-fit matcher instead.


Also, on testing.. 
The design doc mentioned possible tests in v8/test/intl/bestfitmatcher/, but I don't see anything there or in the original CL. Is there another place I can look?


Thanks for all your work on the Intl suite!


--

Frank Tang

unread,
Jun 10, 2021, 7:08:37 PM6/10/21
to blink-dev, Mike West, blink-dev, Mark Davis, Markus Scherer, Shane Carr, Nebojša Ćirić, Jakob Kummerow, Shu-yu Guo, Mathias Bynens, Frank Tang
On Thursday, June 10, 2021 at 1:42:43 AM UTC-7 Mike West wrote:
On Thu, Jun 10, 2021 at 9:10 AM Frank Tang <ft...@chromium.org> wrote:

Contact emails

ft...@google.com

Specification

https://tc39.es/ecma402/#sec-bestfitmatcher

Design docs


https://docs.google.com/document/d/1cPGfiihn76yj2iAomKcspPFyLLcnk3WkCiqceBQPQyk/edit#

Summary

Use the ICU LocaleMatcher to implement the BestFitMatcher of ECMA402 in v8 JavaScript engine. ECMA402 defined the BestFitMatcher abstract operation to allow browser implementation to implement a better way to match locale data. UTS35 sec "4.4 Language Matching" details a data driven algorithm to use CLDR. ICU 67.1 (launched in April 2020) comes with an improved icu::LocaleMatcher API and implementation . This document shows how we implement v8's BestFitMatcher to use such API.



Blink component

Blink>JavaScript>Internationalization

TAG review



TAG review status

Not applicable


I think we're relying on the TC39 process here, and for signals below. Presumably this has been accepted by the group, since it's in the spec. :)
 

Risks



Interoperability and Compatibility

The locale which will be enabled by this better locale matcher are locales with less usage. Therefore the risk is smaller for matching for these locale. It will not impact most common use locales since Chrome already ship data for them and are available already. One example is after the launch, the "ab" (Abkhazian ) locale in the past will fallback to system locale- most likely "en" (English), but with this launch it will fallback to "ru" (Russian) because Chrome ship with Russian locale and we know Abkhazian is more likely to use Russian locale than the system locale.



Gecko: No signal

WebKit: No signal

Web developers: No signals


Do we have any indication of a timeline along which other vendors will ship this as well?

The "best fit" localeMatcher, according to ECMA402 

9.2.4 BestFitMatcher ( availableLocales, requestedLocales )
The BestFitMatcher abstract operation compares requestedLocales, which must be a List as returned by CanonicalizeLocaleList, against the locales in availableLocales and determines the best available language to meet the request. The algorithm is implementation dependent, but should produce results that a typical user of the requested locales would perceive as at least as good as those produced by the LookupMatcher abstract operation. Options specified through Unicode locale extension sequences must be ignored by the algorithm. Information about such subsequences is returned separately. The abstract operation returns a record with a [[locale]] field, whose value is the language tag of the selected locale, which must be an element of availableLocales. If the language tag of the request locale that led to the selected locale contained a Unicode locale extension sequence, then the returned record also contains an [[extension]] field whose value is the substring of the Unicode locale extension sequence within the request locale language tag. 

and
9.2.9 BestFitSupportedLocales ( availableLocales, requestedLocales )

The BestFitSupportedLocales abstract operation returns the subset of the provided BCP 47 language priority list requestedLocales for which availableLocales has a matching locale when using the Best Fit Matcher algorithm. Locales appear in the same order in the returned list as in requestedLocales. The steps taken are implementation dependent.

https://tc39.es/ecma402/#annex-implementation-dependent-behaviour
A Implementation Dependent Behaviour

The following aspects of the ECMAScript 2022 Internationalization API Specification are implementation dependent:

therefore, this is a launch of "Implementation Dependent Behaviour" and it is not applicable for other engine to ship with it. The "lookup" localeMatcher is already shipped for a long time. 

 


Ergonomics

The "best fit" localeMatcher is the default value for all Intl. objects therefore with the launch the behavior will be the default for all ECMA402 operations unless user pass in {localeMatcher: "lookup"} to force the old behavior.



Activation

The "best fit" localeMatcher is the default value for all Intl. objects therefore with the launch the behavior will be the default for all ECMA402 operations unless user pass in {localeMatcher: "lookup"} to force the old behavior.



Security

No security risk by launching this.


It's not clear to me what the delta is between information this mechanism reveals about a user's local system, and what's already available via existing i18n APIs.

The pre-exist "lookup" localeMatcher, per ECMA402, is continue to fallback to the user's locale system but this parr is NOT changing it. 
The "best fit" localeMatcher, in the other hand, may fallback to a better locale before considering the user's locale system based on the CLDR data which mentioned, therefore, the security risk is NOT greater than the current behavior because the worst case it will reveal the user's system locale, which is already reveal by the pre-exist "lookup" localeMatcher which this launch does not remove nor add to the system. 
 
I think the claim here is that the ordering is defined by Chrome, not the local system, and that the fallback order is going to be the same for all users, regardless of the language they're using Chrome in, and the language preferences they may have adjusted via chrome://settings?

Under the "best fit" localeMatcher of this launch, it will consider the CLDR data (e.g. " the ordering is defined by Chrome") BEFORE fallback to the user system locale. If the data does not provide meaningful fallback, then it will fallback to the user system locale, the SAME AS the pre-existing "lookup" localeMatcher. It is NOT 100% decided by " the ordering is defined by Chrome". It may still fallback to user's system locale. But that condition is already in the status quo. This launch will not make it worst nor make it better.   
 
 

Is this feature fully tested by web-platform-tests?

Yes

Flag name

--harmony_intl_best_fit_matcher

Tracking bug

https://bugs.chromium.org/p/v8/issues/detail?id=7051

Link to entry on the Chrome Platform Status

https://chromestatus.com/feature/5407573287108608

Links to previous Intent discussions

Ready for Trial: https://groups.google.com/a/chromium.org/g/blink-dev/c/W7TcX1tSHDI/m/1AthUhEWBAAJ


This intent message was generated by Chrome Platform Status.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscribe@chromium.org.

Frank Tang

unread,
Jun 10, 2021, 7:58:57 PM6/10/21
to blink-dev, Paul Irish, blink-dev, Mark Davis, Markus Scherer, Shane Carr, Nebojša Ćirić, Jakob Kummerow, Shu-yu Guo, Mathias Bynens, Frank Tang
On Thursday, June 10, 2021 at 10:01:51 AM UTC-7 Paul Irish wrote:
I'm slightly familiar with locale-fitting from the application-developer POV and had some quick non-blocking questions.
I see ICU 67.1 introduced some improvements to the LocaleMatcher, and know that BestFitMatcher has flexibility in the UA implementation…

Does this matcher implementation handle the 3 operations mentioned here? Namely: macrolanguage replacements, alias replacements, and parent locale resolving.

I am not 100% sure how these three thing got mentioned in the file you quoted. Some of these three issues mentioned there is ALREADY resolved before reaching the localeMatcher, regardless of this launch or not.
1). CLDR macrolanguage replacements are done ( i.e. "cmn" becomes "zh" )

The "cmn" became "zh" is already addressed by the Locale canonicalization process, regardless it is in "best fit" or pre-exist "lookup" localeMatcher. 
d8> Intl.getCanonicalLocales(["cmn", "arb", "zsm", "swh", "uzn", "knn", "kmr"])
["zh", "ar", "ms", "sw", "uz", "kok", "ku"]

so either "lookup" or "best fit" will have macrolanguage replacement done (there was a locale canonicalization bug before which does not resolve that but we fixed it a while ago. The locale matcher is depending on that but the fix is independent from this launch since that is not part of the algorithm but the locale canonicalization before reaching this stage)

2). Known locale aliases, such as zh-TW = zh-Hant-TW, are resolved,

I am not sure what is "Known locale aliases". I guess the author of that line is referring to the locale canonicalization which is resolved independently a while ago. 
 
3). Explicit parent locales from CLDR's supplemental data are also considered.


For example, with the "best fit" localeMatcher, "co"  (Corsican) which chrome does not ship with locale, will now fall back to "fr" (French) instead of the user's system locale as "lookup" localeMatcher (pre-exixt).

d8> (new Intl.DateTimeFormat("co", {localeMatcher: "best fit"})).resolvedOptions().locale
"fr"
d8> (new Intl.DateTimeFormat("co", {localeMatcher: "lookup"})).resolvedOptions().locale
"en-US"

with the way we currently package locale data for chrome, the parent locale resolution will not be an extra benefit of "best fit" locale matcher since we package all the child locale data with chrome anyway so it is resolved (and shadowed) by how we package the locale data regardless how the "best fit" localeMatcher is launched or not. (both option will have this issue addressed)
 
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscribe@chromium.org.

Mike West

unread,
Jun 16, 2021, 5:29:22 AM6/16/21
to Frank Tang, blink-dev, Paul Irish, Mark Davis, Markus Scherer, Shane Carr, Nebojša Ćirić, Jakob Kummerow, Shu-yu Guo, Mathias Bynens
LGTM1.

Thanks for following up on the concerns I raised. I think your responses sufficiently address them, as well as pointers to conversations with our friends at Mozilla who wrote up a nice analysis of the privacy properties in https://docs.google.com/document/d/1Zw6cYNJpL69HtQfA4-S7bKlCPywhhmoF6Mja-qy-JpU/edit?usp=sharing.

As noted above, skipping TAG review and signals requests is reasonable, given that this is a stage 3 proposal though the TC39 process, so you're good to go from my perspective.

-mike


To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/6244e229-c39b-413f-b482-b27f1673acf3n%40chromium.org.

Chris Harrelson

unread,
Jun 16, 2021, 11:36:50 AM6/16/21
to Mike West, Frank Tang, blink-dev, Paul Irish, Mark Davis, Markus Scherer, Shane Carr, Nebojša Ćirić, Jakob Kummerow, Shu-yu Guo, Mathias Bynens

Yoav Weiss

unread,
Jun 16, 2021, 12:51:15 PM6/16/21
to Chris Harrelson, Mike West, Frank Tang, blink-dev, Paul Irish, Mark Davis, Markus Scherer, Shane Carr, Nebojša Ćirić, Jakob Kummerow, Shu-yu Guo, Mathias Bynens
Reply all
Reply to author
Forward
0 new messages