Changes to likely subtags in CLDR v46

31 views
Skip to first unread message

Edward Welbourne

unread,
Nov 29, 2024, 11:49:11 AM11/29/24
to cldr-...@unicode.org, Mark Davis
In

commit 1a914d1369f216122ef174589c8b210f6175ca5b
Author: Mark Davis <ma...@macchiato.com>
Date: Mon Aug 19 09:11:06 2024 -0700

CLDR-17535 Update likely subtags data (#3966)

I see many und_* likely subtag rules removed, notably

- <likelySubtag from="und_GB" to="en_Latn_GB"/>

and

- <likelySubtag from="und_US" to="en_Latn_US"/>

The code I maintain has previously relied on these to know that the
default language for GB and US is en. I can guess that this means the
code is oblivious to some quirk of the rules [0] - possibly newer than
the code - that ensures we should be finding our way from these
und_{GB,US} to en without needing those rules.

[0] https://www.unicode.org/reports/tr35/#Likely_Subtags
unless I'm missing something.

Can anyone enlighten me as to the path by which the remaining data
should be leading to the conclusions the removed data above used to
supply ?

With any luck I should then be able to work out what the code's doing
wrong ...

Eddy.

Mark Davis Ⓤ

unread,
Nov 29, 2024, 7:39:59 PM11/29/24
to Edward Welbourne, cldr-...@unicode.org, Mark Davis
Some redundancies in the likely subtags were eliminated. However, if you follow the algorithm in https://unicode.org/reports/tr35/#likely-subtag you'll get the same answer.

Note the following line:

<likelySubtag from="und" to="en_Latn_US"/> <!--?‧?‧? ➡ English‧Latin‧United States-->


When you apply the algorithm from s to und_GB (for example), then you will match in 2 Lookup on 2.4 languages, namely "und". You then substitute in "en" and "Latn", getting en_Latn_GB 

Note there are various ways to optimize the lookup to only take a single pass; the ICU code uses them to just make a single pass for the Lookup.

I hope this helps.

Mark


--
You received this message because you are subscribed to the Google Groups "CLDR - Users Public Mail List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cldr-users+...@unicode.org.
To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/cldr-users/DU0PR02MB82188020F0A0F887D85962A9872A2%40DU0PR02MB8218.eurprd02.prod.outlook.com.
Reply all
Reply to author
Forward
0 new messages