RE: CLDR v44.1 available

136 views
Skip to first unread message

Doug Ewell

unread,
Dec 14, 2023, 4:14:00 PM12/14/23
to cldr-...@unicode.org
Peter Edberg wrote:

> CLDR v44.1 is available today with fixes for a few specific issues
> present in CLDR v44, see Version 44.1 Changes
> <https://cldr.unicode.org/index/downloads/cldr-44#h.nvqx283jwsx>;

Currently that page shows “n/a” in the Data column for 44.1. Will there eventually be a normal data release with the usual zip files, given that there were some data changes?

--
Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org

Peter Edberg

unread,
Dec 15, 2023, 2:21:11 AM12/15/23
to Doug Ewell, cldr-...@unicode.org
Hi Doug,


On Dec 14, 2023, at 1:13 PM, Doug Ewell <do...@ewellic.org> wrote:

Peter Edberg wrote:

CLDR v44.1 is available today with fixes for a few specific issues
present in CLDR v44, see Version 44.1 Changes
<https://cldr.unicode.org/index/downloads/cldr-44#h.nvqx283jwsx>;

Currently that page shows “n/a” in the Data column for 44.1. Will there eventually be a normal data release with the usual zip files, given that there were some data changes?

We had not been planning to release the data/tools zip files for CLDR v44.1; we also did not release them for CLDR v43.1. Of the clients who consume CLDR data directly and not through ICU, most use the JSON data. The thinking was that others could use the artifacts from the github tag “release-44-1” in the cldr-staging repository for production data, and in the cldr repository for tools.

However if it would be useful to have the zip files as provided for CLDR v44, please let us know, it would not be difficult to post them.

Thanks!
- Peter

Edward Welbourne

unread,
Jan 9, 2024, 11:21:40 AMJan 9
to Peter Edberg, Doug Ewell, cldr-...@unicode.org
Peter Edberg wrote:
>>> CLDR v44.1 is available today with fixes for a few specific issues
>>> present in CLDR v44,

I don't seem to have received a similar mail about the v44 release itself.

>>> ... see Version 44.1 Changes

On Dec 14, 2023, at 1:13 PM, Doug Ewell <do...@ewellic.org> wrote:
>> Currently that page shows “n/a” in the Data column for 44.1. Will
>> there eventually be a normal data release with the usual zip files,
>> given that there were some data changes?

Peter Edberg (15 December 2023 08:20) wrote:
> We had not been planning to release the data/tools zip files for CLDR
> v44.1; we also did not release them for CLDR v43.1. Of the clients who
> consume CLDR data directly and not through ICU, most use the JSON
> data. The thinking was that others could use the artifacts from the
> github tag “release-44-1” in the cldr-staging repository for
> production data, and in the cldr repository for tools.
>
> However if it would be useful to have the zip files as provided for
> CLDR v44, please let us know, it would not be difficult to post them.

This is the first time I've used the github repositories for the data
and I'm surprised to find many entries saying ↑↑↑ - these seem to map to
where the zip file's version relies on inheritance. I failed to find
anything documenting that, though. So at least a heads-up to anyone
else using the github data for the first time, as a result of this - treat that
special value for a data field as "inherit from parent or root locale".

Example, from ak.xml:
<symbols numberSystem="latn">
<decimal>↑↑↑</decimal>
<group>↑↑↑</group>
</symbols>
which tripped an assertion that decimal and group should be distinct,
https://github.com/qt/qtbase/blob/7d1f29df795e3e1635204b656b368582ed6942ea/util/locale_database/ldml.py#L278
thereby bringing this to my attention.

Eddy.

Edward Welbourne

unread,
Jan 9, 2024, 12:29:29 PMJan 9
to Steven R. Loomis, Peter Edberg, Doug Ewell, cldr-...@unicode.org
Steven R. Loomis (9 January 2024 18:19) wrote:
> The data in the repository contains inheritance markers. The java
> tool GenerateProductionData can be used to output production data,
> this is what is used for CLDR release zipfiles.

Thanks for that explanation. It turned out to be easy enough to amend
the scripts the Qt project uses to extract what it needs, so I've simply
done that - but nice to know how the production data is generated.

In looking at which of our tests have changed results, I notice that the
Tamil locales, based on common/main/ta.xml, have somewhat inconsistent
markers for AM/PM: commit ee245adab1d9116498f614d6a0a67634b0efd576
changed
<dayPeriodContext type="format">
<dayPeriodWidth type="abbreviated">

to inherit AM while retaining its localized noon and PM, while changing

<dayPeriodContext type="stand-alone">
<dayPeriodWidth type="abbreviated">

to inherit noon and PM while retaining its localized AM; meanwhile the
wide forms all went to inherit. This inevitably looks like something
got mixed up,

Eddy.

Steven R. Loomis

unread,
Jan 9, 2024, 12:33:23 PMJan 9
to Edward Welbourne, Peter Edberg, Doug Ewell, cldr-...@unicode.org
You have to inherit according to all of the rules, including parent locales and aliases. 

On Dec 15, Peter Edberg wrote:  "However if it would be useful to have the zip files as provided for CLDR v44, please let us know, it would not be difficult to post them."

--
Steven R. Loomis
Code Hive Tx, LLC


Mark Davis Ⓤ

unread,
Jan 9, 2024, 1:35:46 PMJan 9
to Steven R. Loomis, Edward Welbourne, Peter Edberg, Doug Ewell, cldr-...@unicode.org
Edward, can you file a ticket on the Tamil issue?

Note: if you don't want to handle the intricacies of inheritance yourself (and they can get tricky), then you can look at the json format files, which are fully resolved.

--
You received this message because you are subscribed to the Google Groups "CLDR Users Public Mail List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cldr-users+...@unicode.org.
To view this discussion on the web visit https://groups.google.com/a/unicode.org/d/msgid/cldr-users/07D6979D-BA7E-4215-B940-5D2AD38C39C5%40gmail.com.

Edward Welbourne

unread,
Jan 9, 2024, 2:34:37 PMJan 9
to Mark Davis Ⓤ, Steven R. Loomis, Peter Edberg, Doug Ewell, cldr-...@unicode.org
Mark Davis (9 January 2024 19:35) wrote:
> Edward, can you file a ticket on the Tamil issue?

I'd really rather not create an account on yet another bug-tracker.
I've commented on github where the change was made, perhaps that can be
turned into a github issue ?

> Note: if you don't want to handle the intricacies of inheritance
> yourself (and they can get tricky), then you can look at the json
> format files, which are fully resolved.

I solved that problem (I believe correctly - yes, it was intricate, but
it was clearly documented in the LDML spec) a few years back, so the
scripts take care of it all for me already. We are looking at
potentially using the JSON form, in due course, but it's not urgent
given that we have a working solution already.

Eddy.

Doug Ewell

unread,
Jan 9, 2024, 4:48:20 PMJan 9
to Steven R. Loomis, Mark Davis, Edward Welbourne, Peter Edberg, cldr-...@unicode.org
Steven R. Loomis wrote:

> To follow up on that last point, cldr-json may be obtained via:
>
> https://github.com/unicode-org/cldr-json
>
> Releases, zip files, and npm packages are available.

Is JSON the canonical distribution format for CLDR? In the past I had thought the XML files were canonical, and the JSON files were derived from them.

Edward Welbourne

unread,
Jan 10, 2024, 12:47:31 PMJan 10
to Peter Edberg, Mark Davis Ⓤ, cldr-...@unicode.org
Background: I'm working [0] on enabling Qt to localize names of
timezones, even when not using ICU. In the course of this I'm including
assertions in my scanning code to verify parts of the LDML spec that I
(think I may need to) rely on.

Among the consistency constraints on the metaZone information [1], I
find:
* A golden zone in mapTimezones must have reverse mapping in
metazoneInfo.
* A preferred zone in mapTimezones must have reverse mapping in
metazoneInfo

(Minor typographic inconsistency in spec: missing full-stop at end of
the latter.)

These are violated by metazones Macquarie for territory 001 and
Mountain_Time for MX, respectively; the latter gives America/Hermosillo
which, since [2], no longer has a Mountain_Time period in its history
and the former gives Antarctic/Macquarie which is, since [3], always
Australia_Eastern. I have commented on the github changes that
introduced these inconsistencies.

[0] https://bugreports.qt.io/browse/QTBUG-115158
[1] https://www.unicode.org/reports/tr35/tr35-68/tr35-dates.html#Time_Zone_Format_Terminology
[2] commit b7f0f8eb443d2f55aef6448b95afbadb84752ea5 (2014, v25)
[3] commit 78fa9dc6a034237873e656231cbf6300d37c0733 (2020, v39)

It seemed prudent to report, so that those who know better can work out
what is the right way to fix,

Eddy.

Steven R. Loomis

unread,
Jan 12, 2024, 11:33:37 AMJan 12
to Mark Davis, Edward Welbourne, Peter Edberg, Doug Ewell, cldr-...@unicode.org
To follow up on that last point, cldr-json may be obtained via:

https://github.com/unicode-org/cldr-json

Releases, zip files, and npm packages are available.

--
Steven R. Loomis
Code Hive Tx, LLC
https://codehivetx.us



Steven R. Loomis

unread,
Jan 12, 2024, 11:33:37 AMJan 12
to Edward Welbourne, Peter Edberg, Doug Ewell, cldr-...@unicode.org
The data in the repository contains inheritance markers.  The java tool GenerateProductionData can be used to output production data, this is what is used for CLDR release zipfiles.


--
Steven R. Loomis
Code Hive Tx, LLC


--
You received this message because you are subscribed to the Google Groups "CLDR Users Public Mail List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cldr-users+...@unicode.org.

Markus Scherer

unread,
Jan 12, 2024, 5:33:49 PMJan 12
to Edward Welbourne, Peter Edberg, Mark Davis Ⓤ, cldr-...@unicode.org
On Wed, Jan 10, 2024 at 9:47 AM 'Edward Welbourne' via CLDR Users Public Mail List <cldr-...@unicode.org> wrote:
It seemed prudent to report, so that those who know better can work out
what is the right way to fix,

Reply all
Reply to author
Forward
0 new messages