Would like to accept ticket to remove standard patterns from DTPG

14 views
Skip to first unread message

Shane Carr

unread,
Sep 10, 2025, 2:19:39 AMSep 10
to ICU team, icu-design, Peter Edberg
Dear ICU Team and users, 

I would like to accept ICU-23209 for ICU 78


I am proposing that DateTimePatternGenerator consider only availableFormats patterns. Currently it also looks at the length patterns (full/long/medium/short) when matching on a skeleton. This behavior is not explicitly stated in the spec, and it is a behavior difference from ICU4X.

ICU has long had a technical preview API with the behavior I desire to make the default behavior. The difference is illustrated in this PR here:


Thank you, 
Shane
 

Markus Scherer

unread,
Sep 10, 2025, 12:35:55 PMSep 10
to Shane Carr, Mark Davis, ICU team, icu-design, Peter Edberg
I suspect that if the ICU DTPG uses the standard patterns, then +Mark Davis did so intentionally.
Mark?

tnx
markus

Shane Carr

unread,
Sep 10, 2025, 2:07:01 PMSep 10
to Markus Scherer, Mark Davis, ICU team, icu-design, Peter Edberg
Mark or Peter should confirm, but my suspicion is that the length patterns were added to the DTPG at a time when the availableFormats patterns were not as thorough as they are now. We have a separate long-standing open issue in CLDR to align availableFormats with length patterns and have made some progress. I don't want to wait for that issue before landing this ICU change, though.

Mark Davis Ⓤ

unread,
Sep 10, 2025, 2:21:22 PMSep 10
to Shane Carr, Markus Scherer, ICU team, icu-design, Peter Edberg
That's correct. And I think we probably still want to use them if there is no available format data.

Markus Scherer

unread,
Sep 10, 2025, 2:50:36 PMSep 10
to Mark Davis Ⓤ, Shane Carr, ICU team, icu-design, Peter Edberg
On Wed, Sep 10, 2025 at 11:21 AM Mark Davis Ⓤ <ma...@unicode.org> wrote:
That's correct. And I think we probably still want to use them if there is no available format data.

Please clarify. Should the DTPG use the length patterns if there are no availableFormats at all?
Or none for the relevant skeletons?
Or when?

Shane Carr

unread,
Sep 10, 2025, 3:11:38 PMSep 10
to Markus Scherer, Mark Davis Ⓤ, ICU team, icu-design, Peter Edberg
I'm happy to say that DTPG uses length patterns if availableFormats is empty.

"Or none for the relevant skeletons" is a heuristic that seems brittle. I'm not sure how I would check for it.

Mark Davis Ⓤ

unread,
Sep 10, 2025, 5:08:06 PMSep 10
to Shane Carr, Markus Scherer, ICU team, icu-design, Peter Edberg
I agree.

What we could investigate on the CLDR side is whether we can find ways to "derive" reasonable available patterns from the stock formats that would be better than nothing, eg from "HH:mm:ss" pull out "HH:mm" and "mm:ss"; those could be suggestions to vetters to help kick-start them.

Rich Gillam

unread,
Sep 10, 2025, 8:39:00 PMSep 10
to Mark Davis Ⓤ, Shane Carr, Markus Scherer, ICU team, icu-design, Peter Edberg
Weren’t we planning to get rid of the “length” patterns altogether and make sure they all occur in availableFormats and can be selected with appropriate skeleton strings?  Is that still going to happen?  Do we have any kind of read on how many locales have things in the “length” patterns that aren’t duplicated somewhere in availableFormats?  I’d like to think that by now, DTPG shouldn’t have to look at the “length” patterns anymore, and that if that isn’t true, we should figure out how to plug the holes.

—Rich

-- 
You received this message because you are subscribed to the Google Groups "ICU - Team" group.
To unsubscribe from this group and stop receiving emails from it, send an email to icu-team+u...@unicode.org.
To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-team/CAGuL-ciwi2pgAEoMq6kVqYLWX9WCgKLzO0ScRS_D05kb6shtEA%40mail.gmail.com.

Shane Carr

unread,
Sep 10, 2025, 8:50:25 PMSep 10
to Rich Gillam, Mark Davis Ⓤ, Markus Scherer, ICU team, icu-design, Peter Edberg
> Weren’t we planning to get rid of the “length” patterns altogether and make sure they all occur in availableFormats and can be selected with appropriate skeleton strings?  Is that still going to happen?


I already fixed some low-hanging fruit last release. The task to fix the issue more generally has been punted a few releases and seems likely to be punted again.

> Do we have any kind of read on how many locales have things in the “length” patterns that aren’t duplicated somewhere in availableFormats?

See the spreadsheet in the above bug. I fixed many of them in the Gregorian calendar, but there are quite a few in non-Gregorian calendars. Note also that the skeletons most impacted by my PR are time patterns, not date patterns.

> I’d like to think that by now, DTPG shouldn’t have to look at the “length” patterns anymore, and that if that isn’t true, we should figure out how to plug the holes.

Great, I take that as a vote to move forward.

There are very few test expectations I needed to change. See the PR: https://github.com/unicode-org/icu/pull/3641


Rich Gillam

unread,
Sep 10, 2025, 9:03:42 PMSep 10
to Shane Carr, Mark Davis Ⓤ, Markus Scherer, ICU team, icu-design, Peter Edberg
Shane—

> I’d like to think that by now, DTPG shouldn’t have to look at the “length” patterns anymore, and that if that isn’t true, we should figure out how to plug the holes.

Great, I take that as a vote to move forward.

I don’t know; I kind of think we should plug those holes NOW.  I don’t feel good about a change in observed behavior.  But if that’s what you’ve got in mind, great.  And if other people think the change in behavior won’t really hurt anybody, I’m willing to be convinced.

—Rich

Shane Carr

unread,
Sep 10, 2025, 9:13:34 PMSep 10
to Rich Gillam, Mark Davis Ⓤ, Markus Scherer, ICU team, icu-design, Peter Edberg
> I don’t know; I kind of think we should plug those holes NOW. I don’t feel good about a change in observed behavior.  But if that’s what you’ve got in mind, great.  And if other people think the change in behavior won’t really hurt anybody, I’m willing to be convinced.

I understand; here are my counter-arguments for why the change seems safe:
  1. I previously fixed the highest-impact low-hanging fruit in the issue (for date patterns in Gregorian calendars)
  2. Most locales have thorough availableFormats data. Those that don't will fall back to the root patterns, which is fine (patterns, not symbols). There is exactly one such test expectation in the PR that I needed to update, in locale ha: https://github.com/unicode-org/icu/pull/3641/commits/b779baa09337609436f087e53b6db5da119da4f3
  3. The motivating locale for the change was Thai, which I believe is improving; you can see the diff in https://github.com/unicode-org/icu/pull/3641/commits/75fa7ae0e3f62b0e45bb144004bbc9e7a5732a5f
  4. The CLDR issue continues to be "blocks release" priority (the highest level in Jira), as it has been for the last few releases

Rich Gillam

unread,
Sep 10, 2025, 9:16:32 PMSep 10
to Shane Carr, Mark Davis Ⓤ, Markus Scherer, ICU team, icu-design, Peter Edberg
Mark and Markus have previously weighed in on this issue; if they’re okay with doing this, I’m okay with doing it.

—Rich

Annemarie Apple

unread,
Sep 10, 2025, 10:26:36 PMSep 10
to Rich Gillam, Shane Carr, Mark Davis Ⓤ, Markus Scherer, ICU team, icu-design, Peter Edberg
re #4 - I think Shane should drive resolving this if we think it's a priority or we should lower the priority of https://unicode-org.atlassian.net/browse/CLDR-14993 as he has rightly pointed out that it has been pushed out more than one release.

Best,

Annemarie

~~~~~~~~~~
If you received this communication by mistake, please don't forward it to anyone else, please erase all copies of it, including all attachments, and please let the sender know it went to the wrong person. Thank you!


Mark Davis Ⓤ

unread,
Sep 11, 2025, 9:39:24 AMSep 11
to Annemarie Apple, Rich Gillam, Shane Carr, Markus Scherer, ICU team, icu-design, Peter Edberg
We've started down that path, by having skeletons for the lengths. A further step is to have the feathers change the skeletons and not the patterns and generate the patterns from available formats. We can only do that for locales that have the available formats so we'd have to shift the way we gather the data. Right now the only data gathered at basic are the four stock formats. So the available formats aren't available at basic.

And for migration, we'd want to keep the length patterns so the clients would still function, but make sure that they are always just generated from the skeletons.

Anyway, there's still some work to be done.
Reply all
Reply to author
Forward
0 new messages