Overriding number formatters in SimpleDateFormat

4 views
Skip to first unread message

Rich Gillam

unread,
Sep 8, 2025, 9:06:49 PMSep 8
to icu-design, design-wg (CLDR)
Dear icu-d...@unicode.org and desi...@unicode.org

In CLDR, the dateFormats and timeFormats elements have an optional “numbers” attribute that can be used to specify alternate numbering systems for any numeric fields in a formatted date.  The Chinese and Hebrew calendars, for example, use this to get an alternate numbering system for the day and year fields in some number formats.  The values of the “numbers” attribute eventually end up in the fDateOverride and fTimeOverride properties on SimpleDateFormat.

The availableFormats and intervalFormats elements do _not_ (as far as I know) have a similar attribute.  So that raises a question: If all formats (or all formats of a particular type) for a calendar in a locale should have a particular non-default numbering system, how do you get that?

I kind of thought there was code in ICU that would infer the correct numbering systems from the dateFormats elements, but it looks like I’m wrong about that.  So what’s the party line here (if there is one)?  You can kind of fake the desired behavior by initializing a date formatter with one of the standard formats and then applying the pattern you want on top of that.  There’s also a constructor for SimpleDateFormat that lets the client pass in a number-override string.  But in either case, you’re relying on the client code to know what they’re asking for— it’s not an inherent part of the pattern you’re asking for (which you hopefully got out of a DateTimePatternGenerator).  Am I missing something, or is that the answer— that the client has to know which overrides they want and ask for the right thing?  Is there a better answer?

If that’s the party line, I think we need something better.  I’d propose something like this:

1. In the long term, I think we need some kind of enhancement to the pattern language for date/time formats that lets the numbering system be part of the pattern.

2. In the medium term, the area where my team is feeling the pain right now is with day-of-the-month names.  We have calendars that use names, not numbers, for days of the month, and the current mechanisms suck for that.  I think we should have day-name resources that work like the current month-name and day-of-the-week-name resources and use “ddd” and “dddd” in patterns to get them.

3. In the short term, we can probably add code to SimpleDateFormat to infer the correct override string.  I’m thinking something similar to what Peter did for date+time patterns, where he used the length of the month field in the date pattern to decide which length of date+time pattern to use.  We’d use the length of the month field to decide which dateFormats element to get the “numbers” attribute from.

I’ll file one or more tickets, but I wanted to start the discussion in email first to get a sense of how people were feeling about this issue.  Please let me know what you think…

—Rich

Shane Carr

unread,
Sep 8, 2025, 9:31:30 PMSep 8
to Rich Gillam, icu-design, design-wg (CLDR)
Applying the attribute to the whole pattern seems like the wrong thing. It should be a field-specific override: it makes sense that certain fields (like month) would use a different numbering system than other fields (like year). I think I recall us discussing syntax such as "y{hanidec}M{latn}d{latn}". This might be what you're referring to in your "long term" solution.

In ICU4X, we model this as a private-use field length. For example, we might define field length 16 to be numeric hanidec. Note: ICU4X doesn't ship RBNF for date formatting; we instead pre-compute the strings at build time (for days and months, there aren't very many strings, and I believe no one currently uses an algorithmic numbering system for years). This sounds a bit like your "medium term" solution, just formalizing the specific widths into the CLDR spec.

As far as it not being currently supported in interval formats, I think it's probably just an oversight.

--
You received this message because you are subscribed to the Google Groups "design-wg (CLDR)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to design-wg+...@unicode.org.
To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/design-wg/3AECC6BB-133C-4E27-959E-61BA9390F697%40apple.com.

Markus Scherer

unread,
Sep 9, 2025, 4:10:23 PMSep 9
to Rich Gillam, icu-design, design-wg (CLDR)
On Mon, Sep 8, 2025 at 6:06 PM 'Rich Gillam' via icu-design <icu-d...@unicode.org> wrote:
2. In the medium term, the area where my team is feeling the pain right now is with day-of-the-month names.  We have calendars that use names, not numbers, for days of the month, and the current mechanisms suck for that.  I think we should have day-name resources that work like the current month-name and day-of-the-week-name resources and use “ddd” and “dddd” in patterns to get them.

Can you give an example for “calendars that use names, not numbers, for days of the month”?

Rich Gillam

unread,
Sep 9, 2025, 5:15:14 PMSep 9
to Shane Carr, icu-design, design-wg (CLDR)
Shane—

Applying the attribute to the whole pattern seems like the wrong thing. It should be a field-specific override

That’s exactly what I had in mind.

it makes sense that certain fields (like month) would use a different numbering system than other fields (like year). I think I recall us discussing syntax such as "y{hanidec}M{latn}d{latn}". This might be what you're referring to in your "long term" solution.

Yes.

In ICU4X, we model this as a private-use field length. For example, we might define field length 16 to be numeric hanidec.

That could work.  I think we’ve reached the limit of the current date pattern syntax and might need to move to something more like the MessageFormat syntax: “{day-of-week,long}, {month,long} {day-of-month,numeric}, {year,numeric}”.

Note: ICU4X doesn't ship RBNF for date formatting; we instead pre-compute the strings at build time (for days and months, there aren't very many strings, and I believe no one currently uses an algorithmic numbering system for years). This sounds a bit like your "medium term" solution, just formalizing the specific widths into the CLDR spec.

For the day-of-month names, anyway, yeah.

As far as it not being currently supported in interval formats, I think it's probably just an oversight.

I’m sure you’re right.  And TBH, I haven’t actually taken a close look at interval formats, so I might be wrong about what they do and do not support.

—Rich

On Mon, Sep 8, 2025 at 6:06 PM 'Rich Gillam' via design-wg (CLDR) <desi...@unicode.org> wrote:
Dear icu-d...@unicode.org and desi...@unicode.org

In CLDR, the dateFormats and timeFormats elements have an optional “numbers” attribute that can be used to specify alternate numbering systems for any numeric fields in a formatted date.  The Chinese and Hebrew calendars, for example, use this to get an alternate numbering system for the day and year fields in some number formats.  The values of the “numbers” attribute eventually end up in the fDateOverride and fTimeOverride properties on SimpleDateFormat.

The availableFormats and intervalFormats elements do _not_ (as far as I know) have a similar attribute.  So that raises a question: If all formats (or all formats of a particular type) for a calendar in a locale should have a particular non-default numbering system, how do you get that?

I kind of thought there was code in ICU that would infer the correct numbering systems from the dateFormats elements, but it looks like I’m wrong about that.  So what’s the party line here (if there is one)?  You can kind of fake the desired behavior by initializing a date formatter with one of the standard formats and then applying the pattern you want on top of that.  There’s also a constructor for SimpleDateFormat that lets the client pass in a number-override string.  But in either case, you’re relying on the client code to know what they’re asking for— it’s not an inherent part of the pattern you’re asking for (which you hopefully got out of a DateTimePatternGenerator).  Am I missing something, or is that the answer— that the client has to know which overrides they want and ask for the right thing?  Is there a better answer?

If that’s the party line, I think we need something better.  I’d propose something like this:

1. In the long term, I think we need some kind of enhancement to the pattern language for date/time formats that lets the numbering system be part of the pattern.

2. In the medium term, the area where my team is feeling the pain right now is with day-of-the-month names.  We have calendars that use names, not numbers, for days of the month, and the current mechanisms suck for that.  I think we should have day-name resources that work like the current month-name and day-of-the-week-name resources and use “ddd” and “dddd” in patterns to get them.

3. In the short term, we can probably add code to SimpleDateFormat to infer the correct override string.  I’m thinking something similar to what Peter did for date+time patterns, where he used the length of the month field in the date pattern to decide which length of date+time pattern to use.  We’d use the length of the month field to decide which dateFormats element to get the “numbers” attribute from.

I’ll file one or more tickets, but I wanted to start the discussion in email first to get a sense of how people were feeling about this issue.  Please let me know what you think…

—Rich


--
You received this message because you are subscribed to the Google Groups "design-wg (CLDR)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to design-wg+...@unicode.org.
To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/design-wg/3AECC6BB-133C-4E27-959E-61BA9390F697%40apple.com.

--
You received this message because you are subscribed to the Google Groups "icu-design" group.
To unsubscribe from this group and stop receiving emails from it, send an email to icu-design+...@unicode.org.
To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/CABxsp%3D%3DmxBQndhjHPmhYWT2HneLaVKPDzyfEmP7eMd1TwsC73g%40mail.gmail.com.
For more options, visit https://groups.google.com/a/unicode.org/d/optout.

Rich Gillam

unread,
Sep 9, 2025, 5:23:04 PMSep 9
to icu-design, design-wg (CLDR)
Forgot to copy the lists on the email below…

—Rich

Begin forwarded message:

From: Rich Gillam <richard...@apple.com>
Subject: Re: Overriding number formatters in SimpleDateFormat
Date: September 9, 2025 at 2:08:24 PM PDT
To: Markus Scherer <marku...@gmail.com>

Markus--

Can you give an example for “calendars that use names, not numbers, for days of the month”?

Yes I can, now that it’s been in the Apple public betas for a couple months.

The examples that are causing us pain right now are the Hindu lunisolar calendars.  All of them have two issues:

1. They divide each month into two “lunar fortnights” (“pakshas”) of 14 or 15 days, one corresponding to the waxing moon that culminates in a full moon, and one corresponding to the waning moon that culminates in a new moon (which comes first depends on the calendar).  The full moon and new moon are just referred to as “full moon” and “new moon”, and the earlier days in each fortnight have to be mapped to the range of 1 to 15 and be prepended with the fortnight name (e.g., “waxing 12”).

2. The individual days in each fortnight are never written with numerals; the numbers are always spelled out: That is, “waxing 12” is “Shukla Dvadashi”, never “Shukla 12”.

When you multiply all these names by all the Hindu calendars in all the Indian languages, the RBNF-based approach is just awful, and it gets even worse if you want to support multiple levels of abbreviation.  It’d be much easier to just handle the day names the same way we handle month names.

If we did this, I also think it’d be better than RBNF for most of the cases where we currently have custom day names, such as the “hanidays” stuff in Chinese and Japanese and saying “1er” for the first day of the month in French.  And I actually think it wouldn’t really be any worse than using RBNF for traditional Hebrew or Greek numerals in those calendars, or ordinal numbers in UK English.

Granted, this doesn’t help us if we want an alternate numbering system for the year or some other field, but I do think it’d make life better for the day field.

—Rich

On Sep 9, 2025, at 1:10 PM, Markus Scherer <marku...@gmail.com> wrote:

On Mon, Sep 8, 2025 at 6:06 PM 'Rich Gillam' via icu-design <icu-d...@unicode.org> wrote:
2. In the medium term, the area where my team is feeling the pain right now is with day-of-the-month names.  We have calendars that use names, not numbers, for days of the month, and the current mechanisms suck for that.  I think we should have day-name resources that work like the current month-name and day-of-the-week-name resources and use “ddd” and “dddd” in patterns to get them.

Can you give an example for “calendars that use names, not numbers, for days of the month”?

--
You received this message because you are subscribed to the Google Groups "icu-design" group.
To unsubscribe from this group and stop receiving emails from it, send an email to icu-design+...@unicode.org.
--
You received this message because you are subscribed to the Google Groups "ICU - Team" group.
To unsubscribe from this group and stop receiving emails from it, send an email to icu-team+u...@unicode.org.
To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-team/CAN49p6qax51YX4W0pA65BT9Vv5TM9WTjiXm_jGep-B4%2B0X0Q6Q%40mail.gmail.com.


Mihai Niță Ⓤ

unread,
Sep 9, 2025, 6:46:04 PMSep 9
to Rich Gillam, Shane Carr, icu-design, design-wg (CLDR)
> I believe no one currently uses an algorithmic numbering system for years

So I can't do "Sep. 9, MMXXV"? :-)

But seriously, before computers "invaded everything" and made that impossible, Romanian used Roman numbers for numeric month names.
Today would be 9-Ⅸ-2025.
In the meantime this style was not possible for so many years that it is out of fashion now.

M

--
You received this message because you are subscribed to the Google Groups "ICU - Team" group.
To unsubscribe from this group and stop receiving emails from it, send an email to icu-team+u...@unicode.org.
To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-team/B3C74FE0-080E-4EB4-8B91-039F5D257C60%40apple.com.

Shane Carr

unread,
Sep 9, 2025, 6:48:12 PMSep 9
to Mihai Niță Ⓤ, Rich Gillam, icu-design, design-wg (CLDR)
> I believe no one currently uses an algorithmic numbering system for years

I should have said "no locale".

There is a locale that uses roman numerals for months (Hawaiian).
Reply all
Reply to author
Forward
0 new messages