Korean AM/PM symbols not localized in ICU 78 (ICU-23332)

17 views
Skip to first unread message

TAMURA, Kent

unread,
Mar 9, 2026, 1:40:51 AM (14 days ago) Mar 9
to icu-s...@unicode.org
Dear ICU Support Team,

I am writing to report a change in behavior regarding time formatting for the Korean locale (ko_KR) in ICU 78.2, which appears to be a regression from ICU 77.

I have already filed a bug report for this issue:
https://unicode-org.atlassian.net/browse/ICU-23332

Issue Description
In ICU 77 and earlier versions, the DateFormat::createDateTimeInstance with the Korean locale correctly produced localized AM/PM symbols ("오전" / "오후"). However, starting with ICU 78, these symbols are rendered as "AM" / "PM" instead of the localized Korean strings.

Environment
ICU Version: 78.2 (Behavior changed from 77.1)
Locale: ko_KR (or ko)
OS: Observed on Linux/x64 (I think this is platform-independent)

Reproduction Code
The following code snippet demonstrates the issue:

```
void PrintDateInKorean(icu::DateFormat::EStyle style, const char* label) {
    UErrorCode status = U_ZERO_ERROR;
    icu::DateFormat* df = icu::DateFormat::createDateTimeInstance(
        icu::DateFormat::kNone, style, icu::Locale("ko", "KR"));
    UDate now = icu::Calendar::getNow();
    icu::UnicodeString myString;
    df->format(now, myString);
    std::string s8;
    myString.toUTF8String(s8);
    std::cout << label << ": " << s8 << std::endl;
    delete df;
}

int main() {
    PrintDateInKorean(icu::DateFormat::kFull,   "FULL  ");
    PrintDateInKorean(icu::DateFormat::kLong,   "LONG  ");
    PrintDateInKorean(icu::DateFormat::kMedium, "MEDIUM");
    PrintDateInKorean(icu::DateFormat::kShort,  "SHORT ");
    return 0;
}
```

Observed Output (ICU 78.2)
```
FULL  : PM 2시 15분 14초 일본 표준시
LONG  : PM 2시 15분 14초 GMT+9
MEDIUM: PM 2:15:14
SHORT : PM 2:15
```
(Expected "오후" instead of "PM")

Analysis
As noted in the Jira ticket, it seems the interpretation of AmPmMarkers within the locale data files has changed in ICU 78. This affects not only Korean but potentially other locales that rely on specific localized markers.

Question
Is this change in behavior an intentional design choice for ICU 78 (e.g., a move toward ASCII defaults for certain styles), or is this an unintended side effect of the recent updates to the AmPmMarkers?

I would appreciate any insights or guidance on whether this is a permanent change we should adapt to, or if a fix is planned.

Best regards,

TAMURA Kent
Software Engineer, Google


Mihai Niță Ⓤ

unread,
Mar 9, 2026, 6:17:22 PM (13 days ago) Mar 9
to TAMURA, Kent, icu-s...@unicode.org
After some digging, this is a summary:

I think it is caused by the code change that added support for get/setAmPmStrings with context / width.

More precisely this:
- ampms = arrays.get("AmPmMarkers");
- ampmsNarrow = arrays.get("AmPmMarkersNarrow");
+ ampms = arrays.get("AmPmMarkersAbbr");
+ ampmsWide = arrays.get("AmPmMarkers");
+ ampmsNarrow = arrays.get("AmPmMarkersNarrow");

But not all locales have data for the Abbr format.

Here is the list of locale files (icu4c/source/data/locales/*.txt) that have AmPmMarkers but no AmPmMarkersAbbr:
    ast, bgc, bho, cy, en_IE, fr_MA, gaa, ko, kok, kok_Latn, ks, ks_Deva, lmo, os, pa, prg, ps, raj, sa, sat, sd_Deva, st, th, tok, vmw, za
All of these locales are now showing AM/PM instead.

It looks like the survey tool is missing that data, perhaps it was added later?
But the fields there are not translated:
    https://st.unicode.org/cldr-apps/v#/ko/Gregorian/471a794c61b793f4
    https://st.unicode.org/cldr-apps/v#/pa/Gregorian/471a794c61b793f4
    https://st.unicode.org/cldr-apps/v#/th/Gregorian/471a794c61b793f4

Some locales are probably better, since the new form is shorter:
    gsw  [am Vormittag, am Namittag] // 77
    gsw  [vorm., nam.] // 78
    hsb [dopołdnja, popołdnju] // 77
    hsb [dop., pop.] // 78
    tk [günortadan öň, günortadan soň] // 77
    tk  [go.öň, go.soň] // 78
    ug [چۈشتىن بۇرۇن, چۈشتىن كېيىن] // 77
    ug [چ.ب, چ.ك] // 78

And it might be that the changes for the locales in India to use AM/PM is also OK.
Since they don't normally use am/pm they don't have dedicated abbreviated forms, they default to English, and that's OK.

But apparently that's not OK for Korean.
And probably for Thai, which was the but that triggered the change
(ICU-23177 "Why does ICU format with the long Thai day period?")

So I think that a good fix would be to update the locales with an abbreviated form, if possible.

Cheers,
Mihai

--
You received this message because you are subscribed to the Google Groups "icu-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to icu-support...@unicode.org.
To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-support/CAGH7WqF3Wx49Rp61zseFLuf2uzsFEyXsiD1bTzg5oMhhn0A5aQ%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "ICU - Team" group.
To unsubscribe from this group and stop receiving emails from it, send an email to icu-team+u...@unicode.org.
To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-team/CAGH7WqF3Wx49Rp61zseFLuf2uzsFEyXsiD1bTzg5oMhhn0A5aQ%40mail.gmail.com.

Mark Davis Ⓤ

unread,
Mar 9, 2026, 6:19:58 PM (13 days ago) Mar 9
to Mihai Niță Ⓤ, TAMURA, Kent, icu-s...@unicode.org, cldr-core2
It sounds like that ticket should be transferred to CLDR, and be scheduled for v49 (ICU 79)

Shane Carr

unread,
Mar 9, 2026, 10:24:15 PM (13 days ago) Mar 9
to Mark Davis Ⓤ, Mihai Niță Ⓤ, TAMURA, Kent, icu-s...@unicode.org, cldr-core2
Hmm. We should establish which day period data path is the one that is guaranteed to be populated (between wide/abbr/narrow) and use it as the fallback target. I seem to remember the code that landed fixing bugs, but if it created new bugs, we should add test cases and revisit.


You received this message because you are subscribed to the Google Groups "CLDR - Core (cldr-core2)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cldr-core2+...@unicode.org.
To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/cldr-core2/CAGuL-cjVAa7mkmZWwpJq-vjOdU6OW%3DFvQPyeyVe8j5a7J%2BG3CQ%40mail.gmail.com.

Mark Davis Ⓤ

unread,
Mar 9, 2026, 11:47:36 PM (13 days ago) Mar 9
to Shane Carr, Mihai Niță Ⓤ, TAMURA, Kent, icu-s...@unicode.org, cldr-core2
Here is the root. So stand-alone falls back to format, and narrow and wide fall back to abbreviated (normally various widths fall back to 'wide', but this and units are unusual). Note also that all non-gregorian falls back eventually to gregorian (for time, that generally works well without overrides, since the format is generally shared across multiple calendars)

<dayPeriods>

<dayPeriodContext type="format">

<dayPeriodWidth type="abbreviated">

<dayPeriod type="am">AM</dayPeriod>

<dayPeriod type="pm">PM</dayPeriod>

</dayPeriodWidth>

<dayPeriodWidth type="narrow">

<alias source="locale" path="../dayPeriodWidth[@type='abbreviated']"/>

</dayPeriodWidth>

<dayPeriodWidth type="wide">

<alias source="locale" path="../dayPeriodWidth[@type='abbreviated']"/>

</dayPeriodWidth>

</dayPeriodContext>

<dayPeriodContext type="stand-alone">

<dayPeriodWidth type="abbreviated">

<alias source="locale" path="../../dayPeriodContext[@type='format']/dayPeriodWidth[@type='abbreviated']"/>

</dayPeriodWidth>

<dayPeriodWidth type="narrow">

<alias source="locale" path="../dayPeriodWidth[@type='abbreviated']"/>

</dayPeriodWidth>

<dayPeriodWidth type="wide">

<alias source="locale" path="../dayPeriodWidth[@type='abbreviated']"/>

</dayPeriodWidth>

</dayPeriodContext>

</dayPeriods>


The coverage is the following (in coverageLevels.xml); so all paths in gregorian for am/pm should be at least at Moderate.

395: <coverageLevel value="basic"   match="dates/calendars/calendar[@type='gregorian']/dayPeriods/dayPeriodContext[@type='format']/dayPeriodWidth[@type='wide']/dayPeriod[@type='(am|pm)']"/>
1,036: <coverageLevel value="moderate" match="dates/calendars/calendar[@type='gregorian']/dayPeriods/dayPeriodContext[@type='%anyAlphaNum']/dayPeriodWidth[@type='%allWidths']/dayPeriod[@type='%anyAlphaNum']"/>


The one other possibility is PathHeader, which determines what is shown at all in the ST. It looks like we show everything from gregorian to the vetters.

//ldml/dates/calendars/calendar[@type="gregorian"]/dayPeriods/dayPeriodContext[@type="%A"]/dayPeriodWidth[@type="%A"]/dayPeriod[@type="%A"] ; DateTime ; &calendar(gregorian) ; &calField(DayPeriods:$2:$1) ; &dayPeriod($3)

//ldml/dates/calendars/calendar[@type="%A"]/dayPeriods/dayPeriodContext[@type="%A"]/dayPeriodWidth[@type="%A"]/dayPeriod[@type="%A"] ; Special ; Suppress ; &calendar($1) ; &calField(DayPeriods:$3:$2)-&dayPeriod($4) ; HIDE


So far, I am a bit puzzled as to why they are missing. It will take a bit of debugging to figure out why the AM/PM values are not being flagged in the survey tool because of missing values.

...But the fields there are not translated:
    https://st.unicode.org/cldr-apps/v#/ko/Gregorian/471a794c61b793f4
...

Mark

Annemarie Apple

unread,
Mar 10, 2026, 1:12:58 AM (13 days ago) Mar 10
to Mark Davis Ⓤ, Shane Carr, Mihai Niță Ⓤ, TAMURA, Kent, icu-s...@unicode.org, cldr-core2
Did the data which is being used change in ICU 78.1?

There should be a CLDR issue and we should review and update the data if needed.
  • Korean language specialists discussed this in CLDR 28 and decided to keep them in English because "Since this is 'abbreviated' form I left this in English" which is somewhat strange since the Korean is not significantly longer than the English and none of the other AM/PMs are in English?
  • Thai decided to update this in CLDR 42/43 per the forum posts and referenced CLDR-8810 in their forum posts? Link to XML (last modified 4 years ago)
Either way it sounds like we should verify the right format and potentially clarify instructions and have the language specialists re-review the data.

Question - Is this important enough to fix for ICU 78.3 for at least Korean / Thai considering how commonly they're used?

Best,

Annemarie

~~~~~~~~~~
If you received this communication by mistake, please don't forward it to anyone else, please erase all copies of it, including all attachments, and please let the sender know it went to the wrong person. Thank you!


Shane Carr

unread,
Mar 10, 2026, 2:10:06 PM (12 days ago) Mar 10
to Annemarie Apple, Mark Davis Ⓤ, Mihai Niță Ⓤ, TAMURA, Kent, icu-s...@unicode.org, cldr-core2
In ICU 77 we were loading the long type as the default (abbreviated) form. Now we load the abbreviated type as the default form, more in line with CLDR.

Annemarie Apple

unread,
Mar 10, 2026, 2:17:25 PM (12 days ago) Mar 10
to Shane Carr, Mark Davis Ⓤ, Mihai Niță Ⓤ, TAMURA, Kent, icu-s...@unicode.org, cldr-core2
Should we move https://unicode-org.atlassian.net/browse/ICU-23332 to CLDR, or would you prefer I file a new ticket to fix the data in CLDR?

Best,

Annemarie

~~~~~~~~~~
If you received this communication by mistake, please don't forward it to anyone else, please erase all copies of it, including all attachments, and please let the sender know it went to the wrong person. Thank you!

Markus Scherer

unread,
Mar 10, 2026, 4:32:43 PM (12 days ago) Mar 10
to Annemarie Apple, Shane Carr, Mark Davis Ⓤ, Mihai Niță Ⓤ, TAMURA, Kent, icu-s...@unicode.org, cldr-core2
On Tue, Mar 10, 2026 at 11:17 AM Annemarie Apple <anne...@unicode.org> wrote:
Should we move https://unicode-org.atlassian.net/browse/ICU-23332 to CLDR, or would you prefer I file a new ticket to fix the data in CLDR?

Moving to CLDR is good if we don't need ICU code changes.

Annemarie Apple

unread,
Mar 10, 2026, 4:35:55 PM (12 days ago) Mar 10
to Markus Scherer, Shane Carr, Mark Davis Ⓤ, Mihai Niță Ⓤ, TAMURA, Kent, icu-s...@unicode.org, cldr-core2
Okay, will do. I think we just need to update the AM/PM data for consistency.

Best,

Annemarie

~~~~~~~~~~
If you received this communication by mistake, please don't forward it to anyone else, please erase all copies of it, including all attachments, and please let the sender know it went to the wrong person. Thank you!

Mark Davis Ⓤ

unread,
Mar 10, 2026, 4:36:30 PM (12 days ago) Mar 10
to Markus Scherer, Annemarie Apple, Shane Carr, Mihai Niță Ⓤ, TAMURA, Kent, icu-s...@unicode.org, cldr-core2
That makes sense. It sounds like the reason for this showing up to users was a change in ICU to "do the right thing", but that exposed a problem in CLDR. So the ticket should go to CLDR and be fixed there, because that will fix the problem that people see in ICU

On Tue, Mar 10, 2026 at 1:32 PM Markus Scherer <marku...@gmail.com> wrote:

Annemarie Apple

unread,
Mar 10, 2026, 5:35:54 PM (12 days ago) Mar 10
to Mark Davis Ⓤ, Markus Scherer, Shane Carr, Mihai Niță Ⓤ, TAMURA, Kent, icu-s...@unicode.org, cldr-core2
Okay, I've parsed the list of locales Shane provided and here is the summary of what we will check for CLDR. I'll send out an email to the CLDR TC so other people can check with their language specialists as well.

Languages we'd need to check:
  • ko - Checking with native speakers. The localized version is not much longer than English, and it's strange that the AM/PM appears before the time? Usually when languages keep AM/PM it tends to follow the time like in English.
  • kok, kok_Latn (likely uses the en abbreviation), this may be intentional
  • ps - This looks weird, only abbreviated formatted is unlocalized, all other forms are localized
  • th - Thai looks to intentionally be AM/PM since they don't have a shorter localized version, but I will double check with language specialists
  • cy - Seems weird since the wide formatted versions are the same length or shorter than AM/PM.
Languages where this is expected:
  • pa is consistently localized as AM/PM - so this is 90% likely to be intentional
  • Languages at Basic or below aren't required to localize this: ast, bgc, bho, gaa, ks, ks_Deva, lmo, os, prg, raj, sa, sat, sd_Deva, st, tok, vmw, za
  • en_IE can reasonably inherit from en-001 and doesn't need this to be specified unless it is different
  • fr_MA can reasonably inherit from fr and doesn't need this to be specified unless it is different

Best,

Annemarie

~~~~~~~~~~
If you received this communication by mistake, please don't forward it to anyone else, please erase all copies of it, including all attachments, and please let the sender know it went to the wrong person. Thank you!

Mark Davis Ⓤ

unread,
Mar 10, 2026, 5:47:00 PM (12 days ago) Mar 10
to Annemarie Apple, Markus Scherer, Shane Carr, Mihai Niță Ⓤ, TAMURA, Kent, icu-s...@unicode.org, cldr-core2
Thanks for the analysis and followup
Reply all
Reply to author
Forward
0 new messages