personName: deriving the effective formatting locale

Kip Cole

unread,

Apr 24, 2023, 2:08:50 AM4/24/23

to CLDR Users Public Mail List

When resolving the formatting locale for a personName, using a test data example[3] and referencing the icu4j implementation of getNameLocale[1] it would appear the intent is that if the personName object has an associated locale then that locale should be used unmodified.

In that example the personName locale is “ja-AQ” which when maximally expanded includes the script `Jpan`. Using that locale returns a `surnameFrirst` format with no spaces in the template and therefore an incorrect test result of “MüllerKäthe”.

TR35[2] says:

> If the PersonName object can provide a name locale, return a locale formed from it by replacing its script by the name script.

Which doesn’t seem consistent with the icu4j implementation[1] (but then I’m not a java guy so maybe misreading it).

My questions:

1. When the personName object has an associated locale, is that locale expected to be used without further modification? If not then ….

2. If the derived script of the personName differs from the script of the personNames associated locale, is the nameLocale formed by the base language of the associated locale plus the derived personName script and the region of the personName locale?

3. If (2) is correct then is the fallback chain the same as that described for the derived name order[4] ? Which in this case would be:

ja_Latin_AQ, und_Latn_AQ, ja_Latn, und_Latn, ja_AQ, und_AQ, ja, und

4. Is it correct to say that if a personName has an associated locale then the formatting locale is not used to influence formatting?

5. Is is correct to say that if there is NOT an associated locale for the personName then the effective formatting locale is basically the base language and region of the formatting locale with the derived script of the personName then applying the fallback chain from [4]?

Many thanks for any guidance,

—Kip

References:

[1] icu4j personname formatter: https://github.com/unicode-org/icu/blob/bfa5f4e6ae177860d867af047d759a88076d7c38/icu4j/main/classes/core/src/com/ibm/icu/impl/personname/PersonNameFormatterImpl.java#L278-L310

[2] TR35 PersonName formatting http://www.unicode.org/reports/tr35/tr35-personNames.html#formatting-process

[3] personName referenced test data example: https://github.com/unicode-org/cldr/blob/main/common/testData/personNameTest/en.txt#L336-L345

[4] personName derived name order: http://www.unicode.org/reports/tr35/tr35-personNames.html#derive-the-name-order

Mark Davis Ⓤ

unread,

Apr 24, 2023, 1:56:09 PM4/24/23

to Kip Cole, CLDR Users Public Mail List

Please check out https://cldr-smoke.unicode.org/spec/maint-43/ldml/tr35-personNames.html#formatting-process to see if that needs clarification or fixes. The intent is that a person name record be normally only formatted by a locale with the name's script. But the logic might need tuning.

Also see:

https://unicode-org.atlassian.net/browse/ICU-22304

https://unicode-org.atlassian.net/browse/ICU-22362

--
You received this message because you are subscribed to the Google Groups "CLDR Users Public Mail List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cldr-users+...@unicode.org.
To view this discussion on the web visit https://groups.google.com/a/unicode.org/d/msgid/cldr-users/E59A849D-42C1-4F9B-BEE3-5BD50A00AF84%40gmail.com.

Mark Davis

unread,

May 2, 2023, 9:08:54 PM5/2/23

to CLDR Users Public Mail List, Mark Davis, CLDR Users Public Mail List, kipc...@gmail.com

We reviewed and discussed this in the person name formatting subcommittee, and came up with modified text. See https://unicode-org.atlassian.net/browse/CLDR-16623 . Comments are welcome there.

To unsubscribe from this group and stop receiving emails from it, send an email to cldr-users+unsubscribe@unicode.org.

Reply all

Reply to author

Forward