Collation with upper and numeric comparing in unexpected way

Matt Magoffin

unread,

Jan 21, 2026, 4:26:40 PMJan 21

to icu-support

I am trying to configure a collation to sort upper case before lower case with numeric ordering. I am getting some unexpected results, so hoped to find a way to configure ICU to sort in a way that matches my expectations.

I originally posted this question on the pgsql-general mailing list, as I am working with Postgres to implement the sorting of a database column. Hopefully the SQL statements make it obvious how the ICU collation is configured (und-u-kf-upper-kn):

CREATE COLLATION testsort (provider = icu, locale = 'und-u-kf-upper-kn’);

These comparisons are working as I expected:

SELECT 'ID' < 'id' COLLATE testsort; -- true (upper before lower)

SELECT '45' < '123' COLLATE testsort; --true (45 before 123)

However combining them like the following resulted in an unexpected result:

SELECT 'ID123' < 'id45' COLLATE testsort; -- false, expected true

I thought that last one would be false because “ID” would come before “id”. Is there a way to configure the collation to achieve that? I’m trying to match the sorting behaviour in external application code.

Thanks for any help,
Matt

Rich Gillam

unread,

Jan 27, 2026, 6:08:08 PMJan 27

to Matt Magoffin, icu-support

No one else has answered, so I’ll take a shot and maybe somebody who knows this stuff better than I do can chime in and correct me/

I’m going to guess that what you’re running into here is that the case difference is a tertiary difference and the numeric value difference is a primary difference. Tertiary differences only count if there are no primary differences between the strings, so if “45” < “123” counts as a primary difference (and I’m guessing it does), that’ll swamp “ID” < “id”. To get the effect you’re after, you’d either have to elevate the case difference to the primary level or split the string into two pieces and compare them separately. I’m guessing you can’t easily do either of those in a regular SQL query.

—Rich Gillam

Vice chair, ICU Technical Committee

--
You received this message because you are subscribed to the Google Groups "icu-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to icu-support...@unicode.org.
To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-support/3d6a375b-e698-4cce-a7f7-8a494a2cd86cn%40unicode.org.

--
You received this message because you are subscribed to the Google Groups "ICU - Team" group.
To unsubscribe from this group and stop receiving emails from it, send an email to icu-team+u...@unicode.org.
To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-team/3d6a375b-e698-4cce-a7f7-8a494a2cd86cn%40unicode.org.

Markus Scherer

unread,

Jan 27, 2026, 6:28:44 PMJan 27

to Rich Gillam, Matt Magoffin, icu-support

On Tue, Jan 27, 2026 at 3:08 PM 'Rich Gillam' via icu-support <icu-s...@unicode.org> wrote:

I’m going to guess that what you’re running into here is that the case difference is a tertiary difference and the numeric value difference is a primary difference. Tertiary differences only count if there are no primary differences between the strings, so if “45” < “123” counts as a primary difference (and I’m guessing it does), that’ll swamp “ID” < “id”.

Correct. Collation is a multi-level comparison.

See

https://www.unicode.org/reports/tr10/#Multi_Level_Comparison

https://cldr.unicode.org/index/cldr-spec/collation-guidelines#determining-level-differences

Thanks Rich :-)

markus

Matt Magoffin

unread,

Jan 27, 2026, 6:47:01 PMJan 27

to Markus Scherer, Rich Gillam, icu-support

Thank you Rich and Markus,

I missed the part in the docs that mention numericOrdering is a primary difference, that all makes sense now. Unfortunately it does not look like I can elevate the case difference to the primary level in Postgres, so I will explore other options.

Thank you for your help!

Cheers,

Matt

Reply all

Reply to author

Forward