SpecialCasing.txt, under "Unconditional Mappings", says:# IMPORTANT-when iota-subscript (0345) is uppercased or titlecased,
# the result will be incorrect unless the iota-subscript is moved to the end
# of any sequence of combining marks. Otherwise, the accents will go on the capital iota.
# This process can be achieved by first transforming the text to NFC before casing.
# E.g. <alpha><iota_subscript><acute> is uppercased to <ALPHA><acute><IOTA>But it appears that ICU only follows that rule when uppercasing in a Greek locale. Is that right?
There seems to be some specific tailoring for the Greek result as well
--
You received this message because you are subscribed to the Google Groups "icu-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to icu-support...@unicode.org.
To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-support/CAN49p6pfh_5pOcdEHDoBzC0j5qAdtpLDN1WY%2BTDz4taKQAPOxg%40mail.gmail.com.
Yes. Normally, the uppercase functions do what the spec says:R1 toUppercase(X): Map each character C in X to Uppercase_Mapping(C).
So in order to get Greek right when there is an implicit or explicit iota subscript (ypogegrammeni) followed by another combining mark, you need to normalize first.
I don't actually think that NFC will do the trick, and might make it worse, because it pulls the iota subscript into a composite letter despite a following lower-ccc combining mark.NFD would work, or (with ICU), FCD.
You might want to submit a bug report about the misleading text in SpecialCasing.txt, via https://www.unicode.org/reporting.html
I have filed the report as "An Error in Publications/Data", which linked to this discussion, but I did not receive an issue number or confirmation email so I can't link to it from here. Hopefully I have included the right context so they understand the source of the confusion.
So in order to get Greek right when there is an implicit or explicit iota subscript (ypogegrammeni) followed by another combining mark, you need to normalize first.You mean it's essentially a warning that you should normalize before uppercasing? And they put it in "Unconditional mappings" because that's always an OK thing to do?
Hmm. I thought that NFC(NFD(X)) = NFC(X). Is that not the case?
You might want to submit a bug report about the misleading text in SpecialCasing.txt, via https://www.unicode.org/reporting.htmlI have filed the report as "An Error in Publications/Data", which linked to this discussion, but I did not receive an issue number or confirmation email so I can't link to it from here. Hopefully I have included the right context so they understand the source of the confusion.