On Friday, 11 March 2022 at 04:10:06 UTC-7, madMorty wrote:
> - 'ü' = should count as two separate characters
You must be referring to the statement about diacritics at the end of PS 3.5
6.2.1.2:
> Each combining character (e.g., diacritics or vowel marks) shall be considered a separate character for this maximum length, regardless of how an application may display such combining characters (i.e., combined into the glyph for the base character, or rendered separately).
So 'ü' should only be counted as two characters if the letter u and the diacritic ¨ are encoded as separate code points (a base character and a combining diacritic). But in ISO_IR 100, ü is encoded as a single code point. Ditto for NFC utf-8, where even though ü is encoded as two bytes, it is still a single code point.
Further clarification is available in CP 964 "Correct alphabetic name encoding for Unicode", which states:
> The definition: Combining characters (e.g., diacritics or vowel marks) separately encoded from base
characters shall be considered separate characters for this maximum length was chosen to be
consistent with Unicode and GB18030 definition of character code points.
So there you go. When DICOM says "character", it means "code point", regardless of the number of bytes, and regardless how the code points might be combined to form glyphs.