Character count within PN attributes

Skip to first unread message


Mar 11, 2022, 6:10:06 AMMar 11
Hi all,
so for PN attributes the maximum value length of one CG is limited to 64 characters including delimiters. Considering user input of only lastname and firstname, this would leave me with a 63 character limit considering the single "^" delimiter between the two components. Furthermore considering the used Specific Character Set (0008,0005) as "ISO IR100":

- 'ü' = shoud count as two seperate characters

But then again what about:
- '°' = this should still count as one character right (not two (bytes))?

Or should all characters from 0xA1 on count as 2 characters? This is currently a bit confusing two me, since I thought that the text/string VR limit within DICOM really means characters and never bytes.

Best regards,

David Gobbi

Mar 11, 2022, 1:33:11 PMMar 11
On Friday, 11 March 2022 at 04:10:06 UTC-7, madMorty wrote:
> - 'ü' = should count as two separate characters

You must be referring to the statement about diacritics at the end of PS 3.5

> Each combining character (e.g., diacritics or vowel marks) shall be considered a separate character for this maximum length, regardless of how an application may display such combining characters (i.e., combined into the glyph for the base character, or rendered separately).

So 'ü' should only be counted as two characters if the letter u and the diacritic ¨ are encoded as separate code points (a base character and a combining diacritic). But in ISO_IR 100, ü is encoded as a single code point. Ditto for NFC utf-8, where even though ü is encoded as two bytes, it is still a single code point.

Further clarification is available in CP 964 "Correct alphabetic name encoding for Unicode", which states:
> The definition: Combining characters (e.g., diacritics or vowel marks) separately encoded from base
characters shall be considered separate characters for this maximum length was chosen to be
consistent with Unicode and GB18030 definition of character code points.

So there you go. When DICOM says "character", it means "code point", regardless of the number of bytes, and regardless how the code points might be combined to form glyphs.
Reply all
Reply to author
0 new messages