Shall DICOM define an additional Specific Character Set (0008,0005) value for ISO/IEC 8859-15 ?

gunter zeilinger

unread,

Jul 7, 2020, 8:51:47 AM7/7/20

to

DICOM does not define use of ISO/IEC 8859-15 (Latin-9) for encoding string and text attributes. ISO/IEC 8859-15 (Latin-9) contains characters Š/š and Ž/ž not included in ISO/IEC 8859-1 (Latin-1) used in Estonian and Finnish for transcribing foreign names.

We meet PACS/RIS implementations using it - or alternatively Windows-1252 (CP-1252).

There is a code for ISO/IEC 8859-15 in HL7 Table 0211: Alternate character sets since v2.6 - http://www.hl7.eu/HL7v2x/v26/hl7v26tab0211.htm

Just for discussion...

Gunter

Markus Sabin

unread,

Jul 10, 2020, 7:33:58 AM7/10/20

to

No reactions yet, but I think this is an interesting topic in general. From todays perspective DICOM's way of character encoding appears quite messy to me.

I have met several device manufacturers lately who were thinking that supporting ISO_IR 192 (=UTF-8) would be the one solution that fits for all situations. This is a good idea in theory but not in practice since there are many manufacturers developing for local markets and only supporting their specific local character encoding (e.g. ISO_IR 100 = Latin-1). Code Extension Techniques drive the matter to the extreme IMHO - rather than relying on one universal character set there are rules for switching between different specific ones.

I wish those device manufacturers were right - why not always make use of the one universal character encoding that includes all character sets in the world? So I would rather tend to not increase the number of specific character sets but to hope for UTF-8 evolving towards the universal character set available in all DICOM implementations. Overcoming such problems (without extending the number of character sets supported by DICOM) may be an occasion to drive the market into this direction.

Sorry for being that bold. But hey, it is Friday so why not dream of a better future :-)

Victor Derks

unread,

Jul 12, 2020, 7:06:54 AM7/12/20

to

(0008,0005) can have a value based on a Defined Term(s). So defining and using your own option is valid. The question is, does it makes sense?

Given the fact that UTF-8 is the leading text encoding on the Web, 95% of all web sites use UTF-8 (July 2020), has native support from almost all programming languages and is fully supported by DICOM (ISO_IR 192) I would always advice to use ISO_IR 192 for new implementations.

Using a defined term not defined by the DICOM standard, will sooner or later generate interoperability problems with systems from other vendors. Maybe a great solution for the vendor as a vendor lock-in, but not a great option for the end-user (hospital).

In general the DICOM standard defines how things can be done, but doesn't make recommendations. Maybe a thing like a IHE integration profile "internationalization" would help to push vendors to support UTF-8 always out of the box and to get rid of this problem.

David Gobbi

unread,

Jul 15, 2020, 1:36:22 AM7/15/20

to

I'm playing devil's advocate here, but UTF-8 has its own problems. Normalization form is important. There are different characters that have identical glyphs. It has an ever-expanding repertoire of characters that may or may not be supported by the fonts on the target system. According to PS 3.5.6.1, DICOM's "ISO_IR 192" corresponds to Unicode 3.2 (2002), but how to guarantee that a block of UTF-8 text uses characters only from that version of the standard?

Of course these problems are mainly theoretical. In practise they rarely occur, and when they do, they're more of an inconvenience than a disaster.

I have to say, though, that I have a fondness for ISO 8859 because it is stable (no changes since 2001, with only 15 defined character sets in total). From an interoperability perspective, it seems strange that DICOM defines terms for some ISO 8859 character sets but not others.

gunter zeilinger

unread,

Jul 23, 2020, 9:35:53 AM7/23/20

to

Just a related thought. Different to HTTP, DUL services does not support negotiation of accepted character sets. So even if an archive supports conversion between ISO Latin 1 (which is still quite common in objects received from modalities) and UTF-8 (increasingly seen in received HL7 v2 messages), it would have to be configured, which character set it shall use in query (C-FIND) responses and retrieved (C-STORE RQ) objects, dependent on (the AE Title of) the other DICOM peer application.

Markus Sabin

unread,

Jul 23, 2020, 11:08:41 AM7/23/20

to

Am Donnerstag, 23. Juli 2020 15:35:53 UTC+2 schrieb gunter zeilinger:

> Just a related thought. Different to HTTP, DUL services does not support negotiation of accepted character sets. So even if an archive supports conversion between ISO Latin 1 (which is still quite common in objects received from modalities) and UTF-8 (increasingly seen in received HL7 v2 messages), it would have to be configured, which character set it shall use in query (C-FIND) responses and retrieved (C-STORE RQ) objects, dependent on (the AE Title of) the other DICOM peer application.

True. However I think the value for Specific Character Set that is used in the C-FIND-RQ/C-MOVE-RQ is a quite good indication for what the requesting system is capable of handling in the case that no character set has been explicitly configured.

And agreed, I have wondered several times why the support for character sets is not subject of association negotiation - this would help a lot. I assume that the association negotiation was specified at a time when IR 6 was the only character set considered.

As far as I understand, it is DICOM conformant to receive a (C-FIND) request in IR 100 encoding and send each reply with a different character set (IR 144, UTF-8, Code extensions,...). No status code is suitable to convey the information that the receiver was unable to handle the character sets. I think a generic "Unable to Process" is the closest match.

Jörg Riesmeier

unread,

Jul 23, 2020, 3:19:10 PM7/23/20

to

> And agreed, I have wondered several times why the support for character sets is not subject of association negotiation - this would help a lot. I assume that the association negotiation was specified at a time when IR 6 was the only character set considered.

Maybe, you could come up with a proposal and submit a CP to DICOM WG-06?

> As far as I understand, it is DICOM conformant to receive a (C-FIND) request in IR 100 encoding and send each reply with a different character set (IR 144, UTF-8, Code extensions,...).

Correct. And, if there are multiple matches, each C-FIND Response Data Set can use a different Specific Character Set.

> No status code is suitable to convey the information that the receiver was unable to handle the character sets. I think a generic "Unable to Process" is the closest match.

Defining a specific DIMSE Status Code (Failure and/or Warning) for this purpose could be part of your CP :-)

Regards,
Jörg

Message has been deleted

Markus Sabin

unread,

Jul 24, 2020, 1:33:06 AM7/24/20

to

This is an interesting thought. Never considered that since I was thinking that association negotiation is kind of written in stone. But I will seriously think about whether I feel capable of doing this.

I would have to think about it, but I think that it would be sufficient (preferred) to _either_ negotiate the supported character sets in advance _or_ define a status code for "Cannot handle Specific Character Set"

Regards,

Markus

Jörg Riesmeier

unread,

Jul 24, 2020, 9:21:57 AM7/24/20

to

> This is an interesting thought. Never considered that since I was thinking that association negotiation is kind of written in stone. But I will seriously think about whether I feel capable of doing this.

It is not written in stone but, of course, any change should be backward compatible. In fact, there were numerous extensions to the association negotiation protocol over time (e.g. fuzzy name matching or extended negotiation of user identity) but all them are optional.

> I would have to think about it, but I think that it would be sufficient (preferred) to _either_ negotiate the supported character sets in advance _or_ define a status code for "Cannot handle Specific Character Set"

Good luck! If you want somebody to have a look at your CP before you actually submit it to WG-06, you could send it to me (if you think that this would be helpful). My contact details (including email address) are here: https://www.jriesmeier.com/contact/

Regards,
Jörg

Victor Derks

unread,

Aug 1, 2020, 7:38:18 AM8/1/20

to

Extending the association negotiation with "Supported Character Set" information would be welcome improvement.
It may take some time however, before the CP is approved and more importantly implemented by a large group of vendors. But I think it would be a step in the right direction.

Currently an alternative solution to know if a certain DICOM system is capable of a certain character set is to access the LDAP server on the network that implements the Application Configuration Management
Profile for that network. See: http://dicom.nema.org/medical/dicom/current/output/html/part15.html#sect_H.1.1.2
I have however not see many of these systems in the field.

Such configuration information can however also be stored locally in your own DICOM system. The AE-Title can then be used during the association negotiation to lookup that info and at it to the "association session object". It is even possible to use the Implementation Class UID information from the association request to fingerprint the requesting system and retrieve its capabilities from a local configuration file.
Of course these options can be chained in a failback order.