Bug#988644: Defined Term: ISO 2022 IR 87 is not supported

Mathieu Malaterre

unread,

May 17, 2021, 6:40:03 AM5/17/21

to

Source: dcmtk
Version: 3.6.5-1

dcmtk source package is compiled against libicu. However there is a
false sense of support for Character Encoding. It seems the general
support is written for libiconv specific implementation, and some
effort has been made to support partially libicu.

For reference:

* https://forum.dcmtk.org/viewtopic.php?t=4566

[...]
for ex atm dcmtk/libicu generates errors on ISO 2022 IR 87/ISO 2022 IR
159 dicoms conversion
[...]

It would make sense to indicate that sort of information in the Debian
package binary README file.

Mathieu Malaterre

unread,

May 17, 2021, 7:40:03 AM5/17/21

to

For instance:

% curl -s --output test.dcm
"https://sourceforge.net/p/gdcm/gdcmdata/ci/master/tree/NM-PAL-16-PixRep1.dcm?format=raw"
% dcmconv +U8 test.dcm testU8.dcm
E: DcmSpecificCharacterSet: 'ISO 2022 IR 87' is not supported by the
utilized character set conversion library 'ICU, Version 63.1.0'
F: Cannot open character encoding, ICU error name:
U_FILE_ACCESS_ERROR: processing file: test.dcm

while:

% dicomdump test.dcm| grep PatientName
(0010,0010) PN "PatientName" : [テストです] (16 bytes)

With:

% apt-cache policy vtk-dicom-tools
vtk-dicom-tools:
Installed: 0.8.9-1
Candidate: 0.8.9-1
Version table:
*** 0.8.9-1 500
500 http://deb.debian.org/debian buster/main amd64 Packages
100 /var/lib/dpkg/status

Jörg Riesmeier

unread,

May 17, 2021, 8:40:03 AM5/17/21

to

I can confirm that DCMTK with ICU enabled does not support the same set of
DICOM character sets (i.e. all) in contrast to the original implementation
based on libiconv. That's the reason why I personally would still prefer
libiconv over the other two character set options in DCMTK.

By the way, this is a well-known issue for the character set conversion
support of DCMTK. I hope that the author of the ICU and stdlibc (iconv)
support will fix it anytime soon.

Mathieu Malaterre

unread,

May 17, 2021, 9:00:05 AM5/17/21

to

Hi Jörg !

Could you please clarify the following statement then:

[...]
As far as I understand it ISO 2022 JP is also a set containing
multiple character sets that can be switched via escape sequences. The
ICU handles these escape sequences internally whereas the libiconv
doesn't. This is why the existing code in DCMTK that was orignally
written for libiconv parses these escape sequences itself, therefore,
the ICU does not perceive them and cannot chose the correct character
set. The only way to fix this would be to disable parsing the escape
sequences when the ICU is used and then set all character sets similar
to your proposition.
[...]

Thanks,

ref:
* https://forum.dcmtk.org/viewtopic.php?p=18480&sid=c3f13bb9c9ae0e54bef6276a5d337980#p18480

Jörg Riesmeier

unread,

May 17, 2021, 10:00:03 AM5/17/21

to

> Could you please clarify the following statement then:
>
> [...]
> As far as I understand it ISO 2022 JP is also a set containing
> multiple character sets that can be switched via escape sequences. The
> ICU handles these escape sequences internally whereas the libiconv
> doesn't. This is why the existing code in DCMTK that was orignally
> written for libiconv parses these escape sequences itself, therefore,
> the ICU does not perceive them and cannot chose the correct character
> set. The only way to fix this would be to disable parsing the escape
> sequences when the ICU is used and then set all character sets similar
> to your proposition.
> [...]

I personally implemented support for the character set conversion based on
libiconv into the DCMTK. So, the original implementation of the OFCharacterSet
and DcmSpecifificCharacterSet classes were designed in a way that allowed for
using libiconv for the various (i.e. all) specific character sets defined in the
DICOM standard, i.e. including ISO 2022 switching of character sets.

The ICU and stdlibc (iconv) support was added later by a colleague from the
OFFIS institute based on this original implementation but, unfortunately,
without adapting the parsing approach (e.g. searching for the escape sequences
used for ISO 2022), so that when ICU support is enabled all DICOM character
sets can be used in the same manner as before (also see comments in the
respective DCMTK class).

Mathieu Malaterre

unread,

May 20, 2021, 5:50:04 AM5/20/21

to

Hi Jörg,

Thanks for the clarification !

I've removed explicit usage of ICU, since support will be equivalent
when using stdlibc (iconv), with the added bonus that we remove a
dependency to ICU.

https://salsa.debian.org/med-team/dcmtk/-/commit/666129093bee6d907ef763324835209e4416ff10

and documentation has been added for Debian users:

https://salsa.debian.org/med-team/dcmtk/-/commit/a533a4acb57242e8c9e9f011e3ba083c8382da97

-M

Jörg Riesmeier

unread,

May 20, 2021, 6:40:03 AM5/20/21

to

Hi Mathieu

> I've removed explicit usage of ICU, since support will be equivalent
> when using stdlibc (iconv), with the added bonus that we remove a
> dependency to ICU.

I'm not sure whether "stdlibc (iconv)" really supports all DICOM character
sets, e.g. "ISO 2022 IR 87" and "ISO 2022 IR 159" are not tested by the
regression test case "dcmdata_specificCharacterSet_1" (see "dcmdata/tests/
tspchrs.cc").
As far as I know, _all_ defined DICOM character sets only work as expected when
using "libiconv", i.e. the original implementation (as explained before).

Regards,
Jörg

Mathieu Malaterre

unread,

May 20, 2021, 7:40:03 AM5/20/21

to

Jörg,

Yes, this is what I defined as "support will be equivalent".

Further detailed at:

* https://salsa.debian.org/med-team/dcmtk/-/commit/a533a4acb57242e8c9e9f011e3ba083c8382da97#92c53ca292f6a209c5328fba1c6a1801e28e51c3_151_163

Thanks again

Jörg Riesmeier

unread,

May 20, 2021, 8:00:04 AM5/20/21

to

Mathieu,

> Yes, this is what I defined as "support will be equivalent".
>
> Further detailed at:
>

> https://salsa.debian.org/med-team/dcmtk/-/commit/a533a4acb57242e8c9e9f011e3
> ba083c8382da97#92c53ca292f6a209c5328fba1c6a1801e28e51c3_151_163

thank you for pointing me (again) to the relevant documentation. I should have
checked your link before. I will continue to try to convince the responsible
developer to fix the existing limitations for both stdlic (iconv) and ICU.

Regards,
Jörg

Mathieu Malaterre

unread,

Nov 7, 2022, 5:20:04 AM11/7/22

to

Version: 3.6.8~git20221013.51be018-1

% dcmconv +U8 /tmp/test.dcm /tmp/utf8.dcm && echo "success"
W: DcmSpecificCharacterSet: Escape sequences shall not be used in the
first component group of a Person Name (PN), using them anyway
W: DcmSpecificCharacterSet: Escape sequences shall not be used in the
first component group of a Person Name (PN), using them anyway
success