Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

C-FIND and Character Set clarification

1,183 views
Skip to first unread message

Mathieu Malaterre

unread,
Jan 25, 2012, 3:52:48 AM1/25/12
to
Dear all,

I am trying to understand the behavior for a C-FIND SCU
implementation. According to PS-3.4-2011 we have:

------
K.4.1.1.3.1 Request Identifier Structure
...
Conditionally, the Attribute Specific Character Set (0008,0005). This
Attribute shall be included if
expanded or replacement character sets may be used in any of the
Attributes in the Request
Identifier. It shall not be included otherwise.
------

In order to clearly understand the required behavior, I'll use the
syntax of findscu (from DCMTK excellent package). If I understand the
standard correctly, if a user sends:

$ findscu --patient --key 8,52=PATIENT --key 10,10="*Jérôme*"
dicom.example.com 11112

The C-FIND message is required to contains (assuming UTF-8 is used in
our case):

(0008,0005) CS [ISO_IR 192] # 10, 1
SpecificCharacterSet
(0008,0052) CS [PATIENT] # 8, 1
QueryRetrieveLevel
(0010,0010) PN [*Jérôme*] # 10, 1
PatientName

What I am trying to understand is when should a C-FIND SCU
implementation insert explicitly the SpecificCharacterSet attribute
and above all when it should not.

My naive implementation in GDCM was to always add the
SpecificCharacterSet which seems to be an issue for some SCP
implementation.

thanks very much for your guidance,
-M

David Clunie

unread,
Jan 25, 2012, 2:51:19 PM1/25/12
to
Hi Mathieu

Interesting question.

One of the problems with DICOM and character sets is that there is
no negotiation. I.e., the SCU does not know which, if any beyond
the default, character sets are supported by the SCP, and there
is no requirement that the SCP support any beyond the default.

So, the SAFEST query is one using the default character set (i.e.,
not sending Specific Character Set (0008,0005)) AND of course, not
using anything beyond 7-bit US-ASCII in any string in the identifier,
e.g., "*J?r?me*" rather than "*Jérôme*".

Note that in the absence of accent-insensitive matching, it is safer
to request "*J?r?me*"; if accent-insensitive matching were in use by
the SCP, then "*Jerome*" should match, and can be encoded in the
default character set. There is no negotiation of accent-insensitive
matching (although it is implicit in fuzzy semantic matching).

Some SCPs may support Specific Character Set "properly", but there may
be a limited number of choices. For example, support for Latin 1 may
be more common than UTF-8.

So the query you proposed with "*Jérôme*" and "ISO_IR 100" might
succeed, whereas "ISO_IR 192" might not.

Note also that the SCP in its response identifier may use a different
Specific Character Set that that used by the SCU in the request, or
none at all.

E.g., the SCP might always return "ISO_IR 192" even if the request
was "ISO_IR 100", or vice versa.

For example, a request of "*Jérôme*" and "ISO_IR 100" might return:

- nothing if the SCP doesn't support the character set or the accents
- "Buc^Jerome" and no Specific Character Set
- "Buc^Jérôme" and "ISO_IR 100"
- "Buc^Jérôme" and "ISO_IR 192"
- all sorts of other permutations and combinations

My personal approach when deciding which specific character set to
use for a dataset (whether it be storing a composite object, or
generating a query request or response identifier), is to use the
least necessary character set (e.g., none, ISO_IR 100 if needed,
then ISO_IR 192 if ISO_IR 100 is insufficient, for example). See
"com.pixelmed.dicom.AttributeList.getSuitableSpecificCharacterSetForAllStringValues()".

The history of all this is that the original standard did not define
explicit behavior when Specific Character Set was present, which might
lead really old SCPs to attempt to MATCH on Specific Character Set,
theoretically, if they had support this attribute as an optional
matching key. This was corrected in CP 199 (2002/01/14).

David

A couple of relevant quotes from PS 3.4 C.2.2.2 Attribute Matching:

"If the SCP does not support the value(s) of Specific Character Set
(0008,0005) in the Request Identifier, then the manner in which matching
is performed is undefined and shall be specified in the conformance
statement.

Notes: 1. If an SCU sends a Request Identifier with a single byte
character set not supported by the SCP, then it is likely, but not
required, that the SCP will treat unrecognized characters as wildcards
and match only on characters in the default repertoire, and return a
response in the default repertoire."

"For Attributes with a PN Value Representation (e.g., Patient Name (0010,0010)),
an application may perform literal matching that is either case-sensitive,
or that is insensitive to some or all aspects of case, position, accent, or
other character encoding variants."

"Matching of PN Attributes may be accent-insensitive, as specified in the
conformance statement..."

"An Identifier in a C-FIND response shall contain ... Conditionally, the
Attribute Specific Character Set (0008,0005). This Attribute shall be
included if expanded or replacement character sets may be used in any of
the Attributes in the Response Identifier. It shall not be included
otherwise. The C-FIND SCP is not required to return responses in
the Specific Character Set requested by the SCU if that character set
is not supported by the SCP. The SCP may return responses with a different
Specific Character Set."

Victor Derks

unread,
Jan 25, 2012, 3:11:04 PM1/25/12
to
Hi Mathieu,

My 2 cents

It is always a challenge how to select the ‘best’ (0008,0005) for the
request when the SCP doesn’t support all the characters sets of the
SCU. And IMHO it would be great if all DICOM systems would support
IR-192 out of the box.

If your implementation has access to device configuration parameters
such as ‘Supported Character Set’ as documented in H.1.1.2. of PS-3.15
it can leverage that to select the best character sets. (Implemented
internally or in a different way) If multiple options to select from
are available then often IR-192 or GB18030 are a good choice as most
modern applications are using Unicode strings internally and this
makes the serialization process simple and effective.
If the SCP doesn't support Unicode then PS 3.5, 6.1.2.2, defines that
the preferred order for Western and Eastern Europe is: ISO-IR 100, ISO-
IR 101, ISO-IR 144, ISO-IR 126.

In case you need to implement a C-FIND SCP it is possible to leverage
the (0008, 0005) used in the request as an indication what the SCU
understands. For example if IR-192 is used in the request it is safe
to assume that this is also a valid character set in the response
(kind of capability auto-detect).
0 new messages