Hi Mathieu
Interesting question.
One of the problems with DICOM and character sets is that there is
no negotiation. I.e., the SCU does not know which, if any beyond
the default, character sets are supported by the SCP, and there
is no requirement that the SCP support any beyond the default.
So, the SAFEST query is one using the default character set (i.e.,
not sending Specific Character Set (0008,0005)) AND of course, not
using anything beyond 7-bit US-ASCII in any string in the identifier,
e.g., "*J?r?me*" rather than "*Jérôme*".
Note that in the absence of accent-insensitive matching, it is safer
to request "*J?r?me*"; if accent-insensitive matching were in use by
the SCP, then "*Jerome*" should match, and can be encoded in the
default character set. There is no negotiation of accent-insensitive
matching (although it is implicit in fuzzy semantic matching).
Some SCPs may support Specific Character Set "properly", but there may
be a limited number of choices. For example, support for Latin 1 may
be more common than UTF-8.
So the query you proposed with "*Jérôme*" and "ISO_IR 100" might
succeed, whereas "ISO_IR 192" might not.
Note also that the SCP in its response identifier may use a different
Specific Character Set that that used by the SCU in the request, or
none at all.
E.g., the SCP might always return "ISO_IR 192" even if the request
was "ISO_IR 100", or vice versa.
For example, a request of "*Jérôme*" and "ISO_IR 100" might return:
- nothing if the SCP doesn't support the character set or the accents
- "Buc^Jerome" and no Specific Character Set
- "Buc^Jérôme" and "ISO_IR 100"
- "Buc^Jérôme" and "ISO_IR 192"
- all sorts of other permutations and combinations
My personal approach when deciding which specific character set to
use for a dataset (whether it be storing a composite object, or
generating a query request or response identifier), is to use the
least necessary character set (e.g., none, ISO_IR 100 if needed,
then ISO_IR 192 if ISO_IR 100 is insufficient, for example). See
"com.pixelmed.dicom.AttributeList.getSuitableSpecificCharacterSetForAllStringValues()".
The history of all this is that the original standard did not define
explicit behavior when Specific Character Set was present, which might
lead really old SCPs to attempt to MATCH on Specific Character Set,
theoretically, if they had support this attribute as an optional
matching key. This was corrected in CP 199 (2002/01/14).
David
A couple of relevant quotes from PS 3.4 C.2.2.2 Attribute Matching:
"If the SCP does not support the value(s) of Specific Character Set
(0008,0005) in the Request Identifier, then the manner in which matching
is performed is undefined and shall be specified in the conformance
statement.
Notes: 1. If an SCU sends a Request Identifier with a single byte
character set not supported by the SCP, then it is likely, but not
required, that the SCP will treat unrecognized characters as wildcards
and match only on characters in the default repertoire, and return a
response in the default repertoire."
"For Attributes with a PN Value Representation (e.g., Patient Name (0010,0010)),
an application may perform literal matching that is either case-sensitive,
or that is insensitive to some or all aspects of case, position, accent, or
other character encoding variants."
"Matching of PN Attributes may be accent-insensitive, as specified in the
conformance statement..."
"An Identifier in a C-FIND response shall contain ... Conditionally, the
Attribute Specific Character Set (0008,0005). This Attribute shall be
included if expanded or replacement character sets may be used in any of
the Attributes in the Response Identifier. It shall not be included
otherwise. The C-FIND SCP is not required to return responses in
the Specific Character Set requested by the SCU if that character set
is not supported by the SCP. The SCP may return responses with a different
Specific Character Set."