Padding Character for CS type

kellyat...@gmail.com

unread,

Mar 29, 2005, 1:55:26 PM3/29/05

to

According to my reading of the standard:

>>>>>>>>>>
PS 3.5-2003
Page 18
6.1.2.3 Encoding of character repertoires
...
Two character codes of the single-byte character sets invoked in the GL
area of the code table, 02/00
and 05/12, have special significance in the DICOM Standard. The
character SPACE, represented by bit
combination 02/00, shall be used for the padding of Data Element Values
that are character strings.
<<<<<<<<<<

The padding on the end of odd length strings should be a SPACE
character. However, when I used a SPACE character at the end of my
strings, both the AGFA DICOM validator and the Osiris DICOM Viewer
choked on the file. Upon further investigation of other DICOM files and
trial and error, I found that if I put a NULL char ('\0') at the end of
the odd length strings for padding, all of a sudden everything worked
right.

My question is, Am I misinterpreting the standard, or are the other
tools? If I'm wrong (which seems likely to me frankly) then where in
the standard does it say to use '\0' for padding?

If I'm right, is it standard non-standard practice to use '\0' instead
of ' '?

-Kelly

Ralph

unread,

Mar 29, 2005, 4:30:45 PM3/29/05

to

According to my reading you are correct and SPACE (0x20) is the appropriate
padding character for CS elements. In fact I think (from memory) that NULL
(0x00) padding is only allowed in elements of type UI i.e. uid's

Are you sure that the padding is the culprit causing the AGFA to choke - I
haven't seen that happen before - is it maybe some other part of your CS
string format ? e.g. component length ? correct separator ? invalid
component value for the SOP class you're using ? etc

Anyway, good luck - DICOM has many foibles and is not at all as robust as it
should be.

Ralph

<kellyat...@gmail.com> wrote in message
news:1112122526.7...@l41g2000cwc.googlegroups.com...

kellyat...@gmail.com

unread,

Mar 30, 2005, 12:13:29 PM3/30/05

to

Ok, that's weird, you're saying that the padding type for UIDs (which
are also basically strings) is different from the other strings? That's
kind of goofy. Sigh.

-Kelly

kellyat...@gmail.com

unread,

Mar 30, 2005, 12:19:48 PM3/30/05

to

Ok, I found it...

PS 3.5-2003
Page 50
If ending on an odd byte boundary, except when used for network
negotiation (See PS 3.8),
one trailing NULL (00H), as a padding character, shall follow the last
component in order to
align the UID on an even byte boundary.

Are all the other VR Types to be padded with space or are there other
exceptions as well?

-Kelly

Ralph

unread,

Mar 30, 2005, 1:45:08 PM3/30/05

to

Sorry Kelly not on here much.

All strings are supposed to be SPACE padded - UI is not really treated as a
string within DICOM even though it "looks" like a string - hence one reason
for the difference the other is probably historical (as a guess).

Most of the non-string elements don't need padded to even length because
they would be multiples of an even increment eg US 2 bytes, UL - 4 bytes -
the obvious exceptions being UN and OB - which again use NULL, not being
strings.

As you say, "sigh" .... DICOM is a real "mongrel" standard (despite the
best efforts of a lot of people) and has inconsistencies/oddities etc all
through it in order to cope with various parties disparate demands. All I
can really say is tread carefully and be very familiar with PS3.5 and ...
expect the unexpected.

Ralph

<kellyat...@gmail.com> wrote in message
news:1112203188.7...@f14g2000cwb.googlegroups.com...

kellyat...@gmail.com

unread,

Apr 7, 2005, 11:16:38 AM4/7/05

to

Good advice that "expect the unexpected"... Perhaps that should be in
the Preface to the DICOM standard. ;-)

Rather than calling DICOM a "mongrel", I would prefer to think of it as
evolutionary madness without much of natural selection. Just DNA
morphing and branching over time with no predators to prune the bush.
The sort of evolution one would expect on a distant isolated island.

It certainly was not designed by anyone with extensive computer science
training. If there was someone with those skills, their opinions have
obviously been overridden by those without such skills.

As a small example, the only difference between Long Text LT and Long
String LO is whether or not you can have a '\' character in the text.
Anyone with experience in how this is dealt with in the computer world
would have realized that this could be easily overcome with a simple
escape sequence, rather than creating two separate types.

A good design is more easily extensible. It took over 300 edits to the
DICOM standard to add Clinical Trial information, much of the
information is highly redundant, just added to various IEs and Modules.
An inheritance model would have allowed this to be added in a "base"
definition, and would have then been "inherited" into all derived IEs
and Modules. This would have required a much smaller change to the
standard. Unfortunately, there is no concept of "this module is just
like X, except it adds Y". It just shows that you pay a heavy price for
backwards compatibility and the debt to history that such a system
could not be put into place.

I think that for future versions of DICOM, such a system could be put
into place, if only in the documentation.

Mouse - An elephant designed by committee.

-Kelly

Joerg Riesmeier

unread,

Apr 7, 2005, 11:44:42 AM4/7/05

to

kellyat...@gmail.com wrote:

> Good advice that "expect the unexpected"... Perhaps that should be in
> the Preface to the DICOM standard. ;-)

Sure, there are implementations that do not conform 100% to the DICOM
standard. Welcome to the Real World ... It's always a good idea to be
as standard compliant as possible when writing DICOM objects and to be
as lenient as possible when reading DICOM objects. But this is true for
other "data formats" and "communication protocols" as well.

> As a small example, the only difference between Long Text LT and Long
> String LO is whether or not you can have a '\' character in the text.

This is not the full truth. Read the specifications of LT and LO in part
5 of the standard and you'll see that LO allows 64 characters maximum
whereas LT allows 10240. Furthermore, LT allows control characters like
CR, LF and FF whereas LO does not. Finally, leading spaces are
significant for LT whereas they are not for LO.

So, it's not the only difference between both VRs that LO allows for
multiple values (VM >= 1) whereas LT does not (VM = 1).

> Unfortunately, there is no concept of "this module is just like X,
> except it adds Y".

That's not true. The DICOM standard does use the concept of
specialization. See part 3 for details: "3.8.12 Specialization".

Regards,
Joerg Riesmeier

Laurent Lecomte

unread,

Apr 7, 2005, 12:03:29 PM4/7/05

to

Hi,

I'm not quite agree with you concerning the difference between LT and LO,
they have many differences :
- Length (LO : 64, LT 10240)
- VM (LO : n, LT : only one)
- etc....

the purpose of these item are not the same.

<kellyat...@gmail.com> a écrit dans le message de
news:1112886998.1...@z14g2000cwz.googlegroups.com...

Razvan Costea-Barlutiu

unread,

Apr 8, 2005, 2:06:41 AM4/8/05

to

Hy Joerg...

I agree in part with Kelly or, at least, I do understand his point of
view.

Wouldn't you agree that the differences between LO and LT are, still,
minor?

The problem is that for the standards reader - at least for the
newcomers and for those that do have software design experience - such
subtle differences are not the (only) point of the confusion but the
_reason_ for these differences.

I was trying to make sense of this reason and only things I could think
of were that this was driven by transcription systems companies or by
the way reporting was done in past days and maybe, the LO was thought
to be a big memory saver and a hint for memory allocation for IOD
decoders to be written.
Or it was simply... a compromise made between various heavyweight
vendors at the time of the standard inception.

So, while the differences between LO and LT can be pointed out, the
reason for these remain a mistery to me.
For a design-stuff-efficiently purist, having LT and LO collapsed into
a single VR=TX (text :-) ) makes much more sense.

Regards,
Razvan

kellyat...@gmail.com

unread,

Apr 12, 2005, 1:15:06 PM4/12/05

to

Joerg Riesmeier wrote:
> kellyat...@gmail.com wrote:
>
> > Good advice that "expect the unexpected"... Perhaps that should be
in
> > the Preface to the DICOM standard. ;-)
>
> Sure, there are implementations that do not conform 100% to the DICOM
> standard. Welcome to the Real World ... It's always a good idea to be
> as standard compliant as possible when writing DICOM objects and to
be
> as lenient as possible when reading DICOM objects. But this is true
for
> other "data formats" and "communication protocols" as well.

That's not exactly what I was getting at. My point was that WITHIN the
standard itself, expect the unexpected.

What you are saying is clearly true, and useful, but was not the point
I was making.

A REALLY solid standard, like say the ANSI Standard for C++, or the
standard for Email protocols, is more ameniable to validation than
DICOM. A C++ file either compiles, or it doesn't. Not so with a DICOM
dataset. If validation in DICOM were more straightforward, more
implementations would be closer to the standard.

In general, if a standard is too difficult to interpret for
implementers of the standard to produce consistent output conforming to
the standard, then is that a problem with the implementer or the
standard? I think that while the implementer can take his lumps, the
standard should as well.

In addition, if the standard is vague, then schisms in interpretation
are going to pop up. Just look at the "New Testament" standard for
Christianity for example :-) (I know, that's kind of an extreme case,
but it makes the point)

-Kelly

kellyat...@gmail.com

unread,

Apr 12, 2005, 1:21:48 PM4/12/05

to

Thank you for pointing that out Laurent. I will make sure that my
implementation takes these issues into account.

Generally, an implementation should be liberal about what it accepts as
input, and conservative in what it outputs. In this case, I would
assume that if I encountered an LO string longer than 64 bytes, I'd
read it, display it, or whatever. If I wrote a DICOM file, I would make
sure it was NEVER greater than 64 bytes. This is a general rule with
standards...

I'm sure there is some debt to history that is being paid here,
nevertheless, in today's environment of plentiful memory, etc. , these
differences make almost no logical sense.

Does anyone have a clue where the number 10240 comes from?

-Kelly

Joerg Riesmeier

unread,

Apr 12, 2005, 1:31:35 PM4/12/05

to

kellyat...@gmail.com wrote:

> A REALLY solid standard, like say the ANSI Standard for C++, or the
> standard for Email protocols, is more ameniable to validation than
> DICOM. A C++ file either compiles, or it doesn't. Not so with a DICOM
> dataset. If validation in DICOM were more straightforward, more
> implementations would be closer to the standard.

I understand your point and in principle I agree. However, I would not
sign the statement that it's much easier with standards like C++. I know
a lot of compilers that behave very differently on the same source file,
i.e. compile without warnings, compile with warnings, do not compile
because of errors, abort with an internal compiler error, etc. Not to
mention the various problems that might occur during the compiled program
runs ...

And btw, there are of source validation tools for DICOM implementations.
If everybody would use them it would be much better ... Unfortunately,
there is no formal requirement to do so. Anybody can claim to be DICOM
conformant. That's probably the implication of being a mainly industry
driven standard. So the IHE initiative with its MESA tools and connect-
a-thons is really appreciated.

Regards,
Joerg Riesmeier