64 bytes for SOP Instance UID

Mathieu Malaterre

unread,

Jan 21, 2005, 8:50:37 PM1/21/05

to

Hello,

I have just finished the implementation of the generation of UID in
gdcm(*), and I wanted to share my experience with you.

The main problem I had here was not really the cross plateform
implementation on different -broken- plateforms but rather dealing with
the extremely limited size of UIDs.

Explanation of the size. GDCM is used with ITK(**) and thus I had to use
the DICOM prefix we had after David Clunie send us the link for a free
one(***). Already the size of this prefix was 27 bytes. AS we wanted a
subspace of it (ex: ".1") the prefix ITK + GDCM is now 29 bytes.

Then -still following David advices- I need to incode a spacial
identifier: the mac address of the machine (and not IP address). After
some google reserch I came to a final implementation. And the result is
coded on unsigned char[6] this can then be represented as a number up to
256^6 = 281474976710656 which can fit on 15 bytes.

After the spatial identifier I needed a temporal identifier. This is far
easier. I just used regular time function and coded something like:

"Year Month Day Hour Minutes Second Millisecond"
which fits on
4 + 2 + 2 + 2 + 2 + 2 + 3 = 17 bytes

Therefore I have 29 + 15 + 17 = 62 bytes !

So I only have two bytes left for the random number.

Implementing a software generation of unique identifier that fits on 64
bytes is clearly impossible.

Any advices to squeeze all this *without* loosing the uniqueness ?

Thanks
Mathieu
(*) http://www.creatis.insa-lyon.fr/Public/Gdcm/Main.html
(**) http://www.itk.org/
(***)
http://www.dclunie.com/medical-image-faq/html/part8.html#UIDRegistration

Razvan

unread,

Jan 22, 2005, 5:24:01 AM1/22/05

to

Hi Mathieu.

The assigned UID root and the MAC together make a pretty good UID
scheme. From what's left, adding the date/time down to the millisecond
is good.
The worst part seems to me to be the random number.
Keeping its impact in the UID schema down to a minimum looks like a
good thing to do, so I won't try to create a larger number of bytes to
be filled in by pseudo-random values.

I'd always rely to a maximum on organization-managed UIDs, if possible
and keep the random parts as low as possible.
By the way, what random-number generator are you using?
~~Razvan

Doug Sluis

unread,

Jan 22, 2005, 6:28:32 AM1/22/05

to

Hi Mathieu,

I don't see any compelling reason for the random number.
Your scheme fits within 64 bytes and is unique assuming
the application never attempts to generate more than one UID
with the value returned from the time function.
With the two extra digits available, 100 UID's are possible
using one time value.

You can save digits by taking advantage of the fact that
the number of milliseconds in a year easily fits within 11 digits
rather than the 13 you now allocate. Adopting the year
the ISO prefix was granted as year zero, saves two more
digits over the next few hundred years even allowing for
period delimiters to bracket the time.

Don't forget (as many do) that leading zeros are not allowed in the UID.

I advise that GDCM allow its client to configure a UID prefix
for users that already have an ISO prefix, and perhaps also
to allow the application to set the UID with its own scheme.

- Doug

"Mathieu Malaterre" <mmalat...@Mnycap.rr.com> wrote in message
news:NjiId.36678$Xs6....@twister.nyroc.rr.com...

David Clunie

unread,

Jan 22, 2005, 4:18:46 PM1/22/05

to

Mathieu Malaterre wrote:

> The main problem I had here was not really the cross plateform

> implementation on different -broken- platforms ...

Actually, I thought your work on getting a MAC address regardless
of what platform one is running on was most excellent, and must
have taken forever to research and test.

> Explanation of the size. GDCM is used with ITK(**) and thus I had to use
> the DICOM prefix we had after David Clunie send us the link for a free
> one(***). Already the size of this prefix was 27 bytes. AS we wanted a
> subspace of it (ex: ".1") the prefix ITK + GDCM is now 29 bytes.

As I think I pointed out earlier on the ITK mailing list, you could
choose a root from a source that is shorter.

You (well, ITK) is using one from Dave Harvey, which is long. Specifically
"1.2.826.0.1.3680043.2.1125", which is 27 bytes.

I suggested you use one from the IANA registry, which is relatively
short. E.g. mine is 17 characters before I start to subdivide the space.

The highest number in the IANA registry right now is 22233, so they
are only into 5 digits on top of the 12+period prefix, so if you
act quickly (some time in the next decade or so), you should probably
be able to get an 18 byte root.

That is 9 bytes better than what you are currently using.

> After the spatial identifier I needed a temporal identifier. This is far
> easier. I just used regular time function and coded something like:

...

> So I only have two bytes left for the random number.

Safest not ever to assume time is sufficient; if the granularity of time
is milliseconds (and it may not be that granular even if the return
value is in milliseconds), this could be too short on a modern fast
machine that can generate a lot of UIDs in a millisecond.

For example, I just ran a quick check on the Java UID generator in
one of my toolkits and it generates over 10 UIDs per millisecond if
the same system and time based prefix is used, and still over 5
UIDs per second if a new system and time based prefix is generated,
and that's on my laptop. Once it has been running for a while and
the JIT optimizer gets to it, it actually exceeds 20 UIDs per
millisecond for the latter scenario.

So a random number, or cache of counts per time increment is
definitely required I think (java.rmi.server.UID does the latter
for you).

David

Doug Sluis

unread,

Jan 22, 2005, 6:47:22 PM1/22/05

to

Why use a random number generator at all?
A counter is guaranteed not to repeat a prior value
but an RNG might.

"David Clunie" <dcl...@dclunie.com> wrote in message news:41F2C335...@dclunie.com...

Mathieu Malaterre

unread,

Jan 22, 2005, 8:21:15 PM1/22/05

to

Doug Sluis wrote:
> Why use a random number generator at all?
> A counter is guaranteed not to repeat a prior value
> but an RNG might.

I have full spatial resolution (MAC Address) + full temporal precision
(well almost). Assuming I would generate my UID slowly it is impossible
to find twice the same one. The problem only comes with fast machine/
multi process / multi thread, that's why RNG comes handy.

If you are saying that I could use a static counter while generating the
UID then I have to deal will mutex locks to garantee I am not writting
(=incrementing) this counter at the same time from two threads. Which
seems to me a lot more troubles (cross plateform speaking) than the
simple call to 'rand()' ...

Thanks anyway,
Mathieu

Mathieu Malaterre

unread,

Jan 22, 2005, 8:27:44 PM1/22/05

to

David Clunie wrote:
> Mathieu Malaterre wrote:
>
> > The main problem I had here was not really the cross plateform
> > implementation on different -broken- platforms ...
>
> Actually, I thought your work on getting a MAC address regardless
> of what platform one is running on was most excellent, and must
> have taken forever to research and test.

Thank you. The thing I am happy with is that it compiled right away on
all machines without errors. I am not sure it works (in fact it does not
on old HPUX if you were to copy/paste it).

> I suggested you use one from the IANA registry, which is relatively
> short. E.g. mine is 17 characters before I start to subdivide the space.

Ok I'll try to convince people to ask for a new one...

> Safest not ever to assume time is sufficient; if the granularity of time
> is milliseconds (and it may not be that granular even if the return
> value is in milliseconds), this could be too short on a modern fast
> machine that can generate a lot of UIDs in a millisecond.
>
> For example, I just ran a quick check on the Java UID generator in
> one of my toolkits and it generates over 10 UIDs per millisecond if
> the same system and time based prefix is used, and still over 5
> UIDs per second if a new system and time based prefix is generated,
> and that's on my laptop. Once it has been running for a while and
> the JIT optimizer gets to it, it actually exceeds 20 UIDs per
> millisecond for the latter scenario.
>
> So a random number, or cache of counts per time increment is
> definitely required I think (java.rmi.server.UID does the latter
> for you).

Yeah I know, we have some fancy machines here which were generating
DICOM like lightspeed, and the UID ended up being mainly the same ( I
was generating 5-10 DICOM within a second before I start using the
milliseconds).

Mathieu

Mathieu Malaterre

unread,

Jan 22, 2005, 8:32:58 PM1/22/05

to

> You can save digits by taking advantage of the fact that
> the number of milliseconds in a year easily fits within 11 digits
> rather than the 13 you now allocate.

Very nice. I'll try this right away.

> Adopting the year
> the ISO prefix was granted as year zero, saves two more
> digits over the next few hundred years even allowing for
> period delimiters to bracket the time.

That's also very smart. Plus in 100 years I won't be blamed for the
implementation :P

> Don't forget (as many do) that leading zeros are not allowed in the UID.

D. Clunie remind me this - and I was doing it :( -

> I advise that GDCM allow its client to configure a UID prefix
> for users that already have an ISO prefix, and perhaps also
> to allow the application to set the UID with its own scheme.

that the easiest part and is already done. There is also a special
easter egg. By default the function doesn't take any parameter and set
the prefix as "GDCM" in binary:

$ echo "GDCM" | od -b

107 104 103 115 012

:)
Mathieu

Mathieu Malaterre

unread,

Jan 22, 2005, 8:36:50 PM1/22/05

to

> I'd always rely to a maximum on organization-managed UIDs, if possible
> and keep the random parts as low as possible.
> By the way, what random-number generator are you using?

The C rand() function:

CONFORMING TO
The functions rand() and srand() conform to SVID 3, BSD 4.3, ISO
9899, POSIX 1003.1-2003. The function rand_r() is from POSIX 1003.1-2003.

I am doing a cross plateform software so this is very hard to do a nice
random generator.

Mathieu
If everybody was using linux it would be easy to read /dev/random :)

Doug Sluis

unread,

Jan 23, 2005, 1:14:32 AM1/23/05

to

I presume then, that your RNG is a service that runs in its
own thread. Otherwise it suffers from the same weakness as
the static counter you cite. From my experience, a threadsafe
counter is trivial to implement, and the one that I tend to prefer,
because it NEVER returns the same value within the
allocated range it cycles through.

"Mathieu Malaterre" <mala...@Mfree.fr> wrote in message
news:f_CId.159060$Uf.1...@twister.nyroc.rr.com...

Gurikar

unread,

Jan 23, 2005, 1:40:47 AM1/23/05

to

HI,
Is adding MAC number is needed in generating UID. I have read in
dicom standard that suffix can be generated using device serial number,
study ID, serial and timestamp. This suffix along with root UID is used
to generate UID appropriately, i mean provided your having your own
root UID. Is this right?

Razvan

unread,

Jan 23, 2005, 5:02:31 AM1/23/05

to

Hi Gurikar.

>From my reading of the standard, I cannot see a constraint imposed by
the Dicom standard itself on how the uid suffix part is generated.

The MAC address is a good thing to have as part of the UID, because it
is guaranteed to be unique, being managed globally, as opposed to a
randomly generated number.

So the MAC is not needed nor required, but it is a good path to follow
in constructing the UID. You can add on top on that numbers generated
by some incremental schemes, random numbers or combinations of the two,
there doesn't seem to be a best way to do this.

~~Razvan

Joerg Riesmeier

unread,

Jan 23, 2005, 7:29:00 AM1/23/05

to

Razvan wrote:

> The MAC address is a good thing to have as part of the UID, because it
> is guaranteed to be unique, being managed globally, as opposed to a
> randomly generated number.

I agree that the MAC address is a good candidate to be used for UID
creation. However, the MAC address is _not_ guaranteed to be world-wide
unique. The uniqueness is only required within a (local) network. For
example most onboard Ethernet adapters allow to change the MAC address.
Furthermore, most OS also allow to change the MAC address (under Linux
"ifconfig eth0 hw ether 00:01:02:03:04:05").

See http://en.wikipedia.org/wiki/MAC_address for details.

Regards,
Joerg Riesmeier

Joerg Riesmeier

unread,

Jan 23, 2005, 7:45:25 AM1/23/05

to

Mathieu Malaterre wrote:

> Implementing a software generation of unique identifier that fits
> on 64 bytes is clearly impossible.

I agree that it is impossible to _guarantee_ that a software generated
UID is really world-wide unique unless you have a unique identifier for
each system where your software runs on. As I wrote in another posting
the MAC address is _not_ such an identifier, but e.g. the "host id" of
Sun Sparc processors should be. Unfortunately, there are many other
systems ...

Btw, the approach we've taken for the DCMTK can be found in file
"dcmdata/libsrc/dcuid.cc". In order to make sure that the generated
UID fits into the 64 bytes restriction we create a hash code from
many system dependent parameters, add the process id, a time stamp
and a threadsafe counter.

Regards,
Joerg Riesmeier

Peter B. Schmidt

unread,

Jan 23, 2005, 8:32:18 AM1/23/05

to

Joerg Riesmeier schrieb:

> As I wrote in another posting
> the MAC address is _not_ such an identifier, but e.g. the "host id" of
> Sun Sparc processors should be. Unfortunately, there are many other
> systems ...

Joerg,

I used to work for a SPARC house, and one of the topics the staff
regularly did is change the host ID in hardware-upgraded systems. So no
uniqueness guaranteed, either.

Matthieu,

the way I found to be the most reliable is to register each single
implementation - a php script on your webserver could do - and put it
into you UID root. Unfortuately, some users may fear a loss of privacy
if the product is open source, but you may explain the item in product
documentation.

A scheme I deployed assumed the following:
* I have a UID root for the software producer
* There are less than 1000 installations in one ZIP Postcode area.
=> Deployed:
"UID root (company specific)."
+ "companies software code (2 digits)."
+ "Zip (5 digit) plus 4 digits installation counter."
+ "further stuff (2-6 digits)."
+ "YYYYMMDDHHMMSS plus running counter."

which gave enough space within the 64 byte to encode different types of
UIDs (done by the "further stuff" to distinguish different UID types.

If you forget about the ZIP-Code stuff, you may use a running counter of
some digits for an "Installation Counter" that can be povided by a php /
cgi script where users can download the "system specific UID".

DICOM states that suppliers _must_ guarantee uniqueness of the
generatesd UIDs, so any statistical guessing and Random number stuff is
*not* sufficient.

My two cents.

HTH,

Peter