May I seggest to seperate dicom meta-info reading and pixel data reading?

128 views
Skip to first unread message

yunzhi

unread,
Dec 1, 2008, 2:35:07 PM12/1/08
to pydicom
Hi all

As I understand, the dicom.ReadFile() reads all of the information in
a dicom file at once, even for RTDOSE file. It turns out to be slow.
But sometime, people want to read thousands of dicom files for as fast
as possible just for meta-info to sort them and read pixel data
later.

So I think it would be better if there are different routines reading
meta-info and pixel data seperately (like dicominfo and dicom read in
matlab).

Yunzhi

Darcy Mason

unread,
Dec 1, 2008, 5:28:35 PM12/1/08
to pydicom
Hi Yunzhi,

There is a ReadFileMetaInfo function in dicom.filereader that can be
called to read only the file meta information ("from dicom.filereader
import ReadFileMetaInfo"). But a better solution would be to add the
ability to delay reading large values (especially pixels) until they
are actually accessed. Then one could know almost all data elements
but with a fast read. I'll add this to the issue list and think about
how this could best be coded.

Thanks
Darcy

yunzhi

unread,
Dec 1, 2008, 6:00:29 PM12/1/08
to pydicom
Hi Darcy

ReadFileMetaInfo proves to be fast, but it looks like that the
information it extracts is limited and not actually useful in many
practice. The following is what I get from this functon call:

>>>from dicom.filereader import ReadFileMetaInfo
>>>meta=ReadFileMetaInfo('RD.1.2.246.352.71.7.1114.242998.20080110184104.dcm')
>>>
>>>meta
(0002, 0001) File Meta Information Version OB: '\x00\x01'
(0002, 0002) Media Storage SOP Class UID UI:
'1.2.840.10008.5.1.4.1.1.481.2'
(0002, 0003) Media Storage SOP Instance UID UI:
'1.2.246.352.71.7.1114.242998.20080110184104'
(0002, 0010) Transfer Syntax UID UI:
'1.2.840.10008.1.2'
(0002, 0012) Implementation Class UID UI:
'1.2.246.352.70.2.1.7'

With these infos, how could one classify dicom files according to
infos like PatientsName, Modality. Actually what I mean by "meta-info"
in my post is the information except pixel data, like PatientsName,
Modality etc. There is essential difference between these infos and
pixel data, 'cause the extraction of pixel data takes time.

So if there is another version of ReadFileMetaInfo that gives
infomation like PatientName and Modality, it would be great and very
useful and fast.

Yunzhi
> > Yunzhi- Hide quoted text -
>
> - Show quoted text -

MMoury

unread,
Dec 3, 2008, 11:45:54 AM12/3/08
to pydicom
Hi,

what are meta-informations and the different beetween the others
informations?

Marc

Darcy Mason

unread,
Dec 7, 2008, 7:51:19 PM12/7/08
to pydicom
The File Meta information is defined in the DICOM standard, Part 10
"Media Storage and File Format..." (ftp://medical.nema.org/medical/
dicom/2008/), section 7.1. It is a special block of information
specific to DICOM files. The meta info includes a mandatory length
field, making it easy to read without reading anything else, but as
that section shows, it does not have a lot of information; for example
it does not have patient information or any details about the image.

As I mentioned in the other reply, the more useful information
(patient info and details about the image) would be better served by
changing the pydicom code to delay reading big data like images or
dose grids.

yunzhi

unread,
Dec 7, 2008, 8:28:27 PM12/7/08
to pydicom
Hi Marc,

Sorry I am not very clear about the term of 'Meta Info' in dicom.

But I agree with you that dalay reading is of big benefit. Actually
I changed your code a little bit and I got seperate reading of pixel
data like dose grid. Although I don't think my change is smart (even
kind of stupid), it turned out that the delay reading save much memory
(not so much cpu time).

I read 1280 frames of dicom images (CT + PET + DOSE + PLAN) with the
aim to scan the folder and
tried to put the files into different study, series and modalities.
With old code, the memory that was used by this scanning process
was up to about 1.9 GB (total memory, including operating system).
In contrast, if we just read the patient info but pixel data, the
total memory was just 0.99

I think it is a good idea to retrieve patient information as well as
the info of pixel data (not the pixel data itself)
like how many pixel data there are, 'cause this can be useful for the
daley reading of pixel data, data type in numpy, and shape of the
pixel cube.
This is actually what I did.


Yunzhi
> > Marc- Hide quoted text -

Darcy Mason

unread,
Dec 9, 2008, 1:42:58 AM12/9/08
to pydicom
This discussion got me thinking and I decided to play around with
reading DICOM files using python's iterator concept, which is a very
nice feature in modern python. I've uploaded a demonstration file
"fileiter.py" to this group. It is a copy of some parts of the
filereader module, but modified to return ("yield" in a python
generator) each attribute (data element) as it is read. Using this
concept, you can read and store only certain items as you wish. The
file can be run as a script (example code in the "if __name__ ==
'__main__':" section at the end) -- the example stops reading after it
has found a certain list of items (needs python 2.5 for the "all"
function, or see the url in comments for an equivalent all() from the
python help).

This still reads the whole value of any data element it reads (rather
than delaying reading pixel data as we had talked about -- that will
come in later code) but at least you have the ability to stop before
getting to those big items if you wish.

fileiter.py is rough code, not properly tested, and I'm not sure if
files will always be closed properly, but it shows how this could be
done. I will be thinking about how to incorporate something like this
into the pydicom code. The current ReadFile function, for example,
could behave exactly as it does now, but be implemented by calling an
iterator until all items were read.
Reply all
Reply to author
Forward
0 new messages