dicom reading meta data only (without the Pixel values)

1,254 views
Skip to first unread message

Romain Valabregue

unread,
Nov 11, 2013, 4:49:42 PM11/11/13
to pyd...@googlegroups.com
Hello

First thanks for this nice work.
I give a try and it is quite efficient, but I need to read a few Tera of MRI dicom file to fill an sql data with the acquisition parameters.
 So I am sure I would spare a lot of time if I do not read the 'raw' data.
Is there a simple way to achive that ? (will it be compatible with dcmstack) ?

Many thanks for your help

Romain

Darcy Mason

unread,
Nov 12, 2013, 11:42:05 AM11/12/13
to pyd...@googlegroups.com
On Monday, November 11, 2013 4:49:42 PM UTC-5, Romain Valabregue wrote:
...
 
I give a try and it is quite efficient, but I need to read a few Tera of MRI dicom file to fill an sql data with the acquisition parameters.
 So I am sure I would spare a lot of time if I do not read the 'raw' data.
Is there a simple way to achive that ? (will it be compatible with dcmstack) ?


Hi Romain,

I assume by the raw data you mean the pixel data?  Typically, everything else is very small compared to pixel data. If you do not need the pixel data at all, then you can use the stop_before_pixels=True argument to read_file(). That avoids loading the pixel data into memory, and also does not even seek past the pixel data in the file, but instead closes the file immediately. It is the fastest read you can do with pydicom, and the parsing into dicom structures adds very little time compared with just reading the same bytes with a python read() statement. I'm not sure if that would be compatible with dcmstack unless it specifically checks for that case.

If you will also need to the pixel data (to convert to another file, for example), then I recommend you try using the defer_size argument to read_file. When then pixel data is needed it will then be read from disk transparently by pydicom, so that should work with any code. But it avoids the memory use for as long as possible, so you could read a large number of files, filter them by some of the dicom information, and only write out the ones you need, for example.

You might also want to look at the time_test.py script in the test/performance subdirectory. You could edit that to run some timing tests with a subset of your files, to assess what works best. 

Regards,
Darcy

Romain Valabregue

unread,
Nov 13, 2013, 3:56:43 PM11/13/13
to pyd...@googlegroups.com
Hi Darcy

this is very useful indeed, unfortunatly it is not easily compatible with the dcmstack function I use.
Right now I need dcmstack to get the number of slice information and or number of volume.
too bad that those information are not always store in the dicom meta data

Thanks

Romain

Darcy Mason

unread,
Nov 13, 2013, 6:48:56 PM11/13/13
to pyd...@googlegroups.com
Well, the source for dcmstack is available, so perhaps it is possible to adapt it?  Out of curiosity I just had a quick look, and it seems the DicomStack object can be passed custom dicom data elements to group by.   It also appears that what they are calling meta data is not the dicom standard meta information header, but any info in the dicom file. And there is a flag to allow adding "dummies" -- datasets not containing the pixel data. Maybe those pieces can be put together to do what you need?
Reply all
Reply to author
Forward
0 new messages