Read dicom header information into pandas dataframe

1,812 views
Skip to first unread message

Neuro_rad

unread,
May 26, 2016, 8:22:31 AM5/26/16
to pydicom
Hi pydicom-users,

I would like to do some analysis on the header-data in the dicom-files. In order to do so 
I'm trying to read the header information (without the pixel data) into a pandas DataFrame and running into trouble.
I'm quite new to pydicom and thing I don't really understand the datastructure of the FileDataset. 

scans=[]
for file in filelocations:
          dicomds
= dicom.read_file(file, stop_before_pixels=True)
          scans
.append(dicomds)


Pandas dataframe constructor should take a list of dicts and convert it into a DataFrame.  But it seems to be a little bit more complicated in this case. Is the problem maybe that the scans/FileDataset isn't a pure list of dicts but  a more nested datastructure? 
When I run this code I don't get the DICOM tag as column headers but som arbitrary numbers.. The row are correct and represent the number of dicom files found by the script.

pandas.DataFrame(scans)

Maybe there's a different way to solve this problem?

Any help is very much appreciated.


Darcy Mason

unread,
May 26, 2016, 9:49:09 AM5/26/16
to pydicom
I think what you are getting as column headers is the numerical (group, element) dicom tag (what pydicom actually stores as the dictionary key).

Try something like this:

scans
=[]
for file in filelocations:
          dicomds 
= dicom.read_file(file, stop_before_pixels=True)
          mod_ds = {data_elem.keyword: data_elem.value for data_elem in dicomds.values()}
          scans.append(mod_ds)

That should give a cleaner looking result with the dicom keyword for column names, and just the value (rather than the whole data element repr) as the cell value.

Note also that the dicom file meta information is not in the main dataset, so if you want to see that information, you should use something like dict's update() to combine dicomds with dicomds.file_meta.

Regards,
Darcy

Darcy Mason

unread,
May 26, 2016, 11:13:20 AM5/26/16
to pydicom
I just realized there are some issues with my proposed solution:
- data_elem.keyword is only in the repository code, not in the last pydicom release.  Would instead have to use 'from dicom.datadict import dictionary_keyword, and then use "mod_ds = {dictionary_keyword(data_elem.tag), ..." 
- the solution won't work well for datasets with Sequences - it won't iterate into them.  And if it did, each dataset in the sequence would have the same keywords, so each item in the sequence would overwrite the previous values by using the same dictionary key.  Actually, this is true even if you don't convert the tag to a keyword -- each item in a sequence would generally have the same tags.
- it may have trouble with private data elements, which also may have repeated keywords

A more general solution would build the mod_ds dictionary using Dataset.walk(), and perhaps where needed modify keywords to add something (e.g. a sequence item index) to make them unique when adding to the dictionary.

However, for simple datasets with no sequences, I think what I wrote should work reasonably well, if the keyword lookup is fixed.

Hopefully what I've written above this time is correct -- nothing is based on tested code.

-Darcy

Neuro_rad

unread,
May 26, 2016, 3:39:00 PM5/26/16
to pydicom
Thanks for your help.
I will try this out tomorrow and see how it works.

/landge

Dov Grobgeld

unread,
May 9, 2017, 3:39:50 PM5/9/17
to pydicom
It's been a while, but if it is still relevant you might checkout my library https://github.com/dov/dcmpandas which scrapes a directory structure of dicom files and builds a pandas dataframe.

Regards,
Dov
Reply all
Reply to author
Forward
0 new messages