Searching for a string in the DICOM header

99 views
Skip to first unread message

D .Samber

unread,
Nov 11, 2022, 3:13:26 PM11/11/22
to pydicom

Hi guys!

I'm trying to search all the tags in a DICOM file for a string.
I thought this would be simple thanks to the iterall() function so I wrote a few lines of code:

    # Iterate through the entire header
    for elemCurrent in filedatasetHeader.iterall():

        # Look for string tags
        if (isinstance(elemCurrent.value, str)):
            print(elemCurrent.name + ' ' + elemCurrent.value)

I was surprised that the code doesn't print the Patient's Name (0x0010, 0x0010)!!
This is because the type of that element's value is NOT string but it is "pydicom.valuerep.PersonName". I thought this wouldn't be an issue since iterall would recurse into this type but this type doesn't have a value representation!

I could just write some code specific to this situation but this both inelegant AND (worse), how do I know this type of weirdness doesn't extend to other tags? 

I'm about to just give up and scan the whole header byte by byte....

Help?

Thanks!

Dan

Darcy Mason

unread,
Nov 11, 2022, 3:24:23 PM11/11/22
to pydicom
`isinstance` can take a tuple of classes so you could do:

from pydicom.valuerep import PersonName
...
for ...
    if isinstance(elemCurrent.value, (str, PersonName)):
        ...

PersonName used to be derived from `str`, but for some reason I can't recall, it is not any longer.

Another thought, if you are only looking for a few specific tags, is to read only those by calling `dcmread` using the `specific_tags` parameter.

D .Samber

unread,
Nov 11, 2022, 5:37:00 PM11/11/22
to pydicom

Thank you for replying.

I saw that PersonName was not derived from str after I tried calling issubclass() on it.
I really want to search the ENTIRE header for any occurrences of certain strings (just making sure no PHI snuck in ANYWHERE).
I can do what you suggest but I worry that other (perhaps even private) tags may also hide data in this way.

I'm thinking that maybe an ugly way to do it is just do a binary read of the whole header, cast it into string and search that string. The downside of this brute force approach is that I will be including tag names (and other stuff) in the search space so that searching for (for example) "Patient" will ALWAYS find it since it is in the tag names of several tags.

I would think this should be a solved problem....

Darcy Mason

unread,
Nov 12, 2022, 9:39:42 AM11/12/22
to pydicom
If it is full de-identification you are trying to do, then that is a very difficult process.  Even pixels can have PHI 'burned in'.  For private tags I would just delete all of them (Dataset.remove_private_tags()) unless there are ones you know you absolutely must keep and you also know exactly what they can contain.

There are some libraries out there to help - including the `deid` library in the pydicom github organization, that you could look at to assist in the process.

D .Samber

unread,
Nov 12, 2022, 10:08:04 AM11/12/22
to pydicom

Perhaps I overstated my goal.

My idea was that AFTER I do my de-id/anonymization, I do a final check (limited to the header) to make certain key bits of PHI didn't evade detection. For example: After I do the standard de-identification of President Obama's MRI, I want to search the entire header for the word "Obama". I thought "iterall" would expose all the data elements and I could just check those data elements of type "string" but this approach now seems questionable and even unreliable.

The deid project you suggested ( https://pydicom.github.io/deid/ ) may help.
Thank You

Brad Trumpower

unread,
Nov 14, 2022, 10:45:43 AM11/14/22
to pydicom
Hi Dan,

The attached is what I used when reviewing my de-identification efforts. I never had any issue with 'burn-in' as the DICOMS I used were all in their original DICOM format with the tags displayed over top, not integrated into the image. This program pulls all the tags in all DICOMS in a directory and puts them into a text file for manual review or you could review based on keywords using Regular Expressions or however you want. 

listTags.txt

D .Samber

unread,
Nov 14, 2022, 3:07:49 PM11/14/22
to pydicom

Hi Brad!

Thanks for the code which was very helpful in that I see that when you simply cast the entire data element into a string, the result is a string providing all I need to know (the tag number, description, and (most importantly) the data element "value"). I'm guessing that by casting into a string at the top level invoked a function that overloaded the "display" method and conveniently handles all my problems (for example by intelligently giving the string representation of PersonName as well as NOT giving me a million bytes of non-printable "characters" for Pixeldata (it gives me a nice summary ( (7fe0, 0010) Pixel Data OW: Array of 1632960 elements).

Thanks again!

Dan

Suman Gautam

unread,
Nov 14, 2022, 8:14:51 PM11/14/22
to pyd...@googlegroups.com
Hi guys 

How can I plot the DVH from rtstructure and rtdose file? I want to plot and compare the DVH from the ground truth and predicted dose distribution. 

I have both ground truth and predicted dose distribution.







----------------
Regards,
Suman Gautam 
PhD Student
Department of Radiation Oncology.
Division of Medical Physics
VIRGINIA COMMONWEALTH UNIVERSITY

Mailtrack Sender notified by
Mailtrack
11/14/22, 08:13:14 PM

--
You received this message because you are subscribed to the Google Groups "pydicom" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydicom+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pydicom/c60cf406-4d54-4b6e-9da9-9148b0f4e413n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages