Hi Eva
Here are some general thoughts on DICOM de-identification, which are not XNAT specific ...
I suggest that it is good practice to follow the standard:
- make sure you address all the specified data elements [1][2]
- make sure your de-identified images remain compliant [3] esp. wrt. replacement values
The standard may not be perfect, but at least it is maintained and represents the consensus
of a lot of people working in this space. Doing less may be exposing yourself to unnecessary
risk (on the subject of which, document your decisions in a risk analysis). There should be
enough "options" to cover most use cases (if not, let me know).
When you need to replace rather than remove values, take care the dummy values are compliant
with the VR (e.g., the string "anonymized" is not a valid date, nor a valid code string, UID,
etc.). When a data element can be removed or its value made zero length, do so, rather than
inserting a dummy value.
Do not attempt to de-identify the File Meta Information (group 0x0002), recreate it entirely [4],
which makes the question of OB (0002,0102) moot. In general, you cannot replace OB data
elements, they need to be removed if they may contain PHI (and aren't checked, e.g. Overlay
Data), or retained if they are safe (e.g, VOI LUT Data).
Removing all private data elements rather than keeping those known to be safe (e.g., [5])
may lead to unhappiness (esp. failure of quantitative downstream apps that depend on them).
Using a "remove values" approach on free text strings like blankValues is probably not very
robust, as opposed to performing some kind of more sophisticated analysis of the text. If
you can't do this, then it may be safer to use a "keep values" approach instead (esp. for
Study Description, Series Description, Protocol Name, etc. that may be more important than
other "descriptors" listed in [1] or determined from the VR).
See also the discussion of using XNAT in the recent MIDI-B challenge [6][7] and specifically
the comments wrt. handling unstructured text.
David
1.
https://dicom.nema.org/medical/dicom/current/output/chtml/part15/chapter_E.html#sect_E.1.1
2.
https://link.springer.com/article/10.1007/s10278-024-01182-y
3.
https://www.dclunie.com/dicom3tools/dciodvfy.html
4.
https://dicom.nema.org/medical/dicom/current/output/chtml/part15/chapter_E.html#para_41258edc-da3a-43bb-ae04-3734051a876b
5.
https://wiki.cancerimagingarchive.net/display/Public/Submission+and+De-identification+Overview#:~:text=Private%20Tag%20Dictionary
6.
https://wiki.nci.nih.gov/display/MIDI/2024+MIDI-B+Challenge+Workshop
7.
https://docs.google.com/presentation/d/1Y4jiDIJDVl-vMwUuVGzvVPzcktP8jwTb/edit#slide=id.p1
On 11/6/24 7:04 AM, Eva Herbst wrote:
> Hi everyone,
>
> We are currently writing an anonymisation script and wrote some code to replace all tags with anonymised info ("anon" or hashUIDs or numbers) and maintaining the correct data structure.
>
> The next step is to determine which tags we actually want keep and which we want to overwrite.
> However, we were working with this list <
https://www.dicomlibrary.com/dicom/dicom-tags/> which has over 2000 tags.
> So manually checking which of these possibly has patient info will be quite tedious.
>
> *Does anyone already have a list of PHI tags vs imaging data tags for MRI and CT?*
> *I guess the list contains all possible DICOM tags, are there any that are _never_ used in MRI and CT? And has anyone already made a list of which imaging tags need to be kept for MRI and CT (not overwritten because they contain image info)?*
>
> We are also thinking of using the blankValues on patient name, address, birth date to cover there being multiple tags containing these, but there are still several fields that contain other patient info (ie pregancy status, date of visit etc). blankValues will likely not be sufficient also due to the possibility of typos and alternate data encodings.
>
> We are already using removeAllPrivateTags.
>
> *Another question: *for OB tags, which can be various data types, can we overwrite these with anything? E.g. a string? Some OB info is important for the image AFAIK, but others might include patient info (e.g. (0002,0102) = OB Private Information)
>
>
> Thank you very much!
> Eva
>
>
> --
> You received this message because you are subscribed to the Google Groups "xnat_discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
xnat_discussi...@googlegroups.com <mailto:
xnat_discussi...@googlegroups.com>.
> To view this discussion visit
https://groups.google.com/d/msgid/xnat_discussion/2321acf8-e201-41a3-9e96-27fdf49d61c9n%40googlegroups.com <
https://groups.google.com/d/msgid/xnat_discussion/2321acf8-e201-41a3-9e96-27fdf49d61c9n%40googlegroups.com?utm_medium=email&utm_source=footer>.