Questions about Anonymization

110 views
Skip to first unread message

David Gutman

unread,
Jun 14, 2012, 8:11:19 AM6/14/12
to xnat_di...@googlegroups.com
I'm starting an aging & memory clinic this summer, and many of my patients come from outside hospitals/clinics and already have a CD / DVD with neuroimaging data on it.  We currently store this data in a "shadow folder", aka a manilla folder that gets stored somewhere--- so this neuroimaging data essentially is unaccessible.   

Not surprisingly, I thought XNAT would be useful in this case, but I wanted to see if people had a similar use case, and also set of requirements.  I was thinking something along these lines:

I have two main requirements;
   A) I do not keep ANY  names in my XNAT instance or other PHI
   B) Each patient gets some random ID number (not there MR number) when I do the XNAT insert-- I am OK with having a name--> Key database somewhere else
   C)  Data is uploaded to XNAT, but also I archive the original non-deidentified data in some "secure" encrypted server ----   I've had problems in the past where deanonymization scrubbers are too robust and wind out stripping out certain private tags (like for DTI and perfusion data), that make it really difficult to do certain analysis


1)  In the clinic, I set up a shared / protected folder (we have HIPPA/AUDIT trail friendly windows shares available), and as I am reviewing the CD for patient care, I also simultaneously copy it to a folder on this share (and also presumably need to come up with a structure for the folder itself so I don't lost the metadata, PATIENT_NAME_MR_NUMBER or something like that

2)  "Secure" server somewhere monitors the contents of that folder and daily/hourly/whatever runs a job that moves the data to some sort of archive (I don't want to keep everything in the shared folder-- this is just a staging ground), does the XNAT upload/deidentification, and i guess also would have to do some sort of lookup to assign patient JOHN_DOE_MR18888888 a unique key--- in the case the patient and/or MR number already has a key/unique patient ID, it would obviously just return that instead so I could potentially look at longitudinal data...


Any feedback would be appreciated....




--
David A Gutman, M.D. Ph.D.
Assistant Professor of Biomedical Informatics
Senior Research Scientist, Center for Comprehensive Informatics
Emory University School of Medicine

Daniel Marcus

unread,
Jun 14, 2012, 10:37:04 AM6/14/12
to xnat_discussion
David,

I'm thinking you should skip all of the staging area, etc. Instead,
it seems easiest to just send directly to XNAT over DICOM. If the
software you're using to review the cases clinically has a DICOM send
function, you could use that. Or you could use DICOM Browser to do
the send or something similar. You'll need to set up a solid
anonymization profile on your XNAT. In your anon profile you can
include a custom function to call out to a service that replaces the
patient name with a coded ID. In order to get the same ID for a
patient each time, you could use a one-way hash algorithm to generate
the coded IDs. That should pretty much do it.

I recommend using 1.6 for this. It includes a couple of features that
you'll want:
- Default anonymization -- you can install a default anonimization
profile that is applied to ALL incoming cases, regardless of what
project they land in.
- DICOM header review -- there's a function that allows you to view
DICOM metadata for studies that arrived over a specified date range.
It's a good way to quickly scroll through to see if any PHI leaked in.

-Dan

David Gutman

unread,
Jun 14, 2012, 11:00:57 AM6/14/12
to xnat_di...@googlegroups.com

Thanks.. Will discuss at the workshop.

Main issue in this case is that most if not all cases that come in are.kn these random.burned dvds..which have some.built.in weird viewer... For data already on our pacs i definitely would push directly

--
You received this message because you are subscribed to the Google Groups "xnat_discussion" group.
To post to this group, send email to xnat_di...@googlegroups.com.
To unsubscribe from this group, send email to xnat_discussi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/xnat_discussion?hl=en.

Daniel Marcus

unread,
Jun 14, 2012, 12:07:08 PM6/14/12
to xnat_di...@googlegroups.com
Got it.  Then you'll need to use something like DicomBrowser.
-D

bennett landman

unread,
Jun 14, 2012, 9:30:43 PM6/14/12
to xnat_di...@googlegroups.com
For outside studies, we load the PHI into RedCAP when we receive the CD's via FedEx. Then, we push the DVD contents to a dcm4che listener node using dcm4che on the client side. The listener looks up the not-safe-for-XNAT id and replaces all keys with safe matching codes from RedCAP and sets the comment field appropriately. Then the listener auto-pushes the resulting file to the standard XNAT input port. This setup allows us to put almost any logic into the pipeline easily without dealing with specialized upload programs. This almost works perfectly... pending some "features" in that occasionally sessions don't merge quite right. Scott will be at the workshop and can discuss as well. 

David Gutman

unread,
Jun 15, 2012, 12:03:38 AM6/15/12
to xnat_di...@googlegroups.com
Ok--- I will be chatting with one or both of you then about that... we are in the process of also setting up a REDCap instance, so that would probably solve two problems at once.  

Thanks

On Thu, Jun 14, 2012 at 9:30 PM, bennett landman <bala...@gmail.com> wrote:
For outside studies, we load the PHI into RedCAP when we receive the CD's via FedEx. Then, we push the DVD contents to a dcm4che listener node using dcm4che on the client side. The listener looks up the not-safe-for-XNAT id and replaces all keys with safe matching codes from RedCAP and sets the comment field appropriately. Then the listener auto-pushes the resulting file to the standard XNAT input port. This setup allows us to put almost any logic into the pipeline easily without dealing with specialized upload programs. This almost works perfectly... pending some "features" in that occasionally sessions don't merge quite right. Scott will be at the workshop and can discuss as well. 

--
You received this message because you are subscribed to the Google Groups "xnat_discussion" group.
To view this discussion on the web visit https://groups.google.com/d/msg/xnat_discussion/-/PCD6hNR6Z0UJ.

To post to this group, send email to xnat_di...@googlegroups.com.
To unsubscribe from this group, send email to xnat_discussi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/xnat_discussion?hl=en.



--

Simon Doran

unread,
Jun 16, 2012, 5:06:08 AM6/16/12
to xnat_di...@googlegroups.com
David,

  Your experience is not unlike what we have encountered when uploading Phase 1 clinical trials to our XNAT system, particularly those done in the "bad old days".

  Although everyone appreciated the need for anonymisation, they all went about it in different ways with, essentially, no standardisation of which DICOM tags were kept or removed. Quite often, the anonymisation was done by hand with manufacturer tools at the scanner, and before moving data to the XNAT repository, I am very careful to check for odd bits of PHI in unexpected places. Furthermore, we have quite a heterogeneous portfolio of MR scanners and the different manufacturers include different fields (even between different models from the same manufacturer). Add to this the differences when we attempt to archive the data generated in multi-centre trials, where every participating team has a different local policy for clinical data.

  We spent quite a while thinking about how we manage this situation. Prospectively, things will be easier, because we are conscious of these considerations in the design of new multi-centre trials now. However, for retrospective data, we are proceeding as follows:

1. Use DicomBrowser.

2. Where whole trials have the same behaviour, set up an anonymisation script - very quick to do once you have a few examples - based on a test example of one of the files. If there are any existing data in the file (e.g., previous clinical trial patient codes) that can be used to assign an anonymised name, then put this into the script. Otherwise, assign an anonymised name via the DicomBrowser GUI.

3. Create a separate spreadsheet to record the mapping between the true name and anonymised name, print it out and physically lock it away somewhere.  

4. Regardless of whether you think you know what is happening when you run the script, always look at the DICOM metadata for each patient session before and after running the DicomBrowser anonymisation script and scan for unexpected PHI.

5. Use the "Send" feature of DicomBrowser to upload the data into XNAT, having previously used the script to setup the Patient Comments tag (0010,4000) appropriately.

Although this method won't be applicable to situations with thousands of patients, I find that it only takes about two or three minutes to process a single patient session and I managed to upload around 8 entire Phase 1 clinical trials over a weekend (about 120 imaging sessions in total), so it wasn't too onerous.

Simon


David Gutman

unread,
Jun 18, 2012, 7:47:04 AM6/18/12
to xnat_di...@googlegroups.com
Ok--  that seems reasonable.

The only issue I've had with DicomBrowser is that loading a session (and populating the tags) can be quite slow, depending on the # of files in the DICOM session.  Are there any JAVA tricks you've done to speed this up....   

I'll probably pick your brain a bit more at the workshop....




--
You received this message because you are subscribed to the Google Groups "xnat_discussion" group.
To view this discussion on the web visit https://groups.google.com/d/msg/xnat_discussion/-/dSFxsDV6fdEJ.

To post to this group, send email to xnat_di...@googlegroups.com.
To unsubscribe from this group, send email to xnat_discussi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/xnat_discussion?hl=en.

Archie, Kevin

unread,
Jun 18, 2012, 12:50:03 PM6/18/12
to xnat_di...@googlegroups.com, dicombrow...@googlegroups.com

Hi, David,

 

I haven’t made any recent changes to the DicomBrowser internals, though I am hoping to make some improvements in the next few months (mostly to support sequences) and could address performance issues along the way. Any details you can provide would be helpful: what version of DicomBrowser you’re using, your OS and version, and where the files you’re loading are: on a local drive, removable media, NFS mount, etc.

 

Thanks!

-          Kevin




The material in this message is private and may contain Protected Healthcare Information (PHI). If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.

David Gutman

unread,
Jun 18, 2012, 1:39:27 PM6/18/12
to dicombrow...@googlegroups.com, xnat_di...@googlegroups.com
Generally running 64 bit Windows 7 or Ubuntu 10.04 64bit---  I think the issue may be related to me trying to load too many sessions at once---  and it's usually from SAMBA/Windows shares..

I'll keep track of the specifics going forward...
Reply all
Reply to author
Forward
0 new messages