FOIA request for a deidentified VISTA database

shyma...@gmail.com

unread,

Apr 3, 2024, 7:07:59 PMApr 3

to Hardhats

Hello everyone!

I am interested in obtaining some basic fields of a real deidentified VISTA database:

Patient: DOB, sex
Patient + Date: symptom , diagnosis, treatment
I will choose the fields from the FileMan definitions.

Does anyone have some actual experience in doing this?

Thanks,

Charlie

Nancy Anthracite

unread,

Apr 3, 2024, 7:52:25 PMApr 3

to Hardhats, shyma...@gmail.com

Truely didentified databases are mostly fiction. Have you taken a look at
Mitre's Synthea project. Work has been done to import these patients into
VistA.

https://synthea.mitre.org

--
Nancy Anthracite

shyma...@gmail.com

unread,

Apr 4, 2024, 7:52:33 PMApr 4

to Hardhats

Hi Nancy,

Glad to see that you're still always there. I noticed you in the current conversations as well.
You are the ChatGPT of the MUMPS world. (A very high complement, indeed.)

The Mitre project seems to be randomly generated (but reasonable) data. Not what I seek.

I'm not quite sure what you mean by ”Truely didentified databases are mostly fiction." It's hard to hide identities? People don't do it right? Actually, I once wrote a very nice automatic link between a generic deidentification app that Partners bought and their 114 medical applications. It ran all night and created 96,000 SQL Table definitions for their MUMPS globals and umpteen fields covered by HIPAA. I have no idea exactly what the app did with that info.

Maybe I should explain more about what I'm thinking. I would like to have a database of medical histories to look for real life patterns. There are numerous possible applications that I could imagine.

Do people do that? Then don't they get a deidentified database?

Charlie

Kevin Toppenberg

unread,

Apr 4, 2024, 8:51:44 PMApr 4

to Hardhats

Charlie,

There have been many examples of people thinking that they de-identified data, only to have clever people figure out how to re-identify people. This is the first result that Google gives about failed deidentification: https://news.uchicago.edu/story/common-deidentification-methods-dont-fully-protect-data-privacy-study-finds

When it comes to patient info, it is all the more important because not only is it bad for patient care, there are also reporting requirements and fines to be paid.

Imagine a patient record contains a patient's name, DOB, address, SSN etc. All of that is removed. But then in the narrative, there is mention of a brother living in Colorado. And mention of buying a car, and having 2 sons and 1 daughter (names removed). And perhaps it can be figured out what town the patient lives in. Even from this limited info, it is readily possible to use this information to figure out the true name of the patient. And then the part about them having lung cancer, which they were trying to keep private, is now revealed to everyone. My examples might not be the best, but think of all the other facts that seem anonymous, but when put together actually tell a tale.

The end result is that it's just not safe to try to release real patient's records. And ESPECIALLY not with relying on some automated process.

Best wishes

Kevin T

Matthew R. Wilson

unread,

Apr 4, 2024, 9:27:56 PMApr 4

to hard...@googlegroups.com

On 04.04.2024 16:52, shyma...@gmail.com wrote:
>The Mitre project seems to be randomly generated (but reasonable) data.
> Not what I seek.
>

>Maybe I should explain more about what I'm thinking. I would like to have
>a database of medical histories to look for real life patterns. There are
>numerous possible applications that I could imagine.

HIPAA's requirements for deidentification under the safe harbor method (i.e.
the standard you can satisfy to keep you in the clear legally) would mean that
all dates would be reduced to only the year, and on top of that, birthdates >89
years old can't even include the year, only a generic "90 years old and over"
indication. Admission dates, discharge dates, etc (at my previous employer we
took that to mean ANY date of service whether inpatient or outpatient) also
have to reduce to 1-year resolution (drop the month and day).

That alone would get in the way of any kind of fine-grained pattern analysis
unless you're looking for very broad strokes ("someone born sometime in 1950
had a diagnosis of X in 1978 and procedure Y sometime in 1983"... but if 8
procedures, diagnoses, etc were made throughout 1985, you wouldn't know when in
that year they happened, just that they happened all in that one year).

So even if the VA has some process to release real de-identified data (I highly
doubt they'd agree to that, though; I agree with the assertion that the idea of
truly infallible de-identified data is a myth), they would at a minimum meet
the HIPAA safe harbor standard, and I'm not sure that gives you the granularity
of data you'd be looking for.

>Do people do that? Then don't they get a deidentified database?

I'm guessing a lot of statistical analysis is done on the *real* (or
deidentified) data by the covered entity themselves, or another organization
working under a BAA with the source of the data. We deidentified data
internally to pass to the analysis folks, but we would NEVER release that data
externally because we didn't feel comfortable with that liability; as a covered
entity or a business associate, if one of our people analyzing the data DID
"see through" the deidentification and accidentally come across a link between
PHI and PII, it wasn't necessarily a breach. Everyone getting anywhere close to
that data, even de-identified, was always up-to-date on mandatory HIPAA
training, working on computer systems/networks that were managed under our
HIPAA security policies, etc. But if that happened after we passed the data
along outside the scope of the covered entity or outside the permissible uses
of a BAA that we received the data under, it'd be a much bigger deal. So no way
we'd let out that data even if we thought we had de-identified it sufficiently.

-Matthew (been out of the industy for a few year, so take what I say with
an appropriate quantity of salt grains)

Nancy Anthracite

unread,

Apr 4, 2024, 9:47:41 PMApr 4

to Hardhats, shyma...@gmail.com

I agree with Kevin. You may be able to use the Synthea patients for a proof
of concept so that someone with access to patient data and an institutional
reveiw board blessing would be willing to try it on their patient database.

--
Nancy Anthracite

On Thursday, April 4, 2024 7:52:33 PM EDT shyma...@gmail.com wrote:
> Hi Nancy,
>
> Glad to see that you're still always there. I noticed you in the current
> conversations as well.
> You are the ChatGPT of the MUMPS world. (A very high complement, indeed.)
>
> The Mitre project seems to be randomly generated (but reasonable) data.
> Not what I seek.
>
> I'm not quite sure what you mean by ”Truely didentified databases are
> mostly fiction." It's hard to hide identities? People don't do it right?
> Actually, I once wrote a very nice automatic link between a generic
> deidentification app that Partners bought and their 114 medical
> applications. It ran all night and created 96,000 SQL Table definitions
> for their MUMPS globals and umpteen fields covered by HIPAA. I have no
> idea exactly what the app did with that info.
>
> Maybe I should explain more about what I'm thinking. I would like to have
> a database of medical histories to look for real life patterns. There are
> numerous possible applications that I could imagine.
>
> Do people do that? Then don't they get a deidentified database?
>
> Charlie
>
> On Wednesday, April 3, 2024 at 7:52:25 PM UTC-4 Nancy Anthracite wrote:
>
> > Truely didentified databases are mostly fiction. Have you taken a look at
> > Mitre's Synthea project. Work has been done to import these patients into
> > VistA.
> >
> > https://synthea.mitre.org
> >
> >

shyma...@gmail.com

unread,

Apr 6, 2024, 11:49:05 PMApr 6

to Hardhats

Thanks a lot, Kevin. Interesting problem. I have no interest in (would not request) any narrative (free-text) data. Just the basics that I mentioned.

Take care,

Charlie

shyma...@gmail.com

unread,

Apr 7, 2024, 12:36:39 AMApr 7

to Hardhats

Hi Matthew,

Thanks for the analysis. I am not interested in the actual date something occurred - just the age of the patient at the time. If they were born in year x and the event year is y, they are y-x-1 or y-x years old. The ages would be in ranges so e.g. 10-19 would include everyone from 10-11 through 18-19. Likewise for any other number of years between events.

The nice thing is that I don't have to check any particular disease or treatment. I simply look for the efficacy of everything that occurs. This also gives me a read on what is being prescribed. The question of what prompted what decision or created which result would be a very interesting problem of logic and statistics. As always, start with the simple (unambiguous) cases then try to extend (generalize) it to the more complex scenarios. And then the best part: comparing it to more conventional clinical trials.

Always interested in your NaCl.

Take care,

Charlie

Nancy Anthracite

unread,

Apr 7, 2024, 7:48:52 AMApr 7

to Hardhats, shyma...@gmail.com

HIPAA's deidentification is totally inadequate and has been known to be inadequate since HIPAA was passed. If you want to make a FOIA request to the VA, which I hope they will deny, the FOIA site it here:

https://department.va.gov/foia/

--

Nancy Anthracite

shyma...@gmail.com

unread,

Apr 7, 2024, 9:14:26 AMApr 7

to Hardhats

Where is the breach in my request?

Patient: DOB, sex
Patient + Date: symptom , diagnosis, treatment

I say above that Year of Birth would also suffice. Another alternative would be to leave out DOB and replace any other date with the date minus the DOB. That would be even better than YOB to some extent.

Thanks for the link.

Charlie

Nancy Anthracite

unread,

Apr 7, 2024, 9:58:16 AMApr 7

to Hardhats, shyma...@gmail.com

People get reidentified from voting records with YOB. Believe me, you are a
babe in the woods. Databases can't get deidentified sufficiently in today's
world with so much data avaliable. Unfortunately, HIPAA allows much to much
data out, and people don't care to block it sufficiently, including the federal
government. Take a look at http://protecthealtcareprivacy.net to get an
idea how much is shared about you and everyone else, not even counting the
hackers.

--
Nancy Anthracite

On Sunday, April 7, 2024 9:14:26 AM EDT shyma...@gmail.com wrote:
> Where is the breach in my request?
>
> Patient: DOB, sex
> Patient + Date: symptom , diagnosis, treatment
>
> I say above that Year of Birth would also suffice. Another alternative
> would be to leave out DOB and replace any other date with the date minus
> the DOB. That would be even better than YOB to some extent.
>
> Thanks for the link.
>
> Charlie
> On Sunday, April 7, 2024 at 7:48:52 AM UTC-4 Nancy Anthracite wrote:
>
> > HIPAA's deidentification is totally inadequate and has been known to be
> > inadequate since HIPAA was passed. If you want to make a FOIA request to
> > the VA, which I hope they will deny, the FOIA site it here:
> >
> > https://department.va.gov/foia/
> >
> >

shyma...@gmail.com

unread,

Apr 7, 2024, 10:54:02 AMApr 7

to Hardhats

There are 19 million veterans in the United States and 152 VA hospitals, for 125,000 veterans per VA hospital. If we assume no more than 100 years account for almost all of them, that's an average of 1,250 veterans born in each year within each VA hospital database. Not to mention the nonveterans (16 for each veteran) who vote.

How do you tell which of them is the patient?

I also said that Date minus DOB would suffice without including the DOB. What would be the problem then? Then I have only their sex.

Charlie

Nancy Anthracite

unread,

Apr 7, 2024, 4:24:21 PMApr 7

to Hardhats, shyma...@gmail.com

I have no control over whether or not you manage to get you hands on a
database, but if I did, it would not happen. Things like this get done with
employees of organizations, within the organization, with tight controls on
what can be accessed when it is done right.

Perhaps you would like to look at OHDSI.org.

--
Nancy Anthracite

On Sunday, April 7, 2024 9:14:26 AM EDT shyma...@gmail.com wrote:

> Where is the breach in my request?
>
> Patient: DOB, sex
> Patient + Date: symptom , diagnosis, treatment
>
> I say above that Year of Birth would also suffice. Another alternative
> would be to leave out DOB and replace any other date with the date minus
> the DOB. That would be even better than YOB to some extent.
>
> Thanks for the link.
>
> Charlie
> On Sunday, April 7, 2024 at 7:48:52 AM UTC-4 Nancy Anthracite wrote:
>
> > HIPAA's deidentification is totally inadequate and has been known to be
> > inadequate since HIPAA was passed. If you want to make a FOIA request to
> > the VA, which I hope they will deny, the FOIA site it here:
> >
> > https://department.va.gov/foia/
> >
> >

Nancy Anthracite

unread,

Apr 7, 2024, 4:47:51 PMApr 7

to Hardhats, shyma...@gmail.com, Nancy Anthracite

PS, talk to Lexus-Nexus. They will be happy to sell it to you.

--
Nancy Anthracite

On Sunday, April 7, 2024 4:24:14 PM EDT Nancy Anthracite wrote:
> I have no control over whether or not you manage to get you hands on a
> database, but if I did, it would not happen. Things like this get done with
> employees of organizations, within the organization, with tight controls on
> what can be accessed when it is done right.
>
> Perhaps you would like to look at OHDSI.org.
>

shyma...@gmail.com

unread,

Apr 7, 2024, 5:40:35 PMApr 7

to Hardhats

Nobody has shown me how you can take a sequence of symptoms, diagnoses and treatments with the time from each to the next, and map it to a unique male or to a unique female who had that. It seems you would have to have a database of that information plus the identity of the patient to match it to - in short, already have that information. This is because I am getting nothing about the person per se except their sex. It is possible for any number of different people to have the same sequence. It's more like, "How many people had this sequence?"

But I am still open to suggestions, of course.

So all of this is great, helping me tighten up my specification. I have several people to thank for this, especially the ChatGPT of the MUMPS world.

Many thanks.

People new to MUMPS call it MUMPS.
People who have used it for awhile call it Cache.
People who have used it for decades call it MUMPS.

All the best,
Charlie Volkstorf

Kevin Toppenberg

unread,

Apr 10, 2024, 4:46:30 PMApr 10

to Hardhats

Slight typo. URL should be: https://protecthealthcareprivacy.net/

shyma...@gmail.com

unread,

Apr 12, 2024, 4:46:04 PMApr 12

to Hardhats

Well thanks, Kevin. Shame on them.

I'm not sure what the HIPAA standard has to do with me - I only include the sex.
Are you saying that the only standard that anyone recognizes is HIPAA but that's not good enough so nobody lets anything out?

Do you think my sex-only dataset could be reidentified?

Charlie Volkstorf

Kevin Toppenberg

unread,

Apr 25, 2024, 10:17:47 AMApr 25

to Hardhats

Charlie,

Sorry for the late reply.

"The devil's in the details," is how I think about this. If I tell you that I think you will be fine with including only gender (sex), can I really be sure of what you are revealing? I would have to look at specifics.

Another thing that came to mind is that you might be thinking that if you have a database of 10,000 persons, how could all those persons be re-identified. But I think what Nancy and I are saying is that perhaps only 3 persons from that database could be re-identified. That would seem great, a very high statistical success. But it would still be a failure in our estimation. It needs to have a 0% re-identification.

Best wishes,

Kevin

shyma...@gmail.com

unread,

May 3, 2024, 10:53:26 AMMay 3

to Hardhats

Kevin,

Nice analysis. I am not thinking in terms of probability. (If I did, I would present that probability, but would be remiss to think in those terms.) I am thinking of logical possibilities.

The unit of information is a sex and a list of events (symptom , diagnosis or treatment) with the number of days between each event - except the last one - and the next.

Example (3 events): SEX + (E1,N1,E2,N2,E3) where Ei=event, Ni=number of days.

How could we derive (infer) other information from this?
I see one principle.

Charlie

Nancy Anthracite

unread,

May 3, 2024, 11:16:10 AMMay 3

to Hardhats, shyma...@gmail.com

You should give this up and use the Synthea patients to make your point for
the capability of your software or whatever it is you have in mind.

--
Nancy Anthracite

David Whitten

unread,

May 3, 2024, 6:17:50 PMMay 3

to hard...@googlegroups.com, shyma...@gmail.com

The Synthea patients are nice because they are driven by disease models and corresponding probabilities from the Centers for Disease Control and Prevention, and generate visits, meds, lab results and such. Download link: https://synthea.mitre.org/downloads

A paper about this is at https://www.osti.gov/servlets/purl/1507868

There is a model for the state of Massachusetts called SynthMass.

The effort is developed through MITRE.
The code is on GitHub at https://github.com/synthetichealth/synthea/wiki/Basic-Setup-and-Running

I hope this helps

Dave Whitten

713-870-3834

PS Hi Charley !

On Fri, May 3, 2024 at 10:16 AM Nancy Anthracite <nanth...@earthlink.net> wrote:
>
> You should give this up and use the Synthea patients to make your point foro

> --
> --
> http://groups.google.com/group/Hardhats
> To unsubscribe, send email to Hardhats+unsubscribe@googlegroups.com
>
> ---
> You received this message because you are subscribed to the Google Groups "Hardhats" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to hardhats+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/hardhats/13507892.uLZWGnKmhe%40owl.

shyma...@gmail.com

unread,

May 15, 2024, 4:51:50 PMMay 15

to Hardhats

If the Synthea data is random (vs known correlations), then presumably that would only indicate that I know how to program in MUMPS.

If, OTOH, it is guided by known correlations, then I should be able to derive some of those correlations and it is a very nice idea (proof of concept) and orders of magnitude less costly and more promising, and I owe you a heartfelt "Thanks Nancy!!! Good one." (as opposed to "Nice try.") That also brings up an interesting tangential question of the source(s) of the correlations used.

As is, I'll be waiting with bated breath. (I warned that I am less into probability/statistics and more into logical possibilities.)

Charlie

shyma...@gmail.com

unread,

May 15, 2024, 5:48:20 PMMay 15

to Hardhats

"He's alive!" (w/ apologies to Boris Karloff)

Dave,

I was about to start graphing the probability of your having met your ultimate demise (you have abandoned MUMPS) as a function of time. I think it had reached the 50-50 point. Ewww

My tunnel vision has done me in yet again. You have answered my long-winded reply to Nancy just finished. But first . . .

Thanks Nancy!!! Good one.

And thanks Dave!!! X-O-lent.

I should apologize for being too lazy (or is it too dumb? and which is worse?) to put more energy into research and less into asking for a hand-out. As the republicans say, "The sign at the zoo says do not feed the monkeys lest they become dependent on hand-outs."

I see your wealth of knowledge and generosity continue unabated. :)

I am running out of excuses for not implementing a prototype.

Stay tune (if anybody is interested.)

Charlie

P.S. Hmmm . . . You seem to be more interested in MUMPS than G '31 and T '37. But I understand your misguided belief that you are saving lives (as opposed to making billing companies rich.) I once thought that. Congratulations, you have kept your youthful naivete intact.

> --
> --
> http://groups.google.com/group/Hardhats
> To unsubscribe, send email to Hardhats+u...@googlegroups.com

>
> ---
> You received this message because you are subscribed to the Google Groups "Hardhats" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to hardhats+u...@googlegroups.com.

Reply all

Reply to author

Forward