Thoughts on using Codex in a HIPAA / PHI environment.

4 views
Skip to first unread message

Kevin Toppenberg

unread,
6:38 PM (5 hours ago) 6:38 PM
to Hardhats
Hey all,

Some of you may know that we have a custom HL7 interface implemented on our system.  We have interfaced with 3 or 4 entities now and all has gone fairly well for many years. 

Recently our lab provider, PathGroup, decided we were too small a fish to provide phlebotomy services for, and are pulling out.  So we are switching over to LabCorp.  That should be fine.  But whereas Pathgroup just gave us their hl7 messages and let us do as we wished, LabCorp has a many-step dance they want us to go through.  It is going to be a LONG process (probably 6 months or more base on their current speed). 

So as an interim solution, we are going to get the results from their web portal and convert those results to an hl7 message, and then ingest that.  

This would be a huge job, but I have been very happy with how fast we have been able to get this almost done.  Friday we developed a script for logging in and getting an unformatted block of data.  Today, I was able to refine this to formatted .json output.  And then take that .json and create an output .hl7 message. 

But what I wanted to talk about is the solutions we used to maintain privacy and not expose PHI (protected health information) data.  Here are my takeaways. 

1) the AI is very familiar with the need to keep PHI private.  I explicitly put everything private inside a "private" subfolder, and tell codex to never go in there.  I follows these instructions and reiterates repeatedly when working that it did NOT output any PHI to log files etc and that it did not inspect file names that might include patient names/DOB etc. 

2) When I first started, I kind of thought that codex ITSELF was interacting with the web page.  While I think it can do that if directly instructed to do so, it generally just makes scripts and runs them.  So there are several actors at play: 1) me the developer, 2) codex the AI, 3) the scripts running on the computer, 4) the browser responding to the script.  

3) So how to have the AI know what to expect when it is blinded to web content with PHI?  There are a couple of techniques.  I would tell it, "OK, after you click this button, the following page will contain PHI.  You should NOT inspect that.  Instead, pause the process after selecting the button, and I will get the HTML for you."   It would then make the script to pause at that point until ENTER had been struck to continue.  I could then either download the entire HTML and manually redact the PHI, or I could use the developer tools to find a particular table and show it that html fragment.  Often these fragments will have classes or identifiers that allow the script to locate them in the future.  Also, I would have it generate an output, and I would manually redact that and paste that back into the conversation. 

Anyway, I had thought that I would need a very robust computer and local AI to be able to do anything with health information.   And I guess that is still true for patient notes etc.  But for lab results, there is no risk in letting the AI see that the potassium = 3.5 and Hgb is 10.7 as long as it is not linked to an identifiable patient.  

I hope this helps someone. 

Kevin T
Reply all
Reply to author
Forward
0 new messages