Thoughts on using Codex in a HIPAA / PHI environment.

Kevin Toppenberg

unread,

Jun 14, 2026, 6:38:23 PMJun 14

to Hardhats

Hey all,

Some of you may know that we have a custom HL7 interface implemented on our system. We have interfaced with 3 or 4 entities now and all has gone fairly well for many years.

Recently our lab provider, PathGroup, decided we were too small a fish to provide phlebotomy services for, and are pulling out. So we are switching over to LabCorp. That should be fine. But whereas Pathgroup just gave us their hl7 messages and let us do as we wished, LabCorp has a many-step dance they want us to go through. It is going to be a LONG process (probably 6 months or more base on their current speed).

So as an interim solution, we are going to get the results from their web portal and convert those results to an hl7 message, and then ingest that.

This would be a huge job, but I have been very happy with how fast we have been able to get this almost done. Friday we developed a script for logging in and getting an unformatted block of data. Today, I was able to refine this to formatted .json output. And then take that .json and create an output .hl7 message.

But what I wanted to talk about is the solutions we used to maintain privacy and not expose PHI (protected health information) data. Here are my takeaways.

1) the AI is very familiar with the need to keep PHI private. I explicitly put everything private inside a "private" subfolder, and tell codex to never go in there. I follows these instructions and reiterates repeatedly when working that it did NOT output any PHI to log files etc and that it did not inspect file names that might include patient names/DOB etc.

2) When I first started, I kind of thought that codex ITSELF was interacting with the web page. While I think it can do that if directly instructed to do so, it generally just makes scripts and runs them. So there are several actors at play: 1) me the developer, 2) codex the AI, 3) the scripts running on the computer, 4) the browser responding to the script.

3) So how to have the AI know what to expect when it is blinded to web content with PHI? There are a couple of techniques. I would tell it, "OK, after you click this button, the following page will contain PHI. You should NOT inspect that. Instead, pause the process after selecting the button, and I will get the HTML for you." It would then make the script to pause at that point until ENTER had been struck to continue. I could then either download the entire HTML and manually redact the PHI, or I could use the developer tools to find a particular table and show it that html fragment. Often these fragments will have classes or identifiers that allow the script to locate them in the future. Also, I would have it generate an output, and I would manually redact that and paste that back into the conversation.

Anyway, I had thought that I would need a very robust computer and local AI to be able to do anything with health information. And I guess that is still true for patient notes etc. But for lab results, there is no risk in letting the AI see that the potassium = 3.5 and Hgb is 10.7 as long as it is not linked to an identifiable patient.

I hope this helps someone.

Kevin T

Sam Habiel

unread,

Jun 15, 2026, 9:12:03 AMJun 15

to hard...@googlegroups.com

You kind of buried the lead: you wrote a web scraper using AI in almost no time.

--Sam

--
--
http://groups.google.com/group/Hardhats
To unsubscribe, send email to Hardhats+u...@googlegroups.com

---
You received this message because you are subscribed to the Google Groups "Hardhats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hardhats+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/hardhats/9465e64b-91c5-4f14-9cbf-3aee84fb2ae5n%40googlegroups.com.

Kevin Toppenberg

unread,

Jun 16, 2026, 9:03:25 AMJun 16

to Hardhats

Sam,

Yes, we did it quickly. As of this morning, we have a complete cycle of scraping data from the web, converting it to HL7 and then ingesting the HL7 so that lab results appear in CPRS. We achieved this in 3 business days, which I think is an amazing feat. But this kind of story is becoming old hat these days. What I thought would be useful is that with proper precautions, it can still be used with PHI.

Kevin

Sam Habiel

unread,

Jun 16, 2026, 9:31:24 AMJun 16

to hard...@googlegroups.com

> But this kind of story is becoming old hat these days.

If you are following Silicon Valley closely, but many of us are not working there, and mostly take what people say with a grain of salt. There is one truth in 10 lies. Indeed, the way we have been programming will now change forever.

--Sam

To view this discussion visit https://groups.google.com/d/msgid/hardhats/2c69b9dc-6a82-4957-8c11-49100f6a48f3n%40googlegroups.com.

Reply all

Reply to author

Forward