Practical limits to bundle processing

153 views
Skip to first unread message

John Moehrke

unread,
Jan 12, 2024, 10:49:59 AM1/12/24
to HAPI FHIR
Is there a practical limit that HAPI server has on bundle processing? As in if a transaction bundle has 50,000 resource updates in it, is that going to work? 100,000; 50,000,000???

John Moehrke 🔥 Architect: Healthcare Informatics Standards - Interoperability, Privacy, and Security
IHE Co-Chair IT Infrastructure Planning and Technical
HL7 Co-Chair Security WG, FHIR FMG, FHIR facilitator, and 
FHIR Foundation founding member
Employee of By Light -- Contractor to VHA MyHealtheVet
JohnM...@gmail.com  |  M +1 920-564-2067  |  John.M...@bylight.com
 https://healthcaresecprivacy.blogspot.com

James Agnew

unread,
Jan 12, 2024, 11:33:47 AM1/12/24
to John Moehrke, HAPI FHIR
This is a fun question.

Obviously there's no conclusive answer and it depends on your specific infrastructure, data, etc etc etc...

But with that said, I've found that if your aim is maximizing how quickly you get data into the system, aiming for transaction bundles of around 1000 resources each seems to be the sweet spot (this assumes that you're firing many bundles in parallel and what you actually care about is how quickly you are getting your overall number of resources into the database).

In terms of how high can you go, the limiting factors are: 

- How long does the database itself allow a transaction to stay open (typically a very long time and generally not an issue)
- How long does the HTTP infrastructure allow a transaction to happen, including the client, the server, and any intermediates. This is almost always an issue when you try to handle very large transactions. A 50k transaction bundle is going to take a while to process, and many parts of your HTTP stack aren't going to like having an HTTP request sit open but with no traffic for 10 minutes.
- How much memory do you have on your client/server, since we need to be able to fit the entire transaction bundle into RAM both on the client (usually) and on the server (always).

From what I've seen on various HAPI/Smile implementations I'd say:

- 1k is optimal (as I said above)
- 5k is a very soft limit where you don't usually have to do much work in order to support it
- 25k is a soft limit where you're almost certainly going to have to make environment changes (network, memory) in order to get there
- I've never really seen anyone exceed 50k (although this doesn't mean people haven't)

Cheers,
James

--
You received this message because you are subscribed to the Google Groups "HAPI FHIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hapi-fhir+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hapi-fhir/CACDGQjuyCqi%2BoOVz5HP_8bPW0vJ__Ca%3D%2BtR2%3DFcQTPFJ2Q6d0Q%40mail.gmail.com.

John Moehrke

unread,
Jan 12, 2024, 11:36:01 AM1/12/24
to James Agnew, HAPI FHIR
excellent way to address an unanswerable question. You have exceeded on providing me useful detail. Thanks.

John Moehrke 🔥 Architect: Healthcare Informatics Standards - Interoperability, Privacy, and Security
IHE Co-Chair IT Infrastructure Planning and Technical
HL7 Co-Chair Security WG, FHIR FMG, FHIR facilitator, and 
FHIR Foundation founding member
Employee of By Light -- Contractor to VHA MyHealtheVet
JohnM...@gmail.com  |  M +1 920-564-2067  |  John.M...@bylight.com
 https://healthcaresecprivacy.blogspot.com


jo...@vermonster.com

unread,
Jan 12, 2024, 1:48:46 PM1/12/24
to HAPI FHIR
Then there's also the case where cloud provides limit the data size being sent; e.g. AWS has an API Gateway limit of 6MB I believe; I suspect Azure and Google clouds have similar limitations.

  John

James Agnew

unread,
Jan 12, 2024, 2:45:56 PM1/12/24
to jo...@vermonster.com, HAPI FHIR
That's a great point. It occurs to me that I've also seen people have issues with autoscaling cloud infrastructure killing off processes if they see any single HTTP request taking too long because they assume that the process is hung.

AWS Fargate is one I know I've seen people have this issue with.

Cheers,
James

Peter Micuch

unread,
Jul 15, 2024, 11:09:30 AM7/15/24
to HAPI FHIR
Just a question James. Those numbers presented in your reply to John are with or without profile validation enabled? Because it seems I am hitting HTTP infrastructure limits with much less than 1000 resources in the bundle. In my case it is around 100 resources, profile validation is turned on and I already hit default http client limit for one connection of 100 seconds.

Thanks&Regards,
Peter

Dátum: piatok 12. januára 2024, čas: 20:45:56 UTC+1, odosielateľ: james...@gmail.com

James Agnew

unread,
Jul 15, 2024, 11:20:39 AM7/15/24
to Peter Micuch, HAPI FHIR
The numbers I gave were definitely not including profile validation. Once you're doing that, the processing becomes much more CPU bound.

The FhirValidator has a setting called "concurrent bundle validation" which validates resources within a bundle in parallel (as opposed to treating the bundle as one large atomic thing to validate) which can speed up processing. If your aim is to get data into the system as quickly as possible though, setting up a pipeline where your data is validated before ingestion, or adding way more CPU power to the pipeline are really your only options.

Cheers,
James

Reply all
Reply to author
Forward
0 new messages