Analyzing Out Of Memory exceptions in HAPI FHIR

785 views
Skip to first unread message

Shlomy Reinstein

unread,
Sep 28, 2017, 2:45:16 AM9/28/17
to HAPI FHIR
Hi,

We're using HAPI FHIR 2.5 as cloud services, using CloudFoundry. Each instance has 2GB memory, and recently we started experiencing daily crashes of these instances due to Out Of Memory exceptions. The crashes frequently occur during idle times (e.g. at night, for instances that are only used during the day for performance analysis).
Can you provide some tips on how to analyze these crashes?
  • Are there official memory requirements for the HAPI server? (e.g. depending on DB size, number of patients, or anything like that)
  • Is there some optional logging that can help analyze the issue, which can be turned on? (we currently have default logging)
Some additional notes about these instances that may shed some light on the memory issues:
  • CloudFoundry arranges all the memory of the instance (heap, stack, code, ...) using some heuristics, leaving less than 1.5GB for the heap, if I interpret the logs correctly.
  • The instances are monitored using New Relic - which adds some memory overhead (hopefully this should be negligible).
  • Our HAPI server contains a few small features in addition to the open source code: Security (API signing), Event notification on resource updates (sent to RabbitMQ)
  • For now, we disabled paging - so query results are returned at once in a single bundle. This might cause a problem for large queries, but even running such queries manually using the HAPI admin UI doesn't cause OOM, and many of the OOM issues occur at idle times.
  • We use MySQL (also in the cloud) as the DB.
Thanks,
Shlomy

James Agnew

unread,
Oct 4, 2017, 11:27:25 AM10/4/17
to Shlomy Reinstein, HAPI FHIR
Hi Shlomy,

That's interesting for sure. 2Gb should be enough RAM for HAPI to run comfortably. That's the same amount of RAM that the fhirtest.uhn.ca server has actually. We generally allocate 4Gb in our own production instances though, fwiw.

There are no official requirements for HAPI though- Usage pattterns are so different between every system that it's hard to come up with blanket numbers.

Are you in a position to get a heap dump when it runs out, so you can run it through something like Eclipse mat? I've done this before with some success to figure out memory leaks.

Cheers,
James

--
You received this message because you are subscribed to the Google Groups "HAPI FHIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hapi-fhir+unsubscribe@googlegroups.com.
To post to this group, send email to hapi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hapi-fhir/5ce81b71-2e4e-465a-8713-53a840ff494e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Shlomy Reinstein

unread,
Oct 4, 2017, 11:41:53 AM10/4/17
to James Agnew, HAPI FHIR
Hi,
Thanks for your reply. We're looking into the option of having a heap dump on OOM. However, we're running on cloudfoundry, without a volume service, so for now we're unable to create those on OOM. Even if they were created, the dump would be lost since the instance is killed with its internal storage.
We disabled the paging provider, but this on itself is not the reason for the OOM. HAPI seems to work fine even with queries of 100,000 diagnostic reports that are returned in a single query.
The strange thing is, the issue is not related to load. It sometimes happens when the instance had been idle for hours, so something else must be running even when the instance is idle.

Finally, we tried with 4GB instance and it got an OOM too. The 4GB instance had about 2.7GB of java heap space (the rest was allocated for code, native heap etc).

Shlomy

בתאריך 4 באוק׳ 2017 18:27,‏ "James Agnew" <james...@gmail.com> כתב:

James Agnew

unread,
Oct 4, 2017, 4:28:02 PM10/4/17
to Shlomy Reinstein, HAPI FHIR
Wow yeah, 2.7 Gb definitely does not seem like a reasonable amount of HEAP for it to need, that does sound like there must be a memory leak somewhere.

Is there any way of deploying the same binary (or something very similar) on a non-cloudfoundry server for the purpose of getting a heap dump? I feel like that would be the easiest way to proceed if it's possible.

I guess one other thing to consider.. How did you disable paging? I'm assuming you nulled the paging provider entirely, but if you swapped in the FifoMemoryPagingProvider that would mean that potentially lots of results are being kept in memory. One search might not exhaust your RAM but lots of searches could.

Cheers,
James

Shlomy Reinstein

unread,
Oct 8, 2017, 1:56:59 AM10/8/17
to James Agnew, HAPI FHIR
Thanks for the help. To disable paging, I simply commented out the line from JpaServerDemo which sets the paging provider. The result of this is a null provider.
Are there periodic tasks running in HAPI FHIR that may cause it to crash with OOM when the FHIR service is idle? (receiving no calls for several hours)

Thanks,
Shlomy

James Agnew

unread,
Oct 8, 2017, 9:07:17 AM10/8/17
to Shlomy Reinstein, HAPI FHIR
Hi Shlomy,

There are a few scheduled tasks that run in the background- One looks for resources that need reindexing, one expires old searches, one polls for subscriptions that have fired (this last one is gone as of HAPI FHIR 3.0.0).

None of these should cause memory issues though. We run lots of production servers with all of these enabled for long periods of time without running out of memory, including on our fhirtest.uhn.ca server.

I'm totally curious what's causing your issue, that's for sure..

Cheers,
James
Reply all
Reply to author
Forward
0 new messages