Hapi server suddenly grows memory usage nonstop

587 views
Skip to first unread message

ca...@alis-sol.com.br

unread,
Jul 25, 2018, 12:35:55 PM7/25/18
to HAPI FHIR
Hi all,

We're experiencing an issue that started out of nowhere on july 22nd. The server stopped responding and we saw that it was out of memory and with high cpu usage from the garbage collector. It was sudden, the server was stable for months, and the last change was more than a month ago.


The problem starts by itself in about 5minutes from startup (this changes if I up the number of connections to the database, it takes more time with more connections) with no clients and is always reproducible. So, after excluding a lot of variables like clients and other things, I tried reverting to a month old backup of the database and the problem is gone. Since there are no clients, the problem seems to be data-related and triggered by one of the routines that hapi executes from time to time.

We have batch loads from partners from time to time and it seems that some data that entered is the cause. Has anyone ever come across this?

Sometimes there are connection resets from the database or timeouts when the problem occurs. I have a feeling that calls are made to the database but take a long time to return, and thus things start to queue up and that causes the high memory usage.

Details about our setup are:
Hapi version: 2.5
Java 1.8
Amazon deployed
12GB of ram allocated for hapi's JVM
4 vCores
Oracle database (hardware shows no sign of being topped, on the contrary)

Thanks,

Carlos

James Agnew

unread,
Jul 25, 2018, 6:33:20 PM7/25/18
to Carlos Eduardo Lara Augusto, HAPI FHIR
It might be worth grabbing a heap dump and having a look at what Eclipse Mat has to say about it. If you're chewing through 12 Gb of RAM that does sound like a memory leak of some sort.

All of HAPI's scheduled jobs are set up to not run multiple times in parallel, so things shouldn't be backing up to the point of overloading, but who knows.

Note- I'm happy to have a peek at a memory dump (let me know if you need a way to transfer a large file to me) but of course a heap dump will likely have copies of your data in it so this is not something you probably want to do if this is a production system.

Cheers,
James



--
You received this message because you are subscribed to the Google Groups "HAPI FHIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hapi-fhir+...@googlegroups.com.
To post to this group, send email to hapi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hapi-fhir/57033f09-5fe0-4afc-8b42-0eed78dbdc36%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ca...@alis-sol.com.br

unread,
Jul 25, 2018, 7:49:46 PM7/25/18
to HAPI FHIR
Hi James,

We've found the cause. At the beginning of this month I'd changed the lifetime of the searches from 1h to 6h, because I had to go through all of the patients and practitioners and 1h was not enough. This led to A LOT of searches being inserted and not removed when we had a few days of batch loads recently.

Since the search delete routine on the hapi code loads all the content of the search table that is deletable and translates all rows to java objects, the memory use was going up fast. The search table had about 10 million entries and the search result table... more than 400 million. I think this must be tweaked, if it isn't already in the newer versions.

On our side, we're probably going to truncate the table at the database level daily. James, do you have any input / ideas on this? Since hapi works with a lot of databases, I understand the use of jdbc/hibernate but maybe there's a way to insert these scheduled tasks inside the database when creating it? Running it from there would be much faster and prevent these memory problems.

Cheers,

Carlos

ca...@alis-sol.com.br

unread,
Jul 25, 2018, 7:58:00 PM7/25/18
to HAPI FHIR
I just remembered that the old backup, from 21 june, is from before the change from 1h to 6h. Analyzing it, the search table had about 2 million entries and the search result table about 100 million entries. Probably, the problem was already there and we were not seeing it because it didn't amount to 12GB of java objects yet. Maybe it's worth it to take a look at that and see if there's something that makes some entries live forever.

Cheers,

Carlos

James Agnew

unread,
Jul 26, 2018, 4:48:03 AM7/26/18
to Carlos Eduardo Lara Augusto, HAPI FHIR
Ahh interesting.

We actually rewrote the search cleanup routines this year in order to avoid issues like this one. Which version of HAPI FHIR are you using?

Cheers,
James

--
You received this message because you are subscribed to the Google Groups "HAPI FHIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hapi-fhir+...@googlegroups.com.
To post to this group, send email to hapi...@googlegroups.com.

Carlos Eduardo Lara Augusto

unread,
Jul 26, 2018, 10:02:11 AM7/26/18
to James Agnew, HAPI FHIR
2.5.

We're having a hard time getting an OK to upgrade unfortunately. It'll be a good point if this is changed in the newer versions though.

Thanks,

Carlos


ca...@alis-sol.com.br

unread,
Jul 26, 2018, 4:33:46 PM7/26/18
to HAPI FHIR
Hi James,

We won't be able to upgrade from 2.5 anytime soon. Could you help me understand the search tables? I see that there are four of them:

HFJ_SEARCH
HFJ_SEARCH_INCLUDE
HFJ_SEARCH_PARM
HFJ_SEARCH_RESULT

HFJ_SEARCH_PARM holds search parameters and doesn't seem related to the temporary searches.

From the code, I gather that the other 3 tables have their content deleted by the stale search routine.

So, do you see any problems if I create a database routine that, let's say, truncates these 3 tables if HFJ_SEARCH_RESULT reaches 10 million or 100 million entries? Do you have a better suggestion? I just want to prevent the same thing from happening again with minimal changes.

One more thing... we're working on being able to raise multiple FHIR servers automagically if there's need. I assume this is possible but these scheduled routines would run at all servers. Did you ever come across a problem with this or should it be safe?

Thanks a lot!

Carlos
To unsubscribe from this group and stop receiving emails from it, send an email to hapi-fhir+unsubscribe@googlegroups.com.

na...@riseup.net

unread,
Oct 8, 2018, 6:28:16 AM10/8/18
to HAPI FHIR
Hi

I am facing similar issue: the tomcat memory is growing after each seach query and never being cleaned up.

At the beginning of this month I'd changed the lifetime of the searches from 1h to 6h

Apparently my lifetime is not well configured. Sadly, I have no idea how I can configure that.

Carlos Eduardo Lara Augusto

unread,
Oct 8, 2018, 10:19:20 AM10/8/18
to na...@riseup.net, HAPI FHIR
Hello natus,

In the HAPI JPA Server, file FhirServerConfig.java, method daoConfig(), you can set the search lifetime with:

retVal.setExpireSearchResultsAfterMillis()

If you have not changed this before, however, it doesn't seem likely to me that the 1h default would give you problems unless you've a huge amount of incoming searches and/or insufficient memory. Check the memory amount on your server machine AND JVM, and the number of entries on the 3 tables mentioned before. If you're in the hundreds of thousands, try removing all entries. If you can't make the problem go away and need a temporary solution until you find a better one, you can set up a database routine to delete everything on these tables when they reach a certain amount.

Cheers,

Carlos


--
You received this message because you are subscribed to a topic in the Google Groups "HAPI FHIR" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hapi-fhir/VNKlOsCTID4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hapi-fhir+...@googlegroups.com.

To post to this group, send email to hapi...@googlegroups.com.

na...@riseup.net

unread,
Oct 8, 2018, 11:16:27 AM10/8/18
to HAPI FHIR
Hi Carlos


In the HAPI JPA Server,


Thanks you so much for your answer. However I am not using the JPA server, but
the PLAIN server, with my own Database server implementation with a FifoMemoryPagingProvider .

So I am unsure the parameters you provide apply to me since 1 hour after a bunch of search queries
the overall ram used by tomcat is still the same : growing and growing after each search hit.

Carlos Eduardo Lara Augusto

unread,
Oct 8, 2018, 12:33:32 PM10/8/18
to na...@riseup.net, HAPI FHIR
Sorry, I can't help you with this then. I suggest you create a new thread to gather more attention. Try to provide more information too about the version you're using, your implementation and anything else you can, it should be helpful.

Good luck!




--
You received this message because you are subscribed to a topic in the Google Groups "HAPI FHIR" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hapi-fhir/VNKlOsCTID4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hapi-fhir+...@googlegroups.com.
To post to this group, send email to hapi...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages