Lucene indexes and search tables

359 views
Skip to first unread message

dark...@gmail.com

unread,
Nov 17, 2017, 11:56:36 AM11/17/17
to HAPI FHIR
Hey,

I've been running the JPA server (v2.5 most recently) in a container based hosting environment (I believe they are using Docker). I've recently noticed that every time I redeploy the server there are Lucene indexes that get wiped out and need to be rebuilt which causes a decently large heap spike, and thereby requires extra heap space to bring the server back up.

One tactic I've used to get the server back up without increasing the container's memory limit (which is already at 2 GB) is to truncate the HFJ_SEARCH and HFJ_SEARCH_RESULT tables. Once the records are cleared out of those tables the spike is small enough to handle. This leads me to a few questions:

1) Is there a reason why all of the records are in the HFJ_SEARCH* tables? As far I as knew the cached search results should be expiring after a minute by default. Is it safe to have a background process delete the records out of that table after they are more than a minute old?
2) Do the HFJ_SEARCH* tables get indexed for any particular reason or does Hibernate/Lucene just index everything by default?
3) Is it possible to completely disable Lucene indexing? I've read conflicting things about whether it is possible and what is the correct approach. Does any functionality break if the indexing is disabled?

Thanks,
Kyle

James Agnew

unread,
Nov 18, 2017, 5:59:55 AM11/18/17
to Kyle Meadows, HAPI FHIR
Hi Kyle,

I'm not sure what is going on, but Lucene does not index either of those tables (it only indexes HFJ_RESOURCE and a few terminology tables).  Are you perhaps referring to Derby? If so, you almost certainly want to migrate to a more scalable database platform (e.g. Postgres) if you're doing "real things" with the server.

There is already a scheduled process that expires old search result cache entries from the tables you list, it can be configured using the ExpireSearchResultsAfterMillis property.

Cheers,
James


--
You received this message because you are subscribed to the Google Groups "HAPI FHIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hapi-fhir+unsubscribe@googlegroups.com.
To post to this group, send email to hapi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hapi-fhir/86f7cb54-f742-41f1-b3df-65e964376279%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

dark...@gmail.com

unread,
Nov 27, 2017, 9:42:48 AM11/27/17
to HAPI FHIR
Hey James, thanks for the reply. 

I'll implement that property and see if it works- the docs indicate that it should already be enabled by default (purging after 1 hour) but I'll set both expire search results properties manually to see if it helps. Also, since my Lucene assumption was wrong, can you think of any other reason why having all those records in the DB would cause our container to blow it's memory limits upon application startup? I will see if I can yank a memory dump out of the hosting environment when it is in this state but I don't have direct control on the env, ugh.

I am using MySQL for a DB at the moment. I'm only using the server to host test data, but automated processes are running a large number of searches against the server throughout the day. Are there any known issues with the JPA server and MySQL that would make migrating to Postgres worthwhile? I think both the Postgres and MySQL jdbc drivers have an unlimited default fetch size- I can also try configuring the fetch size for to see if it helps.

Thanks again,
Kyle


On Saturday, November 18, 2017 at 5:59:55 AM UTC-5, James Agnew wrote:
Hi Kyle,

I'm not sure what is going on, but Lucene does not index either of those tables (it only indexes HFJ_RESOURCE and a few terminology tables).  Are you perhaps referring to Derby? If so, you almost certainly want to migrate to a more scalable database platform (e.g. Postgres) if you're doing "real things" with the server.

There is already a scheduled process that expires old search result cache entries from the tables you list, it can be configured using the ExpireSearchResultsAfterMillis property.

Cheers,
James

On Fri, Nov 17, 2017 at 5:56 PM, <dark...@gmail.com> wrote:
Hey,

I've been running the JPA server (v2.5 most recently) in a container based hosting environment (I believe they are using Docker). I've recently noticed that every time I redeploy the server there are Lucene indexes that get wiped out and need to be rebuilt which causes a decently large heap spike, and thereby requires extra heap space to bring the server back up.

One tactic I've used to get the server back up without increasing the container's memory limit (which is already at 2 GB) is to truncate the HFJ_SEARCH and HFJ_SEARCH_RESULT tables. Once the records are cleared out of those tables the spike is small enough to handle. This leads me to a few questions:

1) Is there a reason why all of the records are in the HFJ_SEARCH* tables? As far I as knew the cached search results should be expiring after a minute by default. Is it safe to have a background process delete the records out of that table after they are more than a minute old?
2) Do the HFJ_SEARCH* tables get indexed for any particular reason or does Hibernate/Lucene just index everything by default?
3) Is it possible to completely disable Lucene indexing? I've read conflicting things about whether it is possible and what is the correct approach. Does any functionality break if the indexing is disabled?

Thanks,
Kyle

--
You received this message because you are subscribed to the Google Groups "HAPI FHIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hapi-fhir+...@googlegroups.com.

James Agnew

unread,
Nov 27, 2017, 11:47:32 PM11/27/17
to Kyle Meadows, HAPI FHIR
Hi Kyle,

There shouldn't be any need to switch from MySQL to Postgres- I have worked with both fairly exensively and I've found that both work well. Postgres does tend to perform slightly better, I believe as a result of the fact that MySQL has no native support for database sequences for ID generation so this needs to be done with an extra call to a sequence table because of the way hibernate works, but this isn't a huge hit so it's probably not an issue for you.

In terms of the memory issues, if you are able to grab a heap dump that would definitely be helpful. I haven't seen memory issues on startup before, that is definitely something new.

Cheers,
James

To unsubscribe from this group and stop receiving emails from it, send an email to hapi-fhir+unsubscribe@googlegroups.com.

To post to this group, send email to hapi...@googlegroups.com.

dark...@gmail.com

unread,
Nov 28, 2017, 1:50:16 PM11/28/17
to HAPI FHIR
I updated the server config to use the following (expiring search results after 5 minutes):

daoConfig.setExpireSearchResults(true);
daoConfig.setExpireSearchResultsAfterMillis(300000L);

And then watched as the HFJ_SEARCH table steadily shrunk from over 300k records down to less than 50. Now when I hit a GC overhead limit error I am able to restart the server without issue (before I had manually truncate those search tables before restarting).

I'll still work on grabbing a heap dump to see if I can debug my heap memory issues- my guess is that the heap just needs to be bigger. Currently I'm only able to set the max heap to 1024m. Do you have any guidance for JVM tuning assuming certain use cases?

Thanks again,
Kyle
Reply all
Reply to author
Forward
0 new messages