hapi-fhir-jpaserver BLOBs preventing Postgresql database upgrade

292 views
Skip to first unread message

Caleb Steele-Lane

unread,
Jul 24, 2023, 11:40:48 PM7/24/23
to HAPI FHIR
Hi all,

We have had great success using the base hapi-fhir-jpaserver, but are now looking at upgrading our DB from Postgres 9.5 -> 14. Due to the resources being stored as BLOBs, it is making it challenging to run pg_upgrade as postgres doesn't handle them very efficiently (more detailed explanation of our problem).

Have people encountered this issue and come up with a solution to getting around it? Hoping to find a solution that is faster than just querying all the resources through the FHIR endpoints and moving them to a secondary server running on the higher DB version.

Cheers,
Caleb

rrha...@gmail.com

unread,
Jul 24, 2023, 11:48:07 PM7/24/23
to HAPI FHIR
I just ran pg_upgrade and converted my Postgresql fhir db from 12 -> 14 yesterday, and it worked fine (though it took quite a while to run).  But upgrading from 9.5 -> 14 is quite a bit more of a stretch, and I don't know what issues (and potential inefficiencies) there may be with that.

Rob

rrha...@gmail.com

unread,
Jul 25, 2023, 12:00:05 AM7/25/23
to HAPI FHIR
Actually, I just double-checked and it was pg_upgradecluster that I used to do the upgrade from 12 -> 14 (on Ubuntu 22.04).  I'm not sure if that would would make a significant difference, but possibly it could it could be worth trying (likely depending on the OS).

Rob

Caleb Steele-Lane

unread,
Jul 25, 2023, 12:18:16 AM7/25/23
to HAPI FHIR
Hi Rob,

We've similarly succeeded with the upgrade for some of our smaller DBs (took a few hours which was acceptable), but our larger database is taking far too long. Do you know how many resources are in your DB? It would be a big help to have more context to know if our server is simply too weak to handle the upgrade efficiently. We're exploring ways of improving the process by running it on a more powerful server.

Looking ahead we still would like to have a strategy that could handle this DB efficiently as it grows. Some of the research we've done seems to say that pg_upgrade fails if there are too many lobs, which is a concern.

I think that pg_upgradecluster is a wrapper function for pg_upgrade and pg_dump/pg_restore depending on how it's configured, but I haven't tried it to confirm that.

Cheers,
Caleb

rrha...@gmail.com

unread,
Jul 25, 2023, 12:40:49 AM7/25/23
to HAPI FHIR
It looks like my HAPI jpaserver instance has a total of 155,731 Observations and a bit over 300,000 total resources (I don't have the exact count).  The data is mostly synthetic and other example data instances (no PHI, of course).  

Rob

James Agnew

unread,
Jul 25, 2023, 8:35:15 AM7/25/23
to rrha...@gmail.com, HAPI FHIR
FWIW: As of HAPI FHIR 6.6.0 we no longer use those largeobject blobs for resources smaller than 4000 characters, and you can safely increase that offset too. One thing you could do is to make sure you're up to date (both HAPI FHIR and the starter project), run a resource reindex on the server with the optimizeStorage flag set to true (which will migrate data into a text column and out of the largeobjects), run a vacuumlo on the database to clean up the old LOBs, then then proceed with upgrading Postgres.

Cheers,
James


--
You received this message because you are subscribed to the Google Groups "HAPI FHIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hapi-fhir+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hapi-fhir/a56e7685-4354-424f-90b2-82ffd7e5a8een%40googlegroups.com.

Caleb Steele-Lane

unread,
Jul 26, 2023, 1:51:48 AM7/26/23
to HAPI FHIR
Thanks for the additional information Rob; an approximate number is all I need. Our DB has around 17 million resources in it (20 million counting old versions of resources).

That's awesome to hear James! If my SQL is correct then that should cover 95% of our objects. I'm trying to run that utility, but it doesn't seem to be making a change. I'm assuming that it is supposed to make resources appear in the `hfj_res_ver` table in the `res_text_vc` column instead of an oid in `res_text` column?

The steps I took have been:
  • change docker image to `hapiproject/hapi:v6.6.0-tomcat`
  • add `inline_resource_storage_below_size: 4000` to the application.yaml under `hapi: fhir:`
  • make a `POST` request to `https://server_address/fhir/$reindex?optimizeStorage=true&everything=true`
    • ```OptimizeStorage[NONE] - ReindexSearchParameters[ALL]```
  • make a `POST` request to `https://server_address/fhir/$reindex?optimizeStorage=all&everything=true`
    • ```OptimizeStorage[NONE] - ReindexSearchParameters[ALL]```
I see this appear in the logs which I assume means it's not doing the optimizeStorage correctly? Is there any other steps that need to be done or am I not making the request properly?

Cheers,
Caleb

Kevin Dougan

unread,
Jul 26, 2023, 8:16:20 AM7/26/23
to HAPI FHIR
Hi Caleb,

Please try putting the optimizeStorage Parameter inside the BODY instead of in the URL:

POST /$reindex Content-Type: application/fhir+json
{
"resourceType": "Parameters",
"parameter": [ {
"name": "url",
"valueString": "Patient?"
}, {
"name": "optimizeStorage",
"valueString": "ALL_VERSIONS"
} ]
}

It looks like our documentation needs to be updated to provide the acceptable values for this Parameter: 
  • NONE
  • CURRENT_VERSION
  • ALL_VERSIONS

Please let us know if this works for you!
I will fix the documentation today.

Thanks!
Kevin

Caleb Steele-Lane

unread,
Jul 27, 2023, 8:14:18 PM7/27/23
to HAPI FHIR
Hi Kevin,

Thanks! This will help us out a lot. I was able to successfully start the optimize, although it looks like the action never fully completes, instead saying:
`Job 7c460f26-2900-4d7a-a443-ae43d09d60d1 of type REINDEX has status IN_PROGRESS - 1235906 records processed (317.0409111361138/sec) - ETA: 1575ms` regularly for hours on end. I see most of the objects have been migrated into the `res_text_vc` anyways, but is there something that would stop it from finishing?

Caleb

Kevin Dougan

unread,
Jul 28, 2023, 7:58:25 AM7/28/23
to HAPI FHIR
Hi Caleb,

What criteria are you using for your $reindex job?
If you are reindexing an entire Resource Type (e.g. Patient? ) then my suggestion would be to try and subset of the data (e.g. Patient?family=A ) and build smaller reindex jobs and see if those complete for you without a problem.
Also, check your server logs (smile.log) for any ERROR messages or Stack Traces.

Kevin
Reply all
Reply to author
Forward
0 new messages