HAPI FHIR Database Size increasing 7x compared to input data size

Touseef Dev

unread,

May 15, 2024, 4:25:14 AM5/15/24

to HAPI FHIR

Hi all

I am trying to evaluate how much storage space is occupied inside HAPI FHIR Database by input NDJSON data of certain size.

From my testing on a HAPI FHIR 6.10.3 with Postgres DB, I posted 1000 patient resources (having disk size of 2.3 MB) and the storage size difference of the database before and after posting these resources came out to be 17MB (7x increase). I tried it two more times with same data and got similar behavior.

I am calculating the db-size from following command on postgres;

SELECT pg_size_pretty(pg_database_size('hapi'));

The data spread of tables inside the database is given below;

Can anybody provide some guidance in this regard? I shall be thankful.

Best regards,

Touseef

James Agnew

unread,

May 15, 2024, 4:36:46 AM5/15/24

to HAPI FHIR

Hi Touseef,

The main driver of this is the huge number of Search Parameters that FHIR enables by default. The FHIR specification includes a pretty massive number of SPs by default, all of which require space to index if they are going to be used for indexing. I've found that this is particularly acute on the Patient and Observation resource types, so if your dataset is composed of mostly one or both of these, you'll see a particularly bad sizing unless you disable SPs.

In HAPI FHIR, the default SPs are automatically supported if they aren't explicitly uploaded to the repository, so you can do this by either:

- Uploading any default search parameters that you want to disable, with a status of "retired", or

- Setting this setting to false, and only uploading the SPs you actually want to support

Cheers,

James

Touseef Dev

unread,

May 21, 2024, 7:47:28 AM5/21/24

to HAPI FHIR

Thank you, James, for the prompt response.

I will try disabling the unnecessary Search Parameters and will see how it plays.