HAPI FHIR Database Size increasing 7x compared to input data size

132 views
Skip to first unread message

Touseef Dev

unread,
May 15, 2024, 4:25:14 AM5/15/24
to HAPI FHIR
Hi all


I am trying to evaluate how much storage space is occupied inside HAPI FHIR Database by input NDJSON data of certain size.
From my testing on a HAPI FHIR 6.10.3 with Postgres DB, I posted 1000 patient resources (having disk size of 2.3 MB) and the storage size difference of the database before and after posting these resources came out to be 17MB (7x increase). I tried it two more times with same data and got similar behavior.

I am calculating the db-size from following command on postgres;

SELECT pg_size_pretty(pg_database_size('hapi'));


The data spread of tables inside the database is given below;
Table            |     Data Size (MB)    
-----------------------------+------------------------
hfj_spidx_string            |     5.2656250000000000
hfj_res_ver                 |     3.2421875000000000
hfj_spidx_token             |     2.9062500000000000
hfj_res_ver_prov            |     1.3984375000000000
hfj_spidx_date              |     1.1796875000000000
hfj_res_link                |     1.1796875000000000
hfj_resource                | 0.79687500000000000000
hfj_res_tag                 | 0.39062500000000000000
hfj_history_tag             | 0.37500000000000000000
hfj_search                  | 0.00781250000000000000
hfj_resource_modified       | 0.00781250000000000000
hfj_subscription_stats      | 0.00781250000000000000
hfj_search_result           | 0.00781250000000000000
hfj_revinfo                 | 0.00781250000000000000
hfj_spidx_uri               | 0.00781250000000000000
mpi_link_aud                | 0.00781250000000000000
mpi_link                    | 0.00781250000000000000
hfj_tag_def                 | 0.00781250000000000000
hfj_forced_id               | 0.00781250000000000000
hfj_blk_import_jobfile      | 0.00000000000000000000
hfj_spidx_coords            | 0.00000000000000000000
hfj_search_include          | 0.00000000000000000000
hfj_idx_cmp_string_uniq     | 0.00000000000000000000
hfj_blk_export_job          | 0.00000000000000000000
trm_valueset_concept        | 0.00000000000000000000
trm_concept_map_grp_element | 0.00000000000000000000


Can anybody provide some guidance in this regard? I shall be thankful.


Best regards,
Touseef

James Agnew

unread,
May 15, 2024, 4:36:46 AM5/15/24
to HAPI FHIR
Hi Touseef,

The main driver of this is the huge number of Search Parameters that FHIR enables by default. The FHIR specification includes a pretty massive number of SPs by default, all of which require space to index if they are going to be used for indexing. I've found that this is particularly acute on the Patient and Observation resource types, so if your dataset is composed of mostly one or both of these, you'll see a particularly bad sizing unless you disable SPs.

In HAPI FHIR, the default SPs are automatically supported if they aren't explicitly uploaded to the repository, so you can do this by either:

- Uploading any default search parameters that you want to disable, with a status of "retired", or
- Setting this setting to false, and only uploading the SPs you actually want to support

Cheers,
James

Touseef Dev

unread,
May 21, 2024, 7:47:28 AM5/21/24
to HAPI FHIR
Thank you, James, for the prompt response.

I will try disabling the unnecessary Search Parameters and will see how it plays. 

Best,
Touseef

Reply all
Reply to author
Forward
0 new messages