Hi,
BACKGOUND:
Our organization has been using the 5.6.3 HAPI-FHIR JPA server for storage / analysis of patient generated health data (PGHD). This PGHD comes from a variety of source .. Fitbits, Garmin, Dexcom, and other wearable devices. It is streamed into the HAPI-FHIR server throughout the day. Independent and/or sample series measurements are stored as FHIR R4 observations that reference back to the patient they are associated with. This productized operation has been happening for almost 2 years to-date Today there are 30M+ observations spread across 30K+ patients in our HAPI-FHIR implementation. Not all patients have the same count of observations .. it varies widely.. some patients have handful of observations and some have 20K+ attached.
THE PROBLEM:
It has recently come to our attention that one of our applications that injects measurements has been recording them with the wrong measurement collection date-time. The application has fixed the flaw and would like to replay what measurement data it can from archived sources. In preparation for this replay… we are trying to completely remove the bad observations from our HAPI-FHIR database. The staff and researchers using the HAPI-FHIR data feel it best that ALL the known to be bad measurements be completely scrubbed from the system. We have a complete list of observations by resource id to be removed. The list is 10M observations in length. You read this correct... this is 1/3 of the present observations that we are trying to remove from an active production HAPI-FHIR service. We have a script that calls the HAPI-FHIR API for each individual bad observation resource to soft delete and then hard delete it (i.e. expunge). The expunge action is done exactly as documented in 5.15.1 and 5.15.5… see instance-level expunge. The complexities of the HAPI-FHIR JPA scheme appear that it would be better to do this thru HAPI-FHIR mechanics .. than for my team to hack away at the underpinning resource, version, and spidx database tables… and .. hope for the best.
IT IS VERY SLOW:
We began the delete operation last Friday. Over 2+ days .. the deletion / expunge rate of execution was roughly 65K observations per day. Deletion and expunge is occurring… however, it is unbearably slow. We have a 4 node HAPI-FHIR cluster backed by a shared Oracle database (i.e. not Oracle RAC). The FHIR and Oracle nodes are not under stress … we see no signs that any system or database resources are being exhausted. We know that the delete / expunge API calls are the cause of the slowness. We have a dry-run mechanism built into our script that goes thru every step .. except for the delete / expunge API calls. Dry-run mode executes quickly.
TODAY we stopped the job this morning and altered it to push 5X more delete / expunge operations in parallel at the HAPI-FHIR service. This has been running for 6+ hours.. so far this change does not look to have improved anything. I will post more tomorrow.
If anyone has any thoughts / suggestions.. that would be much appreciated… even if the comments are confirming that it is expected to be slow.
An alternate way to accomplish the above may be to extract / copy out the good observation data and insert it into a new HAPI-FHIR R4 service / database set-up… and .. then switch the new copy into production operation once the copy out action is completed. We have migrated 20M+ resources before in this way to upgrade existing FHIR resource data from DSTU2 to R4 format. This took roughly 3.5 days. In comparison… the current in-place delete / expunge rate of 65K per day for 10M resource .. is projected to be complete in .. wait for it .. 10M / 65 / day = 153+ days .. :^(.
It appears to me that the operation of HAPI-FHIR delete and expunge is burdened with far more things to do than insertion is.
Regards
Chuck
--

--
You received this message because you are subscribed to the Google Groups "HAPI FHIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hapi-fhir+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hapi-fhir/AD8284D4-2251-4A18-AE3D-DD08D13952C1%40aboutobjects.com.