Migrating data from MySQL to PostgreSQL

340 views
Skip to first unread message

Joel Schneider

unread,
Aug 30, 2023, 11:05:05 PM8/30/23
to HAPI FHIR
Considering that MySQL is no longer a recommended database platform for HAPI FHIR, I'm interested in advice or recommendations on how to migrate our (HAPI FHIR) MySQL data into a (HAPI FHIR) PostgreSQL database that could be used going forward.

Some ideas/alternatives are ...

1. Generate an empty PostgreSQL hapi-fhir schema, then develop ETL process(es) to extract data from MySQL and load it into PostgreSQL.

2. Generate a mysqldump file. then develop a process to convert it into a format that can be bulk loaded into PostgreSQL.

3. Evaluate one or more of the database conversion tools listed on the PostgreSQL wiki (https://wiki.postgresql.org/wiki/Converting_from_other_Databases_to_PostgreSQL).

Does anyone here have experience / advice / recommendations to offer about how to best perform this migration?

Joel Schneider

unread,
Oct 31, 2023, 8:20:38 PM10/31/23
to HAPI FHIR
Here's a long post containing a few learnings from further exploration of the MySQL to PostgreSQL migration topic.  Would welcome comments, feedback, or further discussion.

The PostgreSQL wiki has an article about converting from other databases.

- https://wiki.postgresql.org/wiki/Converting_from_other_Databases_to_PostgreSQL

The open source pgloader tool looked promising, but I encountered problems when attempting to use it.  Not sure if pgloader is still actively maintained.  With Postgres 15 it reported a connection error, but it was able to connect to Postgres 12.  With MySQL it reported a connection error unless the MySQL server used the mysql_native_password authentication plugin.  I gave up after that, when it reported the error "ERROR mysql: 76 fell through ECASE expression" after connecting to a MySQL 8 database.

My efforts with the open source NMIG tool were more successful.  This tool was able to migrate a MySQL 8 hapi-fhir database to Postgres 15.

- https://github.com/AnatolyUss/NMIG

However, there are noteworthy differences between the Postgres schema created by NMIG and the schema created by hapi-fhir-cli (version 6.6.2).  Two of the notable differences are:

- For BLOB and CLOB data fields, the Postgres schema created by hapi-fhir-cli uses the oid data type, whereas the NMIG schema uses the bytea and text data types.  NMIG converted MySQL longblob fields to Postgres bytea and MySQL longtext to Postgres text.
- For sequence number generators, the Postgres schema created by hapi-fhir-cli uses CREATE SEQUENCE, but the NMIG schema only contains Postgres versions of the MySQL seq_* tables.

Looking further into the bytea and text data types (for very large field values), Postgres uses "The Oversized-Attribute Storage Technique" (TOAST) for these types.

- https://www.postgresql.org/docs/current/storage-toast.html
- https://wiki.postgresql.org/wiki/TOAST

TOAST offers some advantages over the older method of using the oid data type with Postgres large objects.

- TOAST only generates an OID for large values moved "out of line" (e.g. larger than 2 Kb).
- TOAST can store 2^32 (4 billion) out-of-line values per table, but regular OIDs are limited to 2^32 per database.

The limit of 2^32 OIDs per database has been identified as a potential issue when using Smile CDR (or hapi-fhir) with a Postgres database.

- https://smilecdr.com/docs/database_administration/troubleshooting_postgresql.html#managing-fhir-resource-body-storage
- https://support.smilecdr.com/space/SK/2703556748/Postgres+Database+OID+Issue+-+Potential+Server+Unresponsive+Problem+With+Very+Large+Databases

Considering the advantages offered by the bytea and text data types, I decided to explore the possibility of using a custom Hibernate Dialog to override the default PostgreSQL10Dialog behavior of using the oid data type (and large objects) for BLOB and CLOB data.  With the help of information from the following Stack Overflow article, I created a (Hibernate 5, hapi-fhir 6.6.2) PostgreSQL10ToastDialog that maps Types.BLOB to "bytea" and Types.CLOB to "text".

- https://stackoverflow.com/questions/75042081/hibernate-6-postgres-and-bytea

This approach looks promising.  A test hapi-fhir server using the PostgreSQL10ToastDialog was able to start up and pass a variety of smoke tests.

The test hapi-fhir (v6.6.2) server was based on hapi-fhir-jpaserver-starter, using a PostgreSQL database schema initialized by hapi-fhir-cli but modified to use the bytea data type for the @Lob byte[] and @Lob Blob fields (blob_data, job_contents, res_text, search_param_map, prop_val_lob) and to use the text data type for the @Lob String fields (params_json_lob, report, chunk_data, search_query_string, parent_pids, source_direct_parent_pids).

Going forward, I intend to proceed with a Postgres database that uses the bytea and text data types, instead of the (Hibernate default) oid data type.  Not sure if this is a viable future option for hapi-fhir in general, but if there's interest I'd be happy to submit a pull request containing my PostgreSQL10ToastDialog code and unit test.

Joel Schneider

unread,
Oct 31, 2023, 8:28:56 PM10/31/23
to HAPI FHIR
One more detail:  It may be best to also set hibernate.jdbc.use_streams_for_binary to true when using bytea.

Joel Schneider

unread,
Jun 28, 2024, 10:32:12 AM (8 days ago) Jun 28
to HAPI FHIR
Attached is some code related to the PostgreSQL migration:  a SQL script to create a HAPI-FHIR 6.10.2 PostgreSQL schema using TOAST data types, and Java code for a Hibernate 5 PostgreSQL10ToastDialect.
hapi-fhir-6.10.2-cli-baseline-postgres-toast.sql
PostgreSQL10ToastDialect.java
Reply all
Reply to author
Forward
0 new messages