Here's a long post containing a few learnings from further exploration of the MySQL to PostgreSQL migration topic. Would welcome comments, feedback, or further discussion.
The PostgreSQL wiki has an article about converting from other databases.
-
https://wiki.postgresql.org/wiki/Converting_from_other_Databases_to_PostgreSQLThe open source pgloader tool looked promising, but I encountered problems when attempting to use it. Not sure if pgloader is still actively maintained. With Postgres 15 it reported a connection error, but it was able to connect to Postgres 12. With MySQL it reported a connection error unless the MySQL server used the mysql_native_password authentication plugin. I gave up after that, when it reported the error "ERROR mysql: 76 fell through ECASE expression" after connecting to a MySQL 8 database.
My efforts with the open source NMIG tool were more successful. This tool was able to migrate a MySQL 8 hapi-fhir database to Postgres 15.
-
https://github.com/AnatolyUss/NMIGHowever, there are noteworthy differences between the Postgres schema created by NMIG and the schema created by hapi-fhir-cli (version 6.6.2). Two of the notable differences are:
- For BLOB and CLOB data fields, the Postgres schema created by hapi-fhir-cli uses the oid data type, whereas the NMIG schema uses the bytea and text data types. NMIG converted MySQL longblob fields to Postgres bytea and MySQL longtext to Postgres text.
- For sequence number generators, the Postgres schema created by hapi-fhir-cli uses CREATE SEQUENCE, but the NMIG schema only contains Postgres versions of the MySQL seq_* tables.
Looking further into the bytea and text data types (for very large field values), Postgres uses "The Oversized-Attribute Storage Technique" (TOAST) for these types.
-
https://www.postgresql.org/docs/current/storage-toast.html-
https://wiki.postgresql.org/wiki/TOASTTOAST offers some advantages over the older method of using the oid data type with Postgres large objects.
- TOAST only generates an OID for large values moved "out of line" (e.g. larger than 2 Kb).
- TOAST can store 2^32 (4 billion) out-of-line values per table, but regular OIDs are limited to 2^32 per database.
The limit of 2^32 OIDs per database has been identified as a potential issue when using Smile CDR (or hapi-fhir) with a Postgres database.
-
https://smilecdr.com/docs/database_administration/troubleshooting_postgresql.html#managing-fhir-resource-body-storage-
https://support.smilecdr.com/space/SK/2703556748/Postgres+Database+OID+Issue+-+Potential+Server+Unresponsive+Problem+With+Very+Large+DatabasesConsidering the advantages offered by the bytea and text data types, I decided to explore the possibility of using a custom Hibernate Dialog to override the default PostgreSQL10Dialog behavior of using the oid data type (and large objects) for BLOB and CLOB data. With the help of information from the following Stack Overflow article, I created a (Hibernate 5, hapi-fhir 6.6.2) PostgreSQL10ToastDialog that maps Types.BLOB to "bytea" and Types.CLOB to "text".
-
https://stackoverflow.com/questions/75042081/hibernate-6-postgres-and-byteaThis approach looks promising. A test hapi-fhir server using the PostgreSQL10ToastDialog was able to start up and pass a variety of smoke tests.
The test hapi-fhir (v6.6.2) server was based on hapi-fhir-jpaserver-starter, using a PostgreSQL database schema initialized by hapi-fhir-cli but modified to use the bytea data type for the @Lob byte[] and @Lob Blob fields (blob_data, job_contents, res_text, search_param_map, prop_val_lob) and to use the text data type for the @Lob String fields (params_json_lob, report, chunk_data, search_query_string, parent_pids, source_direct_parent_pids).
Going forward, I intend to proceed with a Postgres database that uses the bytea and text data types, instead of the (Hibernate default) oid data type. Not sure if this is a viable future option for hapi-fhir in general, but if there's interest I'd be happy to submit a pull request containing my PostgreSQL10ToastDialog code and unit test.