Migrating data from MySQL to PostgreSQL

639 views
Skip to first unread message

Joel Schneider

unread,
Aug 30, 2023, 11:05:05 PM8/30/23
to HAPI FHIR
Considering that MySQL is no longer a recommended database platform for HAPI FHIR, I'm interested in advice or recommendations on how to migrate our (HAPI FHIR) MySQL data into a (HAPI FHIR) PostgreSQL database that could be used going forward.

Some ideas/alternatives are ...

1. Generate an empty PostgreSQL hapi-fhir schema, then develop ETL process(es) to extract data from MySQL and load it into PostgreSQL.

2. Generate a mysqldump file. then develop a process to convert it into a format that can be bulk loaded into PostgreSQL.

3. Evaluate one or more of the database conversion tools listed on the PostgreSQL wiki (https://wiki.postgresql.org/wiki/Converting_from_other_Databases_to_PostgreSQL).

Does anyone here have experience / advice / recommendations to offer about how to best perform this migration?

Joel Schneider

unread,
Oct 31, 2023, 8:20:38 PM10/31/23
to HAPI FHIR
Here's a long post containing a few learnings from further exploration of the MySQL to PostgreSQL migration topic.  Would welcome comments, feedback, or further discussion.

The PostgreSQL wiki has an article about converting from other databases.

- https://wiki.postgresql.org/wiki/Converting_from_other_Databases_to_PostgreSQL

The open source pgloader tool looked promising, but I encountered problems when attempting to use it.  Not sure if pgloader is still actively maintained.  With Postgres 15 it reported a connection error, but it was able to connect to Postgres 12.  With MySQL it reported a connection error unless the MySQL server used the mysql_native_password authentication plugin.  I gave up after that, when it reported the error "ERROR mysql: 76 fell through ECASE expression" after connecting to a MySQL 8 database.

My efforts with the open source NMIG tool were more successful.  This tool was able to migrate a MySQL 8 hapi-fhir database to Postgres 15.

- https://github.com/AnatolyUss/NMIG

However, there are noteworthy differences between the Postgres schema created by NMIG and the schema created by hapi-fhir-cli (version 6.6.2).  Two of the notable differences are:

- For BLOB and CLOB data fields, the Postgres schema created by hapi-fhir-cli uses the oid data type, whereas the NMIG schema uses the bytea and text data types.  NMIG converted MySQL longblob fields to Postgres bytea and MySQL longtext to Postgres text.
- For sequence number generators, the Postgres schema created by hapi-fhir-cli uses CREATE SEQUENCE, but the NMIG schema only contains Postgres versions of the MySQL seq_* tables.

Looking further into the bytea and text data types (for very large field values), Postgres uses "The Oversized-Attribute Storage Technique" (TOAST) for these types.

- https://www.postgresql.org/docs/current/storage-toast.html
- https://wiki.postgresql.org/wiki/TOAST

TOAST offers some advantages over the older method of using the oid data type with Postgres large objects.

- TOAST only generates an OID for large values moved "out of line" (e.g. larger than 2 Kb).
- TOAST can store 2^32 (4 billion) out-of-line values per table, but regular OIDs are limited to 2^32 per database.

The limit of 2^32 OIDs per database has been identified as a potential issue when using Smile CDR (or hapi-fhir) with a Postgres database.

- https://smilecdr.com/docs/database_administration/troubleshooting_postgresql.html#managing-fhir-resource-body-storage
- https://support.smilecdr.com/space/SK/2703556748/Postgres+Database+OID+Issue+-+Potential+Server+Unresponsive+Problem+With+Very+Large+Databases

Considering the advantages offered by the bytea and text data types, I decided to explore the possibility of using a custom Hibernate Dialog to override the default PostgreSQL10Dialog behavior of using the oid data type (and large objects) for BLOB and CLOB data.  With the help of information from the following Stack Overflow article, I created a (Hibernate 5, hapi-fhir 6.6.2) PostgreSQL10ToastDialog that maps Types.BLOB to "bytea" and Types.CLOB to "text".

- https://stackoverflow.com/questions/75042081/hibernate-6-postgres-and-bytea

This approach looks promising.  A test hapi-fhir server using the PostgreSQL10ToastDialog was able to start up and pass a variety of smoke tests.

The test hapi-fhir (v6.6.2) server was based on hapi-fhir-jpaserver-starter, using a PostgreSQL database schema initialized by hapi-fhir-cli but modified to use the bytea data type for the @Lob byte[] and @Lob Blob fields (blob_data, job_contents, res_text, search_param_map, prop_val_lob) and to use the text data type for the @Lob String fields (params_json_lob, report, chunk_data, search_query_string, parent_pids, source_direct_parent_pids).

Going forward, I intend to proceed with a Postgres database that uses the bytea and text data types, instead of the (Hibernate default) oid data type.  Not sure if this is a viable future option for hapi-fhir in general, but if there's interest I'd be happy to submit a pull request containing my PostgreSQL10ToastDialog code and unit test.

Joel Schneider

unread,
Oct 31, 2023, 8:28:56 PM10/31/23
to HAPI FHIR
One more detail:  It may be best to also set hibernate.jdbc.use_streams_for_binary to true when using bytea.

Joel Schneider

unread,
Jun 28, 2024, 10:32:12 AM6/28/24
to HAPI FHIR
Attached is some code related to the PostgreSQL migration:  a SQL script to create a HAPI-FHIR 6.10.2 PostgreSQL schema using TOAST data types, and Java code for a Hibernate 5 PostgreSQL10ToastDialect.
hapi-fhir-6.10.2-cli-baseline-postgres-toast.sql
PostgreSQL10ToastDialect.java

Chit Zin Win

unread,
Nov 4, 2024, 4:19:27 PM11/4/24
to HAPI FHIR
hi Joel,

thank you for sharing this. I am on the same boat with migrating hapi 5.4 on mysql57 to aurora postgres 15.7.

I use pgloader with the attached config. mysql schema to postgres daatabase migration went okay. The I executed the following PL/pgSQL to convert seq_ tables to pg Sequence object:

DO $$
DECLARE
    seq_table RECORD;
    new_sequence_name TEXT;
BEGIN
    FOR seq_table IN
        SELECT table_name
        FROM information_schema.tables
        WHERE table_schema = 'public' AND table_name LIKE 'seq_%'
    LOOP
        -- Define the new sequence name
        new_sequence_name := seq_table.table_name;

        -- Create a new sequence using the max value from the seq_ table
        EXECUTE format('
            CREATE SEQUENCE public.%I
            START WITH 1
            INCREMENT BY 1
            MINVALUE 1
            OWNED BY NONE;', new_sequence_name);

        -- Set the sequence to start from the max value in the seq_ table
        EXECUTE format('
            SELECT setval(''%I'', (SELECT COALESCE(MAX(next_val), 1) FROM public.%I));',
            new_sequence_name, seq_table.table_name);

        -- Drop the original seq_ table
        EXECUTE format('DROP TABLE public.%I;', seq_table.table_name);
    END LOOP;
END $$;



when I hit any resource with GET method:  localhost:8080/fhir/<ResourceType>. I am getting the following OperationOutcome. It appears to do with data type incompatibility between mysql and postgres. Is that something I can tackle with your PostgreSQL10ToastDialect.java ?


{
  "resourceType": "OperationOutcome",
  "text": {
    "status": "generated",
    "div": "<div xmlns=\"http://www.w3.org/1999/xhtml\"><h1>Operation Outcome</h1><table border=\"0\"><tr><td style=\"font-weight: bold;\">ERROR</td><td>[]</td><td><pre>ERROR: column &quot;search_param_map&quot; is of type bytea but expression is of type oid\n  Hint: You will need to rewrite or cast the expression.\n  Position: 407</pre></td>\n\t\t\t\t\t\n\t\t\t\t\n\t\t\t</tr>\n\t\t</table>\n\t</div>"
  },
  "issue": [
    {
      "severity": "error",
      "code": "processing",
      "diagnostics": "ERROR: column \"search_param_map\" is of type bytea but expression is of type oid\n  Hint: You will need to rewrite or cast the expression.\n  Position: 407"
    }
  ]
}

Thanks,
Chit

Joel Schneider

unread,
Nov 5, 2024, 12:10:04 AM11/5/24
to HAPI FHIR
Hi Chit,

The hapi-fhir-6.10.2-cli-baseline-postgres-toast.sql PostgreSQL schema I posted above is slightly different from the schema generated by hapi-fhir-cli.  It changes the data types of some columns from oid to bytea or text (TOAST data types).  Migration tools such as pgloader and NMIG seem to prefer using the TOAST data types.  However, the default Hibernate PostgreSQL dialect used by hapi-fhir expects those columns to use the oid data type.  To have Hibernate use TOAST data types for those columns, you could modify hapi-fhir to use the above PostgreSQL10ToastDialect (for Hibernate 5).  For newer versions of hapi-fhir, see the above linked Stack Overflow article for info about how to create a similar Hibernate dialect for Hibernate 6.

Joel

Chit Zin Win

unread,
Nov 5, 2024, 2:52:08 PM11/5/24
to HAPI FHIR
Hi Joel,

Thank again for quick response. 

I looked carefully and found as you mentioned eg:  hfj_search.search_param_map field populated by hapi fhir is oid while migrated via pgloader is bytea. I incorporated your PostgreSQL10ToastDialect  into my hapi jpaconfig. Btw, my hiberbate version is 5.4.14.Final .

Right away my api works as expected for first call and failed in the following for any resource with below error:


{
  "resourceType": "OperationOutcome",
  "text": {
    "status": "generated",
    "div": "<div xmlns=\"http://www.w3.org/1999/xhtml\"><h1>Operation Outcome</h1><table border=\"0\"><tr><td style=\"font-weight: bold;\">ERROR</td><td>[]</td><td><pre>ERROR: duplicate key value violates unique constraint &quot;idx_77121_primary&quot;\n  Detail: Key (pid)=(-41) already exists.</pre></td>\n\t\t\t\t\t\n\t\t\t\t\n\t\t\t</tr>\n\t\t</table>\n\t</div>"

  },
  "issue": [
    {
      "severity": "error",
      "code": "processing",
      "diagnostics": "ERROR: duplicate key value violates unique constraint \"idx_77121_primary\"\n  Detail: Key (pid)=(-41) already exists."
    }
  ]
}

I did preserve index names during migration, but it appears the indexes are gone. So I reindex with legacy reindex operation, $mark-all-resources-for-reindexing which is the only method available in 5.2 and $reindex is added in hapi5.5. Now I ended up this following error to the same request after mark-all-resources-for-reindexing is done:
I confirmed mark-all-resources-for-reindexing status by checking hfj_res_reindex_job.reindex_count.

{
  "resourceType": "OperationOutcome",
  "text": {
    "status": "generated",
    "div": "<div xmlns=\"http://www.w3.org/1999/xhtml\"><h1>Operation Outcome</h1><table border=\"0\"><tr><td style=\"font-weight: bold;\">ERROR</td><td>[]</td><td><pre>A different object with the same identifier value was already associated with the session : [ca.uhn.fhir.jpa.entity.SearchResult#-30]</pre></td>\n\t\t\t\t\t\n\t\t\t\t\n\t\t\t</tr>\n\t\t</table>\n\t</div>"

  },
  "issue": [
    {
      "severity": "error",
      "code": "processing",
      "diagnostics": "A different object with the same identifier value was already associated with the session : [ca.uhn.fhir.jpa.entity.SearchResult#-30]"
    }
  ]
}

Chit

Joel Schneider

unread,
Nov 5, 2024, 4:15:34 PM11/5/24
to HAPI FHIR
To configure the PostgreSQL sequences after migrating the MySQL data, I'd suggest running the attached script under MySQL and then running the script's output under PostgreSQL.
my-migrate-sequences.sql

Chit Zin Win

unread,
Nov 6, 2024, 4:00:20 PM11/6/24
to HAPI FHIR
Hi Joel, 

I got your script run against MySQL and populated the sequence(pg object)  with the next_val number in seq_* table from mysql in my pg datasource.

Still, none of the resource endpoint respond consistently. Kept getting these errors with different numbers at (pid)=(<>) and SearchResult#<>:

  • ERROR: duplicate key value violates unique constraint \"idx_79006_primary\"\n  Detail: Key (pid)=(-15) already exists."
  • "A different object with the same identifier value was already associated with the session : [ca.uhn.fhir.jpa.entity.SearchResult#-4]"  

Not sure I should update my hapi version to 5.5 in mysql side and migrate to use $reindex on postgres. I even trired with hapi-cli 5.5 and ended up getting this:

Caused by: org.springframework.jdbc.BadSqlGrammarException: PreparedStatementCallback; bad SQL grammar [create sequence SEQ_VALUESET_PID start 1 increment 50]; nested exception is org.postgresql.util.PSQLException: ERROR: relation "seq_valueset_pid" already exists

Thanks,
Chit

Joel Schneider

unread,
Nov 6, 2024, 11:23:26 PM11/6/24
to HAPI FHIR
Hi Chit,

I can't comment much on the specific details of your setup, but can offer a few thoughts or ideas that may be helpful ...
  • Use a migration tool such as pgloader or NMIG to migrate MySQL data into a new, empty PostgreSQL database.
  • Use hapi-fhir-cli to initialize a hapi-fhir PostgreSQL schema in a different, new and empty PostgreSQL database.  Newer versions of hapi-fhir-cli may work better for this, because the hapi-fhir team eventually reworked the migrate-database logic to use custom code instead of flyway.  E.g.:
    ./hapi-fhir-cli migrate-database --strict-order -d POSTGRES_9_4 -n myuser -p SECRET -u "jdbc:postgresql://localhost:5432/fhir"
  • Verify the PostgreSQL database created by hapi-fhir-cli is usable with hapi-fhir server.
  • Evaluate discrepancies between the PostgreSQL schema created by the migration tool (e.g., pgloader or NMIG) and the PostgreSQL schema created by hapi-fhir-cli.
    • Consider revising the PostgreSQL schema created by hapi-fhir-cli to use bytea and text (TOAST) data types instead of the oid data type (and then use custom "TOAST" Hibernate dialect with hapi-fhir).
  • Use pg_dump to create a "custom" archive of the PostgreSQL database created by the migration tool.  E.g.:
    pg_dump --username=pguser --host $PG_HOSTNAME --format custom --file fhir-migrated-from-mysql.pgdump fhir_migrated_from_mysql
  • Use pg_restore to load data from dump file into the (target) PostgreSQL database initialized by hapi-fhir-cli ...
    • Use pg_restore --list to produce file containing list of TABLE DATA elements to be restored from the archive.  E.g.:
      pg_restore --list fhir-migrated-from-mysql.pgdump | grep 'TABLE DATA' | grep -v fly_hfj_migration | grep -v seq_ > fhir-data-list.txt
    • Manually edit the list file, moving the lines for the following 7 items to the top of the file:  hfj_blk_export_job, hfj_blk_export_collection, hfj_resource, hfj_res_ver, hfj_tag_def, trm_valueset, trm_valueset_concept.
    • (If needed, drop circular foreign key references in target PostgreSQL database.)
    • Use pg_restore to load data (only) from pg_dump archive into target PostgreSQL database, as specified by the manually edited list file.  E.g.:
      pg_restore --username pguser --host $PG_HOSTNAME --data-only --use-list fhir-data-list.txt --dbname fhir fhir-migrated-from-mysql.pgdump
    • (if needed, recreate foreign key references that were dropped from target PostgreSQL database.)
  • Update sequence values in target PostgreSQL database to match values from MySQL seq_* tables.
  • Compare MySQL and PostgreSQL databases - sequence values and number of rows in each data table should match.
I hope that helps.

Joel

Reply all
Reply to author
Forward
0 new messages