My target DW is Redshift so I was thinking of the following:
- Compatibility NONE - file rolls every time there's a schema change.
- A modified version of the S3 Sink Connector that
- adds the schema version to the filename.
- has loader thread/s the periodically looks for new files ands sort them by table name, offset, for loading. Once a schema change was observed, look for the DDL in the "schema change" topic.
To map the MySQL DDL to Redshift DDL, I plan on using the
MySqlAntlrDdlParser and map column types according to
this by Amazon.
Only thing that worries me know is "rapid" schema updates, or how to avoid synchronizing the table "version" with the events version...
I could maintain a "counter" of sorts that tells me the number of schema changes the table topic "saw" and apply only that number of DDLs from the schema change topic but I'd rather avoid state and I'm not sure if the Confluent Schema guarantees a sequence.
Any suggestions?