Hi,
Just came across debezium from the Kafka Connect Hub.
I'm really interested in the project because I've built many prototypes of MySQL CDC over the years and never been satisfied with any of them for production use.
So my question is, what are your current plans for supporting HA MySQL setups where the master can failover (or in general the host debezium is slaved to can go away)?
So far from looking at code (not tried it yet) it seems you only support a single source server.
Note that Linked In in their original Databus, solved this only by patching MySQL to get monotonically increasing transaction ids in commit order. They have interesting discussion of some ofthe pros and cos of possible approaches here:
https://github.com/linkedin/databus/wiki/Databus-for-MySQL
IN recent MySQL/MariaDB GTIDs help some - they make it easier to uniquely identify transactions across replicas, although there are still many subtleties to supporting failover robustly and I've not seen a good solution to that yet.
We use MHA4MySQL to manage failover of MySQL masters and reconfiguration of replication slaves, however this solution does not play well with tools like debezium because it expects slaves to correctly respond to queries to stop and reconfigure replication master, as well as expecting to find their relay log files on disk...
In short it's not an impossible problem, but I've yet to see a good plan for how to do this in Open Source and it appears to be a deal breaker for tools like this one - no-one is actually going to be able to use CDC tools in production if they can't work with HA MySQL setups.
Would love to hear your thoughts.
Paul