Handling HA MySQL setups

320 views
Skip to first unread message

Paul Banks

unread,
Mar 10, 2016, 10:54:16 AM3/10/16
to debezium
Hi,

Just came across debezium from the Kafka Connect Hub.

I'm really interested in the project because I've built many prototypes of MySQL CDC over the years and never been satisfied with any of them for production use.

So my question is, what are your current plans for supporting HA MySQL setups where the master can failover (or in general the host debezium is slaved to can go away)?

So far from looking at code (not tried it yet) it seems you only support a single source server.

Note that Linked In in their original Databus, solved this only by patching MySQL to get monotonically increasing transaction ids in commit order. They have interesting discussion of some ofthe pros and cos of possible approaches here: https://github.com/linkedin/databus/wiki/Databus-for-MySQL

IN recent MySQL/MariaDB GTIDs help some - they make it easier to uniquely identify transactions across replicas, although there are still many subtleties to supporting failover robustly and I've not seen a good solution to that yet.

We use MHA4MySQL to manage failover of MySQL masters and reconfiguration of replication slaves, however this solution does not play well with tools like debezium because it expects slaves to correctly respond to queries to stop and reconfigure replication master, as well as expecting to find their relay log files on disk...

In short it's not an impossible problem, but I've yet to see a good plan for how to do this in Open Source and it appears to be a deal breaker for tools like this one - no-one is actually going to be able to use CDC tools in production if they can't work with HA MySQL setups.

Would love to hear your thoughts.

Paul

Randall Hauch

unread,
Mar 15, 2016, 11:40:29 AM3/15/16
to debezium
Hi, Paul. Thanks for asking. (And sorry for the delay; been recovering from a very long weekend of robotics. Anyway...)

You are correct in that currently Debezium only deals with a single MySQL server, and that this will need to change to be able to ingest from HA clusters. The challenge will boil down to a handling (at least) the following:
  • Identifying the set of servers that make up the cluster. An easy way to do this is to configure the MySQL connector with a list (perhaps ordered) of servers that it can use or at least bootstrap.
  • Upon failure while reading the binlog from server X, the connector should trigger a Kafka Connect reconfiguration so that the current task is gracefully shutdown and a new one is started, which should then be able to sequentially try the bootstrap server list until a connection is made.
  • Upon connection to a different server, attempt to location the position in that server's binlog that represents the last recorded source offset. How we can do that will require investigation, and sounds like there's no single way to do this effectively for all HA clusters given the various tools. Once identified, we should be able to seek
My goal with Debezium is to make connectors that people can and will use, and that needs a community of people with the varied expertise to identify the problems, determine possible approaches, and to then implement and test a solution. Even though Debezium's MySQL connector might not yet be sufficient for HA situations, I hope you and others are willing to help get it there.

We use MHA4MySQL to manage failover of MySQL masters and reconfiguration of replication slaves, however this solution does not play well with tools like debezium because it expects slaves to correctly respond to queries to stop and reconfigure replication master, as well as expecting to find their relay log files on disk...

I guess it depends on what (if anything) the replica records in its binlog. If we can decipher that the replica has been reconfigured, then we can trigger a Kafka Connect reconfiguration so signal a "restart" of the task.

Any comments or thoughts?

Randall

Paul Banks

unread,
Mar 17, 2016, 3:08:44 PM3/17/16
to debezium
How we can do that will require investigation, and sounds like there's no single way to do this effectively for all HA clusters given the various tools.

Yeah that's the issue. For example, GTIDs are most promising option but I've not found clear guarantees of certain important properties like all binlogs of all replicas have exact same order with respect to GTIDs from different masters. Reading source code and becoming even more intimately familiar with the fine details of the replication system than I am now seem to be necessary to correctly reason about that stuff.

Maybe you have clearer understanding than me on that though?

The other issue with GTIDs is that it's only supported in MySQL 5.6+ and that even then it must be enabled which makes for an incredibly painful upgrade dance on a production cluster - Facebook wrote a whole blog about how they ended up contributing 10 or so patches to mysql in the course of finding a way to upgrade to GTIDs without stopping the world but not all are in mainstream last I checked and all official migration guides require you to stop all replicas and restart them all which is not a small thing from most production setups.

If we can decipher that the replica has been reconfigured

I'm not clear on how unambiguously that can be done from binlog. If replication statements (CHANGE MASTER) are logged then it should be possible without outside intervention. MHA4MySQL also has another option - you can configure it to trigger custom scripts during failover with information about old and new masters that could hook into debezium to trigger reconfiguration (and even hint at new correct config possibly).

MHA4MySQL also possibly buys you some other things though - it ensures that slave replica logs are identical and fills in gaps automatically during failover. Even then it relies on some assumptions that can go badly wrong if you break them (like never ever executing binlogged queries on slaves).

Overall, I'm sure with enough attention to detail, there are possible ways to make it work. But HA MySQL setups are fairly hard to do right already so coming up with something generic in Debezium which works correctly with a lot of other options seems like a very hard problem!

Just some thoughts.

> I hope you and others are willing to help get it there.

Would love to help where I can. It depends on whether I can get to a stage where this is a project I can get my employer to agree is worthwhile. If not I'm pretty limited in the time I can give. Even then, I hope I can help anyone else who has more time to spend benefit from the small amount of experience I have from researching options in the past!

Thanks

Randall Hauch

unread,
Mar 18, 2016, 2:00:07 PM3/18/16
to debezium


On Thursday, March 17, 2016 at 2:08:44 PM UTC-5, Paul Banks wrote:
How we can do that will require investigation, and sounds like there's no single way to do this effectively for all HA clusters given the various tools.

Yeah that's the issue. For example, GTIDs are most promising option but I've not found clear guarantees of certain important properties like all binlogs of all replicas have exact same order with respect to GTIDs from different masters. Reading source code and becoming even more intimately familiar with the fine details of the replication system than I am now seem to be necessary to correctly reason about that stuff.

Maybe you have clearer understanding than me on that though?

Currently the connector assumes the order of the events in the log are correct. I can envision situations involving multiple masters with partitioned data where the binlogs might be different, though for each partition the order should still be maintained. But regardless, if it is possible upon failure to switch to a different replica and see a different order in the binlog, there is potential to reprocess the events for a transaction in the new binlog that we'd already seen prior to the switchover, and we'd have to decide whether that needed to be addressed. On one hand, this might not be that different than how the Kafka architecture generally can provide *exactly once* delivery of messages during nominal operation, and drop to *at least once* while recovering from a failure. On the other hand, if we ultimately need to minimize these then perhaps probabilistic data structures (e.g., inverse bloom filter, etc.) might help us skip transactions we've already processed.


The other issue with GTIDs is that it's only supported in MySQL 5.6+ and that even then it must be enabled which makes for an incredibly painful upgrade dance on a production cluster - Facebook wrote a whole blog about how they ended up contributing 10 or so patches to mysql in the course of finding a way to upgrade to GTIDs without stopping the world but not all are in mainstream last I checked and all official migration guides require you to stop all replicas and restart them all which is not a small thing from most production setups. 

If there's no information in the binlogs to help correlate replicas, then a binlog-based approach like Debezium's MySQL connector is going to be a tough slog. If there's a non-binlog-based approach (even if it is more complicated but works for HA situations), then we can consider a second connector that uses this different approach.
 

If we can decipher that the replica has been reconfigured

I'm not clear on how unambiguously that can be done from binlog. If replication statements (CHANGE MASTER) are logged then it should be possible without outside intervention. MHA4MySQL also has another option - you can configure it to trigger custom scripts during failover with information about old and new masters that could hook into debezium to trigger reconfiguration (and even hint at new correct config possibly).

The binlog protocol documentation (https://dev.mysql.com/doc/internals/en/replication-protocol.html) doesn't show any kind of event for a change in master, though I've not gone through the MySQL source code to see if it might be there but just is undocumented.

Using a trigger will be difficult, since there isn't necessarily a single well-known service endpoint where the Debezium MySQL connector instance can be reached. That's because Kafka Connect is a distributed service that automatically distributes the connector jobs (e.g., tasks) as required. However, if (and this is a big if) a trigger could write to the binlog (e.g., perhaps an INCIDENT_EVENT if that's not abusing it too much) then we're golden.
 

MHA4MySQL also possibly buys you some other things though - it ensures that slave replica logs are identical and fills in gaps automatically during failover. Even then it relies on some assumptions that can go badly wrong if you break them (like never ever executing binlogged queries on slaves).

By "never ever executing binlogged queries on slaves", do you mean that Debezium would need to read the master's binlog and never that of a replica? That should be possible (we don't really care which server we're reading the binlog from), assuming we can figure out how to handle master changeovers.


Overall, I'm sure with enough attention to detail, there are possible ways to make it work. But HA MySQL setups are fairly hard to do right already so coming up with something generic in Debezium which works correctly with a lot of other options seems like a very hard problem!

Absolutely. But the fact that they're hard to get right is a good reason why the project needs those with experience like yours.
 

Just some thoughts.

> I hope you and others are willing to help get it there.

Would love to help where I can. It depends on whether I can get to a stage where this is a project I can get my employer to agree is worthwhile. If not I'm pretty limited in the time I can give. Even then, I hope I can help anyone else who has more time to spend benefit from the small amount of experience I have from researching options in the past!

Yeah, that's a catch 22. The more help we can get, the more valuable Debezium will be for you and others.

Best regards,

Randall

Reply all
Reply to author
Forward
0 new messages