Advice needed: Wagon Wheel Replication Topology

KTWalrus

unread,

Mar 30, 2016, 6:41:58 PM3/30/16

to Percona Discussion

Since 5.7 has added multi-source replication, I'm thinking of deploying my database using a Wagon Wheel replication topology. I haven't found any blogs on Wagon Wheels, but such a topology seems to make the most sense for my application which is geographically dispersed (a headquarter's node with multiple regional nodes).

To form the replication topology, the headquarter's node will have a "hub" MySQL instance. The various regional nodes will deploy 2 "rim" MySQL instances at each regional site.

The "hub" instance will be configured to bi-directionally replicate to each "rim" instance (master-master using GTIDs). Each "rim" instance will use circular replication to replicate the database to an adjacent "rim" instance. The first "rim" instance in a region will be the primary instance that all clients in the region connect to (using haproxy) with the second "rim" instance being a backup instance just in case the first instance temporarily fails for some reason. The primary instance in each region will have a replication channel to the backup instance in nearest adjacent region and have a second replication channel to the headquarter's hub instance. The backup instance in each region will have a replication channel to the primary instance in the same region and have a second replication channel to the headquarter's instance.

So, a 2N + 1 cluster for N regions will have a "hub" with 2N replication channels and each "rim" instance will have 2 replication channels.

The idea is that each region will survive the temporary failure of 1 instance (including the failure of the "hub"). If the adjacent region's backup instance temporarily fails to replicate to the region's primary instance, the cluster-wide updates will all come from the "hub" instance. If the "hub" instance fails temporarily, the adjacent regions will asynchronously replicate cluster-wide updates around the rim. So, there are two paths to replicate an update through the cluster: 2 hops (through the "hub") or up to a maximum 2N - 1 hops (through the "rim"). If both paths are severed, the application will simply stop seeing updates from other regions until at least one path is restored (which is okay for my application where the users mostly access/update data specific to their home region).

Anyway, is this a workable replication topology for higher availability of a distributed database? The "hub" instance will be used to distribute global admin-type updates while the "rim" instances provide high availability, fast access, to the clients in each region. The "hub" will be on a big server to handle 2N replication channels while the "rim" servers will be sized appropriate to each region's needs.

How well will this scale? That is, I assume the "hub" server to potentially be a replication bottleneck since it has 2N channels to keep up to date. I also assume the network traffic to transfer the updates won't be that much of a bottleneck even though each "rim" instance will run two redundant channels (one to "hub", the other to an adjacent "rim" instance). The database will eventually grow to several TBs over time before settling down to a long term size. I plan to use fast SSDs on all servers so updates may be applied relatively quickly.

Any advice on deploying such a topology in production? I really want input from those who have experience with scaling MySQL to a geographically dispersed cluster where there are a growing number of regions (in my case, 4 regions to start with growth over time up to 99 regions). I may need to split the cluster eventually into multiple wheels (with the hubs of each wheel replicating to 1 or more other hubs).

Peter Zaitsev

unread,

Mar 31, 2016, 8:55:36 PM3/31/16

to percona-discussion

Hi,

One thing I recommend to consider is even though MySQL 5.7 has multi-source replication it does not come with meaningful conflict detection and resolution as such the more complicated topology you come up with the more challenging potential data flows you need to consider for possible conflicts.

Another thing to consider which makes this use case unconventional is multi-path. In typical multi-source replication it is implemented as fan-in so there is only one active path from the masters to the slave.

--
You received this message because you are subscribed to the Google Groups "Percona Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to percona-discuss...@googlegroups.com.
To post to this group, send email to percona-d...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Peter Zaitsev, CEO, Percona
Tel: +1 888 401 3401 ext 7360 Skype: peter_zaitsev

KTWalrus

unread,

Apr 1, 2016, 6:49:01 PM4/1/16

to Percona Discussion

By multi-path, I take it you mean that every rim server has two active channels (one to the hub server and the other to an adjacent rim server). In my proposed topology, every server is a master so the entire cluster should be able to handle replicating all updates throughout the cluster.

Is multi-path an issue? I figure that I need multi-path for high availability? So, if the hub server temporarily fails, the cluster functions normally as replication is done on the rim. And, if multiple rim servers fail, the hub server will distribute the updates to all remaining rim servers.

Since each region will have 2 rim servers, I should have a very reliable distributed database service within the region's data center.

And, in my case, the updates from outside the region can experience significant delays and the users of the region's database, won't be that negatively affected. In fact, it would be okay (but not ideal) if replication took place only at night with daytime replication being suspended.

I think that as the number of regions grows over time, I can upgrade the Wagon Wheel topology into a full Wagon topology. A Wagon topology would be 4 Wagon Wheels with the hubs replicating to each other to distribute update in one Wheel to the other Wheels (think two axles and a drivetrain connecting the 4 Wagon Wheels).

To unsubscribe from this group and stop receiving emails from it, send an email to percona-discussion+unsub...@googlegroups.com.

To post to this group, send email to percona-d...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Peter Zaitsev

unread,

Apr 1, 2016, 8:51:17 PM4/1/16

to percona-discussion

Hi,

My main point is what you're looking at setup which is untypical and is not commonly used in production at this point.

Someone always have to go first and polish the rough corners. If you're looking at it as "off the shelf" HA solution which is proven by thousands deployment this is perhaps not it.

To unsubscribe from this group and stop receiving emails from it, send an email to percona-discuss...@googlegroups.com.

To post to this group, send email to percona-d...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Peter Zaitsev, CEO, Percona
Tel: +1 888 401 3401 ext 7360 Skype: peter_zaitsev

--
You received this message because you are subscribed to the Google Groups "Percona Discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to percona-discuss...@googlegroups.com.

To post to this group, send email to percona-d...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

KTWalrus

unread,

Apr 1, 2016, 10:10:03 PM4/1/16

to Percona Discussion

What part of my proposed topology is "unpolished"?

Multi-source replication for the "hub" from the "rim" servers?
Replication from "hub" back to "rim" servers?
Circular replication around the "rim"?
Each "rim" server having dual channels to get (duplicate) updates from both the "hub" and an adjacent "rim" server?

Do you mean that since my topology uses multi-source replication, which is new in 5.7, this might be the main source of "rough corners"? Or, is it that every update transaction is sent in two directions (one bouncing through the "hub" and the other traveling around the "rim") that will be the untested part of the topology?

I assume that circular replication (multi-master) is a well tested area (even if the main use case is for master-master replication for 2 servers). The star topology should also be deployed in production in 5.7 by a bunch of deployments by now, I would think (maybe not).

Anyway, my deployment is fairly complex as I will use a three different public cloud providers (AWS, Google, and another OpenStack public cloud provider) and will have VMs deployed in 5 separate regions each with 2 "rim" servers per region. So, to start, 10 "rim" servers and 1 "hub" server to handle MySQL replication will be deployed using this Wagon Wheel topology.

If this deployment topology turns out to be too cutting edge, I might start with just a Star topology and risk that the "hub" temporary failure (either network connectivity issues or mysql local failures) that would break cluster replication instead of having 2 replication paths providing the possibility of higher availability.

Thanks for your input.

To unsubscribe from this group and stop receiving emails from it, send an email to percona-discussion+unsub...@googlegroups.com.

To post to this group, send email to percona-d...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Peter Zaitsev, CEO, Percona
Tel: +1 888 401 3401 ext 7360 Skype: peter_zaitsev

--
You received this message because you are subscribed to the Google Groups "Percona Discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to percona-discussion+unsub...@googlegroups.com.

To post to this group, send email to percona-d...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bryan O'Neal

unread,

Apr 3, 2016, 3:23:37 PM4/3/16

to percona-d...@googlegroups.com

I believe the truly unique part of this is that you have a situation where multiple masters are feeding the same data set to a slave. In addition the "long" path to that data set is quite long, and you have not indicated how you intend to manage key conflicts. This is where I personally would spin up at least 3 nodes, and probably scale to a few hundred vm's later, apply artificially latency to each, and stress test the ring, paying close attention to what you implement for insert/update collisions management.
You will know your data going in so you can test how well it performs live on each node as well as testing the full data set at the end. I personally would worry about unique key conflicts and updates not being applied in the correct order.

To unsubscribe from this group and stop receiving emails from it, send an email to percona-discuss...@googlegroups.com.

To post to this group, send email to percona-d...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Peter Zaitsev, CEO, Percona
Tel: +1 888 401 3401 ext 7360 Skype: peter_zaitsev

--
You received this message because you are subscribed to the Google Groups "Percona Discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to percona-discuss...@googlegroups.com.

To post to this group, send email to percona-d...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Peter Zaitsev, CEO, Percona
Tel: +1 888 401 3401 ext 7360 Skype: peter_zaitsev

--
You received this message because you are subscribed to the Google Groups "Percona Discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to percona-discuss...@googlegroups.com.

KTWalrus

unread,

Apr 6, 2016, 2:31:05 PM4/6/16

to Percona Discussion

Thanks Bryan. I'm starting to get a better idea of the approach I want to pursue.

I'm now thinking of not replicating around the wheel. Each region will have 2 MySQL 5.7 instances in master-master async replication (one as primary, the other as hot backup). These servers will also each be in master-master async replication with the "hub" server. So, each "rim" instance will use multi-source replication with 2 channels each. The "hub" server will have to handle multi-source replication to each deployed "rim" instance, so updates from other regions always go through the "hub".

I can tolerate this topology since even if replication stalls for updates from outside a region due to "hub" failure, my apps can tolerate any extended latency (as long as the updates are fully applied overnight).

Of course, I am still relying on MySQL replication to properly handle multi-path updates since every update generated by a "rim" server will be sent to both the "hub" and the other instance within the region. If this doesn't function properly, then MySQL 5.7 multi-source replication would not be ready for production use and I believe that it should be by now.

Eventually, I can probable replace the 2 instances in each region with a 3 instance Percona Cluster (when Percona Cluster supports MySQL 5.7) to scale any region that happens to outgrow a single primary instance with a high availability backup instance). With Percona Cluster handling replication within the region and MySQL handling replication between regions (through the "hub"), the two paths for an update would be handled by two separate replication approaches. Hopefully, this is well tested since most geographically distributed topologies using Percona Cluster do depend on a mix of async and cluster sync replication.

To unsubscribe from this group and stop receiving emails from it, send an email to percona-discussion+unsub...@googlegroups.com.

To post to this group, send email to percona-d...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Peter Zaitsev, CEO, Percona
Tel: +1 888 401 3401 ext 7360 Skype: peter_zaitsev

--
You received this message because you are subscribed to the Google Groups "Percona Discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to percona-discussion+unsub...@googlegroups.com.

To post to this group, send email to percona-d...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Peter Zaitsev, CEO, Percona
Tel: +1 888 401 3401 ext 7360 Skype: peter_zaitsev

--
You received this message because you are subscribed to the Google Groups "Percona Discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to percona-discussion+unsub...@googlegroups.com.

Reply all

Reply to author

Forward