Recommendation for WAN replication

1,496 views
Skip to first unread message

Jay Pipes

unread,
Mar 18, 2013, 12:38:32 PM3/18/13
to codersh...@googlegroups.com
We have a situation where latency between two datacenters is not particularly good (about 75ms for TCP roundtrip). Having the one datacenter directly access the MySQL cluster running on the other datacenter is way too slow.

The data in datacenter A is the "master" data for our authentication system. The applications in datacenter B need fast read access to the authentication data but importantly, the applications in datacenter B do NOT need write access to the authentication database in datacenter A.

I'm looking to implement one of two solutions, and I'm looking for advice on which solution makes the most sense.

Solution A:

Use standard MySQL read-only slave replication. In datacenter B, set up one or more read-only MySQL slave servers that read from one or more of the Galera cluster nodes in datacenter A using statement-based replication (roundtrip TCP performance is horrible, so the fewer roundtrips, the better). Allow for high latency/timeouts for the slaves reading from datacenter A's cluster nodes.

Solution B:

Spin up more Galera cluster nodes in datacenter B. Can I use rsync_wlan send method if I use rsync send method for all other cluster nodes in datacenter A? I worry that the latency of writing from datacenter A's cluster nodes to datacenter B's cluster nodes will be too high for acceptable performance..

I am leaning towards solution A because I think it will be the best performing but would like to hear from the group about their suggestions.

Best,
-jay

Jay Janssen

unread,
Mar 18, 2013, 2:21:07 PM3/18/13
to Jay Pipes, codersh...@googlegroups.com
On Mar 18, 2013, at 12:38 PM, Jay Pipes <jayp...@gmail.com> wrote:


Solution A:

Use standard MySQL read-only slave replication. In datacenter B, set up one or more read-only MySQL slave servers that read from one or more of the Galera cluster nodes in datacenter A using statement-based replication (roundtrip TCP performance is horrible, so the fewer roundtrips, the better). Allow for high latency/timeouts for the slaves reading from datacenter A's cluster nodes.

Galera nodes require binlog-format=ROW, so your replication with async replication to datacenter B will be the same.



Solution B:

Spin up more Galera cluster nodes in datacenter B. Can I use rsync_wlan send method if I use rsync send method for all other cluster nodes in datacenter A? I worry that the latency of writing from datacenter A's cluster nodes to datacenter B's cluster nodes will be too high for acceptable performance..

I am leaning towards solution A because I think it will be the best performing but would like to hear from the group about their suggestions.

B can work, commit latency should be the only bottleneck, that should work out to about 1 ping RTT in a two colo setup.  

For HA, if A and B have equal nodes, the loss of one DC would mean the remaining nodes would be non-primary and go offline.  Having 1 extra node (or an arbitrator) in A will keep A up even if B goes down.    If B did go down (in any case), A would be also be prevented from writing until B's nodes exceed the suspect_timeout and the process to remove them from the cluster started.  


Jay Janssen, MySQL Consulting Lead, Percona
Percona Live in Santa Clara, CA  April 22nd-25th 2013

Jay Pipes

unread,
Mar 18, 2013, 2:47:46 PM3/18/13
to Jay Janssen, codersh...@googlegroups.com
On 03/18/2013 02:21 PM, Jay Janssen wrote:
> On Mar 18, 2013, at 12:38 PM, Jay Pipes <jayp...@gmail.com
> <mailto:jayp...@gmail.com>> wrote:
>
>>
>> Solution A:
>>
>> Use standard MySQL read-only slave replication. In datacenter B, set
>> up one or more read-only MySQL slave servers that read from one or
>> more of the Galera cluster nodes in datacenter A using statement-based
>> replication (roundtrip TCP performance is horrible, so the fewer
>> roundtrips, the better). Allow for high latency/timeouts for the
>> slaves reading from datacenter A's cluster nodes.
>
> Galera nodes require binlog-format=ROW, so your replication with async
> replication to datacenter B will be the same.

Ah, good point, thx for reminding me.

>> Solution B:
>>
>> Spin up more Galera cluster nodes in datacenter B. Can I use
>> rsync_wlan send method if I use rsync send method for all other
>> cluster nodes in datacenter A? I worry that the latency of writing
>> from datacenter A's cluster nodes to datacenter B's cluster nodes will
>> be too high for acceptable performance..
>>
>> I am leaning towards solution A because I think it will be the best
>> performing but would like to hear from the group about their suggestions.
>
> B can work, commit latency should be the only bottleneck, that should
> work out to about 1 ping RTT in a two colo setup.
>
> For HA, if A and B have equal nodes, the loss of one DC would mean the
> remaining nodes would be non-primary and go offline. Having 1 extra
> node (or an arbitrator) in A will keep A up even if B goes down. If B
> did go down (in any case), A would be also be prevented from writing
> until B's nodes exceed the suspect_timeout and the process to remove
> them from the cluster started.

As I mentioned, though, the latency is horrendous between the two DCs
and we do not need HA across the datacenters.

I think I'm going to go with read-only slaves in datacenter B slaved
from one of the Galera nodes in datacenter A.

Best,
-jay

Henrik Ingo

unread,
Mar 18, 2013, 7:00:54 PM3/18/13
to Jay Pipes, codersh...@googlegroups.com
On Mon, Mar 18, 2013 at 6:38 PM, Jay Pipes <jayp...@gmail.com> wrote:
> Solution B:
>
> Spin up more Galera cluster nodes in datacenter B. Can I use rsync_wlan send
> method if I use rsync send method for all other cluster nodes in datacenter
> A? I worry that the latency of writing from datacenter A's cluster nodes to
> datacenter B's cluster nodes will be too high for acceptable performance..
>
> I am leaning towards solution A because I think it will be the best
> performing but would like to hear from the group about their suggestions.
>

Hi Jay

I realize this goes against everything you ever learned about
syncrhonous replication. Even for myself this was initially difficult
to accept. But really, many people run Galera Clusters over 2 or 3
continents. Your sub 100 ms latency isn't considered bad in any way.
Yes, try to make a client-server connection over it and then it's bad.
But Galera replication will actually be quite a good option here.

http://www.mysqlperformanceblog.com/2012/01/11/making-the-impossible-3-nodes-intercontinental-replication/
http://www.codership.com/content/synchronous-replication-loves-you-again

Vadim only measured latencies, but Alex' latter post also reports
sysbench tps results. Apologies on his behalf for the messy graphs,
but if you stare at them for 15 minutes or so, you will see that
Galera handles this trans-atlantic latency pretty much without any
degradation at all. Of course, with only a few client threads the
latency is there, but with enough concurrency the throughput is the
same both for LAN and WAN clusters. (And yes, this is very
surprising...)

Of course, there will be applications for which the commit latency is
a show stopper (those doing 15 autocommit queries in a row are a good
candidate...) but for many apps this is not a problem.

Since using only one kind of replication is a much simpler design, and
managing Galera in my opinion is simpler than MySQL replication, I
would in your case seriously consider alternative B.

That said, there's nothing wrong in setting up 2 separate Galera
clusters and then connecting them with MySQL replication. (I've done
that too, and recommend it for really crappy WAN links, but my
definition of really crappy is much crappier than yours.)

henrik




--
henri...@avoinelama.fi
+358-40-8211286 skype: henrik.ingo irc: hingo
www.openlife.cc

My LinkedIn profile: http://www.linkedin.com/profile/view?id=9522559

Jay Pipes

unread,
Mar 25, 2013, 11:32:01 AM3/25/13
to henri...@avoinelama.fi, codersh...@googlegroups.com
On 03/18/2013 07:00 PM, Henrik Ingo wrote:
> Hi Jay
>
> I realize this goes against everything you ever learned about
> syncrhonous replication. Even for myself this was initially difficult
> to accept. But really, many people run Galera Clusters over 2 or 3
> continents. Your sub 100 ms latency isn't considered bad in any way.
> Yes, try to make a client-server connection over it and then it's bad.
> But Galera replication will actually be quite a good option here.
>
> http://www.mysqlperformanceblog.com/2012/01/11/making-the-impossible-3-nodes-intercontinental-replication/
> http://www.codership.com/content/synchronous-replication-loves-you-again
>
> Vadim only measured latencies, but Alex' latter post also reports
> sysbench tps results. Apologies on his behalf for the messy graphs,
> but if you stare at them for 15 minutes or so, you will see that
> Galera handles this trans-atlantic latency pretty much without any
> degradation at all. Of course, with only a few client threads the
> latency is there, but with enough concurrency the throughput is the
> same both for LAN and WAN clusters. (And yes, this is very
> surprising...)
>
> Of course, there will be applications for which the commit latency is
> a show stopper (those doing 15 autocommit queries in a row are a good
> candidate...) but for many apps this is not a problem.
>
> Since using only one kind of replication is a much simpler design, and
> managing Galera in my opinion is simpler than MySQL replication, I
> would in your case seriously consider alternative B.

Hi again,

So, we went ahead and implemented the WAN replication with Galera
cluster and everything is working smoothly at the moment. I'll post back
with any issues we might encounter as well as any performance numbers we
can gather.

Thanks much for everyone's help :)

-jay

Oleksandr Drach

unread,
Mar 28, 2013, 9:23:44 AM3/28/13
to codersh...@googlegroups.com, henri...@avoinelama.fi
Hello Jay,

You may be interested in Galera parameters fro WAN replication.

Thanks!
Reply all
Reply to author
Forward
0 new messages