Date: Wed, 19 Sep 2012 07:53:16 -0700 (PDT) From: KTWalrus To: percona-discussion@googlegroups.com Message-Id: <22b14a88-8999-43cb-bf67-66154bdf8c30@googlegroups.com> In-Reply-To: <51e523a0-e7a7-4ed9-9150-7f1eb7707c35@googlegroups.com> References: <9847cc74-1192-4e73-ba78-abc8f4520e44@googlegroups.com> <51e523a0-e7a7-4ed9-9150-7f1eb7707c35@googlegroups.com> Subject: Re: Consolodating into Master DB from multiple Geographically Distributed DBs MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_223_13581104.1348066397278" ------=_Part_223_13581104.1348066397278 Content-Type: multipart/alternative; boundary="----=_Part_224_20374347.1348066397278" ------=_Part_224_20374347.1348066397278 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Thanks. I intend to handle conflicts by treating them as bugs in my applications that use the database. I believe there is no reason for conflicts since my applications are designed to not require updates until they sync over night and I believe the rows that are updated in a distributed database are not updated in any other database during the day. I believe my data is naturally shard'ed in regards to updates. I intend to eventually use Galera for in-datacenter clustering and Tungsten for asynchronous replication between datacenters. I believe this architecture will allow "huge" scalability and high availability. The only scalability issue I see is when the asynchronous updates overwhelm the ability to apply them within 24 hours to all distributed databases, but I can't imagine this ever really happening since slaves will "catch up" during off hours at night. Also, this architecture requires complete copies of the central DB on all servers which might add to the system storage costs over time. Hopefully, I can prune the central DB so it doesn't grow too fast. On Wednesday, September 19, 2012 3:18:52 AM UTC-4, neila wrote: > > Tungsten Replicator would help and you could delay replication by just > taking individual replicators offline. The biggest problem as Peter and Jay > have already pointed out is conflict resolution. The replicator does not > help with that it is still up to your application to handle conflicts as > the replicators just take the transactions from server A and apply them to > server B. > > Neil > > > On Tuesday, 18 September 2012 21:30:11 UTC+1, KTWalrus wrote: >> >> Just discovered Tungsten Replicator and, according to this Link, >> the Replicator can set up a Star Topology. For my purposes, I want the >> updates to flow from the distributed DBs into the central DB all the time. >> For the updates in the other direction, I can live with them being applied >> continuously. But, I hope that I can delay the updates to the distributed >> DBs until off hours (when very few, if any, users are using the distributed >> DBs). >> >> On Tuesday, September 18, 2012 1:07:40 PM UTC-4, KTWalrus wrote: >>> >>> My approach of using mysqlbinlog to apply transactions from multiple >>> distributed DBs asynchronously to a central Master really doesn't involve >>> using the built-in MySQL replication (other than relying on binlogs for row >>> updates). >>> >>> I've been thinking I should look into modifying the Slave IO Thread >>> logic to accept relay log updates from multiple Masters. Then, I could >>> configure my central DB as a true MySQL Master and each of the distributed >>> DBs as true Slaves. In addition, I could make the central DB a multi-slave >>> to each of the distributed DBs using my modifications to the Slave IO >>> Thread logic. >>> >>> Basically, the modification to the Slave IO Thread logic would better >>> simulate what I was thinking of doing with mysqlbinlog except that >>> transactions would be ordered closer to chronological order. But, because >>> I would now be using MySQL replication to apply updates from the central DB >>> to the distributed DBs (skipping applying the updates that originated from >>> a distributed DB to itself), I avoid having to rsync at night and the >>> updates are applied much sooner (minimizing some collisions). >>> >>> This approach is kind of a mixture of MySQL ring replication and an >>> asynchronous Galera cluster (where only a single DB receives all updates >>> and all other DBs get their non-local updates from the central DB). >>> >>> How hard do you think it would be to modify the Slave IO Thread logic to >>> read updates from multiple masters? >>> >>> Has anyone done this before and were there any lessons learned? >>> >>> I realize that my needs are simpler than implementing this for the >>> general case. For one, I don't care if my non-local updates take a while >>> to be seen in the distributed DBs. Also, admin updates can be applied to >>> the central DB for normal MySQL replication so the only updates originating >>> from the distributed DBs are simple row updates (insert, update, and delete >>> all by primary key). And, I don't care if there is a large window where >>> some user sees "stale" data (that has been updated at a separate location, >>> but not yet locally applied). >>> >> ------=_Part_224_20374347.1348066397278 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable Thanks.  I intend to handle conflicts by treating them as bugs in my a= pplications that use the database.  I believe there is no reason for c= onflicts since my applications are designed to not require updates until th= ey sync over night and I believe the rows that are updated in a distributed= database are not updated in any other database during the day.  I bel= ieve my data is naturally shard'ed in regards to updates.

I intend t= o eventually use Galera for in-datacenter clustering and Tungsten for async= hronous replication between datacenters.  I believe this architecture = will allow "huge" scalability and high availability.  The only scalabi= lity issue I see is when the asynchronous updates overwhelm the ability to = apply them within 24 hours to all distributed databases, but I can't imagin= e this ever really happening since slaves will "catch up" during off hours = at night.  Also, this architecture  requires complete copies of t= he central DB on all servers which might add to the system storage costs ov= er time.  Hopefully, I can prune the central DB so it doesn't grow too= fast.

On Wednesday, September 19, 2012 3:18:52 AM UTC-4, neila wrot= e:
Tungsten Replicator would he= lp and you could delay replication by just taking individual replicators of= fline. The biggest problem as Peter and Jay have already pointed out is con= flict resolution. The replicator does not help with that it is still up to = your application to handle conflicts as the replicators just take the trans= actions from server A and apply them to server B.

Neil


On Tuesday, 18 September 2012 21:30:11 UTC+1, KTWalrus wr= ote:
Just discovered Tungsten Replic= ator and, according to this Link, th= e Replicator can set up a Star Topology.  For my purposes, I want the = updates to flow from the distributed DBs into the central DB all the time.&= nbsp; For the updates in the other direction, I can live with them being ap= plied continuously.  But, I hope that I can delay the updates to the d= istributed DBs until off hours (when very few, if any, users are using the = distributed DBs).

On Tuesday, September 18, 2012 1:07:40 PM UTC-4, K= TWalrus wrote:
My approach of using = mysqlbinlog to apply transactions from multiple distributed DBs asynchronou= sly to a central Master really doesn't involve using the built-in MySQL rep= lication (other than relying on binlogs for row updates).

I've been = thinking I should look into modifying the Slave IO Thread logic to accept r= elay log updates from multiple Masters.  Then, I could configure my ce= ntral DB as a true MySQL Master and each of the distributed DBs as true Sla= ves.  In addition, I could make the central DB a multi-slave to each o= f the distributed DBs using my modifications to the Slave IO Thread logic.<= br>
Basically, the modification to the Slave IO Thread logic would bette= r simulate what I was thinking of doing with mysqlbinlog except that transa= ctions would be ordered closer to chronological order.  But, because I= would now be using MySQL replication to apply updates from the central DB = to the distributed DBs (skipping applying the updates that originated from = a distributed DB to itself), I avoid having to rsync at night and the updat= es are applied much sooner (minimizing some collisions).

This approa= ch is kind of a mixture of MySQL ring replication and an asynchronous Galer= a cluster (where only a single DB receives all updates and all other DBs ge= t their non-local updates from the central DB).

How hard do you thin= k it would be to modify the Slave IO Thread logic to read updates from mult= iple masters?

Has anyone done this before and were there any lessons= learned?

I realize that my needs are simpler than implementing this= for the general case.  For one, I don't care if my non-local updates = take a while to be seen in the distributed DBs.  Also, admin updates c= an be applied to the central DB for normal MySQL replication so the only up= dates originating from the distributed DBs are simple row updates (insert, = update, and delete all by primary key).  And, I don't care if there is= a large window where some user sees "stale" data (that has been updated at= a separate location, but not yet locally applied).
------=_Part_224_20374347.1348066397278-- ------=_Part_223_13581104.1348066397278--