Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion Consolodating into Master DB from multiple Geographically Distributed DBs
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
KTWalrus  
View profile  
 More options Sep 19 2012, 10:53 am
From: KTWalrus <ktwal...@gmail.com>
Date: Wed, 19 Sep 2012 07:53:16 -0700 (PDT)
Local: Wed, Sep 19 2012 10:53 am
Subject: Re: Consolodating into Master DB from multiple Geographically Distributed DBs

Thanks.  I intend to handle conflicts by treating them as bugs in my
applications that use the database.  I believe there is no reason for
conflicts since my applications are designed to not require updates until
they sync over night and I believe the rows that are updated in a
distributed database are not updated in any other database during the day.  
I believe my data is naturally shard'ed in regards to updates.

I intend to eventually use Galera for in-datacenter clustering and Tungsten
for asynchronous replication between datacenters.  I believe this
architecture will allow "huge" scalability and high availability.  The only
scalability issue I see is when the asynchronous updates overwhelm the
ability to apply them within 24 hours to all distributed databases, but I
can't imagine this ever really happening since slaves will "catch up"
during off hours at night.  Also, this architecture  requires complete
copies of the central DB on all servers which might add to the system
storage costs over time.  Hopefully, I can prune the central DB so it
doesn't grow too fast.

On Wednesday, September 19, 2012 3:18:52 AM UTC-4, neila wrote:

> Tungsten Replicator would help and you could delay replication by just
> taking individual replicators offline. The biggest problem as Peter and Jay
> have already pointed out is conflict resolution. The replicator does not
> help with that it is still up to your application to handle conflicts as
> the replicators just take the transactions from server A and apply them to
> server B.

> Neil

> On Tuesday, 18 September 2012 21:30:11 UTC+1, KTWalrus wrote:

>> Just discovered Tungsten Replicator and, according to this Link<http://datacharmer.blogspot.com/2011/11/replication-multiple-masters-...>,
>> the Replicator can set up a Star Topology.  For my purposes, I want the
>> updates to flow from the distributed DBs into the central DB all the time.  
>> For the updates in the other direction, I can live with them being applied
>> continuously.  But, I hope that I can delay the updates to the distributed
>> DBs until off hours (when very few, if any, users are using the distributed
>> DBs).

>> On Tuesday, September 18, 2012 1:07:40 PM UTC-4, KTWalrus wrote:

>>> My approach of using mysqlbinlog to apply transactions from multiple
>>> distributed DBs asynchronously to a central Master really doesn't involve
>>> using the built-in MySQL replication (other than relying on binlogs for row
>>> updates).

>>> I've been thinking I should look into modifying the Slave IO Thread
>>> logic to accept relay log updates from multiple Masters.  Then, I could
>>> configure my central DB as a true MySQL Master and each of the distributed
>>> DBs as true Slaves.  In addition, I could make the central DB a multi-slave
>>> to each of the distributed DBs using my modifications to the Slave IO
>>> Thread logic.

>>> Basically, the modification to the Slave IO Thread logic would better
>>> simulate what I was thinking of doing with mysqlbinlog except that
>>> transactions would be ordered closer to chronological order.  But, because
>>> I would now be using MySQL replication to apply updates from the central DB
>>> to the distributed DBs (skipping applying the updates that originated from
>>> a distributed DB to itself), I avoid having to rsync at night and the
>>> updates are applied much sooner (minimizing some collisions).

>>> This approach is kind of a mixture of MySQL ring replication and an
>>> asynchronous Galera cluster (where only a single DB receives all updates
>>> and all other DBs get their non-local updates from the central DB).

>>> How hard do you think it would be to modify the Slave IO Thread logic to
>>> read updates from multiple masters?

>>> Has anyone done this before and were there any lessons learned?

>>> I realize that my needs are simpler than implementing this for the
>>> general case.  For one, I don't care if my non-local updates take a while
>>> to be seen in the distributed DBs.  Also, admin updates can be applied to
>>> the central DB for normal MySQL replication so the only updates originating
>>> from the distributed DBs are simple row updates (insert, update, and delete
>>> all by primary key).  And, I don't care if there is a large window where
>>> some user sees "stale" data (that has been updated at a separate location,
>>> but not yet locally applied).


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.