Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Consolodating into Master DB from multiple Geographically Distributed DBs
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  7 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
KTWalrus  
View profile  
 More options Sep 17 2012, 6:10 pm
From: KTWalrus <ktwal...@gmail.com>
Date: Mon, 17 Sep 2012 15:10:46 -0700 (PDT)
Local: Mon, Sep 17 2012 6:10 pm
Subject: Consolodating into Master DB from multiple Geographically Distributed DBs

I have 4 geographically distributed DBs.  The DBs only need to be sync'd
once a night with all changes made to each DB during the day.

I'm thinking of keeping a Master DB up to date and rsync'ing to each of the
geographically distributed DBs in the early morning.  Each DB would record
its changes to its binlog and I would apply the changes to the Master DB
every few hours using mysqlbinlog piped to mysql.

Any problems with this approach?  

I would do the autoincrement trick and I don't think the same rows would be
updated in more than one of the distributed DBs (so out of sequence updates
should be okay).  In the middle of the night, I would have a cron job sync
all of the distributed DBs to the Master DB and then perform some
maintenance updates on the master, and rsync the master to each distributed
DB.  All DBs may be taken offline during this synchronization time.

I've been thinking, even if I think it is okay to not keep the temporal
order of the updates when applying to the master, that I might need to
modify mysqlbinlog to open all the input binlog files on start up and
replay them in timestamp order (or implement some sort of binlog timestamp
merge so the transactions are stored in chronological order).


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jay Janssen  
View profile  
 More options Sep 18 2012, 7:50 am
From: Jay Janssen <jay.jans...@percona.com>
Date: Tue, 18 Sep 2012 07:49:59 -0400
Local: Tues, Sep 18 2012 7:49 am
Subject: Re: Consolodating into Master DB from multiple Geographically Distributed DBs

Coincidentally, I'm doing a talk on this underlying question at our conference in a few weeks: http://www.percona.com/live/nyc-2012/sessions/hazards-multi-writing-d..., and I spent a good amount of time yesterday thinking about it.

Technically what you are suggesting is possible, but it's impossible to say for sure that it's bullet proof because so much depends on the actual schema, and how the data inside it is used.  

One of the points I'll be making in the talk is that the data we store in and RDBMS tends to be, well, relational.  Writing across multiple masters simultaneously can be done if all the modifications are completely isolated from the other masters until at least replication occurs.    By "isolated", I mean not modified *or* read elsewhere, or if read then an out of date version of the data doesn't matter to associated writes (this is probably a crude definition).  

Because relational data implies relationships and multi-writing depends on isolation (read: the absence of relationships), then whether or not this architecture will "work" depends entirely on where on the continuum your data falls between relational and non-relational.  

That's my take on the subject (whether useful or not, I'm not sure).  

On Sep 17, 2012, at 6:10 PM, KTWalrus <ktwal...@gmail.com> wrote:

Jay Janssen, Senior MySQL Consultant, Percona Inc.
http://about.me/jay.janssen
Percona Live in NYC Oct 1-2nd: http://www.percona.com/live/nyc-2012/

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Zaitsev  
View profile  
 More options Sep 18 2012, 8:27 am
From: Peter Zaitsev <p...@percona.com>
Date: Tue, 18 Sep 2012 08:26:59 -0400
Local: Tues, Sep 18 2012 8:26 am
Subject: Re: Consolodating into Master DB from multiple Geographically Distributed DBs

Jay,

Yep.  I thing the examples could be in the Sharded world different
shards... or  Different Applications, like  Forums going in one direction
while Blog in other.
The other practical issue is recovery. What if replication between nodes
becomes badly broken ?  Single direction replica allows reclone as last
resort activity while bi-directional might be a lot harder to
recover as you need to decide where data is authoritative. It is easiest if
it  can be done per table/schema or something like it.

One myth you might want to dispell is what auto_increment_increment solves
all the problems....  it helps inserts but does nothing for other
potentially conflicting update.

If you look at the general "thery" of such disconnected operationsyou may
think about "eventually consistent" model - after applying all events the
data should be same on all nodes... which often would require what events
from both direction streams can be applied in any order between them with
end result being the same.  Working on different data as stated above is a
simple subset of such requirement.

If you can't get such property when you get into conflict detection and
resolution piece which gets pretty tricky as application itself can't
detect the conflict because replication is asynchronous and you need
detection and also resolution to happen
at different level.  This is the theory of it but there is not much support
for it in conventional MySQL (there is some in cluster)

On Tue, Sep 18, 2012 at 7:49 AM, Jay Janssen <jay.jans...@percona.com>wrote:

--
Peter Zaitsev, CEO, Percona Inc.
Tel: +1 888 401 3401 ext 501   Skype:  peter_zaitsev
24/7 Emergency Line +1 888 401 3401 ext 911

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
KTWalrus  
View profile  
 More options Sep 18 2012, 1:07 pm
From: KTWalrus <ktwal...@gmail.com>
Date: Tue, 18 Sep 2012 10:07:40 -0700 (PDT)
Local: Tues, Sep 18 2012 1:07 pm
Subject: Re: Consolodating into Master DB from multiple Geographically Distributed DBs

My approach of using mysqlbinlog to apply transactions from multiple
distributed DBs asynchronously to a central Master really doesn't involve
using the built-in MySQL replication (other than relying on binlogs for row
updates).

I've been thinking I should look into modifying the Slave IO Thread logic
to accept relay log updates from multiple Masters.  Then, I could configure
my central DB as a true MySQL Master and each of the distributed DBs as
true Slaves.  In addition, I could make the central DB a multi-slave to
each of the distributed DBs using my modifications to the Slave IO Thread
logic.

Basically, the modification to the Slave IO Thread logic would better
simulate what I was thinking of doing with mysqlbinlog except that
transactions would be ordered closer to chronological order.  But, because
I would now be using MySQL replication to apply updates from the central DB
to the distributed DBs (skipping applying the updates that originated from
a distributed DB to itself), I avoid having to rsync at night and the
updates are applied much sooner (minimizing some collisions).

This approach is kind of a mixture of MySQL ring replication and an
asynchronous Galera cluster (where only a single DB receives all updates
and all other DBs get their non-local updates from the central DB).

How hard do you think it would be to modify the Slave IO Thread logic to
read updates from multiple masters?

Has anyone done this before and were there any lessons learned?

I realize that my needs are simpler than implementing this for the general
case.  For one, I don't care if my non-local updates take a while to be
seen in the distributed DBs.  Also, admin updates can be applied to the
central DB for normal MySQL replication so the only updates originating
from the distributed DBs are simple row updates (insert, update, and delete
all by primary key).  And, I don't care if there is a large window where
some user sees "stale" data (that has been updated at a separate location,
but not yet locally applied).


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
KTWalrus  
View profile  
 More options Sep 18 2012, 4:30 pm
From: KTWalrus <ktwal...@gmail.com>
Date: Tue, 18 Sep 2012 13:30:11 -0700 (PDT)
Local: Tues, Sep 18 2012 4:30 pm
Subject: Re: Consolodating into Master DB from multiple Geographically Distributed DBs

Just discovered Tungsten Replicator and, according to this Link<http://datacharmer.blogspot.com/2011/11/replication-multiple-masters-...>,
the Replicator can set up a Star Topology.  For my purposes, I want the
updates to flow from the distributed DBs into the central DB all the time.  
For the updates in the other direction, I can live with them being applied
continuously.  But, I hope that I can delay the updates to the distributed
DBs until off hours (when very few, if any, users are using the distributed
DBs).


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
neila  
View profile  
 More options Sep 19 2012, 3:18 am
From: neila <neil.armita...@gmail.com>
Date: Wed, 19 Sep 2012 00:18:51 -0700 (PDT)
Local: Wed, Sep 19 2012 3:18 am
Subject: Re: Consolodating into Master DB from multiple Geographically Distributed DBs

Tungsten Replicator would help and you could delay replication by just
taking individual replicators offline. The biggest problem as Peter and Jay
have already pointed out is conflict resolution. The replicator does not
help with that it is still up to your application to handle conflicts as
the replicators just take the transactions from server A and apply them to
server B.

Neil


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
KTWalrus  
View profile  
 More options Sep 19 2012, 10:53 am
From: KTWalrus <ktwal...@gmail.com>
Date: Wed, 19 Sep 2012 07:53:16 -0700 (PDT)
Local: Wed, Sep 19 2012 10:53 am
Subject: Re: Consolodating into Master DB from multiple Geographically Distributed DBs

Thanks.  I intend to handle conflicts by treating them as bugs in my
applications that use the database.  I believe there is no reason for
conflicts since my applications are designed to not require updates until
they sync over night and I believe the rows that are updated in a
distributed database are not updated in any other database during the day.  
I believe my data is naturally shard'ed in regards to updates.

I intend to eventually use Galera for in-datacenter clustering and Tungsten
for asynchronous replication between datacenters.  I believe this
architecture will allow "huge" scalability and high availability.  The only
scalability issue I see is when the asynchronous updates overwhelm the
ability to apply them within 24 hours to all distributed databases, but I
can't imagine this ever really happening since slaves will "catch up"
during off hours at night.  Also, this architecture  requires complete
copies of the central DB on all servers which might add to the system
storage costs over time.  Hopefully, I can prune the central DB so it
doesn't grow too fast.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »