Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Migration of a large database (~26GB) to a replica set on AWS
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  4 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Andrew Bonventre  
View profile  
 More options Sep 11 2012, 1:15 pm
From: Andrew Bonventre <andyb...@gmail.com>
Date: Tue, 11 Sep 2012 10:15:36 -0700 (PDT)
Local: Tues, Sep 11 2012 1:15 pm
Subject: Migration of a large database (~26GB) to a replica set on AWS

I have 1 mongod instance with 26GB of data in a db that I want to migrate
to another set of servers (I'm moving from a hosting provider to AWS).

My topology:
C: The current running mongod instance that holds everything (yeah, I know).

AWS is set up as a three-node replica set:
P1_AWS: Primary AWS mongod instance.
S1_AWS: Secondary AWS mongod instance.
S2_AWS: Secondary AWS mongod instance.

There are a few different ways I could migrate the database over (all with
caveats). Some advice on the best method (or a better one) would be great.

1) Run db.copyDatabase on P1_AWS. This seems like a better fit for a much
smaller database, since it doesn't show progress and I couldn't find
anything within the docs that says anything about recovery if the
connection fails. I'm also not clear on the read/write latency increase
that this would incur on C.

2) db.fsyncLock() on C --> snapshot --> rsync snapshot to P1_AWS --> load
into P1_AWS mongod. This would incur downtime since I would have to stop
all writes to C to make sure it's properly mirrored on P1_AWS. I would
probably just put a "scheduled maintenance" page up. I also don't know the
connectivity speed of C's provider to AWS, so transfer speed would be the
main variable in the amount of downtime. If I mirror the same db on P1_AWS,
S1_AWS, and S2_AWS, does anyone foresee any complications that would arise
in doing so? Should I mirror and then switch on the replica set
configuration or does it Just Work with replication turned on? The reason I
ask is that if I mirror the db on P1_AWS, then I don't want to incur an
additional latency increase as it mirrors on S1_AWS and S2_AWS.

3) Use C as the primary node in the replica set, with P1_AWS, S1_AWS, and
S2_AWS being secondaries. As with db.copyDatabase, I'm not clear on the
severity of read/write latency increases that this would incur. Network
latency is also a concern if it takes a significant amount of time to
replicate while also slowing down the read/write ops. Also, is there a
proper way to determine progress of the replication?

Thanks and any advice would be greatly appreciated.
Andy


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Yong Ouyang  
View profile  
 More options Sep 12 2012, 6:19 am
From: Yong Ouyang <yong.ouy...@gmail.com>
Date: Wed, 12 Sep 2012 03:19:37 -0700 (PDT)
Local: Wed, Sep 12 2012 6:19 am
Subject: Re: Migration of a large database (~26GB) to a replica set on AWS

I will go for 3), but possibly tweak it a bit. Set up S1_AWS and S2_AWS as
arbiters, let P1_AWS sync from C. I think the replication will be slow in
your case, but acceptable as it won't affect the read/write of C too much.
Once P1_AWS becomes Secondary (after the initial sync), step down on C such
that P1_AWS becomes Primary. At this point, you can turn S1_AWS into a
normal data node instead of an arbiter. Repeat for S2_AWS once S1_AWS
becomes Secondary.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tyler Brock  
View profile  
 More options Sep 12 2012, 2:38 pm
From: Tyler Brock <ty...@10gen.com>
Date: Wed, 12 Sep 2012 11:38:40 -0700 (PDT)
Local: Wed, Sep 12 2012 2:38 pm
Subject: Re: Migration of a large database (~26GB) to a replica set on AWS

Hey Andrew,

First, I would make sure you are running the latest patch level of mongo
(2.0.7 or 2.2.0).

Then, I would do as Yong suggested and simply setup a secondary on AWS that
replicates from the primary in the original data center.

This replication will, of course, cause the entire data set to be paged
into memory over the course of the sync so it may compete for resources
with your working set.

When the initial sync completes simply step down the primary in the old
data center.

It might be a good idea to stand up additional secondaries on AWS (which
would replicate from the near member) before stepping down the old primary.
(easy to do with a snapshot after the initial sync has completed)

-Tyler


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andrew Bonventre  
View profile  
 More options Sep 12 2012, 10:53 pm
From: Andrew Bonventre <andyb...@gmail.com>
Date: Wed, 12 Sep 2012 19:53:55 -0700 (PDT)
Local: Wed, Sep 12 2012 10:53 pm
Subject: Re: Migration of a large database (~26GB) to a replica set on AWS

So this actually turned out to be quite painless.

I configured a replica set to use C as the primary, with the three AWS
servers as secondaries. It took a few hours to sync, but without too much
issue in regard to performance. I ran into this harmless bug that confused
me a bit: https://jira.mongodb.org/browse/SERVER-6911 but it was otherwise
an easy switch. Once the replicas were synced, I shut down the app, then
mongod on C, then once a new primary was elected I removed C from the
replica config.

I was running v2.2.0

Thanks, Tyler and Yong for your advice.

Andy


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions Older topic »