Re: Migration of a large database (~26GB) to a replica set on AWS

78 views

Skip to first unread message

Yong Ouyang

unread,

Sep 12, 2012, 6:19:37 AM9/12/12

to mongod...@googlegroups.com

I will go for 3), but possibly tweak it a bit. Set up S1_AWS and S2_AWS as arbiters, let P1_AWS sync from C. I think the replication will be slow in your case, but acceptable as it won't affect the read/write of C too much. Once P1_AWS becomes Secondary (after the initial sync), step down on C such that P1_AWS becomes Primary. At this point, you can turn S1_AWS into a normal data node instead of an arbiter. Repeat for S2_AWS once S1_AWS becomes Secondary.

On Wednesday, September 12, 2012 1:15:36 AM UTC+8, Andrew Bonventre wrote:

I have 1 mongod instance with 26GB of data in a db that I want to migrate to another set of servers (I'm moving from a hosting provider to AWS).

My topology:
C: The current running mongod instance that holds everything (yeah, I know).

AWS is set up as a three-node replica set:
P1_AWS: Primary AWS mongod instance.
S1_AWS: Secondary AWS mongod instance.
S2_AWS: Secondary AWS mongod instance.

There are a few different ways I could migrate the database over (all with caveats). Some advice on the best method (or a better one) would be great.

1) Run db.copyDatabase on P1_AWS. This seems like a better fit for a much smaller database, since it doesn't show progress and I couldn't find anything within the docs that says anything about recovery if the connection fails. I'm also not clear on the read/write latency increase that this would incur on C.

2) db.fsyncLock() on C --> snapshot --> rsync snapshot to P1_AWS --> load into P1_AWS mongod. This would incur downtime since I would have to stop all writes to C to make sure it's properly mirrored on P1_AWS. I would probably just put a "scheduled maintenance" page up. I also don't know the connectivity speed of C's provider to AWS, so transfer speed would be the main variable in the amount of downtime. If I mirror the same db on P1_AWS, S1_AWS, and S2_AWS, does anyone foresee any complications that would arise in doing so? Should I mirror and then switch on the replica set configuration or does it Just Work with replication turned on? The reason I ask is that if I mirror the db on P1_AWS, then I don't want to incur an additional latency increase as it mirrors on S1_AWS and S2_AWS.

3) Use C as the primary node in the replica set, with P1_AWS, S1_AWS, and S2_AWS being secondaries. As with db.copyDatabase, I'm not clear on the severity of read/write latency increases that this would incur. Network latency is also a concern if it takes a significant amount of time to replicate while also slowing down the read/write ops. Also, is there a proper way to determine progress of the replication?

Thanks and any advice would be greatly appreciated.
Andy

Tyler Brock

unread,

Sep 12, 2012, 2:38:40 PM9/12/12

to mongod...@googlegroups.com

Hey Andrew,

First, I would make sure you are running the latest patch level of mongo (2.0.7 or 2.2.0).

Then, I would do as Yong suggested and simply setup a secondary on AWS that replicates from the primary in the original data center.

This replication will, of course, cause the entire data set to be paged into memory over the course of the sync so it may compete for resources with your working set.

When the initial sync completes simply step down the primary in the old data center.

It might be a good idea to stand up additional secondaries on AWS (which would replicate from the near member) before stepping down the old primary. (easy to do with a snapshot after the initial sync has completed)