I have 1 mongod instance with 26GB of data in a db that I want to migrate to another set of servers (I'm moving from a hosting provider to AWS).
My topology: C: The current running mongod instance that holds everything (yeah, I know).
AWS is set up as a three-node replica set: P1_AWS: Primary AWS mongod instance. S1_AWS: Secondary AWS mongod instance. S2_AWS: Secondary AWS mongod instance.
There are a few different ways I could migrate the database over (all with caveats). Some advice on the best method (or a better one) would be great.
1) Run db.copyDatabase on P1_AWS. This seems like a better fit for a much smaller database, since it doesn't show progress and I couldn't find anything within the docs that says anything about recovery if the connection fails. I'm also not clear on the read/write latency increase that this would incur on C.
2) db.fsyncLock() on C --> snapshot --> rsync snapshot to P1_AWS --> load into P1_AWS mongod. This would incur downtime since I would have to stop all writes to C to make sure it's properly mirrored on P1_AWS. I would probably just put a "scheduled maintenance" page up. I also don't know the connectivity speed of C's provider to AWS, so transfer speed would be the main variable in the amount of downtime. If I mirror the same db on P1_AWS, S1_AWS, and S2_AWS, does anyone foresee any complications that would arise in doing so? Should I mirror and then switch on the replica set configuration or does it Just Work with replication turned on? The reason I ask is that if I mirror the db on P1_AWS, then I don't want to incur an additional latency increase as it mirrors on S1_AWS and S2_AWS.
3) Use C as the primary node in the replica set, with P1_AWS, S1_AWS, and S2_AWS being secondaries. As with db.copyDatabase, I'm not clear on the severity of read/write latency increases that this would incur. Network latency is also a concern if it takes a significant amount of time to replicate while also slowing down the read/write ops. Also, is there a proper way to determine progress of the replication?
Thanks and any advice would be greatly appreciated. Andy
I will go for 3), but possibly tweak it a bit. Set up S1_AWS and S2_AWS as arbiters, let P1_AWS sync from C. I think the replication will be slow in your case, but acceptable as it won't affect the read/write of C too much. Once P1_AWS becomes Secondary (after the initial sync), step down on C such that P1_AWS becomes Primary. At this point, you can turn S1_AWS into a normal data node instead of an arbiter. Repeat for S2_AWS once S1_AWS becomes Secondary.
On Wednesday, September 12, 2012 1:15:36 AM UTC+8, Andrew Bonventre wrote:
> I have 1 mongod instance with 26GB of data in a db that I want to migrate > to another set of servers (I'm moving from a hosting provider to AWS).
> My topology: > C: The current running mongod instance that holds everything (yeah, I > know).
> AWS is set up as a three-node replica set: > P1_AWS: Primary AWS mongod instance. > S1_AWS: Secondary AWS mongod instance. > S2_AWS: Secondary AWS mongod instance.
> There are a few different ways I could migrate the database over (all with > caveats). Some advice on the best method (or a better one) would be great.
> 1) Run db.copyDatabase on P1_AWS. This seems like a better fit for a much > smaller database, since it doesn't show progress and I couldn't find > anything within the docs that says anything about recovery if the > connection fails. I'm also not clear on the read/write latency increase > that this would incur on C.
> 2) db.fsyncLock() on C --> snapshot --> rsync snapshot to P1_AWS --> load > into P1_AWS mongod. This would incur downtime since I would have to stop > all writes to C to make sure it's properly mirrored on P1_AWS. I would > probably just put a "scheduled maintenance" page up. I also don't know the > connectivity speed of C's provider to AWS, so transfer speed would be the > main variable in the amount of downtime. If I mirror the same db on P1_AWS, > S1_AWS, and S2_AWS, does anyone foresee any complications that would arise > in doing so? Should I mirror and then switch on the replica set > configuration or does it Just Work with replication turned on? The reason I > ask is that if I mirror the db on P1_AWS, then I don't want to incur an > additional latency increase as it mirrors on S1_AWS and S2_AWS.
> 3) Use C as the primary node in the replica set, with P1_AWS, S1_AWS, and > S2_AWS being secondaries. As with db.copyDatabase, I'm not clear on the > severity of read/write latency increases that this would incur. Network > latency is also a concern if it takes a significant amount of time to > replicate while also slowing down the read/write ops. Also, is there a > proper way to determine progress of the replication?
> Thanks and any advice would be greatly appreciated. > Andy
First, I would make sure you are running the latest patch level of mongo (2.0.7 or 2.2.0).
Then, I would do as Yong suggested and simply setup a secondary on AWS that replicates from the primary in the original data center.
This replication will, of course, cause the entire data set to be paged into memory over the course of the sync so it may compete for resources with your working set.
When the initial sync completes simply step down the primary in the old data center.
It might be a good idea to stand up additional secondaries on AWS (which would replicate from the near member) before stepping down the old primary. (easy to do with a snapshot after the initial sync has completed)
On Wednesday, September 12, 2012 6:19:38 AM UTC-4, Yong Ouyang wrote:
> I will go for 3), but possibly tweak it a bit. Set up S1_AWS and S2_AWS as > arbiters, let P1_AWS sync from C. I think the replication will be slow in > your case, but acceptable as it won't affect the read/write of C too much. > Once P1_AWS becomes Secondary (after the initial sync), step down on C such > that P1_AWS becomes Primary. At this point, you can turn S1_AWS into a > normal data node instead of an arbiter. Repeat for S2_AWS once S1_AWS > becomes Secondary.
> On Wednesday, September 12, 2012 1:15:36 AM UTC+8, Andrew Bonventre wrote:
>> I have 1 mongod instance with 26GB of data in a db that I want to migrate >> to another set of servers (I'm moving from a hosting provider to AWS).
>> My topology: >> C: The current running mongod instance that holds everything (yeah, I >> know).
>> AWS is set up as a three-node replica set: >> P1_AWS: Primary AWS mongod instance. >> S1_AWS: Secondary AWS mongod instance. >> S2_AWS: Secondary AWS mongod instance.
>> There are a few different ways I could migrate the database over (all >> with caveats). Some advice on the best method (or a better one) would be >> great.
>> 1) Run db.copyDatabase on P1_AWS. This seems like a better fit for a much >> smaller database, since it doesn't show progress and I couldn't find >> anything within the docs that says anything about recovery if the >> connection fails. I'm also not clear on the read/write latency increase >> that this would incur on C.
>> 2) db.fsyncLock() on C --> snapshot --> rsync snapshot to P1_AWS --> load >> into P1_AWS mongod. This would incur downtime since I would have to stop >> all writes to C to make sure it's properly mirrored on P1_AWS. I would >> probably just put a "scheduled maintenance" page up. I also don't know the >> connectivity speed of C's provider to AWS, so transfer speed would be the >> main variable in the amount of downtime. If I mirror the same db on P1_AWS, >> S1_AWS, and S2_AWS, does anyone foresee any complications that would arise >> in doing so? Should I mirror and then switch on the replica set >> configuration or does it Just Work with replication turned on? The reason I >> ask is that if I mirror the db on P1_AWS, then I don't want to incur an >> additional latency increase as it mirrors on S1_AWS and S2_AWS.
>> 3) Use C as the primary node in the replica set, with P1_AWS, S1_AWS, and >> S2_AWS being secondaries. As with db.copyDatabase, I'm not clear on the >> severity of read/write latency increases that this would incur. Network >> latency is also a concern if it takes a significant amount of time to >> replicate while also slowing down the read/write ops. Also, is there a >> proper way to determine progress of the replication?
>> Thanks and any advice would be greatly appreciated. >> Andy
I configured a replica set to use C as the primary, with the three AWS servers as secondaries. It took a few hours to sync, but without too much issue in regard to performance. I ran into this harmless bug that confused me a bit: https://jira.mongodb.org/browse/SERVER-6911 but it was otherwise an easy switch. Once the replicas were synced, I shut down the app, then mongod on C, then once a new primary was elected I removed C from the replica config.
On Wednesday, September 12, 2012 2:38:40 PM UTC-4, Tyler Brock wrote:
> Hey Andrew,
> First, I would make sure you are running the latest patch level of mongo > (2.0.7 or 2.2.0).
> Then, I would do as Yong suggested and simply setup a secondary on AWS > that replicates from the primary in the original data center.
> This replication will, of course, cause the entire data set to be paged > into memory over the course of the sync so it may compete for resources > with your working set.
> When the initial sync completes simply step down the primary in the old > data center.
> It might be a good idea to stand up additional secondaries on AWS (which > would replicate from the near member) before stepping down the old primary. > (easy to do with a snapshot after the initial sync has completed)
> -Tyler
> On Wednesday, September 12, 2012 6:19:38 AM UTC-4, Yong Ouyang wrote:
>> I will go for 3), but possibly tweak it a bit. Set up S1_AWS and S2_AWS >> as arbiters, let P1_AWS sync from C. I think the replication will be slow >> in your case, but acceptable as it won't affect the read/write of C too >> much. Once P1_AWS becomes Secondary (after the initial sync), step down on >> C such that P1_AWS becomes Primary. At this point, you can turn S1_AWS into >> a normal data node instead of an arbiter. Repeat for S2_AWS once S1_AWS >> becomes Secondary.
>> On Wednesday, September 12, 2012 1:15:36 AM UTC+8, Andrew Bonventre wrote:
>>> I have 1 mongod instance with 26GB of data in a db that I want to >>> migrate to another set of servers (I'm moving from a hosting provider to >>> AWS).
>>> My topology: >>> C: The current running mongod instance that holds everything (yeah, I >>> know).
>>> AWS is set up as a three-node replica set: >>> P1_AWS: Primary AWS mongod instance. >>> S1_AWS: Secondary AWS mongod instance. >>> S2_AWS: Secondary AWS mongod instance.
>>> There are a few different ways I could migrate the database over (all >>> with caveats). Some advice on the best method (or a better one) would be >>> great.
>>> 1) Run db.copyDatabase on P1_AWS. This seems like a better fit for a >>> much smaller database, since it doesn't show progress and I couldn't find >>> anything within the docs that says anything about recovery if the >>> connection fails. I'm also not clear on the read/write latency increase >>> that this would incur on C.
>>> 2) db.fsyncLock() on C --> snapshot --> rsync snapshot to P1_AWS --> >>> load into P1_AWS mongod. This would incur downtime since I would have to >>> stop all writes to C to make sure it's properly mirrored on P1_AWS. I would >>> probably just put a "scheduled maintenance" page up. I also don't know the >>> connectivity speed of C's provider to AWS, so transfer speed would be the >>> main variable in the amount of downtime. If I mirror the same db on P1_AWS, >>> S1_AWS, and S2_AWS, does anyone foresee any complications that would arise >>> in doing so? Should I mirror and then switch on the replica set >>> configuration or does it Just Work with replication turned on? The reason I >>> ask is that if I mirror the db on P1_AWS, then I don't want to incur an >>> additional latency increase as it mirrors on S1_AWS and S2_AWS.
>>> 3) Use C as the primary node in the replica set, with P1_AWS, S1_AWS, >>> and S2_AWS being secondaries. As with db.copyDatabase, I'm not clear on the >>> severity of read/write latency increases that this would incur. Network >>> latency is also a concern if it takes a significant amount of time to >>> replicate while also slowing down the read/write ops. Also, is there a >>> proper way to determine progress of the replication?
>>> Thanks and any advice would be greatly appreciated. >>> Andy