I will go for 3), but possibly tweak it a bit. Set up S1_AWS and S2_AWS as arbiters, let P1_AWS sync from C. I think the replication will be slow in your case, but acceptable as it won't affect the read/write of C too much. Once P1_AWS becomes Secondary (after the initial sync), step down on C such that P1_AWS becomes Primary. At this point, you can turn S1_AWS into a normal data node instead of an arbiter. Repeat for S2_AWS once S1_AWS becomes Secondary.
I have 1 mongod instance with 26GB of data in a db that I want to migrate to another set of servers (I'm moving from a hosting provider to AWS).
My topology:
C: The current running mongod instance that holds everything (yeah, I know).
AWS is set up as a three-node replica set:
P1_AWS: Primary AWS mongod instance.
S1_AWS: Secondary AWS mongod instance.
S2_AWS: Secondary AWS mongod instance.
There are a few different ways I could migrate the database over (all with caveats). Some advice on the best method (or a better one) would be great.
1) Run db.copyDatabase on P1_AWS. This seems like a better fit for a much smaller database, since it doesn't show progress and I couldn't find anything within the docs that says anything about recovery if the connection fails. I'm also not clear on the read/write latency increase that this would incur on C.
2) db.fsyncLock() on C --> snapshot --> rsync snapshot to P1_AWS --> load into P1_AWS mongod. This would incur downtime since I would have to stop all writes to C to make sure it's properly mirrored on P1_AWS. I would probably just put a "scheduled maintenance" page up. I also don't know the connectivity speed of C's provider to AWS, so transfer speed would be the main variable in the amount of downtime. If I mirror the same db on P1_AWS, S1_AWS, and S2_AWS, does anyone foresee any complications that would arise in doing so? Should I mirror and then switch on the replica set configuration or does it Just Work with replication turned on? The reason I ask is that if I mirror the db on P1_AWS, then I don't want to incur an additional latency increase as it mirrors on S1_AWS and S2_AWS.
3) Use C as the primary node in the replica set, with P1_AWS, S1_AWS, and S2_AWS being secondaries. As with db.copyDatabase, I'm not clear on the severity of read/write latency increases that this would incur. Network latency is also a concern if it takes a significant amount of time to replicate while also slowing down the read/write ops. Also, is there a proper way to determine progress of the replication?
Thanks and any advice would be greatly appreciated.
Andy