Mongodb replica set across AWS regions or subnets or data centers

568 views
Skip to first unread message

Archanaa Panda

unread,
Apr 28, 2015, 4:10:47 PM4/28/15
to mongod...@googlegroups.com
Hi,

I seem to have run into a blocking problem. I want a certain collection which has to be replicated across multiple AWS regions/subnets/data centers (eg a global lookup table) i.e. all data replicated across all regions and it can be updated by application layer of any region. In order to do that do I need to provide public IP address and therefore put all my mongodb EC2 nodes in public network for all the replicas so that replication can be done across the regions so that in case of writes from application, a master node in a different region / subnet can be accessed by my application layer? 

Thanks and Regards,
Archanaa Panda

s.molinari

unread,
Apr 29, 2015, 3:42:19 AM4/29/15
to mongod...@googlegroups.com
I believe this won't work, at least not the way you are describing it with "it can be updated by application layer of any region" or with "a master node in a different region". With MongoDB replication there is always only one single primary (the master) in any type of replica set configuration, therefore the primary will always have to be in one (network) region. You can however, have replica set nodes (the secondaries) with copies of your data spread all over the world and read from them with the caveat that the data may be stale or not yet up-to-date, which you'll need to account for in your application. 

A feature you might be able to use is tag aware sharding, where you can basically have data saved in certain shards (locations) and have your clients also in those same locations. However, I am uncertain this is would be an efficient use case of tag aware sharding. The idea of sharding is to distribute data over multiple replica sets in order to increase compute, storage and I/O / network resources. It isn't really made to offer an (efficient) world-wide database, because you'll need to have centrally located config servers in one region, which means any latency of communicating across to any "long distance" replica sets is still there. (and why you want the multi-master system in the first place, right?) So, if efficiency/ performance, isn't totally a major concern, tag aware sharding could be a possible solution.

Scott

Archanaa Panda

unread,
Apr 29, 2015, 4:45:29 AM4/29/15
to mongod...@googlegroups.com


Hi, Thanks for your reply. Say that I am not bothered too much about latency across regions and want data to be replicated. I am more concerned whether I would have to allocate a static public IP address for each mongod process on each node - since if the primary fails in one region and the secondary of another region becomes the primary, then applications from some other region should be able to access the primary. 

In other words, read local, write global as shown in the below diagrams.


s.molinari

unread,
Apr 29, 2015, 5:52:16 AM4/29/15
to mongod...@googlegroups.com
I am no expert, but the first image with the primary and distributed secondaries would be a normal replica set and you would set the replicas not in the same DC as the primary to non-voting members, meaning they would never have the ability to become the primary. You would also have secondaries with your primary in the same DC. They would then be voting members and only they can become a primary. 

Your second image can only work with the tag aware sharding I mentioned, as MongoDB doesn't have multi-master replication. (I am not sure if it is a proper solution either).

Maybe someone smarter can chime in?

Scott

Archanaa Panda

unread,
Apr 29, 2015, 8:10:36 AM4/29/15
to mongod...@googlegroups.com
Hi,

Just clarifying my own question and understanding further, even if I do enable tag aware sharding or keep replicas in a different DC as permanently hidden or non-voting members, it is the networking access aspect that bothers me... 
1. For diagram 1, if my applications are in Asia or UK, they would still want to be able to write to the primary - which is in US east coast. So the primary will have to be accessible via a public network access (write global). If tunnelling / port forwarding can ensure that the writes get to the primary, I would still want to know what I will need to give as the replica set member names to form the replica.
2. Same for diagram 2, I understand that it can be achieved via tag aware sharding. However, how will I - 
a) make a replica set by listing all the members - even those which are in a remote DC.
b) How will my mongos and mongo config servers which have to be in each DC, need to know and access the replica members of the shard. Best practice is to have mongos running on all application server machines or at least locally in that geography.
c) If a customer travels from US to UK and his primary shard replica were in US, he won't be able to write any data to US - eg he can't change his credentials, he can't make reservations for places which are in UK or back home in US when he is in a different DC etc.

The only straightforward way I can think of is that I have to give public EIP addresses to each and every node in every shard and replica so that mongos or application tier will be able to route the request to the appropriate location. Additionally, public EIP addresses to each config servers as well.
If I grow to have 100s or 1000s of mongo db instances then it doesn't seem quite right to have as many number of EIPs.
The other straightforward way is that I put everything i.e. all mongod, mongos and application servers in one DC / geography and into 1 large VPC. But then again I am not really taking advantage of multiple DCs and high availability - and how will I decide whether requests from mobile users from Singapore should go to US-east-1a or US-east-1b? Where will I draw the boundary of what is a continent/geographical region and what is a country?

Not quite sure how large installations like Foursquare solve this problem which have grown beyond the boundaries of a single VPC and single region.




On Wednesday, April 29, 2015 at 1:40:47 AM UTC+5:30, Archanaa Panda wrote:

Asya Kamsky

unread,
Apr 29, 2015, 8:32:13 PM4/29/15
to mongodb-user
Any node in your cluster that a client may need to "talk" to must have a reachable IP address.

What's the issue with having thousands of addresses if you have thousands of nodes?   Aren't they all supposed to have different names/addresses anyway?

Asya


--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/2d81c060-49cf-438f-9f08-298f641c2650%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
MongoDB World is back! June 1-2 in NYC. Use code ASYA for 25% off!

Archanaa Panda

unread,
Apr 30, 2015, 2:46:51 AM4/30/15
to mongod...@googlegroups.com
Yes they are, but public IP addresses on AWS keep changing on restart of a machine. I believe to form replica sets, I need static IP addresses so that they don't change whenever I restart and the individual members rejoin the cluster if the machine does restart.
So I think a couple of things I will need to try out is - 
a) Check VPC peering and NAT port forwarding - if it can work with multiple machines on either side - and also if it allows access/integration from our MMS account for leveraging monitoring and backup etc.
b) If all else fails, check what it costs the business to apply for as many number of EIPs as there are individual nodes as well as config servers. There is currently a limit of only 5 per account.
c) Keep everything in one VPC and region.

I wanted to know the best practice that is followed for large installations of MongoDB on AWS but there doesn't seem to be enough information available, so we will have to try by ourselves.

Archanaa Panda

unread,
Apr 30, 2015, 2:58:16 AM4/30/15
to mongod...@googlegroups.com
Also because public static IP addresses will not be the best practice as per security also...
Reply all
Reply to author
Forward
0 new messages