Sharding vs replication from Java app point of view

70 views
Skip to first unread message

Rakesh

unread,
Aug 28, 2012, 10:20:33 AM8/28/12
to mongodb-user
Hi,

according to our mongodb expert, if we use sharding, then as far as the Java application is concerned, it see's one address to communicate with.

We don't need sharding and have set up 3 replication instances. This means that the java app has been configured to know about the 3 urls to each server.

Is that right? Why can't i just see one instance like with sharding? That means if I want to add another instance I have to stop, update and start the java apps.

Rakesh

Oz

unread,
Aug 28, 2012, 3:14:27 PM8/28/12
to mongod...@googlegroups.com

When sharding, data is split across different instances. Knowing how that data is split and how to retrieve it, involves using a routing table for the data. This is where the one address you are communicating with when sharding comes into play, the 'mongos'. The 'mongos' is essentially a router to tell you which shards the data you're looking for are located in and where the writes should go. More information on the mongos can be found here:http://www.mongodb.org/display/DOCS/Configuring+Sharding#ConfiguringSharding-%7B%7Bmongos%7D%7DRouter

When running on a replica set all of the data is duplicated, so this router is not needed since any given node can be a primary and contains all the data. Once you create a replica set, all of the instances are aware of each other via the replica set config you created/added them to. In practice, your application really only needs to know about 1 address, and it can auto-discover the other nodes in the replica set through the replica set cfg once connected to that one initial address.

Under the hood, the application will create a replica set connection to this one address which keeps track of the state of the primary and secondaries, their addresses, automatically fails over to new primaries, and auto-discovers information for new nodes added to the replica set. We generally advise against using only 1 node address in your application configuration, since if that 1 node happens to fail... well you won't be able to connect to your replica set if your application restarts because it won't know any other addresses ( Even if the majority or all of the other nodes are up.)

In general if you have 3 nodes preset in the application configuration, and are fairly confident that all 3 will not fail at the same time, then you can add replica sets without adding them to the app config and the app will auto-discover them on startup or as they are added. One way to minimize the amount of servers added to the application cfg but to be safe about connection availability would be to make the nodes you add to the APP cfg be in separate data centers. This will increase the chances that they will not go down simultaniously.

Long story short, it is entirely possible to have the max number of nodes in a replica set, but only 3 nodes in your application cfg. The others will be auto-discovered on startup or as they are added to the set. More information can be found athttp://www.mongodb.org/display/DOCS/Replica+Set+Tutorial#ReplicaSetTutorial-Changingthereplicasetconfiguration under the drivers section. 

It should be noted however, that you would probably want to keep the full list of nodes somewhat up-to-date application side. In case of failures or new additions. But you shouldn't lose any functionality if you only do this periodically or on new application releases.

Reply all
Reply to author
Forward
0 new messages