Re: [mongodb-user] How does a replica set exactly works?

Max Schireson

unread,

Jul 31, 2012, 11:28:41 AM7/31/12

to mongod...@googlegroups.com

If you want some of the data on machine a and some on machine b, that is the purpose of sharding.

In replica sets each machine in a replica set has the whole data set.

So you would want:
2 shards, each of which has one primary, one secondary, and one arbiter

The arbiters can be tiny or can be on one of the nodes that is a primary or.secondary for the other shard. Their only purpose is to vote.

You'd need config servers as well. They track which data is on which server and can be very small.

See http://www.mongodb.org/display/DOCS/Sharding+Introduction for more info.

-- Max

On Jul 31, 2012 8:11 AM, "Tim" <tim.a.zi...@googlemail.com> wrote:

Hello I read all the documentation concerning mongoDB/replica sets and so one.

I still have a question. How is a dataset exactly stored? It says, that the sec. nodes copy the oplog in order to execute the operations theirselves.
Does every sec. node copy all the oplog?
For example, a client makes a write operation (lets say 5GB vol.)
are these 5 GB stored on the prim. node and every sec. node? every time?

For example, I have 4 server each with 32GB storage. Actual I want
to store 64GB data and replicate that due to the other 64 GB.
I would suggest to use 1 prim. node, 3 sec. nodes and 1 arbiter (for breaking ties).
Would I have "only" 32 GB storage and the other 96GB storage would only replicate? Or
does the data volume gets split somehow?

Best regards

--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb

Tim

unread,

Jul 31, 2012, 6:43:40 PM7/31/12

to mongod...@googlegroups.com

hi,
thank you for your reply.
I nearly read everything about replica sets, but I wasn't quite sure if I understood that correctly ;)
(e.g. http://docs.mongodb.org/manual/core/replication-internals/#replica-set-implementation)

Now it's very clear to me :)

I think I'll need 3 config server- right?

best regards

Am Dienstag, 31. Juli 2012 17:28:41 UTC+2 schrieb Max Schireson:

If you want some of the data on machine a and some on machine b, that is the purpose of sharding.

In replica sets each machine in a replica set has the whole data set.

So you would want:
2 shards, each of which has one primary, one secondary, and one arbiter

The arbiters can be tiny or can be on one of the nodes that is a primary or.secondary for the other shard. Their only purpose is to vote.

You'd need config servers as well. They track which data is on which server and can be very small.

See http://www.mongodb.org/display/DOCS/Sharding+Introduction for more info.

-- Max

On Jul 31, 2012 8:11 AM, "Tim" <tim.a.zimmermann@googlemail.com> wrote:

Hello I read all the documentation concerning mongoDB/replica sets and so one.

I still have a question. How is a dataset exactly stored? It says, that the sec. nodes copy the oplog in order to execute the operations theirselves.
Does every sec. node copy all the oplog?
For example, a client makes a write operation (lets say 5GB vol.)
are these 5 GB stored on the prim. node and every sec. node? every time?

For example, I have 4 server each with 32GB storage. Actual I want
to store 64GB data and replicate that due to the other 64 GB.
I would suggest to use 1 prim. node, 3 sec. nodes and 1 arbiter (for breaking ties).
Would I have "only" 32 GB storage and the other 96GB storage would only replicate? Or
does the data volume gets split somehow?

Best regards

--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to

mongodb-user+unsubscribe@googlegroups.com

Tim

unread,

Aug 1, 2012, 10:14:44 AM8/1/12

to mongod...@googlegroups.com

Without Sharding it wouldn't be possible to adress one logical server, would it? Due to replica sets there would be only one cluster (one prim. / one sec.) a operation would connect to, wouldn't it?

Or it is possible to send an operation to "one logical server" and an instance (like the config server at sharding) would route that command to one of two replica sets / a read command would read from the two replica sets.

Like using only two replica sets as one logical server but don't split the data for performance and so on. Is that possible or do I definitely need sharding?
How exactly does sharding split the data? due to the sharding key?

markh

unread,

Aug 1, 2012, 11:14:56 AM8/1/12

to mongod...@googlegroups.com

Hi Tim,

There's lots of questions there so I will do my best to answer all of them :)

Yes, for sharding, you require three config servers. The config server data is the most important in your entire cluster and you should back it up. The config servers have a two-phase commit, which is a standard method for performing reliable distributed transactions. You can read more about two-phase commit here and here.

If you want your application to see "one logical server" then you are talking about sharding and a "mongos'. I think there may be a SERVER ticket for this functionality within a replica set but I'm unable to find it at the minute, I will keep looking and update you asap.

The decision to shard is not simple and requires a lot of thought and preparation. If your data set is sufficiently small, you may not need to shard this collection at all - it all depends on your use case. Here is a more in-depth sharding article also that I highly recommend reading. This section on our "docs" website explains how to administer shard clusters and check the balancing.

Regarding the "splitting", MongoDB's sharding is range-based, meaning that every document in a sharded collection must fall within a range of values for a given key. MongoDB uses a shard key to place each document within one of these ranges.

MongoDB uses two key operations to facilitate sharding - split and migrate. Migrate moves a chunk (the data associated with a key range) to another shard. This is done as needed to rebalance. Split splits a chunk into two ranges; this is done to assure no one chunk is unusually large. Split is an inexpensive metadata operation, while migrate is expensive as large amounts of data may be moving server to server.

It is very important to choose the correct shard key (please note that it is immutable) and a poorly chosen shard key will prevent your application from realising the benefits provided by a sharded cluster. Choosing a shard key is discussed here and one of my colleagues (Kristina) has written an excellent blog post on the topic.

Think of a replica set is a cluster of MongoDB servers. The writes will be sent to the primary and you will have one primary with one or more secondaries. This link describes the possible node configurations as well as the arbiter role. You've probably read this but, just in case you haven't, the fundamentals behind replication are discussed at length here.

With regard to "routing reads", that is possible using read preference. In terms of implement the routing of reads to secondaries, you should read this documentation and this also. MongoDB drivers allow client applications to configure a read preference on a per-connection or per-operation basis. MongoDB does not guarantee that secondary members will be consistent with the state of the primary member in a replica set. As a result, setting a read preference that allows reading from secondary members means that you are accepting the possibility of eventually consistent .

Mark

Tim

unread,

Aug 2, 2012, 4:16:50 AM8/2/12

to mongod...@googlegroups.com

Hi Mark,

thanks for your fast reply :)
I read the articles you gave me. I lean towards a replica set now, due to your post and the articles.
I have 4 Servers each with 32 TB storage. 2 of them should store data and the other two should replicate the data.
therefore 2 replica sets with each 1 prim., 1 sec. and 1 arbiter should be a good choice.
Is that amount of data "sufficent small" then? The use case is mostly testing.

By having two replica sets I (/the application) would have to store the data manually in one of the two clusters - right?
Are there any disadvantages by only choosing replica sets and no sharding? I would have to operate with two single databases- right?

Best regards, Tim

markh

unread,

Aug 2, 2012, 5:45:39 AM8/2/12

to mongod...@googlegroups.com

Hi Tim,

A replica set is the recommended implementation with or without sharding for reliability and resilience reasons. Your replica set design (1 primary, 1 secondary and 1 arbiter) seems sound for your use case.

I don't understand why you want two separate replica sets.

If you are only testing, my suggestion would be to use one replica set and given your use case, this would most likely be sufficient.

Sharding is typically done when you need to spread data across multiple machines (i.e. scale out), which may be because the size of your data has grown and there's not enough RAM, sufficient disk space/speed or even CPU cores on your database server or you require greater read/write throughput.

Replica sets and sharding are not mutually exclusive, if you shard, your sharded cluster will be made up of multiple replica sets as indicated here.

If you decide to go with one replica set without sharding then you can always shard at a later date.

Mark

Tim

unread,

Aug 2, 2012, 6:14:26 AM8/2/12

to mongod...@googlegroups.com

Hi Mark,

great, so I'll use a replica set. I thought that 1 replica set consists of 1 prim. node and several sec. nodes (up to 7) (and furhter nodes like a arbiter-node (up to overall 12 nodes)) .

So when 1 Server (with 32 TB) is the prim. node and the other server (with 32 TB) is the sec. node the first replica set is defined. Adding more sec. would "waste" diskspace, because I want to mirror the data only once.
So I need a second replica set again with 1 prim. node etc. in order to use all my 4 servers and have two times 32TB diskspace for the prim. node.

Or did I unterstand something wrong?

markh

unread,

Aug 2, 2012, 7:30:19 AM8/2/12

to mongod...@googlegroups.com

Hi Tim,

A replica set consists of 1 primary and 1 or more secondaries. If there are an even number of nodes (e.g. 2) then an arbiter is adding for voting purposes so a primary can be elected. There can be up to 12 members in a replica set but only a maximum of 7 can vote.

Adding more secondaries doesn't "waste" disk space so-to-speak. Folk add more secondaries to have more resilience or to scale their reads across the whole replica set.

You should follow this tutorial to test setting up your replica set.

From you use case, I don't understand why you need a second replica set but it's your decision based on how your application works,

Mark

Tim

unread,

Aug 2, 2012, 8:15:39 AM8/2/12

to mongod...@googlegroups.com

Hi Mark,

hm.. I read all the documentation concerning replica sets, but I think i didn't get this point.
Is it possible to conclude two servers to one primary node?
Is this one primary node one server or does it consists of multiple server?
I would like to use all my 4 server with each 32TB.

If one node is one server I would have one replica set by using 2 of 4 Server. So I would use two times 32TB (for prim. node and sec. node) but
there are still 2 server left with each 32TB I would like to use.

Do you know what I mean?^^

Best Regards,

markh

unread,

Aug 2, 2012, 8:59:15 AM8/2/12

to mongod...@googlegroups.com

Hi Tim,

No problem.

You have a few options -

1. Two Shards: 2 replica sets of (1 x Primary 1 x Secondary) , which is probably the best compromise between redundancy and capacity. For each replica set, use an arbiter for elections and put the arbiter on a small server elsewhere on the network ( the arbiter does not have any data so anywhere with a network connection to the others would be fine).

2. One replica set: (1 x Primary 1 x Secondary) is the safest option, but has the lowest capacity. You will need an arbiter (used for elections) on a small server elsewhere on the network.

3. 4 x single node: which has the highest capacity but is very, very risky.

Does that help?

Mark

Tim

unread,

Aug 2, 2012, 9:10:04 AM8/2/12

to mongod...@googlegroups.com

Hi Mark,

thank you, this was very helpful,
if I get that right with "option 2" I can only use 2 of my 4 server - right? (low capacity) One server as prim node the other server as sec. node.

markh

unread,

Aug 2, 2012, 10:01:03 AM8/2/12

to mongod...@googlegroups.com

Hi Tim,

Sorry, that was a typo. Option 2 should have said (1 x Primary 3 x Secondary) so you can use 4 of your servers. All options were based on you using all 4 servers,

Mark

Tim

unread,

Aug 2, 2012, 12:50:56 PM8/2/12

to mongod...@googlegroups.com

Hi Mark,

so with option 2 I would have "~32 TB for storing data and ~96TB for replicating it".
In Order to have "64TB / 64 TB" (in a recommended way) you would propose option 1 - right?

Best regards, Tim

markh

unread,

Aug 2, 2012, 1:11:39 PM8/2/12

to mongod...@googlegroups.com

H Tim,

Option 2 means that your primary has 32TB for data storage and each secondary has 32 TB. With replication, the data on the primary is on each of the secondaries. Therefore, your data has a maximum of 32TB.

Option 1 means that you can have a total of 64TB for data storage, 32TB on each shard with each shard being a replica set.

In turn, each replica set will have a primary node, a secondary node and an arbiter. 32TB of data will reside on the primary, it will be replicated to the secondary and remember that no data lives on the arbiter.

Thanks

Mark

Tim

unread,

Aug 2, 2012, 1:37:50 PM8/2/12

to mongod...@googlegroups.com

Thank you :)
Now everything is clear to me :)
thanks a lot :)

Дмитрий Максименко

unread,

Aug 3, 2012, 5:09:52 AM8/3/12

to mongod...@googlegroups.com

Please tell me what will happen if secondary server will go down during read operation? Will mongos switch automatically to another secondary node (suppose it exists) and transparently to user app?

четверг, 2 августа 2012 г., 21:11:39 UTC+4 пользователь markh написал:

markh

unread,

Aug 6, 2012, 8:43:41 AM8/6/12

to mongod...@googlegroups.com

Hi,

This read operation will fail and you will most likely receive a connection exception error.

Such exceptions should be handled in your code and I'd recommend some testing if you have any doubts about how your application will handle this error.

In future, it is better to start a new thread for such questions. This thread was about something different.