Replica set cache efficiency

Matic

unread,

Sep 11, 2011, 4:11:39 PM9/11/11

to mongodb-user

Hey!

Suppose you have 4 nodes in a replica set. The set contains a
collection with 1,000,000 documents. Mongos will load-balance read
requests among all 4 nodes. Because some documents are more frequently
read than others, they will be residing in cache most of the time.
This are "hot documents". Because Mongos balances the reads among all
4 nodes, that means all 4 nodes will have the same hot documents
stored in cache. Is this a correct assumption?

If it is, it is inefficient as the replica set could cache 4 times
(because of 4 nodes, 10 nodes would be 10 times more efficient) more
documents with a simple improvement to Mongos. If Mongos were to
balance reads in such a way that a read request for certain document
would be, under the right conditions, always routed to the same node,
only that node would keep that document in the cache. This would
enable other nodes to store other documents in the cache, dramatically
increasing read performance of the replica set by reading more from
cache and less from storage, because each document is cached in only 1
node.

I think this is a low hanging fruit. What do you think?

Happy day,
Matic

Scott Hernandez

unread,

Sep 11, 2011, 4:18:37 PM9/11/11

to mongod...@googlegroups.com

It is basically random which connections/queries go to which secondaries related to the working set of data. It could spread reads out in a different way. Currently the goal is to do this based on proximity of the client, but haveing an algorithm which tried to use the chunk (shard key) ranges could provide for better cache efficiency across the replicaset (when reading from secondaries).

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>

Max Schireson

unread,

Sep 11, 2011, 9:52:21 PM9/11/11

to mongod...@googlegroups.com

One way to get something like the effect you want is to run 4 shards; you could run multiple mongod's on each box and thus get redundancy as well. Then you would not use slaveok and in a read-heavy application your caches would be reasonably well segmented. Other pros and cons with this approach, just wanted to point it out.

If you can get by with 2 replicas and an arbiter, you'd pick up even more efficiency.

But yes, different mongos behavior could achieve better caching, though perhaps at the expense of an evenly balanced load in some cases.

-- Max

Matic

unread,

Sep 12, 2011, 3:13:59 AM9/12/11

to mongodb-user

Yes, with sharding, cache size scales linearly. But with replica sets
it does not scale at all. And when you use both (which all large
deployments do), RAM is being wasted. In my philosophy, everything
should be as efficient as possible. No wasted resources. I will open a
new JIRA issue.

> On Sep 11, 2011 1:11 PM, "Matic" <matrix.goo...@alpha-force.net> wrote:

Scott Hernandez

unread,

Sep 12, 2011, 10:26:02 AM9/12/11

to mongod...@googlegroups.com

It is not really meant to behave that way with replica sets. It might
be nice if an algorithm could be found to isolate the load per replica
(when reading from non-primary nodes) but replica sets are for high
availability and redundancy, not for increasing the working set as you
have suggested -- that is what sharding is for.

There is no clear way to consistency spread the load across all the
replicas, and if you depend on this behavior then when you lose a
single replica your application performance may drop considerably
and/or horribly; the goal for replica sets is to be reliable and to
provide redundancy, and while this might be true in some ways by doing
something like this it makes it untrue in others.

Sharding is the solution to the problem you have described.

Reply all

Reply to author

Forward