Advices for mongoDB cluster configuration

28 views
Skip to first unread message

Thibault Dory

unread,
Mar 8, 2011, 9:39:31 AM3/8/11
to mongodb-user
Hello,

I'm benchmarking various noSQL databases (see www.nosqlbenchmarking.com
for current results and configurations used) for my master's thesis
and I'm going to apply this benchmark on bigger clusters. Indeed for
the moment I have only used a small cluster of 8 servers with a very
small data set (20000 articles from Wikipedia) to conduct those
tests.

I will use up to 100 servers (2Gb, 4 CPU, 80Gb hdd) from the Rackspace
cloud and the new data set is the entire English version of Wikipedia.
Each article is stored as a single document with a unique ID based on
a integer, you can see the implementation here :
https://github.com/toflames/Wikipedia-noSQL-Benchmark/blob/master/src/implementations/mongoDB.java
and the benchmark methodology here : http://www.slideshare.net/ThibaultDory/a-new-methodology-for-large

mongoDB was working quite well on the 8 server cluster and right now
the only thing I plan to change is to keep the default size for the
chunks. I'm sharding on the _id key that is a simple increasing
integer. Remember that I'm doing fully random requests, so using an
increasing integer do not create hot spot.

I would like to know if any of you as advice on how to get the best
performances on this kind of cluster (server and data set) for
mongoDB? For example if I should change settings concerning memory
usage and caching to reflect the servers capacity?

For my tests on 8 servers I have observed that I had better
performances with as many mongos process as there was threads making
requests. Should I keep using a lot of mongos with a bigger cluster?

In advance, thank you for your inputs and critics.


Thibault Dory

Eliot Horowitz

unread,
Mar 8, 2011, 11:57:45 AM3/8/11
to mongod...@googlegroups.com
The most important thing to consider is bulk loading with a sequential
is optimized at this point.
The easy work around is pre-sharding, or hashing your key.
Pre-splitting definitely helps for bulk insertions.

You definitely want a fair number of mongos.
Probably 1 per client machine.

Generally you don't need tune much, hard to tell without more data.

One general question is that you compare map/reduce indexing, but with
mongo you can create a regular index rather than using map/reduce.
Have you tried that vs map/reduce where its the requirement?

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Eliot Horowitz

unread,
Mar 8, 2011, 2:11:37 PM3/8/11
to mongod...@googlegroups.com
> In fact I don't really care about the time needed for the bulk
> loading, the performances I'm testing are random read/updates and
> MapReduce. Except if doing so would give me better elasticity time,
> meaning that the cluster would stabilize faster if add new nodes. But
> I don't think so.

Ok, I would just make sure the balancing is done before beginning load
testing then.

>>
>> One general question is that you compare map/reduce indexing, but with
>> mongo you can create a regular index rather than using map/reduce.
>> Have you tried that vs map/reduce where its the requirement?
>

> Well to be honest I don't really care about this kind of specific
> functionality. Building the index is just a way of computing something
> heavy with MapReduce, and something that I can easily port on another
> noSQL db. The only thing I want is to see how MapReduce works on the
> different noSQL DBs.

Ok. Just as a note, map/reduce in mongo is slow mostly because we've
focused on creating other features such that you don't need map/reduce
for as many things.

Thibault Dory

unread,
Mar 8, 2011, 2:00:07 PM3/8/11
to mongodb-user
On 8 mar, 17:57, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> The most important thing to consider is bulk loading with a sequential
> is optimized at this point.
> The easy work around is pre-sharding, or hashing your key.
> Pre-splitting definitely helps for bulk insertions.

In fact I don't really care about the time needed for the bulk
loading, the performances I'm testing are random read/updates and
MapReduce. Except if doing so would give me better elasticity time,
meaning that the cluster would stabilize faster if add new nodes. But
I don't think so.

>
> You definitely want a fair number of mongos.
> Probably 1 per client machine.

Ok that's what I was planning to do

>
> Generally you don't need tune much, hard to tell without more data.
>
> One general question is that you compare map/reduce indexing, but with
> mongo you can create a regular index rather than using map/reduce.
> Have you tried that vs map/reduce where its the requirement?

Well to be honest I don't really care about this kind of specific
functionality. Building the index is just a way of computing something
heavy with MapReduce, and something that I can easily port on another
noSQL db. The only thing I want is to see how MapReduce works on the
different noSQL DBs.


Thank you for your inputs.

>
>
>
>
>
>
>
> On Tue, Mar 8, 2011 at 9:39 AM, Thibault Dory <dory.thiba...@gmail.com> wrote:
> > Hello,
>
> > I'm benchmarking various noSQL databases (seewww.nosqlbenchmarking.com
> > for current results and configurations used) for my master's thesis
> > and I'm going to apply this benchmark on bigger clusters. Indeed for
> > the moment I have only used a small cluster of 8 servers with a very
> > small data set (20000 articles from Wikipedia) to conduct those
> > tests.
>
> > I will use up to 100 servers (2Gb, 4 CPU, 80Gb hdd) from the Rackspace
> > cloud and the new data set is the entire English version of Wikipedia.
> > Each article is stored as a single document with a unique ID based on
> > a integer, you can see the implementation here :
> >https://github.com/toflames/Wikipedia-noSQL-Benchmark/blob/master/src...
Reply all
Reply to author
Forward
0 new messages