cluster with two shards - 5k/s ?

29 views
Skip to first unread message

oferfort

unread,
Sep 3, 2010, 5:54:56 AM9/3/10
to mongodb-user
that seems to be the maximum speed i'm able to insert to my cluster,
but i read that in benchmarks you've gotten to 20k/s.
My documents are rather small, id and another one or two fields, one
is around 10 words, the other is around 500 words.
i'm using 16GB RAM with mongo 1.6.2

My shard key is the _id, which is an int (sequential numbers), and now
i read (in http://groups.google.com/group/mongodb-user/msg/1ce2b92dae5fed4f)
that "for high write throughput you should use a non-sequential shard
key."

can you elaborate on that?
I'm using it as a key/value, and the id is the only unique field i
have, if that's not good for sharding, what should i do?

thanks
ofer

Dwight Merriman

unread,
Sep 3, 2010, 6:14:22 AM9/3/10
to mongod...@googlegroups.com
_id for shard key is ok

(1) are you calling getlasterror()? if so you need a large thread
pool for the client/server turnaround times
(2) try your test on a single mongod and see if it is faster or not?
if not, the client side could be the bottleneck
(3) what is the client side of your test?
(4) what is the % cpu for each process : your client, each mongos, each mongod.

tx

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Ofer Fort

unread,
Sep 3, 2010, 7:27:28 AM9/3/10
to mongod...@googlegroups.com
1. i don't call getlasterror()
2. on a single mongod it is about the same
3. client is a java single threaded process that just queries the data from my db and put's it into the mongo (the time measurement is only on the mongo insert
 time)
4. resources usage is rather low, the highest one is the client (10%-20%), mongos (8%-15%) and mongod (5%-10%)  and config server(2%-5%)

maybe the bottleneck is in my client,i'll add more clients and if they are the bottleneck, and i can insert 5k/s from each, than i shouldn't have any problem

thanks

Dwight Merriman

unread,
Sep 3, 2010, 9:00:26 AM9/3/10
to mongod...@googlegroups.com
it shouldn't be that slow to a single mongod. i would try to make it
fast to a single mongod first. i cam guessing it is something client
related.

suggest you make a trivial test that is inserts to mongod only (no
other parts) and get that fast then add other things back.

Ofer Fort

unread,
Sep 3, 2010, 10:36:01 AM9/3/10
to mongod...@googlegroups.com
Ok, thanks , will do

Ankur

unread,
Sep 4, 2010, 2:16:23 PM9/4/10
to mongodb-user
Did you try a bulk insert?

On Sep 3, 10:36 am, Ofer Fort <ofer...@gmail.com> wrote:
> Ok, thanks , will do
>
>
>
> On Fri, Sep 3, 2010 at 4:00 PM, Dwight Merriman <dwi...@10gen.com> wrote:
> > it shouldn't be that slow to a single mongod.  i would try to make it
> > fast to a single mongod first.  i cam guessing it is something client
> > related.
>
> > suggest you make a trivial test that is inserts to mongod only (no
> > other parts) and get that fast then add other things back.
>
> > >> On Fri, Sep 3, 2010 at 11:54 AM, oferfort <ofer...@gmail.com> wrote:
> > >> > that seems to be the maximum speed i'm able to insert to my cluster,
> > >> > but i read that in benchmarks you've gotten to 20k/s.
> > >> > My documents are rather small, id and another one or two fields, one
> > >> > is around 10 words, the other is around 500 words.
> > >> > i'm using 16GB RAM with mongo 1.6.2
>
> > >> > My shard key is the _id, which is an int (sequential numbers), and now
> > >> > i read (in
> > >> >http://groups.google.com/group/mongodb-user/msg/1ce2b92dae5fed4f)
> > >> > that "for high write throughput you should use a non-sequential shard
> > >> > key."
>
> > >> > can you elaborate on that?
> > >> > I'm using it as a key/value, and the id is the only unique field i
> > >> > have, if that's not good for sharding, what should i do?
>
> > >> > thanks
> > >> > ofer
>
> > >> > --
> > >> > You received this message because you are subscribed to the Google
> > >> > Groups "mongodb-user" group.
> > >> > To post to this group, send email to mongod...@googlegroups.com.
> > >> > To unsubscribe from this group, send email to
> > >> > mongodb-user...@googlegroups.com<mongodb-user%2Bunsubscribe@google groups.com>
> > .
> > >> > For more options, visit this group at
> > >> >http://groups.google.com/group/mongodb-user?hl=en.
>
> > >> --
> > >> You received this message because you are subscribed to the Google
> > Groups
> > >> "mongodb-user" group.
> > >> To post to this group, send email to mongod...@googlegroups.com.
> > >> To unsubscribe from this group, send email to
> > >> mongodb-user...@googlegroups.com<mongodb-user%2Bunsubscribe@google groups.com>
> > .
> > >> For more options, visit this group at
> > >>http://groups.google.com/group/mongodb-user?hl=en.
>
> > > --
> > > You received this message because you are subscribed to the Google Groups
> > > "mongodb-user" group.
> > > To post to this group, send email to mongod...@googlegroups.com.
> > > To unsubscribe from this group, send email to
> > > mongodb-user...@googlegroups.com<mongodb-user%2Bunsubscribe@google groups.com>
> > .
> > > For more options, visit this group at
> > >http://groups.google.com/group/mongodb-user?hl=en.
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "mongodb-user" group.
> > To post to this group, send email to mongod...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > mongodb-user...@googlegroups.com<mongodb-user%2Bunsubscribe@google groups.com>
> > .

Ofer Fort

unread,
Sep 5, 2010, 7:10:38 AM9/5/10
to mongod...@googlegroups.com
how do i do bulk inserts? i didn't see it in the documentation, i'm using the Java driver

To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.

Kyle Banker

unread,
Sep 5, 2010, 7:44:59 AM9/5/10
to mongod...@googlegroups.com
Just pass a list of DBObjects to the insert method.

Ofer Fort

unread,
Sep 5, 2010, 9:29:04 AM9/5/10
to mongod...@googlegroups.com
i'm using the update and not the insert (upserting), and i don't see an option to update multiple objects.

does upsert vs insert have a performance impact?

Kyle Banker

unread,
Sep 5, 2010, 9:36:23 AM9/5/10
to mongod...@googlegroups.com
The API docs are here:

The final parameter to update is a boolean that allows you to specify that the op is an upsert. Is that what you're looking for?

Kyle Banker

unread,
Sep 5, 2010, 11:54:42 AM9/5/10
to mongod...@googlegroups.com
Are you by any chance trying to do a bulk upsert? If so, that's not currently supported. You'll have to upsert each document individually.

Ofer Fort

unread,
Sep 5, 2010, 12:01:28 PM9/5/10
to mongod...@googlegroups.com
no,  insert can accept a List but since i'm using update, and it doesn't accept List, i don't see how i can do bulk inserts.

Ankur

unread,
Sep 5, 2010, 12:25:49 PM9/5/10
to mongodb-user
As you said you can't do a bulk update.

Do you need to do upserts instead of inserts or are you trying to
create new records?

Ankur

On Sep 5, 12:01 pm, Ofer Fort <ofer...@gmail.com> wrote:
> no,  insert<http://api.mongodb.org/java/2.1/com/mongodb/DBCollection.html#insert%...>can
> accept a List but since i'm using
> update<http://api.mongodb.org/java/2.1/com/mongodb/DBCollection.html#update%...>,
> and it doesn't accept List, i don't see how i can do bulk inserts.
>
>
>
> On Sun, Sep 5, 2010 at 4:36 PM, Kyle Banker <k...@10gen.com> wrote:
> > The API docs are here:
> >http://api.mongodb.org/java/2.1/index.html
>
> > The final parameter to update is a boolean that allows you to specify that
> > the op is an upsert. Is that what you're looking for?
>
> > On Sun, Sep 5, 2010 at 9:29 AM, Ofer Fort <ofer...@gmail.com> wrote:
>
> >> i'm using the update and not the insert (upserting), and i don't see an
> >> option to update multiple objects.
>
> >> does upsert vs insert have a performance impact?
>
> >> On Sun, Sep 5, 2010 at 2:44 PM, Kyle Banker <k...@10gen.com> wrote:
>
> >>> Just pass a list of DBObjects to the insert method.
>
> >>> On Sun, Sep 5, 2010 at 7:10 AM, Ofer Fort <ofer...@gmail.com> wrote:
>
> >>>> how do i do bulk inserts? i didn't see it in the documentation, i'm
> >>>> using the Java driver
>

Ofer Fort

unread,
Sep 5, 2010, 12:43:59 PM9/5/10
to mongod...@googlegroups.com
i used the same method in my wrapper class, so my app won't need to know whether this object already exists or not.

is there a difference in performance?
what will happen in i do an insert to an object that has an _id that already is taken? will it rewrite? fail silently?

To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.

Scott Hernandez

unread,
Sep 5, 2010, 12:52:40 PM9/5/10
to mongod...@googlegroups.com
btw, the reason not to use a sequential key to shard on is that all writes will go to a single shard; if you are trying to spread writes out to multiple shards you should use a key that does that (non-increasing values).

(more inline below)

On Sun, Sep 5, 2010 at 9:43 AM, Ofer Fort <ofe...@gmail.com> wrote:
i used the same method in my wrapper class, so my app won't need to know whether this object already exists or not.

That is basically what the save method is for... (see below) 
 
is there a difference in performance?

There is a difference if you are doing a query + insert/update versus just a insert/save
 
what will happen in i do an insert to an object that has an _id that already is taken? will it rewrite? fail silently? 

With insert (without the safe param, or getLastError call to check the status) it will fail silently and not over-write. 

If you do use save (provided in most drivers), it will insert if  the _id value isn't yet set, and do an update by _id if it exists (with upsert set to true).

Perhaps you could post (gist/pastie/etc) a sample of your code so we can better understand what is going on..

Ofer Fort

unread,
Sep 5, 2010, 1:01:56 PM9/5/10
to mongod...@googlegroups.com
thanks,
but if i use it as a key/value, i don't have any other key for the sharding.

the way i'm inseting data, is as follows:
i get an ID and Map of key values
public void put(String dbName, String table, long id, Map<String, String> data) throws Exception
{
    DB db = getDB(dbName);
    db.getCollection(table).update(new BasicDBObject().append("_id", id), new BasicDBObject("$set", new BasicDBObject(data)), true, false);
}

thanks

Sergei Tulentsev

unread,
Sep 6, 2010, 11:31:01 AM9/6/10
to mongod...@googlegroups.com
I have a similar problem. I have set up a cluster of two shards (each one is a replica set of one node). And I am importing large chunks of data with mongoimport. The rate fluctuates but it can drop to as low as 700 records per second (and can be up to 25k/sec). the top command on the shards most of the time show load average = ~5 and %wa = ~20
The shard machines each have three discs in a RAID1.
What could be the weak link here? :-)

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.




--
Best regards,
Sergei Tulentsev

Sergei Tulentsev

unread,
Sep 6, 2010, 11:32:22 AM9/6/10
to mongod...@googlegroups.com
Import to unsharded collection (and without index on "uiq", the shard key to be) goes at pretty constat rate of 50k/sec

Eliot Horowitz

unread,
Sep 6, 2010, 11:58:27 AM9/6/10
to mongod...@googlegroups.com
What version are you running?
Can you try 1.7.0?  There are some large performance improvements in there that should go into 1.6.3

Sergei Tulentsev

unread,
Sep 6, 2010, 5:24:19 PM9/6/10
to mongod...@googlegroups.com
Have tried 1.7.0. No significant changes. Performance dies the moment I shard the collection.
Back to 1.6.2 now. Right now I am importing 800M rows without indexes or sharding. Will add that after and see if it helps.

Sergei Tulentsev

unread,
Sep 7, 2010, 5:46:33 AM9/7/10
to mongod...@googlegroups.com
Ok, so I've imported all of my data within 7 hours and started building index. It's been 8 hours by now and still counting. I though that building index is faster operation than inserting data. :-)

Is it not a good idea to have a collection with 800M documents (on one shard)? I could imagine that B-Tree insertion speed decreases logariphmically.

Can I somehow track the progress of synchronous index construction ?

Eliot Horowitz

unread,
Sep 7, 2010, 8:46:31 AM9/7/10
to mongod...@googlegroups.com
Looking at the web console will give you a % completion for index building.

Sergei Tulentsev

unread,
Sep 7, 2010, 8:50:11 AM9/7/10
to mongod...@googlegroups.com
but what if the server is launched without --rest? no web console?

Eliot Horowitz

unread,
Sep 7, 2010, 8:51:01 AM9/7/10
to mongod...@googlegroups.com
Try it :)
Is the in the basic version.
Also db.currentOp()

Sergei Tulentsev

unread,
Sep 7, 2010, 9:04:40 AM9/7/10
to mongod...@googlegroups.com
When I try it on mongos it gives me almost blank page. No data.
When I talk directly to the server, it fails with "could not acquire read lock"

db.currentOp shows quite a number of operations waiting for lock. They are probably the commands I launched from the console and then Ctrl+C'ed them. And then there's this entry:
        {
            "opid" : "moscow:827615036",
            "active" : true,
            "lockType" : "write",
            "waitingForLock" : false,
            "secs_running" : 36804,
            "op" : "insert",
            "ns" : "pravdorub_production.system.indexes",
            "query" : {
               
            },
            "client" : "10.0.0.6:57116",
            "desc" : "conn",
            "msg" : "index: (2/3) btree bottom up 375917877/815878749 46%"
        },

It's the only one that looks like progress of something. So, it's been 11 hours and not even half done! Data insertion itself took only 7 hours. Does this look like a normal situation?

Eliot Horowitz

unread,
Sep 7, 2010, 10:38:25 AM9/7/10
to mongod...@googlegroups.com
It depends on a lot factors and certainly can be normal.
Also - that % is on part 2/3, so probably closer to 75% done of total time

Harvey Liu

unread,
Sep 7, 2010, 11:50:02 AM9/7/10
to mongod...@googlegroups.com
Bulk inserts into a distributed ordered table, such as mongodb and
yahoo's sherpa, is a problem difficult to solve. Maybe we shouldn't
expect too high metrics:-)

FYI, there is a research paper about that,
http://research.yahoo.com/files/bulkload.pdf

Sergei Tulentsev

unread,
Sep 7, 2010, 12:56:00 PM9/7/10
to mongod...@googlegroups.com
It's an interesting paper, thanks.

But I have another question on the topic. I have a collection of objects, which have a property named 'uiq'. This property will have a secondary index on it (to speed up the retrieval). Would it help the bulk import time if I pre-ordered input files by that field? I am not very proficient with B-Trees, but I have an impression it could help.
Reply all
Reply to author
Forward
0 new messages