Re: [mongodb-user] pymongo update method is very slow, comparing to find_and_modify command

567 views
Skip to first unread message

Bernie Hackett

unread,
Sep 11, 2012, 12:04:11 PM9/11/12
to mongod...@googlegroups.com
The problem is your use of j=True. The first thing to understand if
that find_and_modify() is a command that does not support the j
option, so passing that parameter has no effect.

Passing j=True on every update() is causing *at least* a 33
millisecond pause (to wait for the next group commit) for each
operation before the server acknowledges the write. That is why you
are seeing this slowdown in your update operations in a single thread.

More information on j is available here:

http://www.mongodb.org/display/DOCS/getLastError+Command#getLastErrorCommand-%7B%7Bj%7D%7D

On Tue, Sep 11, 2012 at 5:40 AM, arjun kumar <narju...@gmail.com> wrote:
> We are trying to measure the speed of reads/writes/updates etc. with pymongo
> as the client driver, and we found that the Collection.update() function
> is incredibly slow compared to Collection.find_and_modify() function.
>
> http://api.mongodb.org/python/current/api/pymongo/collection.html
>
> The speed in terms of reads/writes per sec are
> Read - 2800 records per sec
> Update using update function - 30 records per sec
> Update using find_and_modify function - 1500 records per sec
>
> We can update only 1 document at a time - so we used the update function
> with multi=False. We also had journaling enabled so we were passing j=true
> as an argument. To summarise we were using both the functions in this
> manner :
>
> collection.update(query_criteria,{"$set" : {"data":
> my-data}},multi=False,j=True)
> collection.find_and_modify(query=query_criteria,update={"$set" :
> {"data":my-data}},j=True)
>
> We don't understand why is update almost 40 times as slow as
> find_and_modify.
>
> Any help or any sort of documentation on this would be much appreciated.
>
> Thanks,
> Arjun
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb

Scott Hernandez

unread,
Sep 11, 2012, 1:17:50 PM9/11/12
to mongod...@googlegroups.com
On Tue, Sep 11, 2012 at 12:04 PM, Bernie Hackett <ber...@10gen.com> wrote:
> The problem is your use of j=True. The first thing to understand if
> that find_and_modify() is a command that does not support the j
> option, so passing that parameter has no effect.
>
> Passing j=True on every update() is causing *at least* a 33
> millisecond pause (to wait for the next group commit) for each
> operation before the server acknowledges the write. That is why you
> are seeing this slowdown in your update operations in a single thread.

This is not an "at least" number but "the best of the worst case" (and
only for the default 100ms interval). The point is that there will be
some (possibly significant) wait time piling up for journaling if you
are only using a single client thead/connection. If you happen to ask
for a journal commit in the first 33ms (1/3 time) then you have to
wait till the journal commit happens , sometime shortly after the 1/3
time ~ 33ms, which can be a large portion of time relative to the
journal write. So if you are doing on a single connection then all of
them stack up, and throughput is slowed down.

If you test with more client threads, or processes, you will see an
increase in throughput proportional to that number.
Message has been deleted

arjun kumar

unread,
Sep 12, 2012, 3:37:06 AM9/12/12
to mongod...@googlegroups.com
That makes perfect sense and kind of explains the 30 records per sec value we get too.

My question is - how do I update 1000 records and then getLastError once to ensure all of them 
got committed to the journal? And if I update 1000 records and then call getLastError once, does it 
guarantee the successful update of the 1000th record only, or does it guarantee for all the previous 
records also?
Message has been deleted

arjun kumar

unread,
Sep 12, 2012, 3:41:59 AM9/12/12
to mongod...@googlegroups.com
As you said updates scale with more clients - with 10 clients we were able to update around 280 records per second.

However we found that for mongodb writes don't scale well with more clients. i.e the collection.insert function . Is this because of the
nature of lock that mongodb creates because we are writing to a single collection?

arjun kumar

unread,
Sep 12, 2012, 4:32:21 AM9/12/12
to mongod...@googlegroups.com
One more question is - Does findAndModify guarantee persistence to disk (journal/data files) level even without a j=True option? 

Scott Hernandez

unread,
Sep 22, 2012, 8:59:11 AM9/22/12
to mongod...@googlegroups.com
On Wed, Sep 12, 2012 at 4:32 AM, arjun kumar <narju...@gmail.com> wrote:
> One more question is - Does findAndModify guarantee persistence to disk
> (journal/data files) level even without a j=True option?

No, it just requires the server finish the update (in memory) and
acknowledge that. You will need to do an extra call to wait for the
journal write.

arjun kumar

unread,
Sep 24, 2012, 3:28:12 AM9/24/12
to mongod...@googlegroups.com
Thanks Scott for your reply. What we are wondering is why there is no bulk update feature, i.e if I want to update multiple unique documents at once,
and verify with 1 getLastError call that all documents are journalled successfully? Something like the bulk insert? Otherwise to update documents 1 at a 
time with j=True is a big performance killer.. it brings us to a throughput of around 30 records/second. Similar threads - 


Would be great if we can get a usable option - something for decently fast updates.. at a min of 700 rec/sec.. but also confirmation that updates succeeded.

Gianfranco

unread,
Oct 10, 2012, 11:37:18 AM10/10/12
to mongod...@googlegroups.com
Bulk updates in the JavaScript shell are executing many update commands.
This is how the oplog entries are added and then replicated.

Unfortunately there is no way to have great performance while ensuring 100% concurrency.

Scott Hernandez

unread,
Oct 10, 2012, 11:43:42 AM10/10/12
to mongod...@googlegroups.com
On Wed, Oct 10, 2012 at 11:37 AM, Gianfranco <gianf...@10gen.com> wrote:
> Bulk updates in the JavaScript shell are executing many update commands.
> This is how the oplog entries are added and then replicated.
>
> Unfortunately there is no way to have great performance while ensuring 100%
> concurrency.

In order to improve throughput you will want to have many more clients
(or threads). That is what you want to do if you are doing safe writes
for each update.
Reply all
Reply to author
Forward
0 new messages