Insert vs Upsert Performance

Jeffrey Jenkins

unread,

Feb 17, 2011, 2:44:13 PM2/17/11

to mongodb-user

I have a capped collection with two indicies, one on _id and one which
is a unique index of the fields (A,B,C,D) together. In the course of
some refactoring I had accidentally changed an upsert to a save (which
would do an insert in this case since there is no _id). The change I
made was incorrect, and when I discovered the issue I fixed it.

What's interesting is that I started looking for the issue because
there was a significant performance issue in the other applications
which connected to this database if enough load was applied to the
broken code (but not the fixed code). There is a socket timeout of 6
seconds in the python driver and it kept going off when trying to
access this database. The DB itself (via mongostat and other linux
system stats) didn't indicate that anything was wrong, and certainly
not anything to the degree I was seeing. The mongo logs showed a few
long requests, but not as many as the number of timeouts which were
happening.

What I'm curious about is what the difference is performance-wise
between an insert operation and an upsert operation on a collection
with a unique index. My intuition is that the insert should be faster
since it would check the unique index, find an object there, and
fail. The opposite is what seems to have happened. I had thought
insert would be slower because it's a blocking operation, but that
didn't seem like it should affect the other clients connecting to the
database. Any help on explaining why this happened would be greatly
appreciated,

-Jeff

Eliot Horowitz

unread,

Feb 17, 2011, 4:44:45 PM2/17/11

to mongod...@googlegroups.com

Its a little hard without specifics.
First, are you sure there is an index on _id?
If there isn't an index, then save() will be really slow because it
will add an _id client side, and have to look for it.

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Scott Hernandez

unread,

Feb 17, 2011, 4:41:01 PM2/17/11

to mongod...@googlegroups.com

On Thu, Feb 17, 2011 at 11:44 AM, Jeffrey Jenkins <je...@qcircles.net> wrote:
> I have a capped collection with two indicies, one on _id and one which
> is a unique index of the fields (A,B,C,D) together. In the course of
> some refactoring I had accidentally changed an upsert to a save (which
> would do an insert in this case since there is no _id). The change I
> made was incorrect, and when I discovered the issue I fixed it.

There are some restrictions with updates and capped collections since
you can't grow documents in a capped collection, or remove them.

> What's interesting is that I started looking for the issue because
> there was a significant performance issue in the other applications
> which connected to this database if enough load was applied to the
> broken code (but not the fixed code). There is a socket timeout of 6
> seconds in the python driver and it kept going off when trying to
> access this database. The DB itself (via mongostat and other linux
> system stats) didn't indicate that anything was wrong, and certainly
> not anything to the degree I was seeing. The mongo logs showed a few
> long requests, but not as many as the number of timeouts which were
> happening.
>
> What I'm curious about is what the difference is performance-wise
> between an insert operation and an upsert operation on a collection
> with a unique index. My intuition is that the insert should be faster
> since it would check the unique index, find an object there, and
> fail. The opposite is what seems to have happened. I had thought
> insert would be slower because it's a blocking operation, but that
> didn't seem like it should affect the other clients connecting to the
> database. Any help on explaining why this happened would be greatly
> appreciated,

All write operations are blocking, which is to say they take out a
lock for the duration of the write. Either an insert or update will
both require a lock, but the update will do a query to find the
document(s) which will be affected, where an insert will just put the
doc at the end of the collection (or any hole in which it can fit).

An update to the indexes will be more IO/CPU than the insert. An
update could require removing and adding index values where an insert
will just add an index value, for each index. This might not be
considerable or significant.

Reply all

Reply to author

Forward