Question: is insert on a collection and update of unique indexes atomic?

39 views
Skip to first unread message

John Matthews

unread,
Oct 7, 2011, 1:13:48 PM10/7/11
to mongodb-user, tsan...@redhat.com
Will mongodb handle an insert on a collection as an atomic operation
from the perspective of the insert completing and the unique indexes
being updated?

The behavior I am seeing looks like this is not true (Assume 2 threads
both inserting identical parameters).
1) Thread-A inserts a new document into mongodb
2) guess: mongodb updates unique indexes on collection
3) Thread-B attempts to insert an identical document and gets a
DuplicateKeyError
4) Thread-B then attempts a find() on same parameters and sees no
matches
5) guess: mongodb completes insert of document from step #1
6) Now Thread-B could continue with a find() using the same
parameters and it will work


Our problem:
A DuplicateKeyError exception is raised from pymongo when inserting a
new document,
the exception is caught and we attempt to execute a find() on the same
parameters, expecting a match, yet no match is returned.

We have several threads looping over a collection of rpms, storing
information about the rpm in mongo.
We have a unique index on (name, version, epoch, release, arch), these
are fields of each document we create to represent the rpm.

Versions we are using
# rpm -qa | grep mongo
mongodb-1.6.4-3.el6_0.x86_64
mongodb-server-1.6.4-3.el6_0.x86_64
pymongo-1.9-8.el6_1.x86_64
pymongo-debuginfo-1.9-8.el6_1.x86_64

More details about our issue can be found here:
https://bugzilla.redhat.com/show_bug.cgi?id=734782

Bernie Hackett

unread,
Oct 7, 2011, 5:12:13 PM10/7/11
to mongodb-user
MongoDB-1.6.4 is pretty old and a lot of bug fixes and improvements
have been made sense then. It's possible you are hitting a bug in that
branch. Can you upgrade to 1.8.3 and try again?

Scott Hernandez

unread,
Oct 8, 2011, 2:37:17 PM10/8/11
to mongod...@googlegroups.com, tsan...@redhat.com


On Oct 7, 2011 1:15 PM, "John Matthews" <jwmat...@gmail.com> wrote:
>
> Will mongodb handle an insert on a collection as an atomic operation
> from the perspective of the insert completing and the unique indexes
> being updated?

Yes, the document is inserted and the indexes are updated at the same time.

> The behavior I am seeing looks like this is not true (Assume 2 threads
> both inserting identical parameters).
>  1) Thread-A inserts a new document into mongodb
>  2)     guess: mongodb updates unique indexes on collection
>  3) Thread-B attempts to insert an identical document and gets a
> DuplicateKeyError
>  4) Thread-B then attempts a find() on same parameters and sees no
> matches
>  5)    guess: mongodb completes insert of document from step #1
>  6) Now Thread-B could continue with a find() using the same
> parameters and it will work
>
>
> Our problem:
> A DuplicateKeyError exception is raised from pymongo when inserting a
> new document,
> the exception is caught and we attempt to execute a find() on the same
> parameters, expecting a match, yet no match is returned.

Do you have more than one unique index on that collection? If, could you be crecking the wrong one?

You need to search on just the values in the unique index being violated. Not the other parts of the doc.

> We have several threads looping over a collection of rpms, storing
> information about the rpm in mongo.
> We have a unique index on (name, version, epoch, release, arch), these
> are fields of each document we create to represent the rpm.
>
> Versions we are using
> # rpm -qa | grep mongo
> mongodb-1.6.4-3.el6_0.x86_64
> mongodb-server-1.6.4-3.el6_0.x86_64
> pymongo-1.9-8.el6_1.x86_64
> pymongo-debuginfo-1.9-8.el6_1.x86_64
>
> More details about our issue can be found here:
> https://bugzilla.redhat.com/show_bug.cgi?id=734782
>

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>

John Matthews

unread,
Oct 10, 2011, 11:06:23 AM10/10/11
to mongodb-user


On Oct 8, 2:37 pm, Scott Hernandez <scotthernan...@gmail.com> wrote:
> On Oct 7, 2011 1:15 PM, "John Matthews" <jwmatth...@gmail.com> wrote:
>
>
>
> > Will mongodb handle an insert on a collection as an atomic operation
> > from the perspective of the insert completing and the unique indexes
> > being updated?
>
> Yes, the document is inserted and the indexes are updated at the same time.
>

Thank you for replying and confirming this should be a safe
assumption.

>
>
>
>
>
>
>
>
> > The behavior I am seeing looks like this is not true (Assume 2 threads
> > both inserting identical parameters).
> >  1) Thread-A inserts a new document into mongodb
> >  2)     guess: mongodb updates unique indexes on collection
> >  3) Thread-B attempts to insert an identical document and gets a
> > DuplicateKeyError
> >  4) Thread-B then attempts a find() on same parameters and sees no
> > matches
> >  5)    guess: mongodb completes insert of document from step #1
> >  6) Now Thread-B could continue with a find() using the same
> > parameters and it will work
>
> > Our problem:
> > A DuplicateKeyError exception is raised from pymongo when inserting a
> > new document,
> > the exception is caught and we attempt to execute a find() on the same
> > parameters, expecting a match, yet no match is returned.
>
> Do you have more than one unique index on that collection? If, could you be
> crecking the wrong one?
>

Below is the only unique index we create:
{
"unique" : true,
"name" :
"name_-1_epoch_-1_version_-1_release_-1_arch_-1_filename_-1_checksum_-1",
"key" : {
"name" : -1,
"epoch" : -1,
"version" : -1,
"release" : -1,
"arch" : -1,
"filename" : -1,
"checksum" : -1
},
"ns" : "pulp_database.packages",
"background" : true
},

> You need to search on just the values in the unique index being violated.
> Not the other parts of the doc.
>

We are following this advice, we are using the fields from the unique
index, and only those fields when doing this lookup.

We have only seen this problem a few times and I have been unable to
replicate it consistently with any test programs.

Scott Hernandez

unread,
Oct 10, 2011, 11:11:16 AM10/10/11
to mongod...@googlegroups.com

Are these fields very large? There is a limit to the maximum a index
entry can be; ~800-1K bytes

If you are able to reproduce this please file an issue on jira with
that reproduction case.

John Matthews

unread,
Oct 10, 2011, 11:19:46 AM10/10/11
to mongodb-user


On Oct 10, 11:11 am, Scott Hernandez <scotthernan...@gmail.com> wrote:
I think each index entry is under 800 bytes, below is the specific
index entry we saw a problem with

DuplicateKeyError: E11000 duplicate key error index:
pulp_database.packages.
$name_-1_epoch_-1_version_-1_release_-1_arch_-1_filename_-1_checksum_-1
dup key: { : "selinux-policy-devel", : "0", : "2.4.6", :
"255.el5_4.1", :
"noarch", : "selinux-policy-devel-2.4.6-255.el5_4.1.noarch.rpm", :
{ sha256:
"85d4a7922bb47963b7bb6f5f423bc601b299afad" } }

> If you are able to reproduce this please file an issue on jira with
> that reproduction case.
>

If we are able to reproduce, I will file a JIRA.

Thank you for your help.

Jeff

unread,
Oct 11, 2011, 9:35:36 AM10/11/11
to mongodb-user
Are you running with a replica set, and if so are you using
slaveOk()? That would explain the lack of results from the subsequent
find(), if the insert had not been replicated yet to all secondaries.
> > >> "mongodb-user" group.> To post to this group, send email tomongo...@googlegroups.com.
> > >> > To unsubscribe from this group, send email to
>
> > >>mongodb-user...@googlegroups.com.> For more options, visit this group at
>
> > >>http://groups.google.com/group/mongodb-user?hl=en.
>
> > > --
> > > You received this message because you are subscribed to the Google Groups "mongodb-user" group.> > To post to this group, send email tomongo...@googlegroups.com.> > To unsubscribe from this group, send email tomongodb-use...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages