Mongodb duplicate documents

584 views
Skip to first unread message

Aparna Kulkarni

unread,
Feb 28, 2018, 8:32:46 AM2/28/18
to MongoEngine Users

For a very limited duration, Mongodb in my case, receives a lot of connections. During this time-span if any insert happens, multiple duplicate documents get created.


Please note that, from code, I'm saving only one document. However, mongodb stores duplicates with different _ids. Many solutions suggest adding unique index on collection, but it isn't possible in my case.


What could be the best way of making sure that for one insert operation, there is only one entry in the database?
Also, what is the reason it creates duplicate documents? I use mongoengine. Is it mongodbmongoengine or webserver (apache) that is making retry insert calls and why?


Versions used:
mongoengine 0.8.7
pymongo 2.8.1
mongodb 2.6.12
Python 2.7.12

JohnAD

unread,
Mar 8, 2018, 5:41:20 PM3/8/18
to MongoEngine Users
Getting duplicates is a serious problem, but there is not enough information given to know what part of the system is causing the problem.

I recommend adding a reference OID as a temporary measure to locate duplicate entries. For example:

import mongoengine as me
import bson

class MyDoc(me.Document):
    text
= me.StringField()
    year
= me.IntField()

    reference_oid
= me.ObjectIDField()


a
= MyDoc()
a
.text = "something"
a
.year = 2018
a
.reference_oid = bson.ObjectID()
a
.save()



The call to bson.ObjectID() will generate a psuedo-random OID just once.

So, if the same document is saved multiple times, they may have different "_id" s, but they will share the same "reference_oid". In fact, the "_id" list will provide proof of the duplication and details to look through the logs with.

Was this helpful?

John

Reply all
Reply to author
Forward
0 new messages