Duplicate Mongo Records

756 views
Skip to first unread message

timfulmer

unread,
Mar 13, 2012, 9:29:48 PM3/13/12
to mongodb-user

Hi All,

We've been using Mongo to store the results of some analytic types of
calculations, and have noticed we occasionally get duplicate documents
in Mongo. Basically we do the calculations in client code, then do a
lookup in Mongo for a pre-existing document once we've calculated new
numbers. If an existing document is found we copy it's id value into
the new and pass it into Spring's Mongo template. If an existing
document is not found we pass the new into Mongo template w/o an id
attribute.

Most of the documents we process are not new, we're simply updating
existing documents with new numbers. However there are times we get
duplicate entries for the documents, one for the old document and one
for the new. This happens intermittently, and when it does happen it
seems to affect all documents in the collection.

It could be something Spring is doing; Mongo template seems to use
upsert under the covers. It may also be something in the Mongo
driver. We've implemented a workaround to remove duplicates on read,
but it seems like strange behavior. Has anyone else encountered
duplicate Mongo documents where there should be none?

Thanks,

-- Tim

Akbar Gadhiya

unread,
Mar 14, 2012, 12:23:09 AM3/14/12
to mongod...@googlegroups.com
Is ObjectId matching for both the documents? I would say get existing document and apply changes for new numbers to it and let that document get saved.

Thanks
Akbar.


--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


Bill Hayward

unread,
Mar 14, 2012, 7:00:00 AM3/14/12
to mongod...@googlegroups.com
I suspect that you will find the documents are the same in the database with the exception of the _id. Can you run a quick query to check this out?

timfulmer

unread,
Mar 14, 2012, 2:17:22 PM3/14/12
to mongodb-user

Yes, this is the case.

rum verse

unread,
Mar 14, 2012, 2:51:03 PM3/14/12
to mongod...@googlegroups.com
Explore using upsert (update and insert in one update command). 

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.

timfulmer

unread,
Mar 14, 2012, 3:09:57 PM3/14/12
to mongodb-user

Ya, Spring's Mongo template is using upsert when it talks to the Mongo
driver if it finds an _id. Application is setting _id before calling
save method on Mongo template. This was verified in debugger.

rum verse

unread,
Mar 15, 2012, 12:55:50 AM3/15/12
to mongod...@googlegroups.com
Create unique indices on fields that identify uniqueness to each record and set the dropDups option to true. Certainly this isn't something you should do per upsert but rather during db preparation or application init when you're system runs for the first time. Although there is certainly a bug on your driver / code. Mongo doesn't create duplicate on its own.

Scott Hernandez

unread,
Mar 15, 2012, 9:08:49 AM3/15/12
to mongod...@googlegroups.com
Can you post the code you are using to persist the entity? Are you
using save(...)?

On Wed, Mar 14, 2012 at 1:29 AM, timfulmer <tfu...@dslextreme.com> wrote:
>

timfulmer

unread,
Mar 15, 2012, 2:30:17 PM3/15/12
to mongodb-user

Yes, here's the code we're using to populate the id field on a new
entity instance:

for (PreCalcRecord pcr : pcrList) {

try {
PreCalcRecord existingRecord =
preCalcDao.getCorrespondingRecord(
pcr.getReferenceUuid(),
pcr.getFirstChoiceUuid(),
pcr.getSecondChoiceUuid());
pcr.setId(existingRecord.getId());
} catch (NoResultException e) {
// No action.
}

this.preCalcDao.save(pcr);
}

And here's the save method:

public final PreCalcRecord save(PreCalcRecord entity) {
if (entity.getId() == null) {
entity.setCreationDate(new Date());
entity.setModifiedDate(new Date());
this.mongoTemplate.save(entity);
} else {
entity.setModifiedDate(new Date());
this.mongoTemplate.save(entity);
}
return entity;
}


On Mar 15, 6:08 am, Scott Hernandez <scotthernan...@gmail.com> wrote:
> Can you post the code you are using to persist the entity? Are you
> using save(...)?
>
>
>
>
>
>
>
> On Wed, Mar 14, 2012 at 1:29 AM, timfulmer <tful...@dslextreme.com> wrote:
>
> > Hi All,
>
> > We've been usingMongoto store the results of some analytic types of
> > calculations, and have noticed we occasionally getduplicatedocuments
> > inMongo.  Basically we do the calculations in client code, then do a
> > lookup inMongofor a pre-existing document once we've calculated new
> > numbers.  If an existing document is found we copy it's id value into
> > the new and pass it into Spring'sMongotemplate.  If an existing
> > document is not found we pass the new intoMongotemplate w/o an id
> > attribute.
>
> > Most of the documents we process are not new, we're simply updating
> > existing documents with new numbers.  However there are times we get
> >duplicateentries for the documents, one for the old document and one
> > for the new.  This happens intermittently, and when it does happen it
> > seems to affect all documents in the collection.
>
> > It could be something Spring is doing;Mongotemplate seems to use
> > upsert under the covers.  It may also be something in theMongo
> > driver.  We've implemented a workaround to remove duplicates on read,
> > but it seems like strange behavior.  Has anyone else encountered
> >duplicateMongodocuments where there should be none?

timfulmer

unread,
Mar 15, 2012, 2:34:50 PM3/15/12
to mongodb-user

I don't think there's a way to give a unique index a sort parameter to
make sure the latest record stays. Deleting the latest data would not
be very helpful :) It is definitely something between the application
code and Mongo.

Scott Hernandez

unread,
Mar 15, 2012, 2:54:13 PM3/15/12
to mongod...@googlegroups.com
You will want to do an update based on a query (with your unique
values) to either insert or update (w/upsert since that is what it
does). Save is not the correct way to do this as documented here:
http://www.mongodb.org/display/DOCS/Updating#Updating-%7B%7Bsave%28%29%7D%7Dinthemongoshell

Save does the update w/upsert based on the _id in the query.

Reply all
Reply to author
Forward
0 new messages