checking for duplicates

29 views
Skip to first unread message

Trevor Oakley

unread,
Apr 16, 2015, 12:40:50 AM4/16/15
to mongod...@googlegroups.com
Is there any quick way of preventing duplicates on an insert?

Now I check using $exists and the whole system is slow. It checks thousands of time per file. I did think the insert would automatically handle it on a index defined as unique but that did not work. It works in Cassandra for example. 

What is the best way in Mongo?

Derick Rethans

unread,
Apr 16, 2015, 3:34:23 AM4/16/15
to mongod...@googlegroups.com
Using a unique index.

If it does not "work", you should investigate why. Sometimes, it's not
clear whether you store a number as a number (4) or as a string ("4") -
and those do not hit the "uniqueness" for example.

See the following run on the shell:

> db.unique.drop();
true

> db.unique.createIndex( { value: 1 }, { unique: true } );
{
"createdCollectionAutomatically" : true,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}

> db.unique.insert( { value: 42 } );
WriteResult({ "nInserted" : 1 })

> db.unique.insert( { value: 42 } );
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 11000,
"errmsg" : "insertDocument :: caused by :: 11000 E11000 duplicate key error index: demo.unique.$value_1 dup key: { : 42.0 }"
}
})

> db.unique.insert( { value: '42' } );
WriteResult({ "nInserted" : 1 })


cheers,
Derick

--
{
website: [ "http://mongodb.org", "http://derickrethans.nl" ],
twitter: [ "@derickr", "@mongodb" ]
}

Trevor Oakley

unread,
Apr 16, 2015, 3:49:25 AM4/16/15
to mongod...@googlegroups.com
Here is the code (Java) I used which definitely did not work. It inserted countless urls all the same (RawUrl). What I expected was that the insert would automatically fail when a duplicate existed. That is how Cassandra worked when I tested it.


  BasicDBObject index = new BasicDBObject("RawUrl", 1).append("unique", true).append("dropDups", true);
 
  if (r.RawUrl != null) {
  BasicDBList urlList = mapRaw(r.RawUrls);
 
  BasicDBObject dbObj = new BasicDBObject("RawUrl",
  r.RawUrl).append("RawContent", r.RawContent).append("RawStatus", r.RawStatus);
  dbObj.append("RawUrls", new BasicDBObject("Urls", urlList));
  coll.save(dbObj);
  }

   
  coll.createIndex(index );

Do I need to state the createIndex everytime or will Mongo detect the index?

Derick Rethans

unread,
Apr 17, 2015, 4:19:51 AM4/17/15
to mongod...@googlegroups.com
On Thu, 16 Apr 2015, Trevor Oakley wrote:

> Here is the code (Java) I used which definitely did not work. It inserted
> countless urls all the same (RawUrl). What I expected was that the insert
> would automatically fail when a duplicate existed. That is how Cassandra
> worked when I tested it.
>
>
> BasicDBObject index = new BasicDBObject("RawUrl", 1).append("unique",
> true).append("dropDups", true);

You're setting:

append("dropDups", true);

Which means that duplicate entries silently get discarded, instead of an
error showing up.

>
> if (r.RawUrl != null) {
> BasicDBList urlList = mapRaw(r.RawUrls);
>
> BasicDBObject dbObj = new BasicDBObject("RawUrl",
> r.RawUrl).append("RawContent", r.RawContent).append("RawStatus",
> r.RawStatus);
> dbObj.append("RawUrls", new BasicDBObject("Urls", urlList));
> coll.save(dbObj);
> }
>
>
> coll.createIndex(index );
>
> Do I need to state the createIndex everytime or will Mongo detect the index?

You don't need to do createIndex more than one time. It is persisted in
the database. However, you do need to create the index *before* you
insert any documents. You now created it at the end, after inserting
URLs. And with the dropDups set, it will just then remove all duplicate
entries.

cheers,
Derick
Reply all
Reply to author
Forward
0 new messages