checking for duplicates

Trevor Oakley

unread,

Apr 16, 2015, 12:40:50 AM4/16/15

to mongod...@googlegroups.com

Is there any quick way of preventing duplicates on an insert?

Now I check using $exists and the whole system is slow. It checks thousands of time per file. I did think the insert would automatically handle it on a index defined as unique but that did not work. It works in Cassandra for example.

What is the best way in Mongo?

Derick Rethans

unread,

Apr 16, 2015, 3:34:23 AM4/16/15

to mongod...@googlegroups.com

Using a unique index.

If it does not "work", you should investigate why. Sometimes, it's not
clear whether you store a number as a number (4) or as a string ("4") -
and those do not hit the "uniqueness" for example.

See the following run on the shell:

> db.unique.drop();
true

> db.unique.createIndex( { value: 1 }, { unique: true } );
{
"createdCollectionAutomatically" : true,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}

> db.unique.insert( { value: 42 } );
WriteResult({ "nInserted" : 1 })

> db.unique.insert( { value: 42 } );
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 11000,
"errmsg" : "insertDocument :: caused by :: 11000 E11000 duplicate key error index: demo.unique.$value_1 dup key: { : 42.0 }"
}
})

> db.unique.insert( { value: '42' } );
WriteResult({ "nInserted" : 1 })

cheers,
Derick

--
{
website: [ "http://mongodb.org", "http://derickrethans.nl" ],
twitter: [ "@derickr", "@mongodb" ]
}

Trevor Oakley

unread,

Apr 16, 2015, 3:49:25 AM4/16/15

to mongod...@googlegroups.com

Here is the code (Java) I used which definitely did not work. It inserted countless urls all the same (RawUrl). What I expected was that the insert would automatically fail when a duplicate existed. That is how Cassandra worked when I tested it.

BasicDBObject index = new BasicDBObject("RawUrl", 1).append("unique", true).append("dropDups", true);

if (r.RawUrl != null) {

BasicDBList urlList = mapRaw(r.RawUrls);

BasicDBObject dbObj = new BasicDBObject("RawUrl",

r.RawUrl).append("RawContent", r.RawContent).append("RawStatus", r.RawStatus);

dbObj.append("RawUrls", new BasicDBObject("Urls", urlList));

coll.save(dbObj);

}

coll.createIndex(index );

Do I need to state the createIndex everytime or will Mongo detect the index?

Derick Rethans

unread,

Apr 17, 2015, 4:19:51 AM4/17/15

to mongod...@googlegroups.com

On Thu, 16 Apr 2015, Trevor Oakley wrote:

> Here is the code (Java) I used which definitely did not work. It inserted
> countless urls all the same (RawUrl). What I expected was that the insert
> would automatically fail when a duplicate existed. That is how Cassandra
> worked when I tested it.
>
>
> BasicDBObject index = new BasicDBObject("RawUrl", 1).append("unique",
> true).append("dropDups", true);

You're setting:

append("dropDups", true);

Which means that duplicate entries silently get discarded, instead of an
error showing up.

>
> if (r.RawUrl != null) {
> BasicDBList urlList = mapRaw(r.RawUrls);
>
> BasicDBObject dbObj = new BasicDBObject("RawUrl",
> r.RawUrl).append("RawContent", r.RawContent).append("RawStatus",
> r.RawStatus);
> dbObj.append("RawUrls", new BasicDBObject("Urls", urlList));
> coll.save(dbObj);
> }
>
>
> coll.createIndex(index );
>
> Do I need to state the createIndex everytime or will Mongo detect the index?

You don't need to do createIndex more than one time. It is persisted in
the database. However, you do need to create the index *before* you
insert any documents. You now created it at the end, after inserting
URLs. And with the dropDups set, it will just then remove all duplicate
entries.

cheers,
Derick

Reply all

Reply to author

Forward