too may dups on index build with dropDups=true

724 views
Skip to first unread message

Daniel

unread,
Oct 17, 2012, 6:24:40 AM10/17/12
to mongod...@googlegroups.com
Hi,

i am trying to remove duplicates from a collection with 60mio documents. But when i ensure the index with dropDups i get

too may dups on index build with dropDups=true.

Is there no way to handle large collection dropDups?

Thanks & Regards

Daniel

Daniel

unread,
Oct 17, 2012, 6:31:46 AM10/17/12
to mongod...@googlegroups.com
BTW: I am running Mongo 2.2.0 on a machine with 32G RAM.

Daniel

unread,
Oct 17, 2012, 6:40:30 AM10/17/12
to mongod...@googlegroups.com
Is the best way to dump the collection and import it again after setting the ensureIndex?

Stephen Lee

unread,
Oct 17, 2012, 11:09:24 AM10/17/12
to mongod...@googlegroups.com
Hi Daniel,

The assertion indicates that the number of duplicates met or exceeded 1000000.  In addition, there's a comment in the source that says, "we could queue these on disk, but normally there are very few dups, so instead we keep in ram and have a limit." (where the limit == 1000000), so it might be best to start with an empty collection, ensureIndex with {dropDups: true}, and reimport the actual documents.

Let us know if that works better for you.

-Stephen

Daniel

unread,
Oct 17, 2012, 12:06:00 PM10/17/12
to mongod...@googlegroups.com
Hi Stephen,

thanks, i am currently doing exactly this and this leads to the following post in this group: https://groups.google.com/forum/#!topic/mongodb-user/GszyFLAAhOk

nilskp

unread,
Oct 18, 2012, 2:32:35 PM10/18/12
to mongod...@googlegroups.com
I wonder why dupes aren't just deleted as the index is built? I don't see how or why this is a memory or queuing issue. Presumably an index is built sequentially by going through every document. Those documents that collide with an existing index should be deleted, no?
Reply all
Reply to author
Forward
0 new messages