The assertion indicates that the number of duplicates met or exceeded 1000000. In addition, there's a comment in the source that says, "we could queue these on disk, but normally there are very few dups, so instead we keep in ram and have a limit." (where the limit == 1000000), so it might be best to start with an empty collection, ensureIndex with {dropDups: true}, and reimport the actual documents.
On Wednesday, October 17, 2012 5:09:24 PM UTC+2, Stephen Lee wrote:
> Hi Daniel,
> The assertion indicates that the number of duplicates met or exceeded > 1000000. In addition, there's a comment in the source that says, "we could > queue these on disk, but normally there are very few dups, so instead > we keep in ram and have a limit." (where the limit == 1000000), so it might > be best to start with an empty collection, ensureIndex with {dropDups: > true}, and reimport the actual documents.
> Let us know if that works better for you.
> -Stephen
> On Wednesday, October 17, 2012 6:40:30 AM UTC-4, Daniel wrote:
>> Is the best way to dump the collection and import it again after setting >> the ensureIndex?
>> On Wednesday, October 17, 2012 12:31:46 PM UTC+2, Daniel wrote:
>>> BTW: I am running Mongo 2.2.0 on a machine with 32G RAM.
>>> On Wednesday, October 17, 2012 12:24:40 PM UTC+2, Daniel wrote:
>>>> Hi,
>>>> i am trying to remove duplicates from a collection with 60mio >>>> documents. But when i ensure the index with dropDups i get
>>>> too may dups on index build with dropDups=true.
>>>> Is there no way to handle large collection dropDups?
I wonder why dupes aren't just deleted as the index is built? I don't see how or why this is a memory or queuing issue. Presumably an index is built sequentially by going through every document. Those documents that collide with an existing index should be deleted, no?
On Wednesday, October 17, 2012 10:09:24 AM UTC-5, Stephen Lee wrote:
> Hi Daniel,
> The assertion indicates that the number of duplicates met or exceeded > 1000000. In addition, there's a comment in the source that says, "we could > queue these on disk, but normally there are very few dups, so instead > we keep in ram and have a limit." (where the limit == 1000000), so it might > be best to start with an empty collection, ensureIndex with {dropDups: > true}, and reimport the actual documents.
> Let us know if that works better for you.
> -Stephen
> On Wednesday, October 17, 2012 6:40:30 AM UTC-4, Daniel wrote:
>> Is the best way to dump the collection and import it again after setting >> the ensureIndex?
>> On Wednesday, October 17, 2012 12:31:46 PM UTC+2, Daniel wrote:
>>> BTW: I am running Mongo 2.2.0 on a machine with 32G RAM.
>>> On Wednesday, October 17, 2012 12:24:40 PM UTC+2, Daniel wrote:
>>>> Hi,
>>>> i am trying to remove duplicates from a collection with 60mio >>>> documents. But when i ensure the index with dropDups i get
>>>> too may dups on index build with dropDups=true.
>>>> Is there no way to handle large collection dropDups?