Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
too may dups on index build with dropDups=true
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  6 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Daniel  
View profile  
 More options Oct 17 2012, 6:24 am
From: Daniel <daniel.brue...@gmail.com>
Date: Wed, 17 Oct 2012 03:24:40 -0700 (PDT)
Local: Wed, Oct 17 2012 6:24 am
Subject: too may dups on index build with dropDups=true

Hi,

i am trying to remove duplicates from a collection with 60mio documents.
But when i ensure the index with dropDups i get

too may dups on index build with dropDups=true.

Is there no way to handle large collection dropDups?

Thanks & Regards

Daniel


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Daniel  
View profile  
 More options Oct 17 2012, 6:31 am
From: Daniel <daniel.brue...@gmail.com>
Date: Wed, 17 Oct 2012 03:31:46 -0700 (PDT)
Local: Wed, Oct 17 2012 6:31 am
Subject: Re: too may dups on index build with dropDups=true

BTW: I am running Mongo 2.2.0 on a machine with 32G RAM.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Daniel  
View profile  
 More options Oct 17 2012, 6:40 am
From: Daniel <daniel.brue...@gmail.com>
Date: Wed, 17 Oct 2012 03:40:30 -0700 (PDT)
Local: Wed, Oct 17 2012 6:40 am
Subject: Re: too may dups on index build with dropDups=true

Is the best way to dump the collection and import it again after setting
the ensureIndex?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Stephen Lee  
View profile  
 More options Oct 17 2012, 11:09 am
From: Stephen Lee <stephen....@10gen.com>
Date: Wed, 17 Oct 2012 08:09:24 -0700 (PDT)
Local: Wed, Oct 17 2012 11:09 am
Subject: Re: too may dups on index build with dropDups=true

Hi Daniel,

The assertion indicates that the number of duplicates met or exceeded
1000000.  In addition, there's a comment in the source that says, "we could
queue these on disk, but normally there are very few dups, so instead
we keep in ram and have a limit." (where the limit == 1000000), so it might
be best to start with an empty collection, ensureIndex with {dropDups:
true}, and reimport the actual documents.

Let us know if that works better for you.

-Stephen


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Daniel  
View profile  
 More options Oct 17 2012, 12:06 pm
From: Daniel <daniel.brue...@gmail.com>
Date: Wed, 17 Oct 2012 09:06:00 -0700 (PDT)
Subject: Re: too may dups on index build with dropDups=true

Hi Stephen,

thanks, i am currently doing exactly this and this leads to the following
post in this
group: https://groups.google.com/forum/#!topic/mongodb-user/GszyFLAAhOk


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
nilskp  
View profile  
 More options Oct 18 2012, 2:32 pm
From: nilskp <nil...@gmail.com>
Date: Thu, 18 Oct 2012 11:32:35 -0700 (PDT)
Local: Thurs, Oct 18 2012 2:32 pm
Subject: Re: too may dups on index build with dropDups=true

I wonder why dupes aren't just deleted as the index is built? I don't see
how or why this is a memory or queuing issue. Presumably an index is built
sequentially by going through every document. Those documents that collide
with an existing index should be deleted, no?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »