Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Duplicate documents in sharded environment
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  13 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Patrick Scott  
View profile  
 More options Sep 26 2012, 8:24 am
From: Patrick Scott <patr...@springmetrics.com>
Date: Wed, 26 Sep 2012 08:23:47 -0400
Local: Wed, Sep 26 2012 8:23 am
Subject: Duplicate documents in sharded environment

I have a 2 shard setup and I recently discovered duplicate documents
between shards. I have turned off the balancer so it is not an issue with
an in-progress balancer operation. Is there a tool that I can use to clean
up those duplicates? If not, is there a command that will determine which
shard is the owner of the document?

Thanks,
Patrick


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gianfranco  
View profile  
 More options Sep 26 2012, 12:01 pm
From: Gianfranco <gianfra...@10gen.com>
Date: Wed, 26 Sep 2012 09:01:57 -0700 (PDT)
Local: Wed, Sep 26 2012 12:01 pm
Subject: Re: Duplicate documents in sharded environment

Hi,

I'm assuming that you have an index *unique:true* and the duplicates exist
because of a migration failed from one shard to another.
This resulted in 2 shards having the same data and the configs didn't get
updated.

There isn't a single command which will fix this problem unfortunately.

If this is the case you'll need a script which finds and removes orphaned
documents.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Patrick Scott  
View profile  
 More options Sep 26 2012, 1:32 pm
From: Patrick Scott <patr...@springmetrics.com>
Date: Wed, 26 Sep 2012 13:32:28 -0400
Local: Wed, Sep 26 2012 1:32 pm
Subject: Re: [mongodb-user] Re: Duplicate documents in sharded environment

So how can I found out which shard "owns" the document?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gianfranco  
View profile  
 More options Oct 2 2012, 5:12 am
From: Gianfranco <gianfra...@10gen.com>
Date: Tue, 2 Oct 2012 02:12:26 -0700 (PDT)
Local: Tues, Oct 2 2012 5:12 am
Subject: Re: [mongodb-user] Re: Duplicate documents in sharded environment

Hi Patrick,

Sorry for the delay.

Could you run this script with the path to the filename of orphanage.js?

Note: The script must be run from a 2.x shell.
         And you must connect to primary

If it is in the current working directory, where you started mongo shell,
it will be:
1
load("orphanage.js")

After, you'll see a series of options you can now run:

Balancer.stop() -- Do this first, if it's not stopped already
Orphans.find('db.collection') – Find orphans in a given namespace
Orphans.findAll() – Find orphans in all namespaces
Orphans.remove('db.collection') – Remove all orphans in a namespace
Balancer.start()

Please follow the directions and make sure the output of documents to
delete is correct before running remove.

  orphanage.js
10K Download

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Patrick Scott  
View profile  
 More options Oct 2 2012, 9:35 am
From: Patrick Scott <patr...@springmetrics.com>
Date: Tue, 2 Oct 2012 09:35:33 -0400
Local: Tues, Oct 2 2012 9:35 am
Subject: Re: [mongodb-user] Re: Duplicate documents in sharded environment

How is db.collection.count() computed? I noticed that it was decreasing as
orphaned documents were deleted. It scared me enough that I stopped the
script but then I checked each shard individually for the document count
and together they equaled the result of a call to db.collection.count()
from mongos.

My guess is that count() reflects the total count of objects in the
collection on each shard which may include orphaned documents.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gianfranco  
View profile  
 More options Oct 2 2012, 11:30 am
From: Gianfranco <gianfra...@10gen.com>
Date: Tue, 2 Oct 2012 08:30:03 -0700 (PDT)
Local: Tues, Oct 2 2012 11:30 am
Subject: Re: [mongodb-user] Re: Duplicate documents in sharded environment

The db.collection.count() from mongoS is a global operation, so it has
communicate with the shards containing that collection.

What version of mongo are you running? all the same?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Patrick Scott  
View profile  
 More options Oct 2 2012, 11:40 am
From: Patrick Scott <patr...@springmetrics.com>
Date: Tue, 2 Oct 2012 11:40:22 -0400
Local: Tues, Oct 2 2012 11:40 am
Subject: Re: [mongodb-user] Re: Duplicate documents in sharded environment

My shards and mongos' are running 2.0.6.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gianfranco  
View profile  
 More options Oct 2 2012, 12:10 pm
From: Gianfranco <gianfra...@10gen.com>
Date: Tue, 2 Oct 2012 09:10:04 -0700 (PDT)
Local: Tues, Oct 2 2012 12:10 pm
Subject: Re: [mongodb-user] Re: Duplicate documents in sharded environment

If you are doing updates with upserts, there is a Fix in 2.1.0 to prevent
this to happen again.
https://jira.mongodb.org/browse/SERVER-4639

The latest 2.1.x branch is 2.1.2

If you're want to look into upgrading to the latest version (2.2.0) please
read the release notes on how to procede:
http://docs.mongodb.org/manual/release-notes/2.2/#upgrade-shard-cluster


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Patrick Scott  
View profile  
 More options Oct 2 2012, 12:18 pm
From: Patrick Scott <patr...@springmetrics.com>
Date: Tue, 2 Oct 2012 12:17:22 -0400
Local: Tues, Oct 2 2012 12:17 pm
Subject: Re: [mongodb-user] Re: Duplicate documents in sharded environment

I'm doing updates but not with upserts. I just want to make sure I'm
deleting true orphaned documents. I have about 100000 out of ~83 million
which isn't a lot. If collection.count() includes orphaned items then it
makes perfect sense for the global count to decrease as I delete orphans. I
just want to verify that behavior.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gianfranco  
View profile  
 More options Oct 3 2012, 6:07 am
From: Gianfranco <gianfra...@10gen.com>
Date: Wed, 3 Oct 2012 03:07:29 -0700 (PDT)
Local: Wed, Oct 3 2012 6:07 am
Subject: Re: [mongodb-user] Re: Duplicate documents in sharded environment

Sorry, I'm not sure what count() function you're referring to.
The normal one on the shell? or a similar one on the script? which line?

If you want to make sure you can go back incase a non duplicate is deleted,
as in similar situations, you should back up the datafiles or use
mongoexport, specially if it's a production system.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Patrick Scott  
View profile  
 More options Oct 3 2012, 8:25 am
From: Patrick Scott <patr...@springmetrics.com>
Date: Wed, 3 Oct 2012 08:25:22 -0400
Local: Wed, Oct 3 2012 8:25 am
Subject: Re: [mongodb-user] Re: Duplicate documents in sharded environment

I'm referring to the shell command db.<collection>.count(). Does it include
orphaned documents?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gianfranco  
View profile  
 More options Oct 3 2012, 8:35 am
From: Gianfranco <gianfra...@10gen.com>
Date: Wed, 3 Oct 2012 05:35:59 -0700 (PDT)
Local: Wed, Oct 3 2012 8:35 am
Subject: Re: [mongodb-user] Re: Duplicate documents in sharded environment

Yes it does. It counts all the documents across the shards for that
collection (when connected to the mongoS)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Patrick Scott  
View profile  
 More options Oct 3 2012, 8:56 am
From: Patrick Scott <patr...@springmetrics.com>
Date: Wed, 3 Oct 2012 08:56:23 -0400
Local: Wed, Oct 3 2012 8:56 am
Subject: Re: [mongodb-user] Re: Duplicate documents in sharded environment

Ok. Then that explains why the count was decreasing. Thanks!


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »