|Duplicate documents in sharded environment||Patrick Scott||9/26/12 5:24 AM|
I have a 2 shard setup and I recently discovered duplicate documents between shards. I have turned off the balancer so it is not an issue with an in-progress balancer operation. Is there a tool that I can use to clean up those duplicates? If not, is there a command that will determine which shard is the owner of the document?
|Re: Duplicate documents in sharded environment||Gianfranco||9/26/12 9:01 AM|
I'm assuming that you have an index unique:true and the duplicates exist because of a migration failed from one shard to another.
This resulted in 2 shards having the same data and the configs didn't get updated.
There isn't a single command which will fix this problem unfortunately.
If this is the case you'll need a script which finds and removes orphaned documents.
|Re: [mongodb-user] Re: Duplicate documents in sharded environment||Patrick Scott||9/26/12 10:32 AM|
So how can I found out which shard "owns" the document?
|Re: [mongodb-user] Re: Duplicate documents in sharded environment||Gianfranco||10/2/12 2:12 AM|
Sorry for the delay.
Could you run this script with the path to the filename of orphanage.js?
Note: The script must be run from a 2.x shell.
And you must connect to primary
If it is in the current working directory, where you started mongo shell, it will be:
After, you'll see a series of options you can now run:
Balancer.stop() -- Do this first, if it's not stopped already
Please follow the directions and make sure the output of documents to delete is correct before running remove.
|Re: [mongodb-user] Re: Duplicate documents in sharded environment||Patrick Scott||10/2/12 6:35 AM|
How is db.collection.count() computed? I noticed that it was decreasing as orphaned documents were deleted. It scared me enough that I stopped the script but then I checked each shard individually for the document count and together they equaled the result of a call to db.collection.count() from mongos.
My guess is that count() reflects the total count of objects in the collection on each shard which may include orphaned documents.
|Re: [mongodb-user] Re: Duplicate documents in sharded environment||Gianfranco||10/2/12 8:30 AM|
The db.collection.count() from mongoS is a global operation, so it has communicate with the shards containing that collection.
What version of mongo are you running? all the same?
|Re: [mongodb-user] Re: Duplicate documents in sharded environment||Patrick Scott||10/2/12 8:40 AM|
My shards and mongos' are running 2.0.6.
|Re: [mongodb-user] Re: Duplicate documents in sharded environment||Gianfranco||10/2/12 9:10 AM|
If you are doing updates with upserts, there is a Fix in 2.1.0 to prevent this to happen again.
The latest 2.1.x branch is 2.1.2
If you're want to look into upgrading to the latest version (2.2.0) please read the release notes on how to procede:
|Re: [mongodb-user] Re: Duplicate documents in sharded environment||Patrick Scott||10/2/12 9:18 AM|
I'm doing updates but not with upserts. I just want to make sure I'm deleting true orphaned documents. I have about 100000 out of ~83 million which isn't a lot. If collection.count() includes orphaned items then it makes perfect sense for the global count to decrease as I delete orphans. I just want to verify that behavior.
|Re: [mongodb-user] Re: Duplicate documents in sharded environment||Gianfranco||10/3/12 3:07 AM|
Sorry, I'm not sure what count() function you're referring to.
The normal one on the shell? or a similar one on the script? which line?
If you want to make sure you can go back incase a non duplicate is deleted, as in similar situations, you should back up the datafiles or use mongoexport, specially if it's a production system.
|Re: [mongodb-user] Re: Duplicate documents in sharded environment||Patrick Scott||10/3/12 5:25 AM|
I'm referring to the shell command db.<collection>.count(). Does it include orphaned documents?
|Re: [mongodb-user] Re: Duplicate documents in sharded environment||Gianfranco||10/3/12 5:35 AM|
Yes it does. It counts all the documents across the shards for that collection (when connected to the mongoS)
|Re: [mongodb-user] Re: Duplicate documents in sharded environment||Patrick Scott||10/3/12 5:56 AM|
Ok. Then that explains why the count was decreasing. Thanks!