Duplicate documents in sharded environment

140 views
Skip to first unread message

Patrick Scott

unread,
Sep 26, 2012, 8:23:47 AM9/26/12
to mongodb-user
I have a 2 shard setup and I recently discovered duplicate documents between shards. I have turned off the balancer so it is not an issue with an in-progress balancer operation. Is there a tool that I can use to clean up those duplicates? If not, is there a command that will determine which shard is the owner of the document?

Thanks,
Patrick

Gianfranco

unread,
Sep 26, 2012, 12:01:57 PM9/26/12
to mongod...@googlegroups.com
Hi,

I'm assuming that you have an index unique:true and the duplicates exist because of a migration failed from one shard to another.
This resulted in 2 shards having the same data and the configs didn't get updated.

There isn't a single command which will fix this problem unfortunately.

If this is the case you'll need a script which finds and removes orphaned documents.

Patrick Scott

unread,
Sep 26, 2012, 1:32:28 PM9/26/12
to mongod...@googlegroups.com
So how can I found out which shard "owns" the document?

--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb

Gianfranco

unread,
Oct 2, 2012, 5:12:26 AM10/2/12
to mongod...@googlegroups.com
Hi Patrick,

Sorry for the delay.

Could you run this script with the path to the filename of orphanage.js?

Note: The script must be run from a 2.x shell.
         And you must connect to primary

If it is in the current working directory, where you started mongo shell, it will be:
1
load("orphanage.js")

After, you'll see a series of options you can now run:

Balancer.stop() -- Do this first, if it's not stopped already
Orphans.find('db.collection') – Find orphans in a given namespace
Orphans.findAll() – Find orphans in all namespaces
Orphans.remove('db.collection') – Remove all orphans in a namespace
Balancer.start()

Please follow the directions and make sure the output of documents to delete is correct before running remove.

orphanage.js

Patrick Scott

unread,
Oct 2, 2012, 9:35:33 AM10/2/12
to mongod...@googlegroups.com
How is db.collection.count() computed? I noticed that it was decreasing as orphaned documents were deleted. It scared me enough that I stopped the script but then I checked each shard individually for the document count and together they equaled the result of a call to db.collection.count() from mongos.

My guess is that count() reflects the total count of objects in the collection on each shard which may include orphaned documents.

Gianfranco

unread,
Oct 2, 2012, 11:30:03 AM10/2/12
to mongod...@googlegroups.com
The db.collection.count() from mongoS is a global operation, so it has communicate with the shards containing that collection.

What version of mongo are you running? all the same?

Patrick Scott

unread,
Oct 2, 2012, 11:40:22 AM10/2/12
to mongod...@googlegroups.com
My shards and mongos' are running 2.0.6.

Gianfranco

unread,
Oct 2, 2012, 12:10:04 PM10/2/12
to mongod...@googlegroups.com
If you are doing updates with upserts, there is a Fix in 2.1.0 to prevent this to happen again.

The latest 2.1.x branch is 2.1.2

If you're want to look into upgrading to the latest version (2.2.0) please read the release notes on how to procede:

Patrick Scott

unread,
Oct 2, 2012, 12:17:22 PM10/2/12
to mongod...@googlegroups.com
I'm doing updates but not with upserts. I just want to make sure I'm deleting true orphaned documents. I have about 100000 out of ~83 million which isn't a lot. If collection.count() includes orphaned items then it makes perfect sense for the global count to decrease as I delete orphans. I just want to verify that behavior.

Gianfranco

unread,
Oct 3, 2012, 6:07:29 AM10/3/12
to mongod...@googlegroups.com
Sorry, I'm not sure what count() function you're referring to.
The normal one on the shell? or a similar one on the script? which line?

If you want to make sure you can go back incase a non duplicate is deleted, as in similar situations, you should back up the datafiles or use mongoexport, specially if it's a production system.

Patrick Scott

unread,
Oct 3, 2012, 8:25:22 AM10/3/12
to mongod...@googlegroups.com
I'm referring to the shell command db.<collection>.count(). Does it include orphaned documents?

Gianfranco

unread,
Oct 3, 2012, 8:35:59 AM10/3/12
to mongod...@googlegroups.com
Yes it does. It counts all the documents across the shards for that collection (when connected to the mongoS)

Patrick Scott

unread,
Oct 3, 2012, 8:56:23 AM10/3/12
to mongod...@googlegroups.com
Ok. Then that explains why the count was decreasing. Thanks!
Reply all
Reply to author
Forward
0 new messages