Duplicates when sharding collection

657 views
Skip to first unread message

Daniel Schlegel

unread,
Oct 1, 2013, 7:54:44 AM10/1/13
to mongod...@googlegroups.com
Hi
I have a sharded collection on _id with the unique constraint set on the shard key.

use config
switched to db config
mongos> db.collections.find() 
{ "_id" : "production.people", "lastmod" : ISODate("1970-01-16T13:58:02.263Z"), "dropped" : false, "key" : { "_id" : 1 }, "unique" : true, "lastmodEpoch" : ObjectId("503e9d5ef940d75c2de07f8e") }


When i start sharding to a second shard i get duplicate documents in the database. some documents are on both shards. i really don't get why this should happen since _id key is unique.

could one of this things be a problem?
- The unique key in the _id field is an integer. 
- We do lots of bulk inserts.
- We used autobalancing. Can i avoid the problem by splitting manually?

i would be glad if someone could help me with this.

dani

 

Jeff Lee

unread,
Oct 1, 2013, 5:01:07 PM10/1/13
to mongod...@googlegroups.com
Hey Daniel,

How are you determining that there are duplicates?

If you're querying the shards directly ( which you really shouldn't be doing ), the duplicate docs may be remnants of a failed migration.  Running the query through a mongos instance should resolve the issue by routing the query to the correct shard.

Having said that, I believe there is a known issue where doing secondary reads through a mongos can result in orphaned documents being returned.  If that's the case, you may need to switch to primary reads.



--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb
 
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Asya Kamsky

unread,
Oct 1, 2013, 8:53:00 PM10/1/13
to mongodb-user
Can you please clarify what you mean by

"When i start sharding to a second shard i get duplicate documents in
the database." - are you getting an error message? What's the error?

This should not happen - i.e. if you are sharding on _id then you
cannot have duplicate '_id' values in your collection. You would need
to provide more information about what you are observing - then we can
hopefully help you interpret what's actually going on.

Asya


On Tue, Oct 1, 2013 at 4:54 AM, Daniel Schlegel <dan...@silp.com> wrote:

Daniel Schlegel

unread,
Oct 2, 2013, 4:54:51 AM10/2/13
to mongod...@googlegroups.com
Hi
We always connect through mongos. 
"I believe there is a known issue where doing secondary reads through a mongos can result in orphaned documents being returned" sounds like our problem. 
i found a ticket about that but it should be solved since we run mongodb version 2.4.6 https://jira.mongodb.org/browse/SERVER-3319

We don't get any error messages but we detected the duplicates in our code.
it's like this:

Example (with mongoid)

Person.where(_id.in => [1]) # i find only one document from the first shard
Person.where(_id.in => [1,2]) # i get three documents. the 1 is on both shards and the 2 only on shard two.

only if i find other docs on the second shard the first one is duplicate.
When i add a save => true option all seams to be ok. But its not because there are still orphaned docs i don't want in my database. 

Thanks, Dani

Jeff Lee

unread,
Oct 2, 2013, 8:05:54 AM10/2/13
to mongod...@googlegroups.com
The problem that I was thinking of is SERVER-9858 / SERVER-5931 , which I believe is still an issue in 2.4, but which should only occur if both of the following are true:

a) the queries are not directed
b) the queries are against the secondaries

Are you sure that mongoid is actually using the shard key for the lookup?  I would try replicating the issue in the shell and then working back from there.  If this is being caused by the issues linked above, the only workaround that I'm aware of is to do primary reads.

Asya may have other suggestions - I'm not aware of any other situation where it would be possible to get duplicates.

Daniel Schlegel

unread,
Oct 2, 2013, 9:35:28 AM10/2/13
to mongod...@googlegroups.com
Many thanks for your help! We now have configured strong consistency(read master) and shard again. After sharding is complete we will try to remove the orphan data.

Asya Kamsky

unread,
Oct 2, 2013, 9:50:19 PM10/2/13
to mongodb-user
The bug you linked was fixed in 2.1 - what version are you currently running?

Daniel Schlegel

unread,
Oct 3, 2013, 3:26:42 AM10/3/13
to mongod...@googlegroups.com
I have mongodb 2.4.6. Yes you are right the one issue is fixed in 2.1 but this one not: https://jira.mongodb.org/browse/SERVER-5931

Asya Kamsky

unread,
Oct 3, 2013, 8:36:16 PM10/3/13
to mongodb-user
Right - I'm glad you were able to resolve this by using default
(primary) reads. The problem with reading from secondaries is of
course they may be lagging by an unknown amount of time and therefore
be "out-of-sync" with current state of the world. Hopefully we will
come up with some solution for this. The bug you linked is the right
one to watch.

Asya
Reply all
Reply to author
Forward
0 new messages