updating one collection in mongos blocked by duplicate key error in another collection

2,071 views
Skip to first unread message

Mark

unread,
Aug 15, 2012, 4:23:29 PM8/15/12
to mongod...@googlegroups.com
We had an application that wouldn't start because of dup key errors. 
We had 5 of them, and only 3 are needed at any time. 
I usually would wait an hour, and then the dup key errors would go away. 

Well, this time, I shut down the application and executed a query to free the records is was working on. 
Then I got this weird error message

mongos> db.JQ.update( {host: "c01"}, { $set : { inprogress : false } } ,false, true );
E11000 duplicate key error index: CUP.UPI.$_id_  dup key: { : "78b8abe8-bbe7-44d7-9b75-e22040e56e26" }

This same id was in the error message of the application. 
The application ensures indexes when it starts. 

Exception in thread "main" com.mongodb.MongoException$DuplicateKey: E11000 duplicate key error index: CUP.UPI.$_id_  dup key: { : "78b8abe8-bbe7-44d7-9b75-e22040e56e26" }
        at com.mongodb.CommandResult.getException(CommandResult.java:80)
        at com.mongodb.CommandResult.throwOnError(CommandResult.java:116)
        at com.mongodb.DBTCPConnector._checkWriteError(DBTCPConnector.java:126)
        at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:148)
        at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:132)
        at com.mongodb.DBApiLayer$MyCollection.insert(DBApiLayer.java:262)
        at com.mongodb.DBApiLayer$MyCollection.createIndex(DBApiLayer.java:354)
        at com.mongodb.DBCollection.ensureIndex(DBCollection.java:444)
        at com.mongodb.DBCollection.ensureIndex(DBCollection.java:421)

CUP.UPI is a sharded collection, while JQ is not. 

This didn't make any sense why one collection would block another unless mongos detected a general error and wouldn't let other things happen -- for the manual update query I ran. 
What is weird, is a few minutes later, the update command worked --- as if the mongos information got updated and cleared out the problem. 

The reason why I think it might be a mongos problem (somehow) is because all I have to do is wait a few minutes when my application doesn't start before the problem goes away. 

We are using 2.0.6. 

Ian Daniel

unread,
Aug 20, 2012, 2:25:07 AM8/20/12
to mongod...@googlegroups.com
Hi Mark,

Please can I ask some questions to clarify.

You wrote:

We had an application that wouldn't start because of dup key errors. 
We had 5 of them, and only 3 are needed at any time. 
I usually would wait an hour, and then the dup key errors would go away. 

Are the duplicate key errors valid? - that is, are there actually duplicate keys in your database?

What does, "We had 5 of them, and only 3 are needed at any time." mean? 5 of what; 3 of what needed?

From the information you have presented, it looks as if perhaps the shell command is (erroneously) getting back the getLastError output from the application's failed ensureIndex. 

We would like to follow this up further, and as part of this it would be useful to see some of your log files. Please could you create a ticket in the Community Private project of the MongoDB JIRA instance: https://jira.mongodb.org/browse/support/, and attach there the log files of your mongos instances and config server instances covering the time that you were getting the error messages you described. Please then add a comment here with the ticket number.

Kind regards,
Ian

Mark

unread,
Aug 22, 2012, 8:07:59 PM8/22/12
to mongod...@googlegroups.com
Sorry, it turns out we had issues with our program. We dug into the code and it had nothing to do with mongo. 
I had 5 applications, and 3 were quitting because of an error. 
Sorry to bother with programming issues. 
Not capturing errors led us to believe it had something to do with mongo and it didn't at all. 

Ian Daniel

unread,
Aug 22, 2012, 10:08:27 PM8/22/12
to mongod...@googlegroups.com
Hi Mark,

Thanks for the update, and I'm glad you were able to sort out the problem.

Kind regards,
Ian

--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb

Mark Niel

unread,
Oct 23, 2012, 3:30:33 PM10/23/12
to mongod...@googlegroups.com
Hello!
The code attempts an insert, we catch it if errors out and continue. 
It tries to insert data twice that it shouldn't. 
Basically, once it started it was fine because the software had the catch in the code after it ran, but not at the beginning. 



Mikhail Strebkov

unread,
Oct 23, 2012, 5:06:17 PM10/23/12
to mongod...@googlegroups.com
Mark, thanks a lot for the answer!
Unfortunately our app doesn't have inserts without a catch.
In your initial message, you said that this error message appears even when you are doing update inside mongo console.
We are observing the same problem.
So I can't understand, what helped to fix the problem in your case? You have removed the second insert? Or you have added a catch to each insert statement?
Or you are avoiding a situation when two independent threads are doing inserts of the same id to mongos at the same time?

Mark Niel

unread,
Oct 23, 2012, 5:20:40 PM10/23/12
to mongod...@googlegroups.com
We just caught the 2nd insert and ignored the error. 
If you have threads in the same program, you could let them communicate with each other to make sure they don't do the same thing, perhaps. 


Ian Daniel

unread,
Oct 24, 2012, 3:05:48 AM10/24/12
to mongod...@googlegroups.com
Hi Mikhail,

You wrote:

Sometimes the duplicate key errors from inserts are getting stuck in mongos. So that update on some collection ends with duplicate key error for another collection.

Please can you describe in more detail what you mean by "update on some collection ends with duplicate key error for another collection". Please can you post example updates and the resulting error messages.

Cheers,
Ian

On Tuesday, October 23, 2012 3:46:26 AM UTC+11, Mikhail Strebkov wrote:
Hi Mark,

We are facing the same error with production now, and we can't reproduce it locally, or on staging.
We have 6 database servers, 3 masters (sharding) and 3 replicas - one for each master.
On each machine we have mongos running, our app connects to mongos and mongos transfers requests to mongod on database server.
Sometimes the duplicate key errors from inserts are getting stuck in mongos. So that update on some collection ends with duplicate key error for another collection.
We don't handle duplicate key exceptions in update, so our app crashes. When it is restarted automatically - it crashes again, in ensure index, with the same duplicate key error.
Just like in your case.
You said that at the end in your case it was some error in your code, you fixed it and now no more such issues.
Can you share some details? We can't find an issue in our code.

Thanks,
Mikhail

Mikhail Strebkov

unread,
Oct 25, 2012, 12:36:35 PM10/25/12
to mongod...@googlegroups.com
Hi Ian,

News:
We have removed "slaveOk" option from our connection string and now we can't reproduce the problem.
If we put this option back - then the error is easily reproducible (now in staging too).
The error reproduces when we try to insert to the sharded collection.

> Please can you describe in more detail what you mean by "update on some collection ends with duplicate key error for another collection".

In our code different processors update different parts of the document in MongoDB.
The document may not exist. So each processor tries to insert the document. If the insertion fails with duplicate key error in _id field, then it goes with update where _id is the document id, and the update statement is $set -> { <some json> }.
We do not use auto-generated ObjectId, we use meaningful _id  field values.
Ok. That was the background. Now we have two collections: Users and Posts.
There are two processors, working in parallel (different OS processes, same machine). Sometimes there is a situation like this:
* First Processors1 inserts new User
* Then processor2 tries to do the same, but fails and do update for that User (successfully).
* Then processor2 tries to insert the Post. But this insert fails with duplicate key error. And this error has id of the User (not the Post id) in the error message.
* Our processor2 crashes, then it is restarted automatically, and it tries to do ensureIndex -> it fails with the same error message (duplicate key error with the same User id).
* Previous step can be repeated several times.
* Then our processor2 crashes, then it is restarted automatically, and it tries to do ensureIndex -> it starts OK, the error is gone now.
This is only reproducible when Users and Posts collections are sharded using _id key.

Thanks,
Mikhail

Ian Daniel

unread,
Oct 29, 2012, 1:43:04 AM10/29/12
to mongod...@googlegroups.com
Hi Mikhail,

When you had "slaveOk" enabled, were you ensuring that you had a "consistent request" by wrapping your described sequence of reads and writes in a db.requestStart() and db.requestDone(), as described in http://www.mongodb.org/display/DOCS/Java+Driver+Concurrency ? If not, I suggest you do that and see if that fixes the problem.

Kind regards,
Ian

Mikhail Strebkov

unread,
Oct 29, 2012, 2:45:28 AM10/29/12
to mongod...@googlegroups.com
Hi Ian,

We always use WriteConcern.SAFE, so the Java driver should do that for us (as mentioned in the docs here http://www.mongodb.org/display/DOCS/Java+Driver+Concurrency).

Thanks,
Mikhail

Ian Daniel

unread,
Nov 1, 2012, 1:12:03 AM11/1/12
to mongod...@googlegroups.com
Hi Mikhail,

Using WriteConcern.SAFE ensures that for a single request (i.e. a read or write), the driver calls getLastError() on the same connection used for the request. WriteConcern.SAFE does not ensure that same connection is used across multiple requests. db.requestStart() tries to ensure that a single connection is used until the db.requestDone().

From the description of your problem, it looks like more than one connection is being used across your requests, and that you are doing a getLastError() call on a different connection that the previous request. This can happen when slaveOk is set to true because a read can be directed to a secondary, whereas the write must be directed to a primary.

There is a broader issue here related to your trying to use slaveOk in a sharded environment...

One intent of the db.requestStart() and db.requestDone() calls is to try to address the problem where a write followed by a read can be sent to different servers when slaveOk is enabled. This only works, however, when your client is talking directly to the replica set mongod instances. It will not work when you are sharding, and hence the client is talking with mongos instances. Requests will be handled sequentially by a mongos, but there is no guarantee that writes and reads will be directed to the same underlying mongod node if you have slaveOk enabled. Consequently there is no guarantee that you will read back the same data that you have just written.

You were wise to disable slaveOk, and should continue to leave it disabled in a sharded environment in which it is important that you have consistent results across read and write operations (e.g. that you can read back data that you have just written).

Kind regards,
Ian

Mark Niel

unread,
Nov 2, 2012, 7:30:36 PM11/2/12
to mongod...@googlegroups.com
I think someone was looking into it and it was the application trying to insert the same data twice. If that was the case, then its pretty silly. 
Reply all
Reply to author
Forward
0 new messages