Hi,
we reproduced it against latest stable version 0.11.14. Same problem occurs (see attached log for more details). But this time with more information about concurrent access to requestId counters.
Connection-3 is assigning requestIDs: 5489,5490, 5491 and 5492
2016-10-04 11:16:12,889 TRACE r.core.actors.MongoDBSystem - [Supervisor-1/Connection-3] Received a request expecting a response (5489): RequestMakerExpectingResponse(RequestMaker(Query(0,database.$cmd,0,1),BufferSequence(DynamicChannelBuffer(ridx=0, widx=35, cap=64),WrappedArray()),Primary,None),false)
2016-10-04 11:16:12,889 TRACE r.core.actors.MongoDBSystem - [Supervisor-1/Connection-3] Sending request (5489) expecting response by connection Connection([id: 0xd76f71cf, /10.20.12.84:51088 => mongodb/10.20.6.76:27017],Connected,Set(Authenticated(database,database)),None) of node mongodb:27017: Request(5489, 0, Query(0,database.$cmd,0,1), Primary, None)
2016-10-04 11:16:12,889 TRACE r.core.actors.MongoDBSystem - [Supervisor-1/Connection-3] Registering awaiting response for requestID 5489, awaitingResponses: ...
2016-10-04 11:16:12,889 TRACE r.core.actors.MongoDBSystem - [Supervisor-1/Connection-3] Received a request expecting a response (5490): RequestMakerExpectingResponse(RequestMaker(Query(0,database.$cmd,0,1),BufferSequence(DynamicChannelBuffer(ridx=0, widx=2563, cap=4096),WrappedArray()),Primary,None),false)
2016-10-04 11:16:12,889 TRACE r.core.actors.MongoDBSystem - [Supervisor-1/Connection-3] Sending request (5490) expecting response by connection Connection([id: 0x53d13fb4, /10.20.12.84:51089 => mongodb/10.20.6.76:27017],Connected,Set(Authenticated(database,database)),None) of node mongodb:27017: Request(5490, 0, Query(0,database.$cmd,0,1), Primary, None)
2016-10-04 11:16:12,889 TRACE r.core.actors.MongoDBSystem - [Supervisor-1/Connection-3] Registering awaiting response for requestID 5490, awaitingResponses: Map(5487 -> AwaitingResponse(5487,-1258257590,List(scala.concurrent.impl.CallbackRunnable@1de1f67a),false,false), 5488 -> AwaitingResponse(5488,683033596,List(scala.concurrent.impl.CallbackRunnable@3b6f807a),false,false), 5489 -> AwaitingResponse(5489,-680562225,List(scala.concurrent.impl.CallbackRunnable@3b886218),false,false), 5490 -> AwaitingResponse(5490,1406222260,List(scala.concurrent.impl.CallbackRunnable@331616e2),false,false))
2016-10-04 11:16:12,889 TRACE r.core.actors.MongoDBSystem - [Supervisor-1/Connection-3] Received a request expecting a response (5491): RequestMakerExpectingResponse(RequestMaker(Query(0,database.$cmd,0,1),BufferSequence(DynamicChannelBuffer(ridx=0, widx=168, cap=256),WrappedArray()),Primary,None),false)
2016-10-04 11:16:12,889 TRACE r.core.actors.MongoDBSystem - [Supervisor-1/Connection-3] Sending request (5491) expecting response by connection Connection([id: 0xe9e760c2, /10.20.12.84:51090 => mongodb/10.20.6.76:27017],Connected,Set(Authenticated(database,database)),None) of node mongodb:27017: Request(5491, 0, Query(0,database.$cmd,0,1), Primary, None)
2016-10-04 11:16:12,889 TRACE r.core.actors.MongoDBSystem - [Supervisor-1/Connection-3] Registering awaiting response for requestID 5491, awaitingResponses: ...
2016-10-04 11:16:12,890 TRACE r.core.actors.MongoDBSystem - [Supervisor-1/Connection-3] Received a request expecting a response (5492): RequestMakerExpectingResponse(RequestMaker(Query(0,database.ZS05_12-2014,0,1),BufferSequence(DynamicChannelBuffer(ridx=0, widx=83, cap=128),WrappedArray()),Primary,None),false)
2016-10-04 11:16:12,890 TRACE r.core.actors.MongoDBSystem - [Supervisor-1/Connection-3] Sending request (5492) expecting response by connection Connection([id: 0x60c2b0b1, /10.20.12.84:51039 => mongodb/10.20.6.76:27017],Connected,Set(Authenticated(database,database)),None) of node mongodb:27017: Request(5492, 0, Query(0,database.ZS05_12-2014,0,1), Primary, None)
2016-10-04 11:16:12,890 TRACE r.core.actors.MongoDBSystem - [Supervisor-1/Connection-3] Registering awaiting response for requestID 5492, awaitingResponses: ...Concurrently
Connection-2 is using
RequestId. Id 5489 is used again in another connection (thread was probably suspended while
Connection-3 generated ids 5489-5492).
2016-10-04 11:16:12,890 TRACE r.core.actors.MongoDBSystem - [Supervisor-1/Connection-2] Received a request expecting a response (5489): RequestMakerExpectingResponse(RequestMaker(Query(0,database.userRoles,0,1),BufferSequence(DynamicChannelBuffer(ridx=0, widx=72, cap=128),WrappedArray()),Primary,None),false)
2016-10-04 11:16:12,890 TRACE r.core.actors.MongoDBSystem - [Supervisor-1/Connection-2] Sending request (5489) expecting response by connection Connection([id: 0x78acf10f, /10.20.12.84:51078 => mongodb/10.20.6.76:27017],Connected,Set(Authenticated(database,database)),None) of node mongodb:27017: Request(5489, 0, Query(0,database.userRoles,0,1), Primary, None)
2016-10-04 11:16:12,890 TRACE r.core.actors.MongoDBSystem - [Supervisor-1/Connection-2] Registering awaiting response for requestID 5489, awaitingResponses: Map(5482 -> AwaitingResponse(5482,2019600610,List(scala.concurrent.impl.CallbackRunnable@2122246c),false,false), 5483 -> AwaitingResponse(5483,-165384027,List(scala.concurrent.impl.CallbackRunnable@34fd02ad),false,false), 5486 -> AwaitingResponse(5486,-1222446930,List(scala.concurrent.impl.CallbackRunnable@39c98f46),false,false), 5489 -> AwaitingResponse(5489,2024599823,List(scala.concurrent.impl.CallbackRunnable@2cede403),false,false))But requesting ID 5489 from
Connection-2 resets
RequestId.common counter back, so
Connection-3 assigns id
5490 again for another request (one millisecond after first request with same id was sent) and overwrite first request in
awaitingResponses:
2016-10-04 11:16:12,890 TRACE r.core.actors.MongoDBSystem - [Supervisor-1/Connection-3] Received a request expecting a response (5490): RequestMakerExpectingResponse(RequestMaker(Query(0,database.$cmd,0,1),BufferSequence(DynamicChannelBuffer(ridx=0, widx=168, cap=256),WrappedArray()),Primary,None),false)
2016-10-04 11:16:12,890 TRACE r.core.actors.MongoDBSystem - [Supervisor-1/Connection-3] Sending request (5490) expecting response by connection Connection([id: 0x85194670, /10.20.12.84:51082 => mongodb/10.20.6.76:27017],Connected,Set(Authenticated(database,database)),None) of node mongodb:27017: Request(5490, 0, Query(0,database.$cmd,0,1), Primary, None)
2016-10-04 11:16:12,890 TRACE r.core.actors.MongoDBSystem - [Supervisor-1/Connection-3] Registering awaiting response for requestID 5490, awaitingResponses: Map(5488 -> AwaitingResponse(5488,683033596,List(scala.concurrent.impl.CallbackRunnable@3b6f807a),false,false), 5489 -> AwaitingResponse(5489,-680562225,List(scala.concurrent.impl.CallbackRunnable@3b886218),false,false), 5490 -> AwaitingResponse(5490,-2061941136,List(scala.concurrent.impl.CallbackRunnable@74bd3caa),false,false), 5491 -> AwaitingResponse(5491,-370712382,List(scala.concurrent.impl.CallbackRunnable@10b8aad4),false,false), 5492 -> AwaitingResponse(5492,1623371953,List(scala.concurrent.impl.CallbackRunnable@6955fd6),false,false))Two responses for MongoDB arrived. One completes
Future and second fails with oups:
2016-10-04 11:16:12,890 TRACE r.core.actors.MongoDBSystem - [Supervisor-1/Connection-3] Got a response from 1406222260 to 5490! Will give back message=Response(MessageHeader(71,4217058,5490,1),Reply(8,0,0,1),LittleEndianHeapChannelBuffer(ridx=36, widx=71, cap=71),ResponseInfo(1406222260)) to promise 215812231
2016-10-04 11:16:12,890 TRACE r.core.actors.MongoDBSystem - [Supervisor-1/Connection-3] {5490} [MongoDB26 Write Op response] sending a success!2016-10-04 11:16:12,892 ERROR r.core.actors.MongoDBSystem - [Supervisor-1/Connection-3] Oups. 5490 not found! complete message is Response(MessageHeader(71,4217062,5490,1),Reply(8,0,0,1),LittleEndianHeapChannelBuffer(ridx=36, widx=71, cap=71),ResponseInfo(-2061941136))I think these logs provided enough information what is causing oups-es under heavy load when multiple connections are used and confirmed mi suspicion.