Sporadic errors "oups. [0-9]+ not found!"

280 views
Skip to the first unread message

Matej Krchniak

unread,
3 Oct 2016, 10:13:3003/10/2016
to ReactiveMongo - http://reactivemongo.org
Hi,

we recently encountered multiple failures when accessing MongoDB under heavy load. All failures have this error message in common "oups. [0-9]+ not found!". Our configuration:
  1. MongoDB 3.2.9
  2. ReactiveMongo 0.11.7
  3. Usage of multiple connections MongoConnection-s to MongoDB.
When investigating this issue I bumped into this issue: https://github.com/ReactiveMongo/ReactiveMongo/issues/441 (possibly related) and stackoverflow question (where the issue was not reproduced). With the bit of luck we bumped into this issue again with trace logs turned on (see attached file). All failures have in common, that multiple requests were registered with same requestID within same connection:

2016-09-30 09:18:24,165 TRACE r.core.actors.MongoDBSystem  - registering awaiting response for requestID 9579, awaitingResponses: Map(9569 -> AwaitingResponse(9569,-1977729056,List(scala.concurrent.impl.CallbackRunnable@57db3e7c),false,false), 9571 -> AwaitingResponse(9571,1638072636,List(scala.concurrent.impl.CallbackRunnable@24f5a9e1),false,false), 9574 -> AwaitingResponse(9574,721036470,List(scala.concurrent.impl.CallbackRunnable@3e2a9d8a),false,false), 9576 -> AwaitingResponse(9576,1982732805,List(scala.concurrent.impl.CallbackRunnable@2882450d),false,false), 9578 -> AwaitingResponse(9578,-733406577,List(scala.concurrent.impl.CallbackRunnable@2624a6bf),false,false), 9579 -> AwaitingResponse(9579,-371543197,List(scala.concurrent.impl.CallbackRunnable@2f97b93b),false,false))
2016-09-30 09:18:24,165 TRACE r.core.actors.MongoDBSystem  - registering awaiting response for requestID 9579, awaitingResponses: Map(9569 -> AwaitingResponse(9569,-1977729056,List(scala.concurrent.impl.CallbackRunnable@57db3e7c),false,false), 9571 -> AwaitingResponse(9571,1638072636,List(scala.concurrent.impl.CallbackRunnable@24f5a9e1),false,false), 9574 -> AwaitingResponse(9574,721036470,List(scala.concurrent.impl.CallbackRunnable@3e2a9d8a),false,false), 9576 -> AwaitingResponse(9576,1982732805,List(scala.concurrent.impl.CallbackRunnable@2882450d),false,false), 9578 -> AwaitingResponse(9578,-733406577,List(scala.concurrent.impl.CallbackRunnable@2624a6bf),false,false), 9579 -> AwaitingResponse(9579,-791409649,List(scala.concurrent.impl.CallbackRunnable@7fab2819),false,false), 9577 -> AwaitingResponse(9577,1469185972,List(scala.concurrent.impl.CallbackRunnable@45bbb78e),false,false))

First Future was replaced by Future for second request. So there is no surprise what happened next: First request will be completed successfully, because there is registered AwaitingResponse for this requestID

2016-09-30 09:18:24,169 DEBUG r.core.actors.MongoDBSystem  - Got a response from -371543197! Will give back message=Response(MessageHeader(926,1632446,9579,1),Reply(8,0,0,1),LittleEndianHeapChannelBuffer(ridx=36, widx=926, cap=926),ResponseInfo(-371543197)) to promise List(scala.concurrent.impl.CallbackRunnable@7fab2819)
2016-09-30 09:18:24,169 TRACE r.core.actors.MongoDBSystem  - {9579} [MongoDB26 Write Op response] sending a success!

and the second response from MongoDB logs error, because the Future with that requestID is no longer registered
:

2016-09-30 09:18:24,171 ERROR r.core.actors.MongoDBSystem  - oups. 9579 not found! complete message is Response(MessageHeader(71,1632448,9579,1),Reply(8,0,0,1),LittleEndianHeapChannelBuffer(ridx=36, widx=71, cap=71),ResponseInfo(-791409649))


These error messages pointed me firstly to concurrent usage of awaitingResponses, but it looks ok. Then it took me to object RequestId which is used for generating request ids. In the case of one MongoConnection in application everything works fine. But when multiple MongoConnection-s are in mix, counters are modified concurrently from multiple connections. Depending on your luck it can fail (or not):
  1. Same requestID is generated for both connections (250 times from 831 048 requests)
    • This is not fatal, because each connection has it`s own awaitingResponses.
  2. Same requestID is generated for one connection (because second connection is meddling with counter concurrently)(7 times from 831 048 requests)
    • It`s pretty rare, because somehow second connection damages counter, so first connection will emit another request with same requestID. I cannot wrap my head around, how counter must be accessed concurrently.
    • Common error in this case was, that find result was expected, but write result arrived (e.g. The key '_id' could not be found in this document or array).
    • Overwritten Future will never be completed. When akka stream is used to read data, than whole stream will stuck (which is fatal for our stream processing system).

Is there some reason why RequestId is global instead of local to each MongoDBSystem as awaitingResponses?


Thanks.
9579

Cédric Chantepie

unread,
3 Oct 2016, 11:00:0103/10/2016
to ReactiveMongo - http://reactivemongo.org
Hi,

First try to reproduce it against the up-to-date release, either 0.11.14 or 0.12-RC4 .

Best regards

Matej Krchniak

unread,
4 Oct 2016, 08:10:5904/10/2016
to ReactiveMongo - http://reactivemongo.org
Hi,

we reproduced it against latest stable version 0.11.14. Same problem occurs (see attached log for more details). But this time with more information about concurrent access to requestId counters.

Connection-3 is assigning requestIDs: 5489,5490, 5491 and 5492

2016-10-04 11:16:12,889 TRACE r.core.actors.MongoDBSystem  - [Supervisor-1/Connection-3] Received a request expecting a response (5489): RequestMakerExpectingResponse(RequestMaker(Query(0,database.$cmd,0,1),BufferSequence(DynamicChannelBuffer(ridx=0, widx=35, cap=64),WrappedArray()),Primary,None),false)
2016-10-04 11:16:12,889 TRACE r.core.actors.MongoDBSystem  - [Supervisor-1/Connection-3] Sending request (5489) expecting response by connection Connection([id: 0xd76f71cf, /10.20.12.84:51088 => mongodb/10.20.6.76:27017],Connected,Set(Authenticated(database,database)),None) of node mongodb:27017: Request(5489, 0, Query(0,database.$cmd,0,1), Primary, None)
2016-10-04 11:16:12,889 TRACE r.core.actors.MongoDBSystem  - [Supervisor-1/Connection-3] Registering awaiting response for requestID 5489, awaitingResponses: ...
2016-10-04 11:16:12,889 TRACE r.core.actors.MongoDBSystem  - [Supervisor-1/Connection-3] Received a request expecting a response (5490): RequestMakerExpectingResponse(RequestMaker(Query(0,database.$cmd,0,1),BufferSequence(DynamicChannelBuffer(ridx=0, widx=2563, cap=4096),WrappedArray()),Primary,None),false)
2016-10-04 11:16:12,889 TRACE r.core.actors.MongoDBSystem  - [Supervisor-1/Connection-3] Sending request (5490) expecting response by connection Connection([id: 0x53d13fb4, /10.20.12.84:51089 => mongodb/10.20.6.76:27017],Connected,Set(Authenticated(database,database)),None) of node mongodb:27017: Request(5490, 0, Query(0,database.$cmd,0,1), Primary, None)
2016-10-04 11:16:12,889 TRACE r.core.actors.MongoDBSystem  - [Supervisor-1/Connection-3] Registering awaiting response for requestID 5490, awaitingResponses: Map(5487 -> AwaitingResponse(5487,-1258257590,List(scala.concurrent.impl.CallbackRunnable@1de1f67a),false,false), 5488 -> AwaitingResponse(5488,683033596,List(scala.concurrent.impl.CallbackRunnable@3b6f807a),false,false), 5489 -> AwaitingResponse(5489,-680562225,List(scala.concurrent.impl.CallbackRunnable@3b886218),false,false), 5490 -> AwaitingResponse(5490,1406222260,List(scala.concurrent.impl.CallbackRunnable@331616e2),false,false))
2016-10-04 11:16:12,889 TRACE r.core.actors.MongoDBSystem  - [Supervisor-1/Connection-3] Received a request expecting a response (5491): RequestMakerExpectingResponse(RequestMaker(Query(0,database.$cmd,0,1),BufferSequence(DynamicChannelBuffer(ridx=0, widx=168, cap=256),WrappedArray()),Primary,None),false)
2016-10-04 11:16:12,889 TRACE r.core.actors.MongoDBSystem  - [Supervisor-1/Connection-3] Sending request (5491) expecting response by connection Connection([id: 0xe9e760c2, /10.20.12.84:51090 => mongodb/10.20.6.76:27017],Connected,Set(Authenticated(database,database)),None) of node mongodb:27017: Request(5491, 0, Query(0,database.$cmd,0,1), Primary, None)
2016-10-04 11:16:12,889 TRACE r.core.actors.MongoDBSystem  - [Supervisor-1/Connection-3] Registering awaiting response for requestID 5491, awaitingResponses: ...
2016-10-04 11:16:12,890 TRACE r.core.actors.MongoDBSystem  - [Supervisor-1/Connection-3] Received a request expecting a response (5492): RequestMakerExpectingResponse(RequestMaker(Query(0,database.ZS05_12-2014,0,1),BufferSequence(DynamicChannelBuffer(ridx=0, widx=83, cap=128),WrappedArray()),Primary,None),false)
2016-10-04 11:16:12,890 TRACE r.core.actors.MongoDBSystem  - [Supervisor-1/Connection-3] Sending request (5492) expecting response by connection Connection([id: 0x60c2b0b1, /10.20.12.84:51039 => mongodb/10.20.6.76:27017],Connected,Set(Authenticated(database,database)),None) of node mongodb:27017: Request(5492, 0, Query(0,database.ZS05_12-2014,0,1), Primary, None)
2016-10-04 11:16:12,890 TRACE r.core.actors.MongoDBSystem  - [Supervisor-1/Connection-3] Registering awaiting response for requestID 5492, awaitingResponses: ...


Concurrently Connection-2 is using RequestId. Id 5489 is used again in another connection (thread was probably suspended while Connection-3 generated ids 5489-5492).

2016-10-04 11:16:12,890 TRACE r.core.actors.MongoDBSystem  - [Supervisor-1/Connection-2] Received a request expecting a response (5489): RequestMakerExpectingResponse(RequestMaker(Query(0,database.userRoles,0,1),BufferSequence(DynamicChannelBuffer(ridx=0, widx=72, cap=128),WrappedArray()),Primary,None),false)
2016-10-04 11:16:12,890 TRACE r.core.actors.MongoDBSystem  - [Supervisor-1/Connection-2] Sending request (5489) expecting response by connection Connection([id: 0x78acf10f, /10.20.12.84:51078 => mongodb/10.20.6.76:27017],Connected,Set(Authenticated(database,database)),None) of node mongodb:27017: Request(5489, 0, Query(0,database.userRoles,0,1), Primary, None)
2016-10-04 11:16:12,890 TRACE r.core.actors.MongoDBSystem  - [Supervisor-1/Connection-2] Registering awaiting response for requestID 5489, awaitingResponses: Map(5482 -> AwaitingResponse(5482,2019600610,List(scala.concurrent.impl.CallbackRunnable@2122246c),false,false), 5483 -> AwaitingResponse(5483,-165384027,List(scala.concurrent.impl.CallbackRunnable@34fd02ad),false,false), 5486 -> AwaitingResponse(5486,-1222446930,List(scala.concurrent.impl.CallbackRunnable@39c98f46),false,false), 5489 -> AwaitingResponse(5489,2024599823,List(scala.concurrent.impl.CallbackRunnable@2cede403),false,false))


But requesting ID 5489 from Connection-2 resets RequestId.common counter back, so Connection-3 assigns id 5490 again for another request (one millisecond after first request with same id was sent) and overwrite first request in awaitingResponses:

2016-10-04 11:16:12,890 TRACE r.core.actors.MongoDBSystem  - [Supervisor-1/Connection-3] Received a request expecting a response (5490): RequestMakerExpectingResponse(RequestMaker(Query(0,database.$cmd,0,1),BufferSequence(DynamicChannelBuffer(ridx=0, widx=168, cap=256),WrappedArray()),Primary,None),false)
2016-10-04 11:16:12,890 TRACE r.core.actors.MongoDBSystem  - [Supervisor-1/Connection-3] Sending request (5490) expecting response by connection Connection([id: 0x85194670, /10.20.12.84:51082 => mongodb/10.20.6.76:27017],Connected,Set(Authenticated(database,database)),None) of node mongodb:27017: Request(5490, 0, Query(0,database.$cmd,0,1), Primary, None)
2016-10-04 11:16:12,890 TRACE r.core.actors.MongoDBSystem  - [Supervisor-1/Connection-3] Registering awaiting response for requestID 5490, awaitingResponses: Map(5488 -> AwaitingResponse(5488,683033596,List(scala.concurrent.impl.CallbackRunnable@3b6f807a),false,false), 5489 -> AwaitingResponse(5489,-680562225,List(scala.concurrent.impl.CallbackRunnable@3b886218),false,false), 5490 -> AwaitingResponse(5490,-2061941136,List(scala.concurrent.impl.CallbackRunnable@74bd3caa),false,false), 5491 -> AwaitingResponse(5491,-370712382,List(scala.concurrent.impl.CallbackRunnable@10b8aad4),false,false), 5492 -> AwaitingResponse(5492,1623371953,List(scala.concurrent.impl.CallbackRunnable@6955fd6),false,false))


Two responses for MongoDB arrived. One completes Future and second fails with oups:
2016-10-04 11:16:12,890 TRACE r.core.actors.MongoDBSystem  - [Supervisor-1/Connection-3] Got a response from 1406222260 to 5490! Will give back message=Response(MessageHeader(71,4217058,5490,1),Reply(8,0,0,1),LittleEndianHeapChannelBuffer(ridx=36, widx=71, cap=71),ResponseInfo(1406222260)) to promise 215812231
2016-10-04 11:16:12,890 TRACE r.core.actors.MongoDBSystem  - [Supervisor-1/Connection-3] {5490} [MongoDB26 Write Op response] sending a success!

2016-10-04 11:16:12,892 ERROR r.core.actors.MongoDBSystem  - [Supervisor-1/Connection-3] Oups. 5490 not found! complete message is Response(MessageHeader(71,4217062,5490,1),Reply(8,0,0,1),LittleEndianHeapChannelBuffer(ridx=36, widx=71, cap=71),ResponseInfo(-2061941136))

I think these logs provided enough information what is causing oups-es under heavy load when multiple connections are used and confirmed mi suspicion.

5490.partial.log

Cédric Chantepie

unread,
5 Oct 2016, 15:37:0705/10/2016
to ReactiveMongo - http://reactivemongo.org
You can try with 0.12.0-SNAPSHOT

markus.sa...@sharpershape.com

unread,
12 Dec 2016, 04:36:5012/12/2016
to ReactiveMongo - http://reactivemongo.org
I'm getting this with 0.12.0.

Retried requests are using channelId as key when it should be requestId.

Victor Caballero

unread,
26 Jul 2017, 02:50:5626/07/2017
to ReactiveMongo - http://reactivemongo.org
I am having the same problem. Did you solve this issue?

Mehmet Ali GÖZAYDIN

unread,
9 Mar 2018, 04:43:0009/03/2018
to ReactiveMongo - http://reactivemongo.org
Anyone looking into this? I am having the same error too

Reactive Mongo Version: 0.13
Scala Version: 2.11.8
Mongodb Version: 3.4.13

Karl Kröber

unread,
19 Nov 2018, 11:20:4819/11/2018
to ReactiveMongo - http://reactivemongo.org
Same thing here. This needs a fix.

Cédric Chantepie

unread,
19 Nov 2018, 13:05:0519/11/2018
to ReactiveMongo - http://reactivemongo.org
If using the same versions as the previous messages, note that the latest is 0.16 and some fixes as been applied since 0.13.
Reply all
Reply to author
Forward
0 new messages