I process a collection using subscriptions, pretty much exactly as described in Listing 5.15 in
.
The processing changes the documents which are 10-20 kB in size. Each batch (4096 items) is around 60 MB.
The processing works for a while, and then the program fails with the following exception:
Unhandled Exception: System.AggregateException: One or more errors occurred. ---> System.AggregateException: One or more errors occurred. ---> System.AggregateException: One or more errors occurred. ---> Raven.Client.Exceptions.Documents.Subscriptions.SubscriptionDoesNotBelongToNodeException: Subscription With Id '2059' cannot be processed by current node, it will be redirected to
at Raven.Client.Documents.Subscriptions.Subscription`1.AssertConnectionState(SubscriptionConnectionServerMessage connectionStatus)
at Raven.Client.Documents.Subscriptions.Subscription`1.<ReadSingleSubscriptionBatchFromServer>d__31.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task)
at Raven.Client.Documents.Subscriptions.Subscription`1.<ProcessSubscriptionAsync>d__30.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Raven.Client.Documents.Subscriptions.Subscription`1.<RunSubscriptionAsync>d__36.MoveNext()
--- End of inner exception stack trace ---
at Raven.Client.Documents.Subscriptions.Subscription`1.ShouldTryToReconnect(Exception ex)
at Raven.Client.Documents.Subscriptions.Subscription`1.<RunSubscriptionAsync>d__36.MoveNext()
--- End of inner exception stack trace ---
at Raven.Client.Documents.Subscriptions.Subscription`1.<RunSubscriptionAsync>d__36.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.GetResult()
at my code
--- End of inner exception stack trace ---
at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
at System.Threading.Tasks.Task.Wait()
at my code
{
"LeaderNode": "A",
"Term": 7,
"Suspended": false,
"Iteration": 2400,
"ObserverLog": [
{
"Date": "2017-10-20T07:38:27.9245983Z",
"Iteration": 6,
"Database": "data-prod",
"Message": "Node A is currently not responding and moved to rehab"
},
{
"Date": "2017-10-20T07:38:27.9261508Z",
"Iteration": 6,
"Database": "data-subset",
"Message": "Node A is currently not responding and moved to rehab"
},
{
"Date": "2017-10-20T07:38:27.9263065Z",
"Iteration": 6,
"Database": "data-subsub",
"Message": "Node A is currently not responding and moved to rehab"
},
{
"Date": "2017-10-20T07:38:27.9263554Z",
"Iteration": 6,
"Database": "TestFacetPerformance",
"Message": "Node A is currently not responding and moved to rehab"
},
{
"Date": "2017-10-20T07:38:27.9264052Z",
"Iteration": 6,
"Database": "TestInvalidIndex",
"Message": "Node A is currently not responding and moved to rehab"
},
{
"Date": "2017-10-20T07:38:32.7425663Z",
"Iteration": 13,
"Database": "data-subsub",
"Message": "All nodes are not responding, promoting A from rehab"
},
{
"Date": "2017-10-20T07:38:32.7430925Z",
"Iteration": 13,
"Database": "TestFacetPerformance",
"Message": "All nodes are not responding, promoting A from rehab"
},
{
"Date": "2017-10-20T07:38:32.7437067Z",
"Iteration": 13,
"Database": "TestInvalidIndex",
"Message": "All nodes are not responding, promoting A from rehab"
},
{
"Date": "2017-10-20T07:38:33.2240591Z",
"Iteration": 14,
"Database": "data-subsub",
"Message": "Node A is online"
},
{
"Date": "2017-10-20T07:38:33.2242940Z",
"Iteration": 14,
"Database": "TestFacetPerformance",
"Message": "Node A is online"
},
{
"Date": "2017-10-20T07:38:33.2245054Z",
"Iteration": 14,
"Database": "TestInvalidIndex",
"Message": "Node A is online"
},
{
"Date": "2017-10-20T07:38:34.7067482Z",
"Iteration": 17,
"Database": "data-subset",
"Message": "All nodes are not responding, promoting A from rehab"
},
{
"Date": "2017-10-20T07:38:35.2091272Z",
"Iteration": 18,
"Database": "data-subset",
"Message": "Node A is online"
},
{
"Date": "2017-10-20T07:39:11.7202646Z",
"Iteration": 40,
"Database": "data-prod",
"Message": "All nodes are not responding, promoting A from rehab"
},
{
"Date": "2017-10-20T07:39:12.2193681Z",
"Iteration": 41,
"Database": "data-prod",
"Message": "Node A is online", "Comment": "End of database startup, all databases loaded. Processing started (subscription created) at 07:39:57Z"
},
{
"Date": "2017-10-20T07:47:59.9453917Z",
"Iteration": 1092,
"Database": "data-prod",
"Message": "Node A is currently not responding and moved to rehab", "Comment": "This is when the processing stops."
},
{
"Date": "2017-10-20T07:47:59.9457484Z",
"Iteration": 1092,
"Database": "data-subset",
"Message": "Node A is currently not responding and moved to rehab"
},
{
"Date": "2017-10-20T07:47:59.9458942Z",
"Iteration": 1092,
"Database": "data-subsub",
"Message": "Node A is currently not responding and moved to rehab"
},
{
"Date": "2017-10-20T07:47:59.9459562Z",
"Iteration": 1092,
"Database": "TestFacetPerformance",
"Message": "Node A is currently not responding and moved to rehab"
},
{
"Date": "2017-10-20T07:47:59.9460398Z",
"Iteration": 1092,
"Database": "TestInvalidIndex",
"Message": "Node A is currently not responding and moved to rehab"
},
{
"Date": "2017-10-20T07:48:04.1991157Z",
"Iteration": 1100,
"Database": "data-prod",
"Message": "All nodes are not responding, promoting A from rehab"
},
{
"Date": "2017-10-20T07:48:04.1998809Z",
"Iteration": 1100,
"Database": "data-subset",
"Message": "All nodes are not responding, promoting A from rehab"
},
{
"Date": "2017-10-20T07:48:04.2004433Z",
"Iteration": 1100,
"Database": "data-subsub",
"Message": "All nodes are not responding, promoting A from rehab"
},
{
"Date": "2017-10-20T07:48:04.2009312Z",
"Iteration": 1100,
"Database": "TestFacetPerformance",
"Message": "All nodes are not responding, promoting A from rehab"
},
{
"Date": "2017-10-20T07:48:04.2014025Z",
"Iteration": 1100,
"Database": "TestInvalidIndex",
"Message": "All nodes are not responding, promoting A from rehab"
},
{
"Date": "2017-10-20T07:48:04.6984335Z",
"Iteration": 1101,
"Database": "data-prod",
"Message": "Node A is online"
},
{
"Date": "2017-10-20T07:48:04.6987047Z",
"Iteration": 1101,
"Database": "data-subset",
"Message": "Node A is online"
},
{
"Date": "2017-10-20T07:48:04.6988461Z",
"Iteration": 1101,
"Database": "data-subsub",
"Message": "Node A is online"
},
{
"Date": "2017-10-20T07:48:04.6989036Z",
"Iteration": 1101,
"Database": "TestFacetPerformance",
"Message": "Node A is online"
},
{
"Date": "2017-10-20T07:48:04.6989633Z",
"Iteration": 1101,
"Database": "TestInvalidIndex",
"Message": "Node A is online"
}
]
}
It seems that the database experiences some kind of connection problem and disconnects the client. Any ideas about how to investigate this without creating a synthetic test case?