NServiceBus service won't stop occassionally

224 views
Skip to first unread message

Alexey Sednev

unread,
Dec 18, 2015, 6:55:30 AM12/18/15
to Particular Software
Helpful information to include
Product name: NServiceBus 5.2.9, NServiceBus.Host 6.0.0
Version:
Stacktrace:
Description:

Occasionally NServiceBus service running using NServiceBus.Host won't stop without killing the process. Here are the logs:

2015-12-17 22:05:21.3882|PAIAPPMV107|10000|12|Info|NServiceBus.Unicast.UnicastBus|Initiating shutdown.|

2015-12-17 22:05:21.4038|PAIAPPMV107|10000|12|Info|NServiceBus.Satellites.SatelliteLauncher|Stopped 1/5 NServiceBus.SecondLevelRetries.SecondLevelRetriesProcessor, NServiceBus.Core, Version=5.0.0.0, Culture=neutral, PublicKeyToken=9fc386479f8a226c satellite|

2015-12-17 22:05:21.4038|PAIAPPMV107|10000|52|Info|NServiceBus.Satellites.SatelliteLauncher|Stopped 3/5 NServiceBus.Timeout.Hosting.Windows.TimeoutMessageProcessor, NServiceBus.Core, Version=5.0.0.0, Culture=neutral, PublicKeyToken=9fc386479f8a226c satellite|

2015-12-17 22:05:21.4038|PAIAPPMV107|10000|34|Info|NServiceBus.Satellites.SatelliteLauncher|Stopped 4/5 NServiceBus.Distributor.MSMQ.DistributorReadyMessageProcessor, NServiceBus.Distributor.MSMQ, Version=5.0.0.0, Culture=neutral, PublicKeyToken=9fc386479f8a226c satellite|

2015-12-17 22:05:21.4038|PAIAPPMV107|10000|41|Info|NServiceBus.Satellites.SatelliteLauncher|Stopped 5/5 NServiceBus.Distributor.MSMQ.DistributorSatellite, NServiceBus.Distributor.MSMQ, Version=5.0.0.0, Culture=neutral, PublicKeyToken=9fc386479f8a226c satellite|


As you can see, step 2/5 isn't stopping. Here are logs from successful stop (same service few minutes later):


2015-12-17 22:32:15.2133|PAIAPPMV107|9500|20|Info|NServiceBus.Unicast.UnicastBus|Initiating shutdown.|

2015-12-17 22:32:15.2289|PAIAPPMV107|9500|20|Info|NServiceBus.Satellites.SatelliteLauncher|Stopped 1/5 NServiceBus.SecondLevelRetries.SecondLevelRetriesProcessor, NServiceBus.Core, Version=5.0.0.0, Culture=neutral, PublicKeyToken=9fc386479f8a226c satellite|

2015-12-17 22:32:15.2289|PAIAPPMV107|9500|5|Info|NServiceBus.Satellites.SatelliteLauncher|Stopped 3/5 NServiceBus.Timeout.Hosting.Windows.TimeoutMessageProcessor, NServiceBus.Core, Version=5.0.0.0, Culture=neutral, PublicKeyToken=9fc386479f8a226c satellite|

2015-12-17 22:32:15.2289|PAIAPPMV107|9500|9|Info|NServiceBus.Satellites.SatelliteLauncher|Stopped 4/5 NServiceBus.Distributor.MSMQ.DistributorReadyMessageProcessor, NServiceBus.Distributor.MSMQ, Version=5.0.0.0, Culture=neutral, PublicKeyToken=9fc386479f8a226c satellite|

2015-12-17 22:32:15.8373|PAIAPPMV107|9500|15|Info|NServiceBus.Satellites.SatelliteLauncher|Stopped 2/5 NServiceBus.Timeout.Hosting.Windows.TimeoutDispatcherProcessor, NServiceBus.Core, Version=5.0.0.0, Culture=neutral, PublicKeyToken=9fc386479f8a226c satellite|

2015-12-17 22:32:22.4050|PAIAPPMV107|9500|41|Info|NServiceBus.Distributor.MSMQ.MsmqWorkerAvailabilityManager|NextAvailableWorker Exception|

2015-12-17 22:32:22.4206|PAIAPPMV107|9500|38|Info|NServiceBus.Distributor.MSMQ.MsmqWorkerAvailabilityManager|NextAvailableWorker Exception|

2015-12-17 22:32:22.4206|PAIAPPMV107|9500|58|Info|NServiceBus.Distributor.MSMQ.MsmqWorkerAvailabilityManager|NextAvailableWorker Exception|

2015-12-17 22:32:22.4206|PAIAPPMV107|9500|11|Info|NServiceBus.Distributor.MSMQ.MsmqWorkerAvailabilityManager|NextAvailableWorker Exception|

2015-12-17 22:32:32.4203|PAIAPPMV107|9500|60|Info|NServiceBus.Distributor.MSMQ.MsmqWorkerAvailabilityManager|NextAvailableWorker Exception|

2015-12-17 22:32:32.4203|PAIAPPMV107|9500|13|Info|NServiceBus.Satellites.SatelliteLauncher|Stopped 5/5 NServiceBus.Distributor.MSMQ.DistributorSatellite, NServiceBus.Distributor.MSMQ, Version=5.0.0.0, Culture=neutral, PublicKeyToken=9fc386479f8a226c satellite|

2015-12-17 22:32:32.4203|PAIAPPMV107|9500|20|Info|NServiceBus.Unicast.UnicastBus|Shutdown complete.|



Any suggestions would be welcomed.


Thanks

ramon...@particular.net

unread,
Dec 21, 2015, 3:22:12 AM12/21/15
to Particular Software


Hi Alexey,

I see that you are using MSMQ as the transport. Which persistence are you using?

We are trying to perform a graceful shutdown. Meaning that any handler that is currently being invoked will need to complete. We drain worker until no messages are being processed and then we quit. This can sometimes take a while.

How long do you usually wait until you decide to kill the service?


Based on your log file it seems that NServiceBus.Timeout.Hosting.Windows.TimeoutDispatcherProcessor is not stopping.

It is possible that you set the log level to DEBUG? It would be interested in seeing the shutdown section with this information as that might hint on what it blocking the shutdown. You can do this via configuration: http://docs.particular.net/nservicebus/logging/ 



Regards,
Ramon

Alexey Sednev

unread,
Dec 29, 2015, 1:47:03 AM12/29/15
to particula...@googlegroups.com
Hi Ramon,

I waited for close to 30 minutes before killing the process. I will try changing the log level to DEBUG when next opportunity presents itself and report back.

But you're right, it is the TimeoutDispatcher that is not stopping and there are bunch of messages in timeoutdispatcher queue on the distributor node, however there are no messages in the queue for messages on either distributor or workers.

Dave Cluderay

unread,
Jan 6, 2016, 6:51:06 AM1/6/16
to Particular Software
Hi,

I have a few services showing (I believe) the same issue.

It seems that the TimeoutPersisterReceiver starts polling for timeouts but never finishes.
The fact that timeout processing is stuck is not obvious until the service attempts to shut down (at which point it waits for the TimeoutPersisterProcessor, which in turn waits for the stuck TimeoutPersisterReceiver).
Ultimately the service process has to be killed and the service restarted.

Here's a short snippet from the log file:

2016-01-05T15:29:36.4418254+00:00 10/ Debug Polling for timeouts at 05/01/2016 15:29:36.
2016-01-05T15:29:37.7855122+00:00 10/ Debug Polling next retrieval is at 05/01/2016 15:30:37.
2016-01-05T15:30:38.7202241+00:00 10/ Debug Polling for timeouts at 05/01/2016 15:30:38.
2016-01-05T15:30:39.4545704+00:00 10/ Debug Polling next retrieval is at 05/01/2016 15:31:39.
2016-01-05T15:31:40.3893420+00:00 10/ Debug Polling for timeouts at 05/01/2016 15:31:40.
2016-01-05T15:40:44.4794787+00:00 37/ Information Initiating shutdown.
2016-01-05T15:40:44.4794787+00:00 37/ Debug Drain stopping 'Throttling Semaphore'.
2016-01-05T15:40:44.4794787+00:00 37/ Debug Claiming Semaphore thread 1/1.
2016-01-05T15:40:44.4794787+00:00 37/ Debug Releasing all claimed Semaphore threads.
2016-01-05T15:40:44.4794787+00:00 37/ Debug Ensuring IWantToRunWhenBusStartsAndStops.Start has been called.
2016-01-05T15:40:44.4794787+00:00 37/ Debug All IWantToRunWhenBusStartsAndStops.Start have completed now.
2016-01-05T15:40:44.4951051+00:00 50/ Debug Stopped NServiceBus.Unicast.Subscriptions.MessageDrivenSubscriptions.StorageInitializer, NServiceBus.Core, Version=5.0.0.0, Culture=neutral, PublicKeyToken=9fc386479f8a226c.
2016-01-05T15:40:44.4951051+00:00 23/ Debug Stopped NServiceBus.Features.FeatureActivator+Runner, NServiceBus.Core, Version=5.0.0.0, Culture=neutral, PublicKeyToken=9fc386479f8a226c.
2016-01-05T15:40:44.4951051+00:00 31/ Debug Stopping 2/2 NServiceBus.Timeout.Hosting.Windows.TimeoutMessageProcessor, NServiceBus.Core, Version=5.0.0.0, Culture=neutral, PublicKeyToken=9fc386479f8a226c satellite
2016-01-05T15:40:44.4951051+00:00 37/ Debug Stopping 1/2 NServiceBus.Timeout.Hosting.Windows.TimeoutDispatcherProcessor, NServiceBus.Core, Version=5.0.0.0, Culture=neutral, PublicKeyToken=9fc386479f8a226c satellite
2016-01-05T15:40:44.4951051+00:00 31/ Debug Drain stopping 'Throttling Semaphore'.
2016-01-05T15:40:44.4951051+00:00 37/ Debug Drain stopping 'Throttling Semaphore'.
2016-01-05T15:40:44.4951051+00:00 31/ Debug Claiming Semaphore thread 1/1.
2016-01-05T15:40:44.4951051+00:00 37/ Debug Claiming Semaphore thread 1/1.
2016-01-05T15:40:44.4951051+00:00 31/ Debug Releasing all claimed Semaphore threads.
2016-01-05T15:40:44.4951051+00:00 37/ Debug Releasing all claimed Semaphore threads.
2016-01-05T15:40:44.4951051+00:00 31/ Information Stopped 2/2 NServiceBus.Timeout.Hosting.Windows.TimeoutMessageProcessor, NServiceBus.Core, Version=5.0.0.0, Culture=neutral, PublicKeyToken=9fc386479f8a226c satellite

This service was idle throughout, and is deliberately not using Bus.Defer or SLR, so there wouldn't have been any actual timeout activity at the time (i'd expect the poll to return no items).
We are using NServiceBus 5.2.9 with RavenDB 3 (build 3785) for persistence.

One more thing - another completely separate service, but more or less identical and using the same RavenDB server, continued to function normally on this occasion.
It did, however, experience timeout polling problems at that time, which may be helpful in understanding what happened to the other service (since this is connecting to the same server):

2016-01-05T15:34:27.3044017+00:00 14/ Warning Failed to fetch timeouts from the timeout storage
Query failed. See inner exception for details. ---> Could not open database named: ServiceX.NServiceBusPersistence
System.TimeoutException: The database ServiceX.NServiceBusPersistence is currently being loaded, but after 5 seconds, this request has been aborted. Please try again later, database loading continues.
   at Raven.Database.Server.Controllers.RavenDbApiController.TrySetupRequestToProperResource(RequestWebApiEventArgs& args) in c:\Builds\RavenDB-Stable-3.0\Raven.Database\Server\Controllers\RavenDbApiController.cs:line 609
----------------------------------------------------------------------------------------------------
Exception of type System.InvalidOperationException:
    Message:     "Query failed. See inner exception for details."
    Source:      "Raven.Client.Lightweight"
    Target Site: "Raven.Abstractions.Data.QueryResult Query(System.String, Raven.Abstractions.Data.IndexQuery, System.String[], Boolean, Boolean)"
    Help Link:   ""
    Additional information specific to this exception:
    Stack Trace:
        at Raven.Client.Connection.ServerClient.Query(String index, IndexQuery query, String[] includes, Boolean metadataOnly, Boolean indexEntriesOnly) in c:\Builds\RavenDB-Stable-3.0\Raven.Client.Lightweight\Connection\ServerClient.cs:line 274
        at Raven.Client.Document.AbstractDocumentQuery`2.ExecuteActualQuery() in c:\Builds\RavenDB-Stable-3.0\Raven.Client.Lightweight\Document\AbstractDocumentQuery.cs:line 685
        at Raven.Client.Document.AbstractDocumentQuery`2.InitSync() in c:\Builds\RavenDB-Stable-3.0\Raven.Client.Lightweight\Document\AbstractDocumentQuery.cs:line 665
        at Raven.Client.Document.AbstractDocumentQuery`2.get_QueryResult() in c:\Builds\RavenDB-Stable-3.0\Raven.Client.Lightweight\Document\AbstractDocumentQuery.cs:line 654
        at Raven.Client.Linq.RavenQueryProviderProcessor`1.ExecuteQuery[TProjection]() in c:\Builds\RavenDB-Stable-3.0\Raven.Client.Lightweight\Linq\RavenQueryProviderProcessor.cs:line 1625
        at Raven.Client.Linq.RavenQueryInspector`1.GetEnumerator() in c:\Builds\RavenDB-Stable-3.0\Raven.Client.Lightweight\Linq\RavenQueryInspector.cs:line 101
        at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
        at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
        at NServiceBus.TimeoutPersisters.RavenDB.TimeoutPersister.GetCleanupChunk(DateTime startSlice) in c:\BuildAgent\work\c4d62ce02b983878\src\NServiceBus.RavenDB\Timeouts\TimeoutPersister.cs:line 151
        at NServiceBus.TimeoutPersisters.RavenDB.TimeoutPersister.GetNextChunk(DateTime startSlice, DateTime& nextTimeToRunQuery) in c:\BuildAgent\work\c4d62ce02b983878\src\NServiceBus.RavenDB\Timeouts\TimeoutPersister.cs:line 39
        at NServiceBus.Timeout.Hosting.Windows.TimeoutPersisterReceiver.Poll(Object obj) in C:\BuildAgent\work\3206e2123f54fce4\src\NServiceBus.Core\Timeout\Hosting\Windows\TimeoutPersisterReceiver.cs:line 90
        at System.Threading.Tasks.Task.Execute()
    Inner Exception:
        Exception of type Raven.Abstractions.Connection.ErrorResponseException:
            Message:     "Could not open database named: ServiceX.NServiceBusPersistence
System.TimeoutException: The database ServiceX.NServiceBusPersistence is currently being loaded, but after 5 seconds, this request has been aborted. Please try again later, database loading continues.
   at Raven.Database.Server.Controllers.RavenDbApiController.TrySetupRequestToProperResource(RequestWebApiEventArgs& args) in c:\Builds\RavenDB-Stable-3.0\Raven.Database\Server\Controllers\RavenDbApiController.cs:line 609"
            Source:      "Raven.Client.Lightweight"
            Target Site: "Void MoveNext()"
            Help Link:   ""
            Stack Trace:
                at Raven.Client.Connection.Implementation.HttpJsonRequest.<CheckForErrorsAndReturnCachedResultIfAnyAsync>d__2d.MoveNext() in c:\Builds\RavenDB-Stable-3.0\Raven.Client.Lightweight\Connection\Implementation\HttpJsonRequest.cs:line 393
                --- End of stack trace from previous location where exception was thrown ---
                at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
                at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
                at Raven.Client.Connection.Implementation.HttpJsonRequest.<>c__DisplayClass13.<<SendRequestInternal>b__12>d__15.MoveNext() in c:\Builds\RavenDB-Stable-3.0\Raven.Client.Lightweight\Connection\Implementation\HttpJsonRequest.cs:line 202
                --- End of stack trace from previous location where exception was thrown ---
                at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
                at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
                at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
                at Raven.Client.Connection.Implementation.HttpJsonRequest.<RunWithAuthRetry>d__1a`1.MoveNext() in c:\Builds\RavenDB-Stable-3.0\Raven.Client.Lightweight\Connection\Implementation\HttpJsonRequest.cs:line 245
                --- End of stack trace from previous location where exception was thrown ---
                at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
                at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
                at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
                at Raven.Client.Connection.Implementation.HttpJsonRequest.<ReadResponseJsonAsync>d__e.MoveNext() in c:\Builds\RavenDB-Stable-3.0\Raven.Client.Lightweight\Connection\Implementation\HttpJsonRequest.cs:line 173
                --- End of stack trace from previous location where exception was thrown ---
                at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
                at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
                at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
                at Raven.Client.Connection.Async.AsyncServerClient.<>c__DisplayClass1a3.<<QueryAsyncAsGet>b__19f>d__1a8.MoveNext() in c:\Builds\RavenDB-Stable-3.0\Raven.Client.Lightweight\Connection\Async\AsyncServerClient.cs:line 1259
                --- End of stack trace from previous location where exception was thrown ---
                at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
                at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
                at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
                at Raven.Client.Connection.ReplicationInformerBase`1.<TryOperationAsync>d__27`1.MoveNext() in c:\Builds\RavenDB-Stable-3.0\Raven.Client.Lightweight\Connection\ReplicationInformerBase.cs:line 422
                --- End of stack trace from previous location where exception was thrown ---
                at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
                at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
                at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
                at Raven.Client.Connection.ReplicationInformerBase`1.<ExecuteWithReplicationAsync>d__17`1.MoveNext() in c:\Builds\RavenDB-Stable-3.0\Raven.Client.Lightweight\Connection\ReplicationInformerBase.cs:line 314
                --- End of stack trace from previous location where exception was thrown ---
                at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
                at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
                at Raven.Client.Connection.Async.AsyncServerClient.<ExecuteWithReplication>d__28e`1.MoveNext() in c:\Builds\RavenDB-Stable-3.0\Raven.Client.Lightweight\Connection\Async\AsyncServerClient.cs:line 2238
                --- End of stack trace from previous location where exception was thrown ---
                at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
                at Raven.Client.Extensions.TaskExtensions2.ResultUnwrap[T](Task`1 task) in c:\Builds\RavenDB-Stable-3.0\Raven.Client.Lightweight\Extensions\TaskExtensions2.cs:line 59
                at Raven.Client.Connection.ServerClient.Query(String index, IndexQuery query, String[] includes, Boolean metadataOnly, Boolean indexEntriesOnly) in c:\Builds\RavenDB-Stable-3.0\Raven.Client.Lightweight\Connection\ServerClient.cs:line 265

Cheers,
Dave

Alexey Sednev

unread,
Jan 25, 2016, 12:16:47 PM1/25/16
to Particular Software
Ticket for this issue has been created https://github.com/Particular/NServiceBus.Host/issues/51

Dave Cluderay

unread,
Jan 25, 2016, 6:23:21 PM1/25/16
to Particular Software
Hi,

Thanks for creating a ticket for the shutdown hang part of this problem.

I just wanted to mention again that the shutdown hang stems from the fact that the TimeoutPersisterReceiver is sometimes never returning from an attempt to poll the timeout (Raven) store. To me, this is the more serious issue, since the service continues to run and it is not really obvious that timeouts have ceased to function from that point onwards.

Cheers,
Dave

Alexey Sednev

unread,
Jan 25, 2016, 6:44:12 PM1/25/16
to Particular Software
Dave,

You're bringing up a great point. I did run into cases when NSB would cease delivering timeouts. It's definitely an issue. I just don't have enough details on this.

Mauro Servienti

unread,
Jan 26, 2016, 12:24:47 PM1/26/16
to particula...@googlegroups.com
Hi,

We seems that we have identified a possible cause. Can you please confirm that you all are using RavenDB as the persistence engine?

--
You received this message because you are subscribed to the Google Groups "Particular Software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to particularsoftw...@googlegroups.com.
To post to this group, send email to particula...@googlegroups.com.
Visit this group at https://groups.google.com/group/particularsoftware.
To view this discussion on the web visit https://groups.google.com/d/msgid/particularsoftware/8a043dff-de4b-41ed-9647-e8ce7076dd96%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Mauro Servienti

Alexey Sednev

unread,
Jan 26, 2016, 12:33:05 PM1/26/16
to Particular Software
Hi Mauro, 

yes, we do use RavenDB for persistence. I assume you're talking about https://github.com/Particular/NServiceBus.RavenDB/issues/140?
To unsubscribe from this group and stop receiving emails from it, send an email to particularsoftware+unsub...@googlegroups.com.
--
Mauro Servienti

Mauro Servienti

unread,
Jan 26, 2016, 12:47:53 PM1/26/16
to particula...@googlegroups.com
Yes, the issue should be exactly that, the stream API hangs on shutdown and prevents the endpoint to shutdown properly.
We discussed the fixed and we are working on it.

To unsubscribe from this group and stop receiving emails from it, send an email to particularsoftw...@googlegroups.com.
To post to this group, send email to particula...@googlegroups.com.
--
Mauro Servienti

--
You received this message because you are subscribed to the Google Groups "Particular Software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to particularsoftw...@googlegroups.com.

To post to this group, send email to particula...@googlegroups.com.
Visit this group at https://groups.google.com/group/particularsoftware.

For more options, visit https://groups.google.com/d/optout.
--
Mauro Servienti

Alexey Sednev

unread,
Jan 26, 2016, 1:28:42 PM1/26/16
to Particular Software
That's great, thank you Mauro. I'm looking forward to what you guys come up with.
To unsubscribe from this group and stop receiving emails from it, send an email to particularsoftware+unsub...@googlegroups.com.
To post to this group, send email to particula...@googlegroups.com.
--
Mauro Servienti

--
You received this message because you are subscribed to the Google Groups "Particular Software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to particularsoftware+unsub...@googlegroups.com.
--
Mauro Servienti

Dave Cluderay

unread,
Jan 26, 2016, 1:40:31 PM1/26/16
to Particular Software
Hi,

This is a nice find, and yes, we are using RavenDB too.
The issue doesn't seem like a perfect fit for my issue, though, since we have specifically disabled SLR and are deliberately not making use of Bus.Defer.
As a consequence I wouldn't expect there to be any actual timeout activity other than polling (which should always return empty results).

Cheers,
Dave
Reply all
Reply to author
Forward
0 new messages