Helpful information to include
Product name: NServiceBus
Version:
Stacktrace: None
Description:
So we have many NServiceBus services running as Azure Worker Roles. We have only one logical service that uses NHibernate persistence for saga persistence. For that logical service, we have 5 worker role instances (so 5 instances of the same cloud service which translates to 5 PaaS instances)
All the non-NHibernate saga persistence services are working just fine. But this one service (we call it the Orchestrator service) constantly STOPS receiving messages from the Azure Service Bus Queue. The CPU usage in Task Manager shows 0%, but occasionally ticks up to 1% before falling back down to 0%. More curiously, there is always ONE instance of the Orchestrator which stays up and receiving messages normally. The instance that stays up varies.
So we write a log each time we receive a command and another when we exit the handler. In our log, we'll see things like:
command received
command processed
and then is just stops. Nothing more is written to the log (log4net). I've tried turning on debug for the logging level and no new information is written to the event log. The only way to fix the issue is to restart the instance or to kill the Microsoft Windows Azure Worker Host (WAWORKERHOST). Windows Azure will automatically detect that the worker host is down and will restart it. Normal operation resumes, but within the next 20 minutes or so (it varies and can be stable for hours / days) the process will drop back down to 0% CPU.
We're not really sure where to begin to test.