Scheduler stops running tasks for no apparent reason

437 views
Skip to first unread message

Marius Gundersen

unread,
May 20, 2014, 7:34:35 AM5/20/14
to particula...@googlegroups.com
We are developing a real-time system using NServiceBus 4.4 where our agents use the Scheduler to run a task every few seconds. Once in a while though the Scheduler just stops calling the action, with no exception and no error message in the Event viewer or the logs. The agent has another scheduled task, a heartbeat, which monitors the main task. The heartbeat scheduler continues running, but (obviously) reports that the main task is not running.

I have no stacktrace, no error message and the logs are not of any use, since nothing out of the ordinary is reported. The only way to get things to work again is to manually restart the agent. I had a hypothesis that this could be caused by the system time being adjusted, such that it skips a second. I haven't been able to reproduce this when manually adjusting the time though. Another possibility I can see is that the agent runs out of resources, and is not able to create a new thread. But I have absolutely no evidence for either of these hypothesis.

Has anyone else experienced this? Does anyone know of a fix?

I have also posted this question to StackOverflow: http://stackoverflow.com/questions/23732174/nservicebus-scheduler-stops-executing-job


John Simons

unread,
May 20, 2014, 7:44:12 AM5/20/14
to particula...@googlegroups.com
To help us to replicate this issue can you tell us how many scheduled messages do u have, the frequency and what storage are u using?

Marius Gundersen

unread,
May 20, 2014, 8:11:07 AM5/20/14
to particula...@googlegroups.com
The agent has two scheduled tasks: The main task, which runs every 5 seconds, and the heartbeat task, which runs once a minute. It uses RavenDB for storage. I haven't checked the RavenDB for the agent when it stops the scheduled task, but that is something I will do next time it happens

Mauro Servienti

unread,
May 20, 2014, 1:02:02 PM5/20/14
to particula...@googlegroups.com
Hi Marius,

I replied on SO.

Sent from my Amazing Yellow Lumia, typos are guaranteed ;-)

From: Marius Gundersen
Sent: ‎20/‎05/‎2014 13:35
To: particula...@googlegroups.com
Subject: [particularsoftware] Scheduler stops running tasks for no apparent reason

--
You received this message because you are subscribed to the Google Groups "Particular Software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to particularsoftw...@googlegroups.com.
To post to this group, send email to particula...@googlegroups.com.
Visit this group at http://groups.google.com/group/particularsoftware.
To view this discussion on the web visit https://groups.google.com/d/msgid/particularsoftware/683b4542-4b98-4f07-90eb-1b5f6c9df810%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Marius Gundersen

unread,
May 21, 2014, 4:46:38 AM5/21/14
to particula...@googlegroups.com
It happened again, and this time I checked the RavenDB. Instead of having two TimeoutData entries (one for the main task and one for the heartbeat task) it had four. Two of them were from today, and two of them were from a few days ago. It may be that this is confusing NServiceBus.

I have attached the TimeoutData Document for the task that stopped running. Notice the time difference between the NServiceBus.TimeSent and NServiceBus.Timeout.Expire, which is 5 seconds. I copied this content at 08:34, so for 6 minutes it has not been updating the TimeoutData document, even though it should do it every 5 seconds.

{
  "Destination": {
    "Queue": "Flyt.TiosCtcAgent",
    "Machine": "GMBSANNTID"
  },
  "SagaId": "00000000-0000-0000-0000-000000000000",
  "State": "PD94bWwgdmVyc2lvbj0iMS4wIiA/Pg0KPE1lc3NhZ2VzIHhtbG5zOnhzaT0iaHR0cDovL3d3dy53My5vcmcvMjAwMS9YTUxTY2hlbWEtaW5zdGFuY2UiIHhtbG5zOnhzZD0iaHR0cDovL3d3dy53My5vcmcvMjAwMS9YTUxTY2hlbWEiIHhtbG5zPSJodHRwOi8vdGVtcHVyaS5uZXQvTlNlcnZpY2VCdXMuU2NoZWR1bGluZy5NZXNzYWdlcyI+CjxTY2hlZHVsZWRUYXNrPgo8VGFza0lkPmU2YjZlN2NmLTlmYjMtNDhiZS1iZGUwLTc0NmJhZmE3MmRlNzwvVGFza0lkPgo8TmFtZT5Cb290c3RyYXA8L05hbWU+CjxFdmVyeT5QVDVTPC9FdmVyeT4KPC9TY2hlZHVsZWRUYXNrPgo8L01lc3NhZ2VzPg0K",
  "Time": "2014-05-21T08:18:00.2114720Z",
  "CorrelationId": "2f849e22-d760-4ce6-846b-a33200a9b784\\0",
  "OwningTimeoutManager": "Flyt.TiosCtcAgent",
  "Headers": {
    "NServiceBus.MessageId": "2f849e22-d760-4ce6-846b-a33200a9b784",
    "NServiceBus.CorrelationId": "2f849e22-d760-4ce6-846b-a33200a9b784",
    "NServiceBus.OriginatingEndpoint": "Flyt.TiosCtcAgent",
    "$.diagnostics.originating.hostid": "522caf7d3f53a790fdcc328b6cb0d627",
    "NServiceBus.MessageIntent": "Send",
    "NServiceBus.Version": "4.4.2",
    "NServiceBus.TimeSent": "2014-05-21 08:17:55:211472 Z",
    "NServiceBus.OriginatingMachine": "GMBSANNTID",
    "NServiceBus.ContentType": "text/xml",
    "NServiceBus.EnclosedMessageTypes": "NServiceBus.Scheduling.Messages.ScheduledTask, NServiceBus.Core, Version=4.4.0.0, Culture=neutral, PublicKeyToken=9fc386479f8a226c",
    "CorrId": "2f849e22-d760-4ce6-846b-a33200a9b784\\0",
    "WinIdName": "GMBDOMENE1\\sanntidintegrasjon",
    "NServiceBus.RelatedTo": "5da0ead5-1701-4e2e-9b43-a33200a9b16e",
    "NServiceBus.ConversationId": "a00caaba-1eab-48a7-854f-a33100a5c243",
    "NServiceBus.IsDeferredMessage": "True",
    "NServiceBus.Temporary.DelayDeliveryWith": "00:00:05",
    "NServiceBus.Timeout.Expire": "2014-05-21 08:18:00:211472 Z",
    "NServiceBus.Timeout.RouteExpiredTimeoutTo": "Flyt.TiosCtcAgent@GMBSANNTID",
    "NServiceBus.Timeout.ReplyToAddress": "Flyt.TiosCtcAgent@GMBSANNTID"
  }
}


On Tuesday, May 20, 2014 7:02:02 PM UTC+2, Mauro Servienti wrote:
Hi Marius,

I replied on SO.

Sent from my Amazing Yellow Lumia, typos are guaranteed ;-)

From: Marius Gundersen
Sent: ‎20/‎05/‎2014 13:35
To: particula...@googlegroups.com
Subject: [particularsoftware] Scheduler stops running tasks for no apparent reason

We are developing a real-time system using NServiceBus 4.4 where our agents use the Scheduler to run a task every few seconds. Once in a while though the Scheduler just stops calling the action, with no exception and no error message in the Event viewer or the logs. The agent has another scheduled task, a heartbeat, which monitors the main task. The heartbeat scheduler continues running, but (obviously) reports that the main task is not running.

I have no stacktrace, no error message and the logs are not of any use, since nothing out of the ordinary is reported. The only way to get things to work again is to manually restart the agent. I had a hypothesis that this could be caused by the system time being adjusted, such that it skips a second. I haven't been able to reproduce this when manually adjusting the time though. Another possibility I can see is that the agent runs out of resources, and is not able to create a new thread. But I have absolutely no evidence for either of these hypothesis.

Has anyone else experienced this? Does anyone know of a fix?

I have also posted this question to StackOverflow: http://stackoverflow.com/questions/23732174/nservicebus-scheduler-stops-executing-job


--
You received this message because you are subscribed to the Google Groups "Particular Software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to particularsoftware+unsub...@googlegroups.com.
To post to this group, send email to particul...@googlegroups.com.

John Simons

unread,
May 22, 2014, 8:33:23 PM5/22/14
to particula...@googlegroups.com
Hi Marius,

I'm trying to replicate your issue but I need more info.

Can you share the code where you are adding the scheduled tasks?
Also, how long do you think the code for the 5sec task would take to complete? I'm asking because could the problem be that the actual code takes longer then 5secs and overlaps with the previous task?

Cheers
John

Marius Gundersen

unread,
May 26, 2014, 4:35:29 AM5/26/14
to particula...@googlegroups.com
I have attached the logfile of the agent when it stops. At 2014-05-26 10:09:16,830 is the last time it does something useful. After that it only runs the heartbeat task (the second of the two timeouts)
Flyt.TiosCtcAgent-log.txt

John Simons

unread,
May 26, 2014, 9:23:04 PM5/26/14
to particula...@googlegroups.com
Hi Marius,

I've looked at the log file you sent,  thank you.
Unfortunately I can't figure this one out :(
Can you either email us directly or post some code here that shows how you starting the scheduled tasks ?

Cheers
John

Marius Gundersen

unread,
May 27, 2014, 4:21:20 AM5/27/14
to particula...@googlegroups.com
The Bootstrap.cs class looks like this:

using System;

using Flyt.Common.Contracts;
using Flyt.TiosCtcAgent.Properties;

using NServiceBus;
using NServiceBus.Logging;

namespace Flyt.TiosCtcAgent
{
   
public class Bootstrap : IWantToRunWhenBusStartsAndStops
   
{
       
private static readonly ILog Log = LogManager.GetLogger(typeof(Bootstrap));

       
public IBus Bus { get; set; }

       
public IHeart Heart { get; set; }

       
public IPacemaker Pacemaker { get; set; }

       
public ITiosJob TiosJob { get; set; }

       
public void Start()
       
{
           
Log.Info("TIOS CTC Agent starts");
           
Pacemaker.AttachToHeartAndStart(Heart, TimeSpan.FromMinutes(1));
           
Schedule.Every(TimeSpan.FromSeconds(Settings.Default.PollingFrequencyInSeconds)).Action(() => Pacemaker.RechargeAfterExecuting(TiosJob.Execute));
       
}

       
public void Stop()
       
{
           
TiosJob.ShutDown();
           
Log.Info("TIOS CTC Agent stopped");
       
}
   
}
}

We use the Heart to keep track of running agents. It is connected to a Pacemaker that makes sure the main scheduled task runs as scheduled. The Heart then sends a pulse to a monitoring agent once a minute, and a saga on the monitoring agent keeps track of the agent, ensuring that it produces a steady heartbeat of one beat per minute. The RechargeAfterExecuting method looks like this:

        public void RechargeAfterExecuting(Action action)
       
{
            action
.Invoke();
            _previousCharge
= _systemTime.UtcNow;
       
}

Thank you for your help with this problem

John Simons

unread,
May 27, 2014, 4:48:23 AM5/27/14
to particula...@googlegroups.com
So where is the 2nd Scheduled task?

You said you have 2 scheduled tasks, one that runs every 5secs and one that runs every 1min?

Marius Gundersen

unread,
May 27, 2014, 4:51:17 AM5/27/14
to particula...@googlegroups.com
It's in the Pacemaker:
        public void AttachToHeartAndStart(IHeart heart, TimeSpan rythm)
       
{
            _heart
= heart;
            _previousBeat
= _previousCharge = _systemTime.UtcNow;
           
var agent = Assembly.GetCallingAssembly().GetName().Name;
           
BeatHeart(agent, rythm);
           
Schedule.Every(rythm).Action(() => BeatHeart(agent, rythm));
       
}

John Simons

unread,
May 28, 2014, 2:03:51 AM5/28/14
to particula...@googlegroups.com
Hi Marius,

We'll keep u posted.

Cheers
John

Jason Butera

unread,
Jul 21, 2014, 11:07:22 AM7/21/14
to particula...@googlegroups.com
I have been encountering the same issue. I have a scheduled reporting service that pings every minute using schedule.Every() and it mysteriously just stops pinging after a few days. The service continues to run, but of course does nothing once the scheduler ceases. We are using version 4.1.2.

John Simons

unread,
Jul 21, 2014, 9:54:43 PM7/21/14
to particula...@googlegroups.com
Hi Jason and Marius,

We have just released quite a few patch releases to fix this issue.
Jason please upgrade to v4.1.3 and Marius please update to v4.4.3.

Regards
John

Tim Burkhart

unread,
Feb 10, 2015, 4:58:38 PM2/10/15
to particula...@googlegroups.com
We are also having this happen on version 5.1.2

Our code is:

schedule.Every(TimeSpan.FromDays(1), () => Bus.Send(new AirportUpdateTriggered()));
schedule.Every(TimeSpan.FromMinutes(5), () => Bus.Publish(new RefreshBookingMetricsTriggeredEvent()));
schedule.Every(TimeSpan.FromDays(1), () => Bus.SendLocal(new AllocationUpdateTriggered()));

And they will stop running for several days at a time and then start again, but rarely do they ever follow the timeframe that we set up.

-Tim

Alexander Arzumanov

unread,
Feb 11, 2015, 12:49:01 PM2/11/15
to particula...@googlegroups.com

We're having very similar issues. We have publisher and listener. The publisher impliments IWantToRunWhenBusStartsAndStops and starts a System.Timers.Timer. On timer elapse every few seconds it retrieves data from Db and publishes it. When there's no data changes, nothing gets published. The publisher also logs a heartbeat in an interval on the same timer but skipping some ticks. The consumer gets a message and writes some files. Both components are deployed as NServiceBus windows services. When we come in the morning  consumer windows service would be stopped. Publisher windows service would still be shown as running in services window while not doing anything - no publishing & no data retrieval. All logging is stopped by both components. There's nothing in any logs indicating a problem or really anything. 
We were running one of the latest versions until this morning, when we updated  all latest packages.Here'is what we have:
<packages>
  <package id="Iesi.Collections" version="4.0.1.4000" targetFramework="net45" />
  <package id="NHibernate" version="4.0.1.4000" targetFramework="net45" />
  <package id="NServiceBus" version="5.1.3" targetFramework="net45" />
  <package id="NServiceBus.Host" version="6.0.0" targetFramework="net45" />
  <package id="NServiceBus.NHibernate" version="6.0.1" targetFramework="net45" />
  <package id="NServiceBus.RabbitMQ" version="2.1.0" targetFramework="net45" />
  <package id="RabbitMQ.Client" version="3.4.3" targetFramework="net45" />
  <package id="ServiceControl.Plugin.Nsb5.Heartbeat" version="1.1.0" targetFramework="net45" />
  <package id="SlowCheetah" version="2.5.14" targetFramework="net45" />
</packages>
What additional information can I post or what diagnostic steps would you suggest?
Thanks,
Sasha

Sean Feldman

unread,
Feb 12, 2015, 8:00:30 AM2/12/15
to particula...@googlegroups.com
Tim / Sasha,

Are your endpoints scaled out by any chance?

Alexander Arzumanov

unread,
Feb 12, 2015, 9:36:01 AM2/12/15
to particula...@googlegroups.com
I'm not sure what scaled out means. The endpoints are deployed on the same server.

Sean Feldman

unread,
Feb 12, 2015, 10:51:05 AM2/12/15
to particula...@googlegroups.com
Sasha, 
What I meant was do you you have more than one instance of the same endpoint?

Alexander Arzumanov

unread,
Feb 12, 2015, 10:55:53 AM2/12/15
to particula...@googlegroups.com
No, we do not.

On Thu, Feb 12, 2015 at 9:51 AM, Sean Feldman <feldman....@gmail.com> wrote:
Sasha, 
What I meant was do you you have more than one instance of the same endpoint?

--
You received this message because you are subscribed to a topic in the Google Groups "Particular Software" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/particularsoftware/0GAI9oqmp74/unsubscribe.
To unsubscribe from this group and all its topics, send an email to particularsoftw...@googlegroups.com.
To post to this group, send email to particula...@googlegroups.com.

Alexander Arzumanov

unread,
Feb 13, 2015, 9:52:53 AM2/13/15
to particula...@googlegroups.com
Any update or tips on this?

Sean Feldman

unread,
Feb 13, 2015, 10:59:00 AM2/13/15
to particula...@googlegroups.com
Any chance to share code that can reproduce it?
Thank you

Alexander Arzumanov

unread,
Feb 13, 2015, 11:46:20 AM2/13/15
to particula...@googlegroups.com
I'm opening a support ticker right now. There's no super secret logic so I'll share the code. Should i do it there or here?

Sean Feldman

unread,
Feb 13, 2015, 12:06:21 PM2/13/15
to particula...@googlegroups.com
Support ticket is good (with a link to this thread).

Alexander Arzumanov

unread,
Feb 13, 2015, 12:08:22 PM2/13/15
to particula...@googlegroups.com
Done. I'm going to prepare a simplified version of the projects and upload them 


On Friday, February 13, 2015, Sean Feldman <feldman....@gmail.com> wrote:
Support ticket is good (with a link to this thread).

--
You received this message because you are subscribed to a topic in the Google Groups "Particular Software" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/particularsoftware/0GAI9oqmp74/unsubscribe.
To unsubscribe from this group and all its topics, send an email to particularsoftw...@googlegroups.com.
To post to this group, send email to particula...@googlegroups.com.
Visit this group at http://groups.google.com/group/particularsoftware.
To view this discussion on the web visit https://groups.google.com/d/msgid/particularsoftware/705da037-6d07-4d68-9c8d-38efc15f8899%40googlegroups.com.

Alexander Arzumanov

unread,
Feb 13, 2015, 12:09:43 PM2/13/15
to particula...@googlegroups.com
Meanwhile anything you can suggest to check?

Sean Feldman

unread,
Feb 13, 2015, 1:24:43 PM2/13/15
to particula...@googlegroups.com
Alexander, 
Your scenario is not related to Scheduler API and more related to Timer in the code. 
This is handled through support as of now.
Thank you.

Indu Alagarsamy

unread,
Feb 13, 2015, 1:58:19 PM2/13/15
to particula...@googlegroups.com
Hi Tim,

Regarding this:
What is the endpoint load?
Since the scheduler dispatches the message to the main queue of the endpoint, depending on the endpoint's current load, the scheduled task may not execute at exactly the specified interval.
We are in the process of updating the guidance on the scheduler article concerning this. 

When you say, stop running for several days, did you have to restart the service?
Also what timeout persistence are you using for your endpoint? Are you using RavenDB? 
Can you list the packages.config of your endpoint?

Thanks much,
Indu Alagarsamy
Reply all
Reply to author
Forward
0 new messages