Failed job retry (custom implementation based on Rescheduling) and disconnected network problem

892 views

Skip to first unread message

Arek Czubik

unread,

Oct 24, 2013, 6:27:01 AM10/24/13

to quar...@googlegroups.com

My original need was to be able to refire failed job (with some retry count and delay settings) triggered using following trigger

                TriggerBuilder.Create()
                              .WithIdentity(CreateTriggerKey(jobKey))
                              .StartNow()
                              .WithSimpleSchedule(x => x.WithMisfireHandlingInstructionNowWithRemainingCount())
                              .Build();

As I have not found any, out of the box, failed job retry mechanism in Quartz I have implemented sth that could be described with following pseudo code:

        public void Execute(IJobExecutionContext context)
        {
            try
            {
                // some job execution code that may fail with exception
            }
            catch (Exception exception)
            {
                var reschedulingSettings = GetReschedulingSettings(context.JobDetail.JobDataMap);
                var newSingleExecutionTrigger = CreateNextExecutionTrigger(); <<<- creates trigger that executes that job e.g. 5 minutes later
                context.Scheduler.RescheduleJob(context.Trigger.Key, newSingleExecutionTrigger); <<<<------ HERE IS A PROBLEM

                throw new JobExecutionException(exception) { RefireImmediately = false };
            }
        }

This solution works more less ok. Problem occurs when scheduler loses network connection. It cannot reschedule job in that bolded line and, since RefireImmediately is set to false, job is lost (if scheduler regains network connection).
I run scheduler in clustered mode on Oracle. I hoped this situation will be solved by cluster recovery procedure but it looks like in given scheduler instance the same instance is recovered only during first checkIn.

Any idea for solution?
By the way I would like to ask - how comes that quartz does not have any mechanism that would deal with failed job re-execution different than RefireImmediately? Is that solved in some way that I have not found yet? Or such need is so rare that it does not make sense to implement?

Reply all

Reply to author

Forward

0 new messages