Job Scheduler "Stuck", due to "An error occured while marking executed job complete"

2,049 views
Skip to first unread message

Gary German

unread,
Jan 10, 2011, 1:18:19 PM1/10/11
to quar...@googlegroups.com

Recently, a few our Quartz instances have become "stuck", with no new jobs getting downloaded or executed.  It looks like this is due to an internal update in the Scheduler that is failing (see exception info below).

 

We first tried stopping and restarting our Windows Service (which stops and then restarts the Quartz.Net scheduler). But the exceptions continued, with the scheduler still throwing exceptions and "stuck" from our point of view.

 

Our only solution was to delete all the records in our Quartz_ tables.  This freed up whatever was repeatedly blocking.

 

Since doing this, our servers have been stable, with no further problems.  But I'm worried that this is a "data issue" that could come back to haunt us.

 

Any ideas?

 

Here's the exception we captured:

 

JobSchedulerListener says:  Scheduler Error. 

 

Message: An error occured while marking executed job complete (will continue attempts). job= 'admin.PhoneHomeForJobs'

 

Exception:    at Quartz.Impl.AdoJobStore.UpdateLockRowSemaphore.ExecuteSQL(ConnectionAndTransactionHolder conn, String lockName, String expandedSQL)

   at Quartz.Impl.AdoJobStore.DBSemaphore.ObtainLock(DbMetadata metadata, ConnectionAndTransactionHolder conn, String lockName)

   at Quartz.Impl.AdoJobStore.JobStoreSupport.ExecuteInNonManagedTXLock(String lockName, ITransactionCallback txCallback)

   at Quartz.Impl.AdoJobStore.JobStoreSupport.TriggeredJobComplete(SchedulingContext ctxt, Trigger trigger, JobDetail jobDetail, SchedulerInstruction triggerInstCode)

   at Quartz.Core.QuartzScheduler.NotifyJobStoreJobComplete(SchedulingContext ctxt, Trigger trigger, JobDetail detail, SchedulerInstruction instCode)

   at Quartz.Core.JobRunShell.CompleteTriggerRetryLoop(Trigger trigger, JobDetail jobDetail, SchedulerInstruction instCode)

 

 

 

Marko Lahma

unread,
Jan 10, 2011, 1:44:43 PM1/10/11
to quar...@googlegroups.com
Hi,

you don't give us much to go on. What's the internal exception, the
root cause of update failure? Are you sure that you service restart
properly kill the whole process? What database, which Quartz.NET
version? How many jobs and triggers? You are running Quartz.NET
clustered?

-Marko

> --
> You received this message because you are subscribed to the Google Groups
> "Quartz.NET" group.
> To post to this group, send email to quar...@googlegroups.com.
> To unsubscribe from this group, send email to
> quartznet+...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/quartznet?hl=en.
>

Gary German

unread,
Jan 10, 2011, 3:48:16 PM1/10/11
to quar...@googlegroups.com
Re: "What's the internal exception, the root cause of update failure? "

It looks like the root cause is some sort of timeout (see more detailed
event logs below).

Exception: Failure obtaining db row lock: Timeout expired. The timeout
period elapsed prior to completion of the operation or the server is not
responding.
The statement has been terminated.

But, this exception never seems to clear itself up either. Once it starts
happening, our only recourse seems to be to manually delete all entries in
our Quartz_ tables. That seems to clear up the blockage.


Re: "Are you sure that you service restart properly kill the whole process?"

The request to stop the service processed OK, as did the request to start
the service. The scheduler gets shut down during the stop procedure, and
started during the start procedure. Those both seemed to go OK, but after
restart the problem persisted.


Re: What database, which Quartz.NET version?

SQL Server 2008.

Quartz dll version 1.0.2.2 (upgrading to 1.0.3 shortly).


Re: How many jobs and triggers?

Hardly any - perhaps one repeating job, and maybe one or two other jobs.
Scheduler is set up to allow 5 threads.


Re: You are running Quartz.NET clustered?

No. Single instances, and we feed jobs to them via a "PhoneHomeForJobs" job
(it's a job that phones home via a web service call).

Event log entries from when the service was failing:

JobSchedulerListener says: Scheduler Error. Message: An error occured
while marking executed job complete (will continue attempts). job=
'admin.PhoneHomeForJobs'
Exception: at
Quartz.Impl.AdoJobStore.UpdateLockRowSemaphore.ExecuteSQL(ConnectionAndTrans

actionHolder conn, String lockName, String expandedSQL)


at Quartz.Impl.AdoJobStore.DBSemaphore.ObtainLock(DbMetadata metadata,
ConnectionAndTransactionHolder conn, String lockName)
at
Quartz.Impl.AdoJobStore.JobStoreSupport.ExecuteInNonManagedTXLock(String
lockName, ITransactionCallback txCallback)
at
Quartz.Impl.AdoJobStore.JobStoreSupport.TriggeredJobComplete(SchedulingConte

xt ctxt, Trigger trigger, JobDetail jobDetail, SchedulerInstruction


triggerInstCode)
at
Quartz.Core.QuartzScheduler.NotifyJobStoreJobComplete(SchedulingContext
ctxt, Trigger trigger, JobDetail detail, SchedulerInstruction instCode)
at Quartz.Core.JobRunShell.CompleteTriggerRetryLoop(Trigger trigger,
JobDetail jobDetail, SchedulerInstruction instCode)


'JobSchedulerListener says: Scheduler Error. Message: An error occured


while marking executed job complete (will continue attempts). job=
'admin.PhoneHomeForJobs'

Exception: Failure obtaining db row lock: Timeout expired. The timeout
period elapsed prior to completion of the operation or the server is not
responding.
The statement has been terminated.

JobSchedulerListener says: Scheduler Error. Message: An error occured

while marking executed job complete. job= 'admin.PhoneHomeForJobs'
Exception: Quartz


JobSchedulerListener says: Scheduler Error. Message: An error occured

while marking executed job complete. job= 'admin.PhoneHomeForJobs'
Exception: at
Quartz.Impl.AdoJobStore.UpdateLockRowSemaphore.ExecuteSQL(ConnectionAndTrans
actionHolder conn, String lockName, String expandedSQL)


at Quartz.Impl.AdoJobStore.DBSemaphore.ObtainLock(DbMetadata metadata,
ConnectionAndTransactionHolder conn, String lockName)
at
Quartz.Impl.AdoJobStore.JobStoreSupport.ExecuteInNonManagedTXLock(String
lockName, ITransactionCallback txCallback)
at
Quartz.Impl.AdoJobStore.JobStoreSupport.TriggeredJobComplete(SchedulingConte

xt ctxt, Trigger trigger, JobDetail jobDetail, SchedulerInstruction


triggerInstCode)
at
Quartz.Core.QuartzScheduler.NotifyJobStoreJobComplete(SchedulingContext
ctxt, Trigger trigger, JobDetail detail, SchedulerInstruction instCode)

at Quartz.Core.JobRunShell.Run()

JobSchedulerListener says: Scheduler Error. Message: An error occured

while marking executed job complete. job= 'admin.PhoneHomeForJobs'
Exception: Failure obtaining db row lock: Timeout expired. The timeout
period elapsed prior to completion of the operation or the server is not
responding.
The statement has been terminated.

JobSchedulerListener says: Scheduler Error. Message: An error occured

while scanning for the next trigger to fire.
Exception: Quartz

JobSchedulerListener says: Scheduler Error. Message: An error occured

while scanning for the next trigger to fire.
Exception: 400

JobSchedulerListener says: Scheduler Error. Message: An error occured

while scanning for the next trigger to fire.
Exception: False

JobSchedulerListener says: Scheduler Error. Message: An error occured

while scanning for the next trigger to fire.
Exception: at
Quartz.Impl.AdoJobStore.JobStoreSupport.AcquireNextTrigger(ConnectionAndTran
sactionHolder conn, SchedulingContext ctxt, DateTime noLaterThan)
at
Quartz.Impl.AdoJobStore.JobStoreSupport.AcquireNextTriggerCallback.Execute(C
onnectionAndTransactionHolder conn)


at
Quartz.Impl.AdoJobStore.JobStoreSupport.ExecuteInNonManagedTXLock(String
lockName, ITransactionCallback txCallback)
at

Quartz.Impl.AdoJobStore.JobStoreSupport.AcquireNextTrigger(SchedulingContext
ctxt, DateTime noLaterThan)
at Quartz.Core.QuartzSchedulerThread.Run()

JobSchedulerListener says: Scheduler Error. Message: An error occured

while scanning for the next trigger to fire.
Exception: Couldn't acquire next trigger: Timeout expired. The timeout
period elapsed prior to completion of the operation or the server is not
responding.

Trigger/CronTrigger: Error: Failure obtaining db row lock: Timeout expired.
The timeout period elapsed prior to completion of the operation or the
server is not responding.
The statement has been terminated.<br><br>Error: Timeout expired. The
timeout period elapsed prior to completion of the operation or the server is
not responding.
The statement has been terminated.
Stack Trace:

Exception in PhoneHomeForJobs.Excecute. Exception: Error: Failure obtaining
db row lock: Timeout expired. The timeout period elapsed prior to
completion of the operation or the server is not responding.
The statement has been terminated.<br><br>Error: Timeout expired. The
timeout period elapsed prior to completion of the operation or the server is
not responding.
The statement has been terminated.
Stack Trace: Stack Trace: at
Quartz.Impl.AdoJobStore.UpdateLockRowSemaphore.ExecuteSQL(ConnectionAndTrans
actionHolder conn, String lockName, String expandedSQL)


at Quartz.Impl.AdoJobStore.DBSemaphore.ObtainLock(DbMetadata metadata,
ConnectionAndTransactionHolder conn, String lockName)
at
Quartz.Impl.AdoJobStore.JobStoreSupport.ExecuteInNonManagedTXLock(String
lockName, ITransactionCallback txCallback)

at Quartz.Impl.AdoJobStore.JobStoreTX.ExecuteInLock(String lockName,
ITransactionCallback txCallback)
at Quartz.Impl.AdoJobStore.JobStoreSupport.RemoveJob(SchedulingContext
ctxt, String jobName, String groupName)
at Quartz.Core.QuartzScheduler.DeleteJob(SchedulingContext ctxt, String
jobName, String groupName)
at Quartz.Impl.StdScheduler.DeleteJob(String jobName, String groupName)
at Datawise.MeasuresJobs.PhoneHomeForJobs.Execute(JobExecutionContext
context) in \\DAVID\C\Documents and Settings\bruce\Local
Settings\Temp\Measures\MeasuresJobs\PhoneHomeForJobs.vb:line
181<br><br>Stack Trace: at
System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean
breakConnection)
at
System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObjec
t stateObj)
at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior,
SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet
bulkCopyHandler, TdsParserStateObject stateObj)
at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds,
RunBehavior runBehavior, String resetOptionsString)
at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior
cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior
cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method,
DbAsyncResult result)
at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(DbAsyncResult
result, String methodName, Boolean sendToPipe)
at System.Data.SqlClient.SqlCommand.ExecuteNonQuery()
at
Quartz.Impl.AdoJobStore.UpdateLockRowSemaphore.ExecuteSQL(ConnectionAndTrans
actionHolder conn, String lockName, String expandedSQL)

Gary German
ShastaSoftware.com

Stephen Tunney

unread,
Jan 10, 2011, 3:58:27 PM1/10/11
to quar...@googlegroups.com
I believe this is the same error (or at least seems to be the same symptoms) that I am seeing in Quartz 1.0.3

Gary, I can send you an updated DLL for quartz, and you can see if it fixes your issues. What is does actually, when I encounters the exception you are seeing, is it will take all BLOCKED triggers and set them back to a WAITING state in the QRTZ_TRIGGERS table. This is a very blunt solution, but it does seem to work for me (running for about 4 months now without issue).

Let me know if you want it.

Stephen

SQL Server 2008.

Gary German
ShastaSoftware.com

Hi,

-Marko

> ng lockName, ITransactionCallback txCallback)


>
>    at
>
Quartz.Impl.AdoJobStore.JobStoreSupport.TriggeredJobComplete(SchedulingConte
xt
> ctxt, Trigger trigger, JobDetail jobDetail, SchedulerInstruction
> triggerInstCode)
>
>    at
> Quartz.Core.QuartzScheduler.NotifyJobStoreJobComplete(SchedulingContex

> t ctxt, Trigger trigger, JobDetail detail, SchedulerInstruction

Mark G. Gillen

unread,
Jan 10, 2011, 4:11:59 PM1/10/11
to quar...@googlegroups.com
Sounds like a database row (or table) locking issue. Are you using SQL
Server or Oracle? Can you monitor the database and see if there are
transactions that are not being committed or rolled back. If there is an
external process that is attempting to write or read data in the Quartz
tables make sure that it's not the source of the locks (we use "WITH
(NOLOCK)" for all our "selects" in SQL Server).

-----Original Message-----
From: quar...@googlegroups.com [mailto:quar...@googlegroups.com] On

Behalf Of Gary German
Sent: Monday, January 10, 2011 3:48 PM
To: quar...@googlegroups.com

Stephen Tunney

unread,
Jan 10, 2011, 4:27:41 PM1/10/11
to quar...@googlegroups.com
I'm using SQLServer 2008.

I have absolutely no other processes accessing the quartz database. It is a single process on a remote machine, and I am simply using the Quartz API to create jobs and triggers.

This error happens without running queries against the database in an ad-hoc fashion, I just see the errors in my log files after running it for even just 10-15 minutes. I have not been able to see what transactions are causing the issue because SQLServer provides no means to showing ad-hoc sql statements in the monitoring systems that it has (we have IDERA as well, but it only shows a state of the server once every 5 minutes). We would need to create stored procedures and extend the Quartz code base to use these when configured to use the sqlserver driver in order to get statistics. I simply looked at the stack trace, and narrowed it down to a manageable spot in the code, and added a try/catch block and some extra code for reverting the state of the trigger.

Stephen

SQL Server 2008.

Gary German
ShastaSoftware.com

Hi,

-Marko

> ng lockName, ITransactionCallback txCallback)


>
> at
>
Quartz.Impl.AdoJobStore.JobStoreSupport.TriggeredJobComplete(SchedulingConte
xt
> ctxt, Trigger trigger, JobDetail jobDetail, SchedulerInstruction
> triggerInstCode)
>
> at
> Quartz.Core.QuartzScheduler.NotifyJobStoreJobComplete(SchedulingContex

> t ctxt, Trigger trigger, JobDetail detail, SchedulerInstruction

Gary German

unread,
Jan 11, 2011, 11:52:03 AM1/11/11
to quar...@googlegroups.com
Thanks for your explanation.

We plan to trap for that exception in our code, and then attempt to reset
state in QRTZ_TRIGGERS as you suggest.

We're also going to add a periodic "job scheduler restart" to our code, to
restart the scheduler once every few hours (to try and avoid memory leaks,
from Quartz, or our jobs).

Unfortunately, this exception rarely occurs, and we can't seem to reproduce
it in testing, so these are more like "insurance", than solid fixes, for
now.


FWIW, this intermittent glitch in no way affects my appreciation for
Quartz.Net. It's been an important component that we have installed on
nearly 50 servers. Kudos to the dev team.

Gary German

Reply all
Reply to author
Forward
0 new messages