Clustered Quartz Scheduler triggers the same job on two instances at the same time

9,012 views
Skip to first unread message

Irwin Plitt

unread,
Sep 17, 2013, 12:02:57 PM9/17/13
to quar...@googlegroups.com
I am using Quartz.net to run eight similar jobs once a minute.  A windows service runs each Quartz instance and Quartz is set up as a two-instance cluster using SQL Server as the data store.  Each job runs for approximately one minute and then terminate.  Every few days, two instances of the same job are run simultaneously, sometimes on the same node, sometimes on different nodes.  The two nodes are synchronized within one second, so I don't think that is causing the problem.

Has anyone seen this behavior?  Is there something I'm doing wrong?

Here are the config file settings for one of the nodes.  The config file for the other node differs only in the value of quartz.scheduler.instanceId, which is "Node-Two".

<quartz>
<add key="quartz.scheduler.instanceName" value="QuartzServer" />
<add key="quartz.scheduler.instanceId" value="Node-One" />
<add key="quartz.threadPool.threadCount" value="10" />
<add key="quartz.threadPool.threadPriority" value="Normal" />
<add key="quartz.jobStore.misfireThreshold" value="2000" />
<add key="quartz.jobStore.type" value="Quartz.Impl.AdoJobStore.JobStoreTX, Quartz" />
<add key="quartz.jobStore.useProperties" value="false" />
<add key="quartz.jobStore.dataSource" value="default" />
<add key="quartz.jobStore.tablePrefix" value="QRTZ_" />
<add key="quartz.jobStore.clustered" value="true" />
<add key="quartz.jobStore.lockHandler.type" value="Quartz.Impl.AdoJobStore.SimpleSemaphore, Quartz" />
<!-- point this at your database -->
<add key="quartz.dataSource.default.connectionString" value="..." />
<add key="quartz.dataSource.default.provider" value="SqlServer-20" />
</quartz>

Marko Lahma

unread,
Sep 17, 2013, 12:37:31 PM9/17/13
to Quartz. NET
You definitely should not be using SimpleSemaphore which is process
level lock. Other cluster members won't see the lock. If you are
running Quartz.NET 2.x (2.2 recommended) you can remove the
configuration option altogether, this will enabled SQL Server specific
locking that is done using database level locking table.

Hope this helps,

-Marko
> --
> You received this message because you are subscribed to the Google Groups
> "Quartz.NET" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to quartznet+...@googlegroups.com.
> To post to this group, send email to quar...@googlegroups.com.
> Visit this group at http://groups.google.com/group/quartznet.
> For more options, visit https://groups.google.com/groups/opt_out.

Irwin Plitt

unread,
Sep 17, 2013, 5:25:37 PM9/17/13
to quar...@googlegroups.com
I removed the configuration option and it seems to be working fine, but performance is not a good as before, which I assume we can attribute to table-level locking in SQL Server.  Is there a reason you didn't recommend row-level locking?  I'm assuming that I could get row-level locking by using the following setting:

"Quartz.Impl.AdoJobStore.UpdateLockRowSemaphore, Quartz"?

Irwin Plitt

unread,
Sep 17, 2013, 7:00:05 PM9/17/13
to quar...@googlegroups.com
Now that this has been running for a few hours with the new settings it is exhibiting even more duplicate run behavior than before.  It looks like a second job is starting up before the first one has completed.

Marko Lahma

unread,
Sep 18, 2013, 1:06:07 AM9/18/13
to Quartz. NET
Sorry, my wording was a bit off. Quartz.NET uses a table for locking,
when acquiring a lock from the table the update use efficient WITH
(UPDLOCK,ROWLOCK) with SQL Server. If you don't want same job running
multiple instances due to old still running when trigger fires you
should use DisallowConcurrentExecutionAttribute but be aware that jobs
may queue up then.

-Marko

Irwin Plitt

unread,
Sep 18, 2013, 2:23:20 PM9/18/13
to quar...@googlegroups.com
I'm already using the [DisallowConcurrentExecution] attribute in my .NET code.

I now suspect that the problem may be due to the fact that each job runs for slightly longer than 60 seconds, but they have repeat intervals of exactly 60 seconds.  Maybe the scheduler has been attempting to run the job again before the previous run completes and there is some sort or race condition.  To test that theory I've changed the repeat interval to 70 seconds.

Darshan Udayashankar

unread,
Sep 20, 2013, 9:19:41 AM9/20/13
to quar...@googlegroups.com
Hi All,

Can you please help me out in finding a tool to generate CRON
Expression which can be integrated with the C# application.

Regards
Darshan

Irwin Plitt

unread,
Sep 20, 2013, 8:18:21 PM9/20/13
to quar...@googlegroups.com
Now that I've changed the repeat interval to 70 seconds the number of attempted duplicate runs has decreased, but it has not been eliminated.  I've seen one example today of the same node scheduling the same job twice within 200 milliseconds.  Program logic is in place to abort the job if another instance is still running, so I've been able to avoid the consequences of two jobs running simultaneously, but I'd like to get to the root cause.

Has anyone else experienced this problem?

Marko Lahma

unread,
Sep 21, 2013, 2:10:00 AM9/21/13
to Quartz. NET
Only thing that comes from the top of my head is that jobs would be
distinct jobs with a different job key. The concurrency limiting
attribute actually works via checking the job key, not the job type.

-Marko

Irwin Plitt

unread,
Oct 22, 2013, 4:57:28 PM10/22/13
to quar...@googlegroups.com
The job keys are distinct.  The jobs are all in the same group but different names.
Reply all
Reply to author
Forward
0 new messages