Scheduler is_ticker and deadlock issues

129 views
Skip to first unread message

Jason Solack

unread,
Jan 24, 2017, 4:59:31 PM1/24/17
to web2py-users
Hello all, 

I'm having some re-occurring issue with the scheduler.  We are currently running multiple environments (production, beta) and have several nodes in each environment.  If we have scheduler services running on all machines on each node we get a lot of deadlock errors.  If we drop each environment down to one node we get no deadlock errors.  I am noticing the field "is_ticker" in the worker table will only have one ticker across all the workers (spanning environments).  Is that the expected behavior?  I don't see any documentation about the ticker field so i'm not sure what to expect from that.

Also is there any best practices about running the scheduler in an environment that i've described?  

Thanks in advance

Jason

Dave S

unread,
Jan 24, 2017, 9:05:47 PM1/24/17
to web2py-users
My scheduler experiences are a lot simpler, so I can't directly answer your questions.

But I can ask for more details!  Is this a cluster, or independent but related nodes?  What db engine is being used for the scheduler tables?   Are the deadlocks related to having a lot of work, or happen when the nodes are relatively idle as well?  Are tasks recurring or are they in response to client requests?  Do tasks not get run because of the problem, or is it just a minor delay?

/dps



Niphlod

unread,
Jan 25, 2017, 3:05:37 AM1/25/17
to web2py-users
you *should* have one different db for each environment. Each scheduler tied to the same db will process incoming tasks, and it doesn't matter what app effectively pushes them.
This is good if you want to have a single scheduler (which can be composed by several workers) serving many apps, but *generally* you don't want to *merge* prod and beta apps.

The is_ticker bit is fine: only one worker tied to a db is elegible to be a ticker, which is the one process than manages asssigning tasks (to itself AND to other available workers).
Locking, once in a while, can happen and is self-healed. Continuous locking is not good: either you have too many workers tied to the db OR your db isn't processing concurrency at the rate that it needs. 
SQLite can handle at most 2 or 3 workers. All the other "solid" backends can manage up to 10, 15 at most.
If you wanna go higher, you need to turn to the redis-backed scheduler.

Jason Solack

unread,
Jan 26, 2017, 11:44:25 AM1/26/17
to web...@googlegroups.com
So the issue is we run 6 workers on a machine and it works.  If we do 3 workers on 2 machines we get deadlocks.  That is no exaggeration - 6 records in our worker table and we're getting dealocks.

Dave S

unread,
Jan 26, 2017, 12:03:41 PM1/26/17
to web2py-users


On Thursday, January 26, 2017 at 8:44:25 AM UTC-8, Jason Solack wrote:
So the issue is we run 6 workers on a machine and it works.  If we do 3 workers on 2 machines we get deadlocks.  That is no exaggeration - 6 records in our worker table and we're getting dealocks.


Which DB are you using?  Can you show your relevant code?

/dps

Jason Solack

unread,
Jan 26, 2017, 12:45:20 PM1/26/17
to web2py-users
using mssql, the code itself is in gluon scheduler.py - this happens with no interaction from the app

Dave S

unread,
Jan 26, 2017, 1:47:51 PM1/26/17
to web2py-users
On Thursday, January 26, 2017 at 9:45:20 AM UTC-8, Jason Solack wrote:
using mssql, the code itself is in gluon scheduler.py - this happens with no interaction from the app


How do you instantiate the Scheduler?

Is the mssql engine on the same machine as any of the web2py nodes?  Are there non-web2py connections to it?

/dps

Niphlod

unread,
Jan 26, 2017, 2:43:19 PM1/26/17
to web2py-users
I think I posted the relevant number of queries issued to the backend for a given number of workers but I do daily use the scheduler on an mssql db and it can easily handle at least 10 workers (with the default heartbeat). Locking kicks in maybe once or twice a day, which means 1 or 2 on 28800 occasions, which is a pretty damn low number :P
Of course the backend *should* be able to sustain concurrency, but on a minimal server with very low specs 6 or 7 workers should absolutely pose no threats at all. 
For 5 workers all that is needed is a backend being able to handle 240 transactions per minute!

Jason Solack

unread,
Jan 27, 2017, 12:08:56 PM1/27/17
to web...@googlegroups.com
In your scenario do you have 10 workers on multiple machines?   So having 5 workers on 1 machine and 5 on another?  we can easily do 10 workers, but we run into issues when we have them split across machines

Niphlod

unread,
Jan 31, 2017, 2:47:17 AM1/31/17
to web2py-users
it really doesn't matter. the IPC is done on the database, so having local workers hitting it or remote ones doesn't turn into more transactions.
Reply all
Reply to author
Forward
0 new messages