Scheduler exception when setting immediate=True: 'Deadlock found when trying to get lock...'

149 views
Skip to first unread message

Osman Masood

unread,
Feb 26, 2015, 6:28:28 PM2/26/15
to web...@googlegroups.com
Hello all,

I keep getting this error when calling scheduler.queue_task() with immediate=True:

<class 'gluon.contrib.pymysql.err.InternalError'> (1213, u'Deadlock found when trying to get lock; try restarting transaction')


The stack trace shows this is the culprit:

  self.db(self.db.scheduler_worker.is_ticker == True).update(status=PICK)
(This is a web application which uses the scheduler pretty extensively.) Seems like multiple connections are competing for access to the same scheduler_worker row. 
Seems to me like a web2py bug. Any help would be greatly appreciated. Thanks!

Niphlod

unread,
Feb 26, 2015, 6:35:41 PM2/26/15
to web...@googlegroups.com
it seems to me that there is database contention........ although the exception **should** be trapped (i.e. try to set the status to "PICK", if not, well.... it's not a fatal error), there's something wrong with your setup or your database is too much "underpowered" to do what you're asking.
How many workers are running and with what heartbeat ?

Osman Masood

unread,
Feb 26, 2015, 6:54:47 PM2/26/15
to web...@googlegroups.com
Thanks for the quick response!

I'm using Amazon RDS (just basically MySQL), on a pretty beefy instance (db.m3.large), 1000 IOPS. Would be pretty surprised if that were the problem.

Running 5 scheduler workers with the default heartbeat (3s).

Osman Masood

unread,
Feb 26, 2015, 9:38:19 PM2/26/15
to web...@googlegroups.com
I was actually able to easily reproduce this. Just open up 2 web2py consoles (which connect to the same DB), and do:

scheduler.queue_task('myfunc', pvars=dict(), immediate=True)

The first one will succeed, and the second one will hang there for some time, and then give the deadlock exception above.

This kinda makes sense, looking at the above DB query & considering record-level locking. We have conditional writes on all records in the scheduler_worker table, so if two connections attempt this at the same time, it would follow that there is a deadlock. (Correct me if I'm wrong.) So I think the solution would be something like (on line 1303 of scheduler.py):

             if immediate:

-                self.db(self.db.scheduler_worker.is_ticker == True).update(status=PICK)

+                scheduler_worker_tickers = self.db(self.db.scheduler_worker.is_ticker == True).select()

+                for scheduler_worker_ticker in scheduler_worker_tickers:

+                    scheduler_worker_ticker.update_record(status=PICK)

Niphlod

unread,
Feb 27, 2015, 2:39:45 AM2/27/15
to web...@googlegroups.com
you know that on console, after queue_task() you need to commit(), right ? If you leave the transaction hanging, no wonder that you're observing deadlocks!

Osman Masood

unread,
Feb 27, 2015, 2:43:31 AM2/27/15
to web...@googlegroups.com
Right. The purpose of using the console was to simulate what actually happens in production.

--
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
---
You received this message because you are subscribed to a topic in the Google Groups "web2py-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/web2py/ZBv43p3w_MM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to web2py+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michele Comitini

unread,
Feb 27, 2015, 5:19:48 AM2/27/15
to web...@googlegroups.com
Try too see if in production there is a transaction that stays open for a long time.
It could be a console left open somewhere?


You received this message because you are subscribed to the Google Groups "web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web2py+un...@googlegroups.com.

Osman Masood

unread,
Apr 24, 2015, 9:58:14 PM4/24/15
to web...@googlegroups.com
Nope...as proven earlier, this is a web2py timing issue. It's been occurring randomly in production for some time now. I have a try/except surrounding it, but the sucky part is that since the transaction fails, you lose all the commits in the whole request.
Reply all
Reply to author
Forward
0 new messages