Ongoing issue with 'Lost connection to MySQL server during query'

356 views
Skip to first unread message

Ian Ryder

unread,
Mar 13, 2015, 5:30:39 PM3/13/15
to web...@googlegroups.com
Hi, we are using Web2py to do some batch processing of data but from day one have had show-stopping issues with losing connection to MySQL on PythonAnywhere.

We've refactored the code to be much lighter in terms of batch sizes etc which has made it much closer to perfect but we still have problems.

Essentially we're going over the 300 second timeout on PA and web2py won't reconnect no matter what we try. We have:

- tried pool size 10 on the connection
- tried pool size 0 on the connection
- db.commit before kicking off the longer process
- db.close before kicking off the longer process
- tried reconnecting the entire DB when it drops
- tried db._adapter.reconnect() when it drops

Same results every time...the basic steps are: 

- scheduled method called
- it calls batch task which loops through happily, constantly working on the database
- returns to the original method which then tried to close a status record off and that's where it bombs

Here's some code:
self.my_pid = this_task.last_doer_pid
        new_doer = {
            'doer_name': self.get_doer_name(this_task.id),
            'task_queue': this_task.id,
            'method_name': this_task.method_name,
            'pid': self.my_pid
        }        
        this_doer_id = self.dbase.task_doer.insert(**self.dbase.task_doer._filter_fields(new_doer))
        this_doer = self.dbase.task_doer[this_doer_id]
        this_task.update_record(status=RUNNING, last_doer_id=this_doer.id, doer_name=new_doer['doer_name'])
        self.dbase.commit()

<snip socket handling>
<snip method name getting>

        if not run_method:
            this_doer.update_record(status=FAILED, status_note='Method %s not implemented' % this_task.method_name)
            self.dbase.commit()
            raise Exception('Method %s not implemented' % this_task.method_name)

        passed_args = passed_vars = None
        # close off the connection so it doesn't drop out during the processing
        self.dbase.commit()
        try:
            run_method(
                *loads(this_task.args, object_hook=self._decode_dict),
                **loads(this_task.vars, object_hook=self._decode_dict)) #, *passed_vars)
        except Exception as ee:
            this_doer.update_record(status=FAILED, status_note='Failed: ' + str(ee))
            debug_log(sys._getframe().f_code.co_name, None, 
            start=method_start, error_details=ee, severity='Fail' )                        
            raise Exception('Failure during method execution: ' + ee)

        print 'About to save status'
        this_doer.update_record(status=COMPLETED, status_note='Normal exit')
        self.dbase.commit()

        print 'Clean exit for doer ' + str(this_doer.pid)

Just to confirm, here's a test with 2 processes, one waiting 295 seconds and closing fine, the other waiting 305 seconds and blowing up:

Database drivers available: SQLite(sqlite3), MySQL(pymysql), MySQL(MySQLdb), MySQL(mysqlconnector), PostgreSQL(psycopg2), PostgreSQL(pg8000), MSSQL(pyodbc), DB2(pyodbc), Teradata(pyodbc), Ingres(pyodbc), MongoDB(pymongo), IMAP(imaplib)################################## ########295##########################################About to save statusClean exit for doer 2928################################## ########305##########################################(2013, 'Lost connection to MySQL server during query')About to save statusClean exit for doer 2925Traceback (most recent call last): File "/home/ianryder/yarpr/web2py/gluon/restricted.py", line 224, in restricted exec ccode in environment File "applications/yarpr/models/my_scheduler.py", line 446, in <module> PenguinDoer(db_app) File "applications/yarpr/models/my_scheduler.py", line 382, in __init__ debug_log(sys._getframe().f_code.co_name, None, start=method_start, job_status=None) File "applications/yarpr/models/debug_framework.py", line 196, in debug_log debug_flush() File "applications/yarpr/models/debug_framework.py", line 200, in debug_flush db_usage.debug_data.bulk_insert(DebugHandler.debug_records) File "/home/ianryder/yarpr/web2py/gluon/dal.py", line 9425, in bulk_insert ret = self._db._adapter.bulk_insert(self, items) File "/home/ianryder/yarpr/web2py/gluon/dal.py", line 1376, in bulk_insert return [self.insert(table, item) for item in items] File "/home/ianryder/yarpr/web2py/gluon/dal.py", line 1361, in insert raise eOperationalError: (2006, 'MySQL server has gone away')
Any help appreciated!

Cheers
Ian

Ian Ryder

unread,
Mar 14, 2015, 12:50:49 AM3/14/15
to web...@googlegroups.com
Sorry, formatting was all screwy on the last bit of the OP - 2 processing running together, one lasting 295 seconds and saving without issue, the other lasting 305 seconds and failing because the SQL server has gone / isn't reconnected:


################################## 
########295#########
################################## 

About to save statusClean exit for doer 2928
################################## 
########305########
##################################
About to save statusClean exit for doer 2925
(2013, 'Lost connection to MySQL server during query')
Traceback (most recent call last): 
File "/home/ianryder/yarpr/web2py/gluon/restricted.py", line 224, in restricted exec ccode in environment 
File "applications/yarpr/models/my_scheduler.py", line 446, in <module> PenguinDoer(db_app) 
File "applications/yarpr/models/my_scheduler.py", line 382, in __init__ debug_log(sys._getframe().f_code.co_name, None, start=method_start, job_status=None) 
File "applications/yarpr/models/debug_framework.py", line 196, in debug_log debug_flush() 
File "applications/yarpr/models/debug_framework.py", line 200, in debug_flushdb_usage.debug_data.bulk_insert(DebugHandler.debug_records) 
File "/home/ianryder/yarpr/web2py/gluon/dal.py", line 9425, in bulk_insert ret = self._db._adapter.bulk_insert(self, items)
File "/home/ianryder/yarpr/web2py/gluon/dal.py", line 1376, in bulk_insert return [self.insert(table, item) for item in items] 
File "/home/ianryder/yarpr/web2py/gluon/dal.py", line 1361, in insert
raise e OperationalError: (2006, 'MySQL server has gone away')

Paolo Valleri

unread,
Mar 14, 2015, 9:04:04 AM3/14/15
to web...@googlegroups.com
Hi,
first of all use the latest pydal, it contains a fix when trying to close 'broken connection'.
Afterwards, for a proper reconnect in your application you should do:
- db._adapter.close()
- db._adapter.reconnect()

If you are looking for an 'auto-reconnect', namely it will re-execute all queries in the broken transaction, I've been working on it, have a look at https://github.com/ilvalle/pydal/tree/auto-reconnect
In this case it will re-execute all commands in the last transaction (but if you do a db.commit() before the operation, then it is fine). 
Finally, I've never tried it with mysql, but it should work since an OperationalError is raised.

Paolo

Ian Ryder

unread,
Mar 16, 2015, 9:41:57 AM3/16/15
to web...@googlegroups.com
Thanks Paolo - sadly same result :( I'm not using pyDAL, just the usual Web2py DAL

##################################
         ########305########
##################################
Traceback (most recent call last):
  File "/home/ianryder/yarpr/web2py/gluon/restricted.py", line 224, in restricted
    exec ccode in environment
  File "applications/yarpr/models/my_scheduler.py", line 554, in <module>
    PenguinDoer(db_app)
  File "applications/yarpr/models/my_scheduler.py", line 468, in __init__
    debug_log(sys._getframe().f_code.co_name, None, start=method_start, error_details=ee, severity='Fail')
  File "applications/yarpr/models/do_debugging.py", line 197, in debug_log
    debug_flush()
  File "applications/yarpr/models/do_debugging.py", line 201, in debug_flush
    db_usage.debug_data.bulk_insert(DebugHandler.debug_records)
  File "/home/ianryder/yarpr/web2py/gluon/dal.py", line 9425, in bulk_insert
    ret = self._db._adapter.bulk_insert(self, items)
  File "/home/ianryder/yarpr/web2py/gluon/dal.py", line 1376, in bulk_insert
    return [self.insert(table, item) for item in items]
  File "/home/ianryder/yarpr/web2py/gluon/dal.py", line 1361, in insert
    raise e
OperationalError: (2006, 'MySQL server has gone away')

Here's the code:
        self.dbase.commit()
        self.dbase._adapter.close(action=None)
        try:
            # now the long running task - in the test it's a sleep for 305 seconds.
            run_method(
                *loads(this_task.args, object_hook=self._decode_dict),
                **loads(this_task.vars, object_hook=self._decode_dict)) #, *passed_vars)
        except Exception as ee:
            self.dbase._adapter.reconnect()
            this_doer = change_status(
                this_doer,
                FAILED,
                'Failed: ' + str(ee)
            )
            this_doer.update_record()
            debug_log(sys._getframe().f_code.co_name, None, start=method_start, error_details=ee, severity='Fail')
            raise Exception('Failure during method execution: ' + str(ee))

        self.dbase._adapter.reconnect()
        this_doer = change_status(
            this_doer,
            COMPLETED,
            'Normal exit'
        )


Paolo Valleri

unread,
Mar 16, 2015, 10:07:06 AM3/16/15
to web...@googlegroups.com
your query takes to much time to be executed and the query goes into timeout.
From the log I see that you are running a bulk_insert, you can split it into more operations, placing a db.commit between them.
Mind that in this case the bulk_insert will be executed into few transactions. 

Otherwise, you can run a custom bulk_insert based on an executesql containing all the inserts. 
let us know more information about the query.

 Paolo

--
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
---
You received this message because you are subscribed to a topic in the Google Groups "web2py-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/web2py/U-hkmIc3TZM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to web2py+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ian Ryder

unread,
Mar 17, 2015, 10:33:20 AM3/17/15
to web...@googlegroups.com
The issue sits with the parent method - it calls a method which takes > 300 seconds, then whatever action is taken with the database in the parent method on return blows up.

I think I've cracked it though - and to get there I had to drop the web2py scheduler and write my own so I knew what was going on!

Anyway, for anyone else in the same situation (unless web2py handles this natively down the track), I basically have to reconnect completely, ie:
  • db = DAL('mysql://XXXXX...@mysql.server/xxxxxxxxxxx', fake_migrate=False, pool_size=10)
  • define all the tables again
  • reload any records I had open in the calling method
I've written the relevant methods to make it all fairly neat...it's not the end of the universe and may be the only way but seems clunky...given I've been living with this for 2 or 3 months I'm just happy I have an answer that seems properly robust ;)

Paolo Valleri

unread,
Mar 17, 2015, 11:21:35 AM3/17/15
to web...@googlegroups.com
Ian, as mentioned before, with the last trunk a reconnect can be done with the following:
- db._adapter.close()
- db._adapter.reconnect()
Without the last trunk, try the following:
try:
     db._adapter.close()
except:
     db._adapter.connection = None
db._adapter.reconnect()

but my suggestion is to use the last trunk




 Paolo

Ian Ryder

unread,
Mar 18, 2015, 12:50:22 PM3/18/15
to web...@googlegroups.com
Thanks Paolo, the bit missing was being on the latest trunk. Much neater than what I had as I don't need to reload the open records :)

Just to confirm the solution for anyone else who gets it:

db._adapter.close()

# call long running process

db._adapter.reconnect()

And it works exactly as I'd hope so far...will give it a thorough testing and report back if any issues pop up

On Wednesday, March 18, 2015 at 2:21:35 AM UTC+11, Paolo Valleri wrote:
Ian, as mentioned before, with the last trunk a reconnect can be done with the following:
- db._adapter.close()
- db._adapter.reconnect()
Without the last trunk, try the following:
try:
     db._adapter.close()
except:
     db._adapter.connection = None
db._adapter.reconnect()

but my suggestion is to use the last trunk




 Paolo

2015-03-17 11:02 GMT+01:00 Ian Ryder <i.r...@appichar.com.au>:
The issue sits with the parent method - it calls a method which takes > 300 seconds, then whatever action is taken with the database in the parent method on return blows up.

I think I've cracked it though - and to get there I had to drop the web2py scheduler and write my own so I knew what was going on!

Anyway, for anyone else in the same situation (unless web2py handles this natively down the track), I basically have to reconnect completely, ie:
  • db = DAL('mysql://XXXXXXXXXX@mysql.server/xxxxxxxxxxx', fake_migrate=False, pool_size=10)

Andre Kozaczka

unread,
Jun 4, 2016, 7:01:31 PM6/4/16
to web2py-users
Where do you put the
- db._adapter.close()
- db._adapter.reconnect

In the Scheduler.py file?

For some reason I'm still getting the lost connection error. I'm putting the close() and reconnect() in Scheduler.py and they surround the long processes.

Reply all
Reply to author
Forward
0 new messages