retrying queries and 'Lost connection to MySQL server'

Bobby Impollonia

unread,

Dec 21, 2008, 11:24:08 PM12/21/08

to sqlal...@googlegroups.com

I occasionally have a query fail with 'Lost connection to MySQL server
during query' which gets converted into a
sqlalchemy.exceptions.OperationalError. I have not been able to figure
out why it happens, but the server should always be available.

I would like to tell sqlalchemy that if a query fails with this error,
it should wait a few seconds and then retry the query (and probably
give up if it fails again). Does SQLA provide some sort of hooks that
would allow me to do this without gnarly monkey patching?

Michael Bayer

unread,

Dec 21, 2008, 11:32:07 PM12/21/08

to sqlal...@googlegroups.com

you'd have to organize your code such that the desired operation can
be attempted again when this exception is raised.

This is a pretty tough road to travel, though, since if the connection
is lost, so is your entire transaction and everything you've loaded/
persisted within it. A better approach would be to isolate the cause
of the error. This error is commonly caused by a MySQL client
timeout (usually on a connection that's been idle for 8 hours) and is
allevated using the pool_recycle=<some number of seconds> option.

Bobby Impollonia

unread,

Dec 22, 2008, 12:22:11 AM12/22/08

to sqlal...@googlegroups.com

This code isn't using transactions so retrying a failed query should
be as simple as creating a new connection to replace the failed one
and executing the query again.

Still, I would much prefer to figure out the real cause, as you say. I
had sort of given up on that because after a little while researching
this error, I couldn't find much helpful info. It's hard to debug
because the issue happens in a daily cron job, but it happens less
than once a month and the rest of the time everything works fine. I
have no way of consistently reproducing the problem or knowing if I've
fixed it.

I'm pretty sure there is no way that 8 hours could have gone by
between the last query and the one that blew up.

The basic structure of the cron job is:
1) It start up, does some sql stuff.
2) It forks a worker process using the python processing module.
3a) The worker calls metadata.bind.dispose() so that it won't try to
reuse the connection it inherited from the parent. Worker then does
some sql stuff. Worker always finishes successfully.
3b) Parent process goes into a loop doing sql stuff. Parent usually
finishes successfully, but occasionally dies with the aforementioned
MySQL error. I can't tell from the traceback whether it happens during
the first iteration of the loop immediately after spawning the child
or if it happens later.

In principle, this structure is safe, right? 3a and 3b are happening
in parallel, so it is indeterminate whether the worker calls dispose()
before or during the sql stuff going on in the parent, but that
shouldn't mater, right? Is it possible that the call to dispose() is
somehow closing the connection in a way that sabotages the parent?

Michael Bayer

unread,

Dec 22, 2008, 9:41:55 AM12/22/08

to sqlal...@googlegroups.com

On Dec 22, 2008, at 12:22 AM, Bobby Impollonia wrote:

>
> This code isn't using transactions so retrying a failed query should
> be as simple as creating a new connection to replace the failed one
> and executing the query again.
>
> Still, I would much prefer to figure out the real cause, as you say. I
> had sort of given up on that because after a little while researching
> this error, I couldn't find much helpful info. It's hard to debug
> because the issue happens in a daily cron job, but it happens less
> than once a month and the rest of the time everything works fine. I
> have no way of consistently reproducing the problem or knowing if I've
> fixed it.
>
> I'm pretty sure there is no way that 8 hours could have gone by
> between the last query and the one that blew up.
>
> The basic structure of the cron job is:
> 1) It start up, does some sql stuff.
> 2) It forks a worker process using the python processing module.
> 3a) The worker calls metadata.bind.dispose() so that it won't try to
> reuse the connection it inherited from the parent. Worker then does
> some sql stuff. Worker always finishes successfully.

it might be better to just call create_engine() and not use bound
metadata here.

>
> 3b) Parent process goes into a loop doing sql stuff. Parent usually
> finishes successfully, but occasionally dies with the aforementioned
> MySQL error. I can't tell from the traceback whether it happens during
> the first iteration of the loop immediately after spawning the child
> or if it happens later.
>
> In principle, this structure is safe, right? 3a and 3b are happening
> in parallel, so it is indeterminate whether the worker calls dispose()
> before or during the sql stuff going on in the parent, but that
> shouldn't mater, right? Is it possible that the call to dispose() is
> somehow closing the connection in a way that sabotages the parent?

I wouldn't think so, but I'm not intricately familiar with the
mechanics of database connections passed between parent/child forks.
If its just a cron job you might want to consider using the NullPool
which doesnt pool any connections.

Reply all

Reply to author

Forward