"Can't reconnect until invalid transaction is rolled back" with repoze.who / repoze.what

Josh Kelley

unread,

Nov 13, 2010, 11:25:30 PM11/13/10

to pylons-discuss

I'm getting the dreaded "MySQL server has gone away" and "Can't
reconnect until invalid transaction is rolled back" pair of errors in
my Pylons application.

From what I've read online, the "Can't reconnect until invalid
transaction is rolled back" is supposed to be avoided by Pylons'
default lib/base.py wrapping each controller's operations in a

try:
...
finally:
Session.remove()

block. However, from what I can tell, I'm getting this error within
repoze.who's middleware (specifically, repose.who.plugins.sa's
_BaseSQLAlchemyPlugin.get_user), which I assume is executing outside
of this try / finally block, so Session is never removed or rolled
back, so the "Can't reconnect until invalid transaction is rolled
back" error is never cleared.

How do I fix this "invalid transaction" state? I've read all of the
suggestions on handling "MySQL server has gone away" and will work on
those, but before I do so, I'd like to know that if an error does
somehow slip through, it won't leave my app in a broken "invalid
transaction" state.

--
Josh Kelley

Gustavo Narea

unread,

Nov 15, 2010, 4:54:37 PM11/15/10

to pylons-discuss

Hello,

The only thing I can think of is that the repoze.who middleware was
added in the wrong order. Please have a look at this example:
https://bitbucket.org/Gustavo/whatpylonsproject/src/tip/pylonssecuredapp/config/middleware.py

I don't remember seeing that error before. Can you please paste the
traceback if the location of the middleware is not the problem?

Cheers.

Josh Kelley

unread,

Nov 15, 2010, 9:07:18 PM11/15/10

to pylons-discuss

On Nov 15, 4:54 pm, Gustavo Narea <m...@gustavonarea.net> wrote:
> The only thing I can think of is that the repoze.who middleware was

> added in the wrong order. Please have a look at this example:https://bitbucket.org/Gustavo/whatpylonsproject/src/tip/pylonssecured...

>
> I don't remember seeing that error before. Can you please paste the
> traceback if the location of the middleware is not the problem?

Thanks for the reply.

My middleware.py looks very similar to the one you linked. The middle
is slightly different (I haven't done anything with caching, and I
added ToscaWidgets; neither looks like it should be a problem):

# Routing/Session/Cache Middleware
app = RoutesMiddleware(app, config['routes.map'], singleton=False)
app = SessionMiddleware(app, config)

# CUSTOM MIDDLEWARE HERE (filtered by error handling middlewares)

app = twa.make_middleware(app, {
'toscawidgets.framework': 'pylons',
'toscawidgets.framework.default_view': 'mako',
})
app = add_auth(app)

if asbool(full_stack):
# Handle Python exceptions
app = ErrorHandler(app, global_conf,
**config['pylons.errorware'])

My add_auth uses repoze.what.plugins.quickstart.setup_sql_auth, like
the one you linked. (Actually, I think I followed your cookbook
article.)

I posted the traceback to http://pastie.org/1301390.

After investigating a bit more, here's my understanding of the
problem. Please correct me if I'm wrong:

As described at http://www.sqlalchemy.org/trac/wiki/FAQ#Thetransactionisinactiveduetoarollbackinasubtransaction,
uses of Session should make sure that they end with a call to
rollback(), close(), or remove(). Pylons does this in its
BaseController, but because repoze.who.plugins.sa is configured as
middleware, it executes outside of BaseController's try/finally block,
so any errors it encounters are never rolled back.

I can think of three solutions:

1) Give repoze.what.quickstart a Session object with autocommit=True,
so that it never has to rollback. I guess this would mean creating
two session objects (one for Pylons with autocommit=False, one for
repoze.who / repoze.what with autocommit=True); are there any caveats
with doing this?
2) Initialize and close the Session within middleware rather than
within the Pylons app (something like http://pypi.python.org/pypi/SQLAlchemyManager/0.1.0?).
3) Add try/except blocks to repoze.who.plugins.sa.

#3 seems like the correct solution, but I'm very new to Pylons,
repoze, SQLAlchemy, and WSGI middleware, so I could easily be
misunderstanding something.

--
Josh Kelley

Gustavo Narea

unread,

Nov 16, 2010, 5:45:43 PM11/16/10

to pylons-discuss

Hello, Josh.

Thanks for the information.

I couldn't find the message "Can't reconnect until invalid transaction
is rolled back" in the output you pasted and I think the link to the
FAQ refers to another type of issue.

According to the traceback, the exception is raised when the user
object is loaded. Before that, the plugin would've issued another
query to verify the username and password, and it seems like that
query did succeed (you can see it by increasing the verbosity of the
SQLAlchemy logger). So somehow, the connection is lost between those
two queries (while the repoze.who middleware is being executed).

I can't see anything on the plugin that would cause that -- in fact,
as I mentioned yesterday, this is the first time I hear about this in
2 years.

I've been reading about that error on the MySQL documentation and it
seems like all the possible causes are external to the application:
http://dev.mysql.com/doc/refman/5.0/en/gone-away.html

There's some troubleshooting information there. If you can't still
find the problem, I'd recommend writing to the SQLAlchemy mailing list
with the output of the logger at the INFO level -- You can CC me on
that email so that I can keep an eye on it and see if I can help.

You can try and tweak the Session if you want; it may or may not help.
I don't think handling the exception in repoze.who.plugins.sa is an
appropriate solution because we'd silencing a problem that should be
fixed.

HTH.

- Gustavo.

> I posted the traceback tohttp://pastie.org/1301390.

>
> After investigating a bit more, here's my understanding of the
> problem. Please correct me if I'm wrong:
>

> As described athttp://www.sqlalchemy.org/trac/wiki/FAQ#Thetransactionisinactivedueto...,

> uses of Session should make sure that they end with a call to
> rollback(), close(), or remove(). Pylons does this in its
> BaseController, but because repoze.who.plugins.sa is configured as
> middleware, it executes outside of BaseController's try/finally block,
> so any errors it encounters are never rolled back.
>
> I can think of three solutions:
>
> 1) Give repoze.what.quickstart a Session object with autocommit=True,
> so that it never has to rollback. I guess this would mean creating
> two session objects (one for Pylons with autocommit=False, one for
> repoze.who / repoze.what with autocommit=True); are there any caveats
> with doing this?
> 2) Initialize and close the Session within middleware rather than

> within the Pylons app (something likehttp://pypi.python.org/pypi/SQLAlchemyManager/0.1.0?).

Josh Kelley

unread,

Nov 16, 2010, 11:34:33 PM11/16/10

to pylons-discuss

On Nov 16, 5:45 pm, Gustavo Narea <m...@gustavonarea.net> wrote:
> Thanks for the information.
>
> I couldn't find the message "Can't reconnect until invalid transaction
> is rolled back" in the output you pasted and I think the link to the
> FAQ refers to another type of issue.

Sorry. There are actually two exceptions; the first I already posted,
and here's the second, with the "Can't reconnect until invalid
transaction is rolled back" error.

http://pastie.org/1304673

I don't know why I got two exceptions on a single request? I assumed
the Pylons / repoze stack would have aborted after the first. (Unless
the second exception was while trying to render the error page? If
that is what's happening, is there a way to keep repoze.who from
breaking rendering the error page?)

> I've been reading about that error on the MySQL documentation and it
> seems like all the possible causes are external to the application:http://dev.mysql.com/doc/refman/5.0/en/gone-away.html
>

> You can try and tweak the Session if you want; it may or may not help.
> I don't think handling the exception in repoze.who.plugins.sa is an
> appropriate solution because we'd silencing a problem that should be
> fixed.

I understand that the causes of the "MySQL server has gone away" are
external to the app and will work on that later; my concern right now
is that (as far as I can tell) repoze.who.plugins.sa isn't cleaning up
when this happens, which causes the app to get stuck in the "Can't
reconnect until invalid transaction is rolled back" state (and I have
to restart the app to get anything working again). The FAQ I linked
to does appear to be a different specific issue, but its solution of
using a try/except block to properly handle rollbacks seems to apply
here too.

--
Josh Kelley

Mike Orr

unread,

Nov 16, 2010, 11:58:28 PM11/16/10

to pylons-...@googlegroups.com

On Tue, Nov 16, 2010 at 8:34 PM, Josh Kelley <jos...@gmail.com> wrote:
> On Nov 16, 5:45 pm, Gustavo Narea <m...@gustavonarea.net> wrote:
>> Thanks for the information.
>>
>> I couldn't find the message "Can't reconnect until invalid transaction
>> is rolled back" in the output you pasted and I think the link to the
>> FAQ refers to another type of issue.
>
> Sorry. There are actually two exceptions; the first I already posted,
> and here's the second, with the "Can't reconnect until invalid
> transaction is rolled back" error.
>
> http://pastie.org/1304673

I get that sometimes but I'm not using Repoze.who. For some reason
SQLAlchemy leaves the transaction in a stuck state when an error
occurs until you explicitly do session.rollback().
Except that by the time I get the error email the transaction is
already over so there's nothing I can do about it anyway.

--
Mike Orr <slugg...@gmail.com>

Gustavo Narea

unread,

Nov 18, 2010, 5:05:25 PM11/18/10

to pylons-discuss

Hello, Josh.

If I make repoze.who.plugins.sa handle the exception, I'd be silencing
that error, which I'd rather not do as that sort of things always make
debugging harder.

If you had another WSGI middleware that uses SA, chances are you'd get
the same error, so I think a better solution for you would be to
subclass ErrorMiddleware like this:
"""
class MyErrorMiddleware(ErrorMiddleware):

def exception_handler(self, exc_info, environ):
exception_class = exc_info[0]
if exception_class in (InvalidRequestError, OperationalError):
# rollback...
return super(MyErrorMiddleware,
self).exception_handler(exc_info, environ)
"""

That should avoid the second exception, allowing the error page to be
returned without problems. And it'd also work if that error happens
within your application and you're not expecting it.

I'm happy to reconsider handling the exception if more hit the same
problem, though.

HTH,

- Gustavo.

Josh Kelley

unread,

Nov 20, 2010, 10:44:35 AM11/20/10

to pylons-discuss

On Nov 18, 5:05 pm, Gustavo Narea <m...@gustavonarea.net> wrote:
> If I make repoze.who.plugins.sa handle the exception, I'd be silencing
> that error, which I'd rather not do as that sort of things always make
> debugging harder.

Sorry if I wasn't clear enough. I was suggesting doing something like

except:
Session.rollback()
raise

rather than handling the exception entirely.

> If you had another WSGI middleware that uses SA, chances are you'd get
> the same error, so I think a better solution for you would be to
> subclass ErrorMiddleware like this:
> """
> class MyErrorMiddleware(ErrorMiddleware):
>
> def exception_handler(self, exc_info, environ):
> exception_class = exc_info[0]
> if exception_class in (InvalidRequestError, OperationalError):
> # rollback...
> return super(MyErrorMiddleware,
> self).exception_handler(exc_info, environ)
> """
>
> That should avoid the second exception, allowing the error page to be
> returned without problems. And it'd also work if that error happens
> within your application and you're not expecting it.

Perfect! Thanks for all of your help.

--
Josh Kelley

edwincheese

unread,

Dec 22, 2010, 5:48:37 AM12/22/10

to pylons-discuss

Hi

Sorry for bringing back this rather old thread. I am going to add some
more information on the discussion of this issue.

We have constantly encountered the exactly same issue mentioned by
Josh with Pylons and repoze.who 1.0 in a low traffic web application.
After analysed the check out pattern of SQLAlchemy connection pool, we
concluded that repoze.*, friendlyformplugin, and SQLAlchemyPlugin
cause this problem. (For other people having the "MySQL server gone
away" issue, I suggest you check pool_recycle directive of SQLAlachemy
first if you have not)

As Josh mentioned, SQLAlahcmey expect the session to be closed after
use. For web application, SQLAlachemy recommend closing the session at
the end of every request. Pylons followed this recommendation by
removing the session in the "finally" clause in the BaseController. So
when a request pass through the BaseController, the SQLAlachemy
session would always be properly cleaned up.

However it is not the case that every request would pass through the
BaseController when using repoze.*, friendlyformplugin and
SQLAlchemyPlugin.

Here is a simplified view of how a **normal** request use a SQLAlchemy
session (I assumed using default configuration of Pylons so sessions
are in fact SQLAlchemy's scoped_session; and the user is logged in):

== Normal Request ==
1. Request come in
2. SQLAlchemyPlugin use Session to retrieve user object.
scoped_session create the underlying session object and hence a
connection is checked out from connection pool.
3. repoze.who's PluggableAuthenticationMiddleware pass control to
downstream middleware.
4. Control pass to controller action, controller use Session to query
the database.
5. Session.remove() called in BaseController to clean up the session
and return connection to connection pool
6. Completing request and return result to user.
7. Thread wait for another request.

The problem occurred when requesting friendlyform's login_handler and
logout_handler url. Here is how the login/logout request use a
SQLAlchemy session:

== Problematic Request (Login) ==
1. Request come in
2. Friendlyform handle the request
3. Invoke SQLAlchemyPlugin to check login credential, it use Session
to query the user model. Hence, scoped_session create the underlying
session object and hence a connection is checked out from connection
pool.
4. Credential accepted, Friendlyform set
environ['repoze.who.application'] to a HTTPFound instance (for
redirecting user to the post_login_path)
5. repoze.who's PluggableAuthenticationMiddleware execute HTTPFound
instead of statically configured downstream middleware, thus Pylons
controllers are not executed
6. 302 Found send to user to redirect to post_login_path. Request
completed.
7. Thread wait for another request.

The about observation could be verified by setting the log level of
sqlalchemy.pool to DEBUG, that will show how a connection is checked
out and return to connection pool. The problem is, the session is not
properly closed when going through the login_handler or logout_handler
of friendlyform. Therefore, the same session and connection is used
again in next request of this thread. MySQL server drop connection
after several hours by default, so if next request for this thread
come in after MySQL dropping the connection, you receive the "MySQL
server gone away" error.

We solve the problem by adding a middleware warping repoze.who's
middleware, and it call Session.remove() when the request path is
login_handler or logout_handler.

Please suggest if it is better to handle this in repoze.who.

Regards
Edwin

Mike Orr

unread,

Dec 22, 2010, 1:29:07 PM12/22/10

to pylons-...@googlegroups.com

It sounds like a bug in FriendlyForm. If it's using the scoped
session, it should remove it in just before returning. I don't think
it matters if Session.remove() is called twice, once by the
application and once by the middleare. The only problem would be if a
farther-out middleware wants to do some postprocessing on some objects
it fetched during preprocessing. But that's rare, and having
Session.remove() in the application would have prevented it from
working anyway.

The other alternative is to use a separate scoped session and separate
engine for the middleware, so that it's not sharing connections with
the application.

This all does show the limitations of combining global state (a shared
scoped session) with middleware (which doesn't know whether other
middleware has removed the session).

--
Mike Orr <slugg...@gmail.com>

Richard Harding

unread,

Apr 18, 2011, 12:11:12 PM4/18/11

to pylons-...@googlegroups.com

Edwin, we're seeing a similar thing here and I've been trying to modify the form plugin to add a .remove() manually to see if it helps with little success.

Can you verify that your wsgi wrap does work and can you share the code/what it's doing? Are you just importing from the models Session and running a Session.remove()? or are you doing additional step?

Thanks.

Rick

Gustavo Narea

unread,

Apr 19, 2011, 6:28:55 PM4/19/11

to pylons-...@googlegroups.com

Hello,

I finally got some time today to investigate this further and fix it.

I think the only way to solve this at the repoze.who.plugins.sa level is by rolling back the transaction before issuing a query. I agree that, ideally, this would've been handled in the repoze.who-friendlyform plugin, but the problem is that these two plugins and independent from each other.

I'll wait 24 hours before releasing this, to allow time to get some feedback on the change:
https://github.com/repoze/repoze.who-sqlalchemy/commit/b327b426125859dff9eccd05611c1143912c554d

I've tested it and all the tests pass, but maybe I missed a reason why this solution is not a good one.

Cheers,

- Gustavo.

Richard Harding

unread,

Apr 20, 2011, 9:32:17 AM4/20/11

to pylons-...@googlegroups.com

Gustavo, thanks so much for the updates. Our initial tests on a test system seem to show that the issue is gone. We're rolling out to the production systems that are using the normal default timeout and will test if the issue is 100% gone then.

Rick

(mitechie)

Daniel Holth

unread,

Apr 20, 2011, 10:09:14 AM4/20/11

to pylons-...@googlegroups.com

I notice you begin building the query with one dbsession which must be a ScopedSession, then call dbsession.remove(), and then execute the query.

It would probably be better to put dbsession.remove() at the top of the function.

Gustavo Narea

unread,

Apr 20, 2011, 5:02:08 PM4/20/11

to pylons-...@googlegroups.com, Daniel Holth

Hello, Daniel.

Thanks, that's a good suggestion! I've just changed it.

Cheers.

On 20/04/11 15:09, Daniel Holth wrote:

I notice you begin building the query with one dbsession which must be a ScopedSession, then call dbsession.remove(), and then execute the query.

It would probably be better to put dbsession.remove() at the top of the function.

--
You received this message because you are subscribed to the Google Groups "pylons-discuss" group.
To post to this group, send email to pylons-...@googlegroups.com.
To unsubscribe from this group, send email to pylons-discus...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en.

-- 
Gustavo Narea <xri://=Gustavo>.
| Tech blog: =Gustavo/(+blog)/tech  ~  About me: =Gustavo/about |

Reply all

Reply to author

Forward