SQLAlchemy connection errors

Matthijs Blaas

unread,

Sep 10, 2014, 10:49:02 AM9/10/14

to sqlal...@googlegroups.com

I'm expiriencing some strange bugs which seem to be caused by connections used by Sqlalchemy, which i can't pin down exactly.. i was hoping someone has a clue whats going on here.

We're working on a Pyramid (version 1.5b1) webapplication and use Sqlalchemy (version 0.9.6) for all our database connectivity. Sometimes we get errors related to the db connection or session, most of the time this would be a "cursor already closed" or "This Connection is closed" error, but we get other related exceptions too:

(OperationalError) connection pointer is NULL

(InterfaceError) cursor already closed

Parent instance <...> is not bound to a Session, and no contextual session is established; lazy load operation of attribute '...' cannot proceed

A conflicting state is already present in the identity map for key (<class '...'>, (1001L,))

This Connection is closed (original cause: ResourceClosedError: This Connection is closed)

(InterfaceError) cursor already closed

Parent instance <...> is not bound to a Session; lazy load operation of attribute '...' cannot proceed

Parent instance <...> is not bound to a Session, and no contextual session is established; lazy load operation of attribute '...' cannot proceed

'NoneType' object has no attribute 'twophase'

(OperationalError) connection pointer is NULL

This session is in 'prepared' state; no further

There is no silver bullet to reproduce them, only by refreshing many times they are bound to happen one at some point. So i made a script using multi-mechanize to spam different urls concurrently and see where and when it happens.

It appears the url triggered doesn't really matter, the errors happen when there are concurrent requests that span a longer time (and other requests get served in between). This seems to indicate there is some kind of threading problem; that either the session or connection is shared among different threads.

After googling for these issues I found a lot of topics, most of them tell to use scoped sessions, but the thing is we do use them already:

db_session = scoped_session(sessionmaker(extension=ZopeTransactionExtension(), autocommit=False, autoflush=False))

db_meta = MetaData()

- We have a BaseModel for all our orm objects:

BaseModel = declarative_base(cls=BaseModelObj, metaclass=BaseMeta, metadata=db_meta)

- We use the pyramid_tm tween to handle transactions during the request

- We hook db_session.remove() to the pyramid NewResponse event (which is fired after everything has run). I also tried putting it in a seperate tween running after pyramid_tm or even not doing it at all, none of these seem to have effect, so the response event seemed like the most clean place to put it.

- We create the engine in our main entrypoint of our pyramid project and use a NullPool and leave connection pooling to pgbouncer. We also configure the session and the bind for our BaseModel here:

engine = engine_from_config(config.registry.settings, 'sqlalchemy.', poolclass=NullPool)

db_session.configure(bind=engine, query_cls=FilterQuery)

BaseModel.metadata.bind = engine

config.add_subscriber(cleanup_db_session, NewResponse)

return config.make_wsgi_app()

- In our app we access all db operation using:

from project.db import db_session

...

db_session.query(MyModel).filter(...)

db_session.execute(...)

- We use psycopg2==2.5.2 to handle the connection to postgres with pgbouncer in between

- I made sure no references to db_session or connections are saved anywhere (which could result in other threads reusing them)

I also tried the spamming test using different webservers, using waitress and cogen i got the errors very easily, using wsgiref we unsurprisingly have no errors (which is singlethreaded). Using uwsgi and gunicorn (4 workers, gevent) i didn't get any errors.

Given the differences in the webserver used, I thought it either has to do with some webservers handling requests in threads and some using new processes (maybe a forking problem)?

But to complicate matters even more, when time went on and i did some new tests, the problem had gone away in waitress but now happened with gunicorn (when using gevent)! I have no clue on how to go debugging this...

Finally, to test what happens to the connection, i attached an attribute to the connection at the start of the cursor execute and tried to read the attribute out at the end of the execute:

@event.listens_for(Engine, "before_cursor_execute")

def _before_cursor_execute(conn, cursor, stmt, params, context, execmany):

conn.pdtb_start_timer = time.time()

@event.listens_for(Engine, "after_cursor_execute")

def _after_cursor_execute(conn, cursor, stmt, params, context, execmany):

print conn.pdtb_start_timer

Surprisingly this sometimes raised an exception: 'Connection' object has no attribute 'pdtb_start_timer'

Which struck me as very strange.. I found one discussion about something similar: https://groups.google.com/d/msg/sqlalchemy/GQZSjHAGkWM/rDflJvuyWnEJ

And tried adding strategy='threadlocal' to the engine, which from what i understand should force 1 connection for the tread. But it didn't have any effect on the errors im seeing.. (besides some unittests failing because i need two different sessions/connections for some tests and this forces 1 connection to be associated)

Does anyone have any idea what might go on here or have some more pointers on how to attack this problem?

Thanks in advance!

Matthijs Blaas

PS: sorry for the crosspost (http://stackoverflow.com/questions/25768428/sqlalchemy-connection-errors), i hope i might catch some experts here that otherwise wouldn't see the stackoverflow post

Jonathan Vanasco

unread,

Sep 10, 2014, 12:18:42 PM9/10/14

to sqlal...@googlegroups.com

1. I had a concern a few months ago when doing an audit of code. I ended up not having an issue, but this link might be of help:

https://groups.google.com/d/topic/sqlalchemy/Z7tMCB_IK1M/discussion

2. I use `add_finished_callback`, which runs unconditionally at the end of every request.

request.add_finished_callback(cleanup_db_session)

Actually, I think this may be caused or exacerbated by your use of `NewRespsone`. I think that event can be triggered multiple times during a single request, and if you generate multiple response objects, only the first one would have an active connection.

Matthijs Blaas

unread,

Sep 11, 2014, 12:30:02 PM9/11/14

to sqlal...@googlegroups.com

Hi Jonathan,

Thanks for your response, i removed the newresponse event like you mentioned and added a finished callback to the request (in a tween): request.add_finished_callback(cleanup_db_session)

Unfortunatly the plot thickens! Since the last time i tested our project, the problem *doesn't* occur anymore using waitress, cogen or gunicorn (in sync mode), they all handle all requests i spam correctly... but it now consistently raise an "(InterfaceError) cursor already closed" exception on ~80% of the requests when using gunicorn (in gevent mode). It actually doesn't seem to make any difference wether i use db_session.remove() or not!

Could gunicorn's gevent mode have any effect on the application layer? Any other ideas?

Matthijs

Matthijs Blaas

unread,

Sep 12, 2014, 6:45:22 AM9/12/14

to sqlal...@googlegroups.com

Ok, i found the issue!

When I tried to use pg8000 to connect to postgres i got an error: "cannot insert multiple commands into a prepared statement".

Apparently multiple queries is fine with psycopg2, but not using pg8000.

We prefix all queries because of a workaround. We handle multi-tenancy using the postgres search_path, but setting the search_path once on an new transaction event resulted in another issue; the search_path is sometimes "lost" during requests! I can't see why exactly, but it appears sometimes during a request another connection is used which doesn't fire a new transaction event.. To workaround this issue, we prefix every query fired with a "set search_path to <customer_schema>; <original query>" in an before_cursor_execute callback.

So if i leave out this search_path prefix query, and monkey patch_all (when using gevent) everything runs without trouble.

But this still leaves the strange issue where the search_path is sometimes suddenly lost during requests.. feeding strategy="threadlocal" might seem relevant here (as it should bind 1 connection to each thread handling a request)? Furthermore i need this behaviour because in some requests i also rely on a for_update lock during the request..

Michael Bayer

unread,

Sep 12, 2014, 10:05:45 AM9/12/14

to sqlal...@googlegroups.com

On Sep 12, 2014, at 6:45 AM, Matthijs Blaas <thijs...@gmail.com> wrote:

> Ok, i found the issue!
>
> When I tried to use pg8000 to connect to postgres i got an error: "cannot insert multiple commands into a prepared statement".
> Apparently multiple queries is fine with psycopg2, but not using pg8000.
>
> We prefix all queries because of a workaround. We handle multi-tenancy using the postgres search_path, but setting the search_path once on an new transaction event resulted in another issue; the search_path is sometimes "lost" during requests! I can't see why exactly, but it appears sometimes during a request another connection is used which doesn't fire a new transaction event.. To workaround this issue, we prefix every query fired with a "set search_path to <customer_schema>; <original query>" in an before_cursor_execute callback.

set the search path within the connection pool using the “connect” event; all connections used by the pool will then have this search path setting the moment they connect. Don’t ever put multiple SQL expressions in a DBAPI execute (for this very reason - driver changes, hilarity ensues. We are having a crisis in Openstack right now because they made the same mistake).

>
> But this still leaves the strange issue where the search_path is sometimes suddenly lost during requests..

no part of your code should be changing search_path - I’d grep for it.

Jonathan Vanasco

unread,

Sep 12, 2014, 12:24:41 PM9/12/14

to sqlal...@googlegroups.com

i have no input on the issues michael identified.

but i'm curious about a sqlalchemy event somehow not firing. which transaction event were you using? after_begin or after_transaction_create ?

Matthijs Blaas

unread,

Sep 12, 2014, 12:26:31 PM9/12/14

to sqlal...@googlegroups.com

> set the search path within the connection pool using the “connect” event; all connections used by the pool will then have this search path setting the moment they connect. Don’t ever put multiple SQL expressions in a DBAPI execute (for this very reason - driver changes, hilarity ensues. We are having a crisis in Openstack right now because they made the same mistake).

Thanks, i'll update our code to use the NullPool's connect event.

> no part of your code should be changing search_path - I’d grep for it.

I did and made sure there is only one place where it's set. But i have good hopes the connect event takes care of it, i haven't been able to reproduce the error anymore.

I'm curious though, i was under the impression that when you start every request in a transaction (pyramid_tm in this case), and set the search_path on every transaction (using the sessions after_begin event), is it even possible that a new connection is triggered somewhere that isn't included in the main transaction?

Thanks!

Matthijs

Michael Bayer

unread,

Sep 12, 2014, 2:04:59 PM9/12/14

to sqlal...@googlegroups.com

SQLAlchemy never does that but I don’t know what you’d be doing in your code, it’s certainly a pattern that is supported and possible, just wouldn’t happen implicitly.

Matthijs Blaas

unread,

Sep 13, 2014, 3:58:29 PM9/13/14

to sqlal...@googlegroups.com

@Jonathan Vanasco:

> i have no input on the issues michael identified.
> but i'm curious about a sqlalchemy event somehow not firing. which transaction event were you using? after_begin or after_transaction_create ?

I used the after_begin event to set the search_path, but moved it into the connect event like Michael suggested.

@Michael Bayer:

> SQLAlchemy never does that but I don’t know what you’d be doing in your code, it’s certainly a pattern that is supported and possible, just wouldn’t happen implicitly.

Ok good to know! Im pretty sure there isnt anything done with connections in our code explicitly, so it probably had something todo with either the session not being cleaned up in the right event hook or the multiple commands in one query.. With the adjustments made i haven't been able to reproduce any of the errors so far, so thanks for the help!

Reply all

Reply to author

Forward