confused by results when using QueuePool, pool_size=1, max_overflow=0, pool_timeout=0

1,473 views
Skip to first unread message

Jonathon Nelson

unread,
May 12, 2015, 9:34:32 AM5/12/15
to sqlal...@googlegroups.com
I'm working on an application where I want to use one and only one connection, even in the face of errors.
Originally I used AssertionPool, but it seemed to misbehave sometimes in the face of a disconnection.
Switching to QueuePool, I somewhat surprisingly got the same result.

This is how I perform my test. Using the code below, I start it with a URI and wait for it to tell me to stop postgresql, which I then do.
I hit enter, and get an (expected) error as it fails to connect.
When I hit enter a second time, I get an *unexpected* traceback, a different one for QueuePool as for AssertionPool, but unexpected nonetheless.
If I up the pool_size to 2, everything works fine but it leaves open the possibility of having more than one connection.



#! /usr/bin/python
import sys
import sqlalchemy as sa

def handle_checkout_event(dbapi_con, con_record, con_proxy):
    try:
        with dbapi_con.cursor() as cur:
            cur.execute("SELECT 1")
            cur.fetchone()
    except Exception, e:
        raise sa.exc.DisconnectionError()

def main():
    uri = sys.argv[1]
    #engine = sa.create_engine(uri, poolclass=sa.pool.AssertionPool)
    engine = sa.create_engine(uri, pool_size=1, max_overflow=0, pool_timeout=0, pool_recycle=3600)
    sa.event.listen(engine, 'checkout', handle_checkout_event)

    with engine.begin() as conn:
        pass
    print "Now stop PG."
    raw_input('-=>')

    # should get an operational error
    try:
        with engine.begin() as conn:
            pass
        raw_input('-=>')
    except sa.exc.OperationalError, e:
        print >>sys.stderr, "Got an (expected) error: ", e

    raw_input('-=!')

    # *should* get the same thing.
    try:
        with engine.begin() as conn:
            pass
        raw_input('-=>')
    except sa.exc.OperationalError, e:
        print >>sys.stderr, "Got an (expected) error: ", e

    print "We never get here."
    raw_input('-=!')



if __name__ == '__main__':
    main()

Mike Bayer

unread,
May 12, 2015, 10:28:46 AM5/12/15
to sqlal...@googlegroups.com


On 5/12/15 9:34 AM, Jonathon Nelson wrote:
I'm working on an application where I want to use one and only one connection, even in the face of errors.
Originally I used AssertionPool, but it seemed to misbehave sometimes in the face of a disconnection.
Switching to QueuePool, I somewhat surprisingly got the same result.

This is how I perform my test. Using the code below, I start it with a URI and wait for it to tell me to stop postgresql, which I then do.
I hit enter, and get an (expected) error as it fails to connect.
When I hit enter a second time, I get an *unexpected* traceback, a different one for QueuePool as for AssertionPool, but unexpected nonetheless.
it would be super helpful if you could share what this traceback is.

send both tracebacks, because "it fails to connect" is also not expected - the connection is already pooled so it should be a stale connection error.



--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Jonathon Nelson

unread,
May 12, 2015, 10:43:05 AM5/12/15
to sqlal...@googlegroups.com
On Tue, May 12, 2015 at 9:28 AM, Mike Bayer <mik...@zzzcomputing.com> wrote:


On 5/12/15 9:34 AM, Jonathon Nelson wrote:
I'm working on an application where I want to use one and only one connection, even in the face of errors.
Originally I used AssertionPool, but it seemed to misbehave sometimes in the face of a disconnection.
Switching to QueuePool, I somewhat surprisingly got the same result.

This is how I perform my test. Using the code below, I start it with a URI and wait for it to tell me to stop postgresql, which I then do.
I hit enter, and get an (expected) error as it fails to connect.
When I hit enter a second time, I get an *unexpected* traceback, a different one for QueuePool as for AssertionPool, but unexpected nonetheless.
it would be super helpful if you could share what this traceback is.

Sure. Below find *two* runs (split on '*****************'), the first being the one with AssertionPool and the latter with QueuePool.


*****************
Now stop PG.
-=>
Got an (expected) error:  (psycopg2.OperationalError) could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

-=!
No handlers could be found for logger "sqlalchemy.pool.AssertionPool"
Traceback (most recent call last):
  File "t3.py", line 48, in <module>
    main()
  File "t3.py", line 36, in main
    with engine.begin() as conn:
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1903, in begin
    conn = self.contextual_connect(close_with_result=close_with_result)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2034, in contextual_connect
    self._wrap_pool_connect(self.pool.connect, None),
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2069, in _wrap_pool_connect
    return fn()
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 376, in connect
    return _ConnectionFairy._checkout(self)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 708, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 480, in checkout
    rec = pool._do_get()
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 1257, in _do_get
    raise AssertionError("connection is already checked out" + suffix)
AssertionError: connection is already checked out at:
  File "t3.py", line 48, in <module>
    main()
  File "t3.py", line 26, in main
    with engine.begin() as conn:

*****************
Now stop PG.
-=>
Got an (expected) error:  (psycopg2.OperationalError) could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

-=!
No handlers could be found for logger "sqlalchemy.pool.QueuePool"
Traceback (most recent call last):
  File "t3.py", line 48, in <module>
    main()
  File "t3.py", line 36, in main
    with engine.begin() as conn:
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1903, in begin
    conn = self.contextual_connect(close_with_result=close_with_result)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2034, in contextual_connect
    self._wrap_pool_connect(self.pool.connect, None),
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2069, in _wrap_pool_connect
    return fn()
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 376, in connect
    return _ConnectionFairy._checkout(self)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 708, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 480, in checkout
    rec = pool._do_get()
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 1042, in _do_get
    (self.size(), self.overflow(), self._timeout))
sqlalchemy.exc.TimeoutError: QueuePool limit of size 1 overflow 0 reached, connection timed out, timeout 0
*****************


I hope this is helpful.

Mike Bayer

unread,
May 12, 2015, 11:06:57 AM5/12/15
to sqlal...@googlegroups.com


On 5/12/15 9:34 AM, Jonathon Nelson wrote:
your script is holding onto state that the QueuePool requires in order to check the connection back in, namely the SQLAlchemy OperationalError and exception context that is attached to the stack trace.

Add this after the first error throw:

    sys.exc_clear()

now the next invocation works as expected.






if __name__ == '__main__':
    main()

Jonathon Nelson

unread,
May 14, 2015, 12:43:09 PM5/14/15
to sqlal...@googlegroups.com

And indeed you are correct. However, this comes as quite a surprise to me.
As an idiom:

try:
    do_stuff()
except SomeError, e:
    make_some_noise(e)
    maybe_return_here_etc

is a pretty common idiom. I tried deleting 'e' within the except block without any luck.
In fact, I did a search for sys.exc_clear() and found almost literally no code making use of that call.
I suggest that this is a bit of POLA violation (principle of least astonishment). Is this due to how SQLAlchemy holds on to some measure of state? Should it maybe be using copies or weakrefs? Or, is this more of a situation that only as a result of the specific configuration choices made above? (in which case I contend that it's still a POLA violation, but not one that is as likely to bite people)


Mike Bayer

unread,
May 14, 2015, 2:13:55 PM5/14/15
to sqlal...@googlegroups.com


On 5/14/15 12:43 PM, Jonathon Nelson wrote:

And indeed you are correct. However, this comes as quite a surprise to me.
As an idiom:

try:
    do_stuff()
except SomeError, e:
    make_some_noise(e)
    maybe_return_here_etc

is a pretty common idiom. I tried deleting 'e' within the except block without any luck.
it's the traceback object itself that is holding onto the ConnectionFairy object that was the subject of the error, this is not in fact even related to any contextual objects related to the SQLAlchemy exception itself.   The traceback is held within sys.exc_info().



In fact, I did a search for sys.exc_clear() and found almost literally no code making use of that call.

this particular failure only occurs if about five different conditions are in play at the same time.  After ten years, you're the first to hit them all simultaneously (and report it): 1.checkout event handler is in use,  2. pool is set to size 1 w no overflow, 3. database server is shut off so that reconnects cannot occur, 4. stack trace object is not cleared from memory, 5. identical operation is attempted again


I suggest that this is a bit of POLA violation (principle of least astonishment).

you are correct, though when the database is down totally and a connection attempt throws an OperationalError, people are usually not expecting much from subsequent connection attempts.   The condition vanishes very easily, such as the typical case that the error was caught within a function, which then exits before another connection attempt occurs.   So this is a very unique corner case.


Is this due to how SQLAlchemy holds on to some measure of state?
it is due to the mechanics of a Python stack trace object holding onto stack frames each of which refer strongly to the contents of locals() within each frame.



Should it maybe be using copies or weakrefs?
funny thing is that because the pool does use weakrefs to ensure things are cleaned up, this condition is very hard to trap since it vanishes once the stack trace does.



Or, is this more of a situation that only as a result of the specific configuration choices made above? (in which case I contend that it's still a POLA violation, but not one that is as likely to bite people)

it's because a method called ConnectionRecord.get_connection() is responsible for calling down to the database connection process, and if this method fails within the "checkout" process, it will raise an exception that just propagates all the way out, while the record itself is not actually returned to the connection pool, unless exceptions here are reraised only after setting the record back into a checked in state.   This occurs at a well known point during a normal checkout, however it is called in one extra place in the specific case that a checkout handler is in use which itself raises a DisconnectionError.    A bug has been added at https://bitbucket.org/zzzeek/sqlalchemy/issue/3419/connection-record-not-immediately-checked to indicate this should be added and has been fixed for 1.0.5 in 4a0e51e7d2f3.  Thanks for reporting!
Reply all
Reply to author
Forward
0 new messages