On 5/14/15 12:43 PM, Jonathon Nelson
wrote:
And indeed you are correct. However, this comes as quite a
surprise to me.
As an idiom:
try:
do_stuff()
except SomeError, e:
make_some_noise(e)
maybe_return_here_etc
is a pretty common idiom. I tried deleting 'e' within the
except block without any luck.
it's the traceback object itself that is holding onto the
ConnectionFairy object that was the subject of the error, this is
not in fact even related to any contextual objects related to the
SQLAlchemy exception itself. The traceback is held within
sys.exc_info().
In fact, I did a search for sys.exc_clear() and found
almost literally no code making use of that call.
this particular failure only occurs if about five different
conditions are in play at the same time. After ten years, you're
the first to hit them all simultaneously (and report it): 1.checkout
event handler is in use, 2. pool is set to size 1 w no overflow, 3.
database server is shut off so that reconnects cannot occur, 4.
stack trace object is not cleared from memory, 5. identical
operation is attempted again
I suggest that this is a bit of POLA violation (principle
of least astonishment).
you are correct, though when the database is down totally and a
connection attempt throws an OperationalError, people are usually
not expecting much from subsequent connection attempts. The
condition vanishes very easily, such as the typical case that the
error was caught within a function, which then exits before another
connection attempt occurs. So this is a very unique corner case.
Is this due to how SQLAlchemy holds on to some measure of
state?
it is due to the mechanics of a Python stack trace object holding
onto stack frames each of which refer strongly to the contents of
locals() within each frame.
Should it maybe be using copies or weakrefs?
funny thing is that because the pool does use weakrefs to ensure
things are cleaned up, this condition is very hard to trap since it
vanishes once the stack trace does.
Or, is this more of a situation that only as a result of
the specific configuration choices made above? (in which case
I contend that it's still a POLA violation, but not one that
is as likely to bite people)
it's because a method called ConnectionRecord.get_connection() is
responsible for calling down to the database connection process, and
if this method fails within the "checkout" process, it will raise an
exception that just propagates all the way out, while the record
itself is not actually returned to the connection pool, unless
exceptions here are reraised only after setting the record back into
a checked in state. This occurs at a well known point during a
normal checkout, however it is called in one extra place in the
specific case that a checkout handler is in use which itself raises
a DisconnectionError. A bug has been added at
https://bitbucket.org/zzzeek/sqlalchemy/issue/3419/connection-record-not-immediately-checked
to indicate this should be added and has been fixed for 1.0.5 in
4a0e51e7d2f3. Thanks for reporting!