Rollback raises "connection already closed"

Wolfgang Schnerring

unread,

Apr 23, 2012, 4:41:52 AM4/23/12

to sqlal...@googlegroups.com

Hello,

I'm seeing a somewhat puzzling traceback, in this scenario:

- The database connection goes away (due to networking issues)
- A transaction is committed
- This fails (of course), thus tries to rollback
- The rollback fails, too, (of course) complaining that the
"connection already closed". Is that intentional?

The full traceback is attached, as well as a minimal reproduction test
case using tcpwatch to interrupt the connection. (Tested with postgres
9.1)

While I realize that on the one hand, yes, the rollback failed so
sqlalchemy should complain, I feel on the other hand that this doesn't
really help me much from a pragmatic point of view: since the commit
didn't go through, there should be no harm in the rollback failing
(right?), so why complain?

Thanks for any insights,
Wolfgang

--
Wolfgang Schnerring · w...@gocept.com · Software development
gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany
http://gocept.com · Tel +49 345 219401-0
Python, Pyramid, Plone, Zope · consulting, development, hosting, operations

test_rollback.py

rollback-traceback.txt

Michael Bayer

unread,

Apr 23, 2012, 10:43:57 AM4/23/12

to sqlal...@googlegroups.com

On Apr 23, 2012, at 4:41 AM, Wolfgang Schnerring wrote:

> Hello,
>
> I'm seeing a somewhat puzzling traceback, in this scenario:
>
> - The database connection goes away (due to networking issues)
> - A transaction is committed
> - This fails (of course), thus tries to rollback
> - The rollback fails, too, (of course) complaining that the
> "connection already closed". Is that intentional?
>
> The full traceback is attached, as well as a minimal reproduction test
> case using tcpwatch to interrupt the connection. (Tested with postgres
> 9.1)
>
> While I realize that on the one hand, yes, the rollback failed so
> sqlalchemy should complain, I feel on the other hand that this doesn't
> really help me much from a pragmatic point of view: since the commit
> didn't go through, there should be no harm in the rollback failing
> (right?), so why complain?
>
> Thanks for any insights,

I will note that I'm looking at 0.7.6 here, I see you're on 0.6.7 where there are in fact issues with rollback() being hit in some situations so I'd advise at least upgrading to 0.7.6. The issue is specific to the prepare() call which is only invoked if you use the twophase=True flag, and was fixed as ticket 2317 in release 0.7.4.

Assuming current release, the rollback is not called if a disconnect situation is detected on the connection.

However, even then, in this specific case, the erroneous behavior is due to the fact that the test itself, as presented, is illustrating what is possibly an unrealistic use case. The actual error raised by the usage of tcpwatch here, then cutting it off, is:

SystemError: null argument to internal routine

which I've never seen before. This is being raised by psycopg2 directly, and technically constitutes a bug on psycopg2's part, as all errors raised by a DBAPI must be a subclass here of psycopg2.Error, which SystemError is not. SQLAlchemy intercepts all dbapi.Error instances and detects if they constitute a "disconnect", which fails here as SystemError fails that check.

If I hardcode a positive set of the flag in response to this specific error:

diff -r 5c7bea55f040 lib/sqlalchemy/engine/base.py
--- a/lib/sqlalchemy/engine/base.py Thu Apr 19 12:31:15 2012 -0400
+++ b/lib/sqlalchemy/engine/base.py Mon Apr 23 10:39:40 2012 -0400
@@ -1818,6 +1818,9 @@
is_disconnect = isinstance(e, self.dialect.dbapi.Error) and \
self.dialect.is_disconnect(e, self.__connection, cursor)

+ if not is_disconnect:
+ if isinstance(e, SystemError) and str(e) == "null argument to internal routine":
+ is_disconnect = True

if is_disconnect:
self.invalidate(e)

then the behavior proceeds as it should.

So there's no bug in current releases of SQLAlchemy, save for not handling what seems to be a bug in psycopg2. Assuming your real-world use case is not dependent on manipulation via tcpdump then upgrading is the solution.

> Wolfgang
>
> --
> Wolfgang Schnerring · w...@gocept.com · Software development
> gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany
> http://gocept.com · Tel +49 345 219401-0
> Python, Pyramid, Plone, Zope · consulting, development, hosting, operations
>

> --
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To post to this group, send email to sqlal...@googlegroups.com.
> To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
>
> <rollback-traceback.txt><test_rollback.py>

Wolfgang Schnerring

unread,

Apr 24, 2012, 3:12:15 AM4/24/12

to sqlal...@googlegroups.com

Hello Mike,

thank you very much for your detailed analysis!

* Michael Bayer <mik...@zzzcomputing.com> [2012-04-23 16:43]:

> On Apr 23, 2012, at 4:41 AM, Wolfgang Schnerring wrote:
>> - The database connection goes away (due to networking issues)
>> - A transaction is committed
>> - This fails (of course), thus tries to rollback
>> - The rollback fails, too, (of course) complaining that the
>> "connection already closed". Is that intentional?
>

> However, even then, in this specific case, the erroneous behavior is due to the
> fact that the test itself, as presented, is illustrating what is possibly an
> unrealistic use case. The actual error raised by the usage of tcpwatch here,
> then cutting it off, is:

I've seen this error in production, intermittently but persistently.
I do not know what causes it there (I suspect some kind of abnormal
restart of the database or somesuch, but that's pure speculation).
What I do know is that I can provoke the exact same behaviour (read:
traceback) with the tcpdump method.

> I will note that I'm looking at 0.7.6 here, I see you're on 0.6.7 where there
> are in fact issues with rollback() being hit in some situations so I'd advise
> at least upgrading to 0.7.6. The issue is specific to the prepare() call which
> is only invoked if you use the twophase=True flag, and was fixed as ticket 2317
> in release 0.7.4.

Sorry, that was sloppy on my part. The system I'm talking about is
indeed running 0.6.7, but 0.7.6 behaves the same way with the synthetic test.

> Assuming current release, the rollback is not called if a disconnect situation
> is detected on the connection.
>

> This is being raised by psycopg2 directly, and technically constitutes
> a bug on psycopg2's part, as all errors raised by a DBAPI must be a
> subclass here of psycopg2.Error, which SystemError is not.
>

> So there's no bug in current releases of SQLAlchemy, save for not handling what
> seems to be a bug in psycopg2. Assuming your real-world use case is not
> dependent on manipulation via tcpdump then upgrading is the solution.

So, if I understand you correctly,
1. sqlalchemy indeed detects disconnect situations on rollback
2. psycopg2 should signal those correctly
3. my synthetic test may not be appropriate

Re (1): Does that apply in 0.6.7 already or has that changed
significantly in 0.7?
(I'd like to avoid upgrading needlessly, since the application depends
on the 0.6 event system and other intricate details, so it'd be painful.)

Re (2+3): Do you happen to have an idea how to get hold of the "real"
error cause (via logging or somesuch) without patching sqlalchemy and/or
psycopg2? Or, heck, even with patching, for that matter.

Thanks,

Michael Bayer

unread,

Apr 24, 2012, 9:21:54 AM4/24/12

to sqlal...@googlegroups.com

On Apr 24, 2012, at 3:12 AM, Wolfgang Schnerring wrote:

>
> So, if I understand you correctly,
> 1. sqlalchemy indeed detects disconnect situations on rollback
> 2. psycopg2 should signal those correctly
> 3. my synthetic test may not be appropriate
>
> Re (1): Does that apply in 0.6.7 already or has that changed
> significantly in 0.7?

there's a bug still in 0.6.7 where the rollback_prepared() call is not going to appropriately check first that the "disconnect" flag was set, so the bug will persist in 0.6 unless we backport the fix.

as of 0.7.4 you should not see this issue in production, assuming the disconnects here are being raised by psycopg2 in the ways that we expect (that is, particular subclasses/messages of dbapi.Error that we know to detect).

> (I'd like to avoid upgrading needlessly, since the application depends
> on the 0.6 event system and other intricate details, so it'd be painful.)

then we'd have to look into backporting the fixes from ticket 2317 into the 0.6 series.

>
> Re (2+3): Do you happen to have an idea how to get hold of the "real"
> error cause (via logging or somesuch) without patching sqlalchemy and/or
> psycopg2? Or, heck, even with patching, for that matter.

patching is most straightforward for the moment. try the one attached.

2317_06_backport.patch

Michael Bayer

unread,

Apr 24, 2012, 10:57:52 AM4/24/12

to sqlal...@googlegroups.com

On Apr 24, 2012, at 9:21 AM, Michael Bayer wrote:

>
> patching is most straightforward for the moment. try the one attached.

I've backported this fix to the 0.6 series. The as-yet-unreleased 0.6.9 is available directly from mercurial: http://hg.sqlalchemy.org/sqlalchemy/archive/rel_0_6.tar.gz

Matteo Boscolo

unread,

Apr 24, 2012, 11:14:36 AM4/24/12

to sqlal...@googlegroups.com

Il 24/04/2012 16:57, Michael Bayer ha scritto:

On Apr 24, 2012, at 9:21 AM, Michael Bayer wrote:

patching is most straightforward for the moment.    try the one attached.

Witch one ?
do I have to install it from the following link ? or you missed the attach ?

I've backported this fix to the 0.6 series.  The as-yet-unreleased 0.6.9 is available directly from mercurial: http://hg.sqlalchemy.org/sqlalchemy/archive/rel_0_6.tar.gz

--

OmniaSolutions
Matteo Boscolo
Technical Manager

Cell. +39.327.1525045
Fax. +39.041.8623730

te...@omniasolutions.eu

Skype: matteo.boscolo

Wolfgang Schnerring

unread,

Apr 25, 2012, 3:01:15 AM4/25/12

to sqlal...@googlegroups.com

* Michael Bayer <mik...@zzzcomputing.com> [2012-04-24 15:21]:

> On Apr 24, 2012, at 3:12 AM, Wolfgang Schnerring wrote:
>> So, if I understand you correctly,
>> 1. sqlalchemy indeed detects disconnect situations on rollback
>> 2. psycopg2 should signal those correctly
>> 3. my synthetic test may not be appropriate
>>
>> Re (1): Does that apply in 0.6.7 already or has that changed
>> significantly in 0.7?
>
> there's a bug still in 0.6.7 where the rollback_prepared() call is not
> going to appropriately check first that the "disconnect" flag was set,
> so the bug will persist in 0.6 unless we backport the fix.

Ah, got it. Thanks.

> as of 0.7.4 you should not see this issue in production, assuming the
> disconnects here are being raised by psycopg2 in the ways that we
> expect (that is, particular subclasses/messages of dbapi.Error that we
> know to detect).

Since the production system doesn't feature a monkey throwing a
tcpwatch-wrench, I think that's a fair assumption. :)

>> (I'd like to avoid upgrading needlessly, since the application depends
>> on the 0.6 event system and other intricate details, so it'd be painful.)
>
> then we'd have to look into backporting the fixes from ticket 2317 into the 0.6 series.

> [later]

> I've backported this fix to the 0.6 series.

Thanks! I appreciate this a lot!

I didn't mean to cause you trouble/busywork here, since I guess me having
to upgrade would be entirely appropriate, considering that 0.8 is coming
up, making 0.6 look kinda old (not "positively ancient" yet, but still ;).