Updating pg8000 dialect following new release of pg8000

102 views
Skip to first unread message

Tony Locke

unread,
Aug 31, 2017, 7:02:17 AM8/31/17
to sqlalchemy
There's a new release (1.11.0) of the pg8000 driver for PostgreSQL. It's a pure-python driver, and it already has a dialect for SQLAlchemy. This latest release is not backwardly compatible with the previous release, and I'm trying to modify the dialect accordingly. The main change is that connections and cursors are no longer threadsafe. In DB-API terms it has a threadsafety value of 1 (Threads may share the module, but not connections).

So the first problem I ran into was in the on_connect() method in the dialiect. It referred to the 'unicode' keyword, which caused a problem under Python 3. So I deleted the following:

         def on_connect(conn):
             conn.py_types[quoted_name] = conn.py_types[unicode]
         fns.append(on_connect)

just to get the tests going. The next problem is:

test/aaa_profiling/test_memusage.py::MemUsageWBackendTest_postgresql+pg8000_9_5_8::()::test_alias_pathing

which I think fails due to a threading problem. It seems that somehow the DB-API connection is shared between threads, which isn't supported any more with pg8000. So my question is, is it the case that:

a. DB-API connections must be threadsafe to work with SQLAlchemy.
b. There's a something wrong with the test.
c. Something else.

Thanks for your help!

Mike Bayer

unread,
Aug 31, 2017, 10:15:16 AM8/31/17
to sqlal...@googlegroups.com
On Thu, Aug 31, 2017 at 7:02 AM, Tony Locke <tlo...@tlocke.org.uk> wrote:
> There's a new release (1.11.0) of the pg8000 driver for PostgreSQL. It's a
> pure-python driver, and it already has a dialect for SQLAlchemy. This latest
> release is not backwardly compatible with the previous release, and I'm
> trying to modify the dialect accordingly. The main change is that
> connections and cursors are no longer threadsafe. In DB-API terms it has a
> threadsafety value of 1 (Threads may share the module, but not connections).
>
> So the first problem I ran into was in the on_connect() method in the
> dialiect. It referred to the 'unicode' keyword, which caused a problem under
> Python 3. So I deleted the following:
>
> def on_connect(conn):
> conn.py_types[quoted_name] = conn.py_types[unicode]
> fns.append(on_connect)
>

that was a recent fix for a regression:

https://github.com/zzzeek/sqlalchemy/commit/03560c4b83308719067ec635662c35f9a437fb7f

it is fixed in

https://github.com/zzzeek/sqlalchemy/commit/d0470e296ea589620c94d8f2dd37e94b8f03842a


> just to get the tests going. The next problem is:
>
> test/aaa_profiling/test_memusage.py::MemUsageWBackendTest_postgresql+pg8000_9_5_8::()::test_alias_pathing
>
> which I think fails due to a threading problem. It seems that somehow the
> DB-API connection is shared between threads, which isn't supported any more
> with pg8000. So my question is, is it the case that:
>
> a. DB-API connections must be threadsafe to work with SQLAlchemy.

not at all, SQLAlchemy holds that its own Session and Connection which
ultimately refer to the DBAPI connection are themselves not
threadsafe.

> b. There's a something wrong with the test.

the memusage tests are often omitted from normal testing as they are
extremely process/memory intensive, but they don't spawn any threads.
In the past few weeks the memusage suite has been altered such that
each memusage test is run in a separate *process* however, so there is
some concurrency going on. When the new process is created, the test
makes a new connection pool so that it should not refer to any
database connections that were transferred to the child fork, however
it also doesn't try to close them or anything else - they should be
totally ignored. However if pg8000 is tracking some kind of global
state, like a collection of prepared statements, this state needs to
travel across the process boundary as well without impacting the
parent process even if the child process ends.

The failure can be isolated by doing a pdb like this:

diff --git a/test/aaa_profiling/test_memusage.py
b/test/aaa_profiling/test_memusage.py
index 3181cfe61..ff600b85d 100644
--- a/test/aaa_profiling/test_memusage.py
+++ b/test/aaa_profiling/test_memusage.py
@@ -636,6 +636,8 @@ class MemUsageWBackendTest(EnsureZeroed):
try:
go()
finally:
+ import pdb
+ pdb.set_trace()
metadata.drop_all()
clear_mappers()

This brings me right to a clean state where "next" will produce the
error. Looking at Postgresql processes within the block, there are
no open transactions to the DB. If you pdb right here, you can poke
around to see what state might be present.




> c. Something else.
>
> Thanks for your help!
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+...@googlegroups.com.
> To post to this group, send email to sqlal...@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

Tony Locke

unread,
Sep 1, 2017, 10:57:28 AM9/1/17
to sqlalchemy
Thanks Mike, fetching the latest commits from SQLAlchemy solved the 'unicode' problem.

Looking at the multiprocess problem, the pg8000 driver holds global state for remembering the prepared statements it's created. This means that it fails if multiple processes access it, even if the access is in series and not concurrent. I've done a (temporary?) fix by keying the state on the current process id. So now test/aaa_profiling/test_memusage.py::MemUsageWBackendTest_postgresql+pg8000_9_5_8::()::test_alias_pathing passes. Yay!

So, next problem. The test test/aaa_profiling/test_memusage.py::MemUsageWBackendTest_postgresql+pg8000_9_5_8::test_many_updates fails because pg8000 creates a new prepared statement for each unique sql statement, which entails a creation of object within the driver, and so the memory usage keeps increasing. I've marked this test with @testing.fails_on('postgresql+pg8000', 'prepared statements use memory'), and now all the test test/aaa_profiling/test_memusage.py tests pass.

Now, on with the other tests!

Mike Bayer

unread,
Sep 1, 2017, 11:54:16 AM9/1/17
to sqlal...@googlegroups.com
On Fri, Sep 1, 2017 at 10:57 AM, Tony Locke <tlo...@tlocke.org.uk> wrote:
> So, next problem. The test
> test/aaa_profiling/test_memusage.py::MemUsageWBackendTest_postgresql+pg8000_9_5_8::test_many_updates
> fails because pg8000 creates a new prepared statement for each unique sql
> statement, which entails a creation of object within the driver, and so the
> memory usage keeps increasing. I've marked this test with
> @testing.fails_on('postgresql+pg8000', 'prepared statements use memory'),
> and now all the test test/aaa_profiling/test_memusage.py tests pass.

do you consider that to be a bug in pg8000?

Tony Locke

unread,
Sep 1, 2017, 12:09:50 PM9/1/17
to sqlal...@googlegroups.com
No I don't think it's a bug because pg8000 is designed to always create a prepared statement if one doesn't already exists, and so that entails keeping some data for each prepared statement, steadily increasing memory use. There's a pg8000.max_prepared_statements parameter (set to 1000 by default) which, if exceeded, triggers the closing of all prepared statements. So memory use never goes off to infinity.


> To post to this group, send email to sqlal...@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example.  See  http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to a topic in the Google Groups "sqlalchemy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sqlalchemy/0RpERMhoJcY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sqlalchemy+unsubscribe@googlegroups.com.

Mike Bayer

unread,
Sep 1, 2017, 12:40:23 PM9/1/17
to sqlal...@googlegroups.com
On Fri, Sep 1, 2017 at 12:09 PM, Tony Locke <tlo...@tlocke.org.uk> wrote:
> No I don't think it's a bug because pg8000 is designed to always create a
> prepared statement if one doesn't already exists, and so that entails
> keeping some data for each prepared statement, steadily increasing memory
> use. There's a pg8000.max_prepared_statements parameter (set to 1000 by
> default) which, if exceeded, triggers the closing of all prepared
> statements. So memory use never goes off to infinity.

oh. So if you set that to like 100, the test here would pass for
you, right? the memory use tests try for quite a while to see if
object use tilts down again since limited size caches are common.

Is this value available on the connect() function or through an
environment variable?
>> You received this message because you are subscribed to a topic in the
>> Google Groups "sqlalchemy" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/sqlalchemy/0RpERMhoJcY/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to

Tony Locke

unread,
Sep 5, 2017, 10:28:45 AM9/5/17
to sqlalchemy


On Friday, 1 September 2017 17:40:23 UTC+1, Mike Bayer wrote:
On Fri, Sep 1, 2017 at 12:09 PM, Tony Locke <tlo...@tlocke.org.uk> wrote:
> No I don't think it's a bug because pg8000 is designed to always create a
> prepared statement if one doesn't already exists, and so that entails
> keeping some data for each prepared statement, steadily increasing memory
> use. There's a pg8000.max_prepared_statements parameter (set to 1000 by
> default) which, if exceeded, triggers the closing of all prepared
> statements. So memory use never goes off to infinity.

oh.    So if you set that to like 100, the test here would pass for
you, right?  the memory use tests try for quite a while to see if
object use tilts down again since limited size caches are common.

Is this value available on the connect() function or through an
environment variable?

There's a pg8000.max_prepared_statements module attribute and if that's 100 it passes test_many_updates. Actually, I just stuck max_prepared_statements as a module attribute without giving it much thought, and I should probably put it as a keyword argument to the connect() function. If I were to do that, are you thinking that then I could specify that parameter in the connection string when the test suite is run?

Mike Bayer

unread,
Sep 5, 2017, 11:39:27 AM9/5/17
to sqlal...@googlegroups.com
On Tue, Sep 5, 2017 at 10:28 AM, Tony Locke <tlo...@tlocke.org.uk> wrote:
>
>
> On Friday, 1 September 2017 17:40:23 UTC+1, Mike Bayer wrote:
>>
>> On Fri, Sep 1, 2017 at 12:09 PM, Tony Locke <tlo...@tlocke.org.uk> wrote:
>> > No I don't think it's a bug because pg8000 is designed to always create
>> > a
>> > prepared statement if one doesn't already exists, and so that entails
>> > keeping some data for each prepared statement, steadily increasing
>> > memory
>> > use. There's a pg8000.max_prepared_statements parameter (set to 1000 by
>> > default) which, if exceeded, triggers the closing of all prepared
>> > statements. So memory use never goes off to infinity.
>>
>> oh. So if you set that to like 100, the test here would pass for
>> you, right? the memory use tests try for quite a while to see if
>> object use tilts down again since limited size caches are common.
>>
>> Is this value available on the connect() function or through an
>> environment variable?
>
>
> There's a pg8000.max_prepared_statements module attribute and if that's 100
> it passes test_many_updates. Actually, I just stuck max_prepared_statements
> as a module attribute without giving it much thought, and I should probably
> put it as a keyword argument to the connect() function. If I were to do
> that, are you thinking that then I could specify that parameter in the
> connection string when the test suite is run?

that would be the idea
Reply all
Reply to author
Forward
0 new messages