updated ibm_db_sa for SQLAlchemy 0.7 / 0.8

54 views
Skip to first unread message

Michael Bayer

unread,
Jan 28, 2013, 12:40:54 PM1/28/13
to ibm...@googlegroups.com
Hi all -

At the request of a client I've taken a look at the current 0.7 branch of ibm_db_sa and have produced an updated version compatible with 0.7 and 0.8, making use of SQLAlchemy's new external dialect test suite.   The code exists as a github fork at the moment and includes pep-8ing all the code as well as greatly reducing a lot of constructs and duplication that's no longer needed.

The fork is available at https://github.com/zzzeek/ibm_db_sa .   I'd like to know what approach can be taken such that this code can be published as a package on Pypi, either merging it with the existing project for IBM to continue to maintain, or creating a new project such as "sqlalchemy-db2", where myself and others in the Github community could handle maintaining it.

- mike

Mario Briggs

unread,
Jan 28, 2013, 11:49:28 PM1/28/13
to ibm_db
Hi Michael,

This is really great and all i can say is our sincere appreciation to
you. The CLA for ongoing contributors is here -
http://code.google.com/p/ibm-db/wiki/Contributions, if you can print
it out, sign and then send us a scanned copy of it via email, that is
good enough.

On the github request, i am pasting your response on the original
email from Gennadiy here, take forward that discussion

>>
>> About github, since google code supports it, shouldnt just migrating to it
>> be possible ?

>Would you want to migrate the code.google.com SVN repo to an in-place git repo ? Then the code could remain on google and github could just have a mirror of it (or vice versa). What I'd do there is to wrap up all >the changes I've made since the SVN version into a pull request against the new repository.
Yes.

>But the other part of this is whether the ibm_db_sa library itself would be broken out from the "trunk" into it's own repository. If you were to move to git entirely, the usual layout would be that each of ibm_db, >ibm_db_django, and ibm_db_sa would each be their own git repositories.
About the breaking out from trunk, we have been wanting to do it
anyways especially after Jaimy became a ongoing contributor. And its
ok that moving to git will force this.

>Having the code on git basically makes it much easier for users to develop their own bug fixes and feature additions (via forking + pull requests) as well as for users to make local adjustments if needed.
We concur

thanks in advance
Mario

Mario Briggs

unread,
Jan 29, 2013, 2:22:25 PM1/29/13
to ibm_db
Michael,

on a related note, we were chasing down huge memory consumption
problems with Jaimy's version of the adapter when it is used with
OpenStack. One pointer was that since we dont get calls to close the
cursor in the DBI layer and in the DBI layer we hold the cursor refs
in the connection object (to handle navigation from the connection),
so using this pointer we stopped holding those refs in a custom build
of the DBI wrapper and when we reran, saw that this did help reduce
the memory consumption (but stills lot more to chase down).

Another quick test we did was use the adapter from 'https://github.com/
zzzeek/ibm_db_sa', but this gave us more memory consumption problems.
Given you round of cleanup recently, did the aspect of reference
releasing strike up ?

thanks in advance
Mario

Michael Bayer

unread,
Jan 29, 2013, 2:37:55 PM1/29/13
to ibm...@googlegroups.com
SQLAlchemy's usage of the cursor includes that it gets closed explicitly in all circumstances except where a user has fetched a ResultProxy with rows pending and has not fetched all the rows. So an application that makes sure result proxies are exhausted fully (which is always the case with an ORM-oriented application), there shouldn't be any cursors left open.

In the case where a ResultProxy is left hanging open with an open cursor, when it gets garbage collected, the DBAPI should ideally be releasing this cursor. Pep249 sort of implies this here: http://www.python.org/dev/peps/pep-0249/#Cursor.close "Close the cursor now (rather than whenever __del__ is called).".

With all of that said, if all you're seeing is "memory consumption" and you don't actually know what the cause is, I will note that we had an important memory related fix as of SQLAlchemy 0.7.8, which impacted the use of SQLAlchemy's C extensions in conjunction with a DBAPI that delivers rows as an object that is not a pure Python tuple - pyodbc was the primary culprit here but it's possible ibm_db is doing this also. So if you note that removing the C extensions and/or upgrading to at least 0.7.8 fixes the issue, then it was that bug. In particular this bug would not be seen by the usual method of counting gc.get_objects(), as it was a C reference count issue.

To debug a simple in-Python leak, observing the size of gc.get_objects() will reveal if Python objects are not being released. But you'd want to observe this growth for at least a few thousand objects as there is at least one LRU cache used in the ORM which will cause a limited period of growth that then subsides periodically.
> --
> You received this message because you are subscribed to the Google Groups "ibm_db" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to ibm_db+un...@googlegroups.com.
> To post to this group, send email to ibm...@googlegroups.com.
> Visit this group at http://groups.google.com/group/ibm_db?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Mario Briggs

unread,
Jan 30, 2013, 9:21:04 AM1/30/13
to ibm_db
Michael,

thanks for the pointers, Rahul is looking further into it. Openstack
is already in 0.7.8 and even upgraded to 0.7.9, but didnt make a diff.

Anyhow coming to release plans, i thought it is fair that i update you
of the plans given we have the signed CLA from you (thanks again).
Plan would be to run the entire sqlalchemy test suite to see where we
stand, hoping Rahul can get to that after this memory consumption
issue. I know you did mention that it already passes schema generation/
reflection, INSERT mechanics, basic SQL statements, and basic data
types like strings, unicode, and dates.

On a related note, saw that you change the class name IBM_DBCompiler
=> DB2TypeCompiler etc. Some of the IBM openstack folks patched
sqlaclhemy-migrate for DB2 and used those class names, so changes for
them.... i guess that was not such a good practice anyway.

thanks
Mario

Michael Bayer

unread,
Jan 30, 2013, 11:41:05 AM1/30/13
to ibm...@googlegroups.com

On Jan 30, 2013, at 9:21 AM, Mario Briggs wrote:

>
> Anyhow coming to release plans, i thought it is fair that i update you
> of the plans given we have the signed CLA from you (thanks again).
> Plan would be to run the entire sqlalchemy test suite to see where we
> stand, hoping Rahul can get to that after this memory consumption
> issue. I know you did mention that it already passes schema generation/
> reflection, INSERT mechanics, basic SQL statements, and basic data
> types like strings, unicode, and dates.

part of the rationale for the "compliance suite" was that it is designed to test specific contracts of dialects on an individual basis. Outside of that suite, the majority of SQLAlchemy tests, like all the ORM tests and such, are emphasizing the testing of SQLAlchemy internals and usage contracts - these tests often call upon various capabilities that some dialects just don't have. So it's challenging to get the full expanse of all the ORM tests and such to run for all dialects. In theory, once the "compliance suite" is mature enough (it's halfway there), there won't be any need to get the expanse of ORM tests to run for most dialects. So if the ibm_db dialect is failing a lot across many ORM tests, those are likely not bugs, just things that the DB2 dialect doesn't support.



>
> On a related note, saw that you change the class name IBM_DBCompiler
> => DB2TypeCompiler etc. Some of the IBM openstack folks patched
> sqlaclhemy-migrate for DB2 and used those class names, so changes for
> them.... i guess that was not such a good practice anyway.

OK I fixed up all the names but what I can do, when I re-patch for the git transition, is to ensure that all the old names are still present, including the "ibm_db://" engine prefix and such. The naming scheme I used is the standard scheme that is used across all dialects, which is:

base.py: <Prefix><Component>
<dbapi>.py: <Prefix><Component>_<dbapi>

such as:

DB2Compiler, DB2Compiler_pyodbc

etc.

that scheme should be present at this point in most/all of the dialects in sqlalchemy/dialects.


Mario Briggs

unread,
Jan 31, 2013, 1:21:36 AM1/31/13
to ibm_db
>>
So if you note that removing the C extensions
<<
Can you elaborate a little by what you mean by removing C extensions.
Since the ibm_db driver is a C extension

thanks
Mario

Michael Bayer

unread,
Jan 31, 2013, 10:14:20 PM1/31/13
to ibm...@googlegroups.com
ah SQLAlchemy has its own C extensions, if you install like this:

python setup.py install --without-cextensions

they won't build....or alternatively locate the .so files where SQLAlchemy installed itself and delete them.

though if you're on 0.7.8 or greater it's unlikely this is the culprit.
Reply all
Reply to author
Forward
0 new messages