Have I found a bug with MSSQL/PyODBC identity maps?

Simon Haines

unread,

Oct 5, 2011, 10:13:10 PM10/5/11

to sqlal...@googlegroups.com

I'm fairly new to SQLAlchemy so I want to double-check this before filing what I think is a bug against SQLAlchemy 0.7.

Consider a plain, declarative-mapped object:

class Table(Base):

__tablename__ = 'table'

id = Column(Integer, primary_key=True, nullable=False)

Now consider code to retrieve this item:

function test_identity_map():

db=Session() # scoped_session(sessionmaker(...))

for x in range(5):

print "%d: %s" % (x, db.query(Table).get(1))

If I create a MSSQL engine with a connection string prefix of 'mssql+pyodbc://', set 'echo' to True, populate the table with a single item with an id of '1' and call the test function, five separate SQL statements are echoed to the console. However, if I repeat the process using the SQLite in-memory engine ('sqlite:///:memory:'), only one SQL statement is echoed to the console.

I believe the mssql/pyodbc engine is not caching the item in the identity map. Could it be that the behaviour of 'echo=True' when supplied to create_engine differs between engine types, such that SQL is always echoed for the mssql engine, and the item is retrieved from the identity map anyway (I haven't checked whether the calls are actually emitted to the database, but the timing information leads me to believe they are)? Or could there be some other configuration setting I've overlooked? Could it be that I'm using FreeTDS on linux for the ODBC layer? Stores/retrievals otherwise work just fine. I've looked through the dialect code and can't find anything that might affect this. Am I missing something? I'm going to dig into the session identity map code now to see what I can find, but I'd appreciate it if anyone already knows where I'm going wrong.

Many thanks,

Simon.

Michael Bayer

unread,

Oct 6, 2011, 9:44:16 AM10/6/11

to sqlal...@googlegroups.com

On Oct 5, 2011, at 10:13 PM, Simon Haines wrote:

> I'm fairly new to SQLAlchemy so I want to double-check this before filing what I think is a bug against SQLAlchemy 0.7.
>
> Consider a plain, declarative-mapped object:
>
> class Table(Base):
> __tablename__ = 'table'
> id = Column(Integer, primary_key=True, nullable=False)
>
> Now consider code to retrieve this item:
>
> function test_identity_map():
> db=Session() # scoped_session(sessionmaker(...))
> for x in range(5):
> print "%d: %s" % (x, db.query(Table).get(1))
>
> If I create a MSSQL engine with a connection string prefix of 'mssql+pyodbc://', set 'echo' to True, populate the table with a single item with an id of '1' and call the test function, five separate SQL statements are echoed to the console. However, if I repeat the process using the SQLite in-memory engine ('sqlite:///:memory:'), only one SQL statement is echoed to the console.
>
> I believe the mssql/pyodbc engine is not caching the item in the identity map.

The implementation for the identity map is agnostic of backend.

There are two things that come to mind that could specifically cause this behavior.

One is, the identifier of "1" as an integer does not actually match the datatype received back from the MSSQL database - such as if the database returned a string "1", that would cause the get() above to not locate integer "1" in the cache.

The other is, "x" is garbage collected in between each iteration of the loop, and by the time get() is called again, the underlying state has fallen out of the Session. The Session does not strongly reference items that have no pending changes, but ultimately relies upon weakref callbacks to implement the fact that "1" is no longer present in the identity map. The fact that accessing MSSQL over PyODBC is orders of magnitude slower than a local SQLite database may give it the time to garbage collect in one case that's not in the other.

> Could it be that the behaviour of 'echo=True' when supplied to create_engine differs between engine types, such that SQL is always echoed for the mssql engine, and the item is retrieved from the identity map anyway

if you were doing filter_by(id=1).first(), then yes, but get() specifically checks the cache first and will not emit a SELECT if the item is found.

> (I haven't checked whether the calls are actually emitted to the database, but the timing information leads me to believe they are)? Or could there be some other configuration setting I've overlooked? Could it be that I'm using FreeTDS on linux for the ODBC layer? Stores/retrievals otherwise work just fine. I've looked through the dialect code and can't find anything that might affect this. Am I missing something? I'm going to dig into the session identity map code now to see what I can find, but I'd appreciate it if anyone already knows where I'm going wrong.

in the echo=True, pay close attention to the formatting of the bound parameters - setting echo='debug' will display result rows as they come in. Make sure the "1" in the result row is of the same type as the one bound to the SELECT, and also assign each "x" above to a local list.

Simon Haines

unread,

Oct 12, 2011, 12:37:20 AM10/12/11

to sqlal...@googlegroups.com

Thanks Michael,

On Friday, October 7, 2011 12:44:16 AM UTC+11, Michael Bayer wrote:

There are two things that come to mind that could specifically cause this behavior.
One is, the identifier of "1" as an integer does not actually match the datatype received back from the MSSQL database - such as if the database returned a string "1", that would cause the get() above to not locate integer "1" in the cache.

This was exactly the issue. Many thanks for your help with this--apologies for not getting back to you sooner.

Simon.

Reply all

Reply to author

Forward