> I was getting some strange transaction isolation behavior with
> SQLAlchemy (0.7.2), psycopg2 (2.4.2), and PostgreSQL 8.4. In order to
> investigate I wrote up a usage sequence that does this:
>
> 1. starts a transaction with a session (s1)
> 2. starts another transaction with session (s2)
> 3. updates a value in s1 and commits it
> 4. reads the value back in s2 using the ORM...
> - and it does not get the updated value, although with READ COMMITED
> it should
> 5. reads the value back in s2 using a direct s2.execute statement...
> - and it DOES get the updated value (??)
>
> I don't understand why the ORM-triggered read (which does emit SQL) is
> not getting the update value, but the direct statement is getting the
> update.
>
> When the logger emits SQL I thought it always sent the SQL to the
> database. Is this a correct assumption?
>
> Here is my complete (and somewhat verbose) test code that shows the
> behaviour...
> http://static.inky.ws/syn/325
Thanks for the clear example case. So yeah this is basic identity map stuff. On line 88, "s2c1 =", you'll find that the s2c1 you are getting there is the same object as the one you loaded on line 77:
s2c1_a = s2.query(Counter).filter_by(name = "C1").one()
assert s2c1 is s2c1_a
So suppose you're in a series of steps, and you loaded s2c1, and then as a matter of course you looked at s2c1.count, saw that it was "1", then continued doing things based on that assumption. If a concurrent transaction were to come in from underneath and change "1" to "222" while you were still in your ongoing operation, you might be pretty upset (in this case you wouldn't be, but consider the implications if it were across the board...especially on a non-isolated system like MySQL).
It's only at commit() time that information like this is reconciled, and from there it flows that the ORM, since it is just a Python library, keeps things simple and behaves as though transactions are perfectly isolated, and this means that no unexpired state on any object is overwritten.
The mapped attributes can't reasonably behave as "pure" proxies to the absolute state of a row in the database at all times - it wouldn't be feasible to emit SELECT statements on every access or otherwise do some kind of "live cursors everywhere" approach. In the absence of that, SQLAlchemy can never really know if a concurrent, less than purely isolated transaction has done something to that attribute which may be visible in the current transaction. So we demarcate the point of transaction end, that is commit() or rollback(), as the point when everything is expired unconditionally, and we are again willing to "look" at what's changed.
This should not be construed to mean that you should only use the ORM with SERIALIZABLE isolation. It's instead just something to be aware of. You can of course expire any object or individual attribute at any time. In this case, if you were to add s2.expire(s2c1) on line 88, you'd then get the "222" value on the next check as it would emit a SELECT.
>
> The postgres engine string obviously needs to be changed
> appropriately. And WATCH OUT for the drop tables code in there for
> anyone who tries this, if nutty enough to point at a live database.
>
> What is going on? I expect/hope it is something dumb on my end, but I
> just don't see it.
>
> Thanks!
>
> --
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To post to this group, send email to sqlal...@googlegroups.com.
> To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
>
Thanks for clarifying with a full answer!> This should not be construed to mean that you should only use the ORM> with SERIALIZABLE isolation. It's instead just something to be aware of.> You can of course expire any object or individual attribute at any time. In> this case, if you were to add s2.expire(s2c1) on line 88, you'd then get the> "222" value on the next check as it would emit a SELECT.But it does seem to emit a select! The output surrounding the line 88 query is below:Querying s2 again (whose transaction is still open) to see what it gets for c1...SQL >> SELECT counter.id AS counter_id, counter.name AS counter_name, counter.count AS counter_countSQL >> FROM counterSQL >> WHERE counter.name = %(name_1)sSQL >> {'name_1': 'C1'}s2 gets C1.count = 1... so what were those SQL emmissions about? They did not seem to arrive at the database (based on the result, anyway).
when you say query(Class).filter(<criterion>).one(), that always emits SQL. It's an open-ended query, and SQLalhcemy doesn't know that s2c1 is the object which corresponds to the SQL you're about to emit. When the row comes back, it then extracts the primary key from the incoming row, determines that identity is already in the identity map and is unexpired, and the row is skipped
(that's the disconnect here - incoming rows are not used if the existing identity exists and is unexpired).
If a concurrent transaction were to come in from underneath and change "1" to "222" while you were still in your ongoing operation, you might be pretty upset
if OTOH you had said query(Class).get(pk), that would pull from the identity map directly and not even hit the DB in this case.
when you say query(Class).filter(<criterion>).one(), that always emits SQL. It's an open-ended query, and SQLalhcemy doesn't know that s2c1 is the object which corresponds to the SQL you're about to emit. When the row comes back, it then extracts the primary key from the incoming row, determines that identity is already in the identity map and is unexpired, and the row is skippedIt ignores new data? I'm clearly going to need to be even more careful about this when I expect 'read committed' behaviour from the database.(that's the disconnect here - incoming rows are not used if the existing identity exists and is unexpired).Is there a global, or per session or something, way to change this ignore-new-data behavior (aside from manually expiring objects prior to querying for data?). Or even more brutal, can the identity map feature be turned off altogether on a per session basis?
For those cases when 'read committed' and the associated transaction-transaction contamination is actually the desired behavior, having to manually tell SQLAlchemy to expire everything all the time so that it doesn't ignore new incoming data is quite an extra layer to worry about on top of the the already treacherous concurrency issues around 'read committed'. I always knew the identity map was something to be wary of with concurrency since it is effectively another layer of isolation on top of what the DB is set up to do, but didn't consider the fact that new data that is read would be ignored.From earlier you said:If a concurrent transaction were to come in from underneath and change "1" to "222" while you were still in your ongoing operation, you might be pretty upset
But I shouldn't be! That is exactly how 'read committed' is supposed to behave.
If I don't want that, I should be setting "repeatable read" or "serializable". Although I guess it's true that most developers don't quite know about transaction isolation levels (I sure didn't until fairly recently) and might be upset with SQLAlchemy at the first layer, if it did happen to them...