SQLAlchemy 0.7.9 breaks SQLAlchemy-ORM-Tree flush behavior

87 views
Skip to first unread message

Mark Friedenbach

unread,
Oct 2, 2012, 2:14:14 PM10/2/12
to sqlal...@googlegroups.com
SQLAlchemy 0.7.9 seems to have broken SQLAlchemy-ORM-tree (http://pypi.python.org/pypi/SQLAlchemy-ORM-tree/). Specifically, SQLAlchemy-ORM-Tree has a dependency on flush behavior prior to the fix for #2566. I'm currently investigating a way to detect (and ignore) the 2nd flush.

But more generally I'm wondering what motivated #2566 in the first place? It has huge compatibility implications for anyone doing tricky things with flush and insert/update/delete events.

Michael Bayer

unread,
Oct 2, 2012, 3:07:53 PM10/2/12
to sqlal...@googlegroups.com
On Oct 2, 2012, at 2:14 PM, Mark Friedenbach wrote:

SQLAlchemy 0.7.9 seems to have broken SQLAlchemy-ORM-tree (http://pypi.python.org/pypi/SQLAlchemy-ORM-tree/). Specifically, SQLAlchemy-ORM-Tree has a dependency on flush behavior prior to the fix for #2566. I'm currently investigating a way to detect (and ignore) the 2nd flush.

But more generally I'm wondering what motivated #2566 in the first place? It has huge compatibility implications for anyone doing tricky things with flush and insert/update/delete events.

#2566 was a serious issue.  If dirty state remains in the session after a flush(), then calling commit() had the effect that *two COMMITs are emitted*, meaning, one COMMIT, then a brand new transaction, then another one, and then any dirty state left by that second commit would just be garbage.   The commit() call flushes all remaining dirty state and it is essential that the session is clean after a commit occurs.    2566's fix should only effect when commit() is called.

Feel free to send me a test case showing a valid usage that is broken by this change.


Mark Friedenbach

unread,
Oct 2, 2012, 3:49:03 PM10/2/12
to sqlal...@googlegroups.com
Well it's hard to boil it down to a specific test case, as it affects the underlying assumptions that went into the design of ORM-tree. Here's an explanation of what I'm doing, and perhaps you can tell me if I'm (ab)using the API correctly:

The meat of the code is a mapper extension whose insert, update, and delete hooks execute SQL expressions directly to update the nested-interval tree parameters. For efficiency I use the SQL expression layer and then manually update the working set of ORM objects to reflect the new state.

In essence:

    connection.execute(...update in sql expression language...)
    for obj in session.identity_map:
        ...same update, as python...

(The session.identity_map is accessible to the mapper extension because it was tucked away as a hidden attribute in the object in a session extension before_flush handler.)

As far as I can tell, the update to the session objects is now triggering a 2nd flush, even though the purpose of the update was to refresh the objects with their current database values. On the 2nd flush the SQL expression updates get executed again, resulting in a corrupt database.

Any red flags in what I'm doing?


--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To post to this group, send email to sqlal...@googlegroups.com.
To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.

Mark Friedenbach

unread,
Oct 2, 2012, 3:51:27 PM10/2/12
to sqlal...@googlegroups.com
I forgot to mention, this is picked up by SQLAlchemy-ORM-tree's unit tests, if you want to see the failure:

cd sqlalchemy-orm-tree && make check

Michael Bayer

unread,
Oct 2, 2012, 4:05:57 PM10/2/12
to sqlal...@googlegroups.com

On Oct 2, 2012, at 3:49 PM, Mark Friedenbach wrote:

> Well it's hard to boil it down to a specific test case, as it affects the underlying assumptions that went into the design of ORM-tree. Here's an explanation of what I'm doing, and perhaps you can tell me if I'm (ab)using the API correctly:
>
> The meat of the code is a mapper extension whose insert, update, and delete hooks execute SQL expressions directly to update the nested-interval tree parameters. For efficiency I use the SQL expression layer and then manually update the working set of ORM objects to reflect the new state.
>
> In essence:
>
> connection.execute(...update in sql expression language...)
> for obj in session.identity_map:
> ...same update, as python...
>
> (The session.identity_map is accessible to the mapper extension because it was tucked away as a hidden attribute in the object in a session extension before_flush handler.)
>
> As far as I can tell, the update to the session objects is now triggering a 2nd flush, even though the purpose of the update was to refresh the objects with their current database values. On the 2nd flush the SQL expression updates get executed again, resulting in a corrupt database.
>
> Any red flags in what I'm doing?

Basically, when the flush is finished, session.dirty, session.new and session.deleted should be empty. If you add a quick after_flush_postexec() event, and inside the event assert that these are all empty, then you shouldn't get any subsequent flush.

If session.dirty is acquiring state due to attribute set events, as I see a lot of setattr() going on, you can set attributes without establishing history using set_committed_value(): http://docs.sqlalchemy.org/en/rel_0_7/orm/session.html#sqlalchemy.orm.attributes.set_committed_value . However, if that's in fact what's happening, I might need to look into that because the "dirty" state of attributes should be unconditionally reset before the flush completes. I just did a test to confirm that, so I'd be curious to see a test (unless I can come up with one here) that illustrates plain setattr() of a column-based attribute causing a second flush. In theory it would be only .new and .dirty that might have state after the flush completes.


Michael Bayer

unread,
Oct 2, 2012, 4:08:45 PM10/2/12
to sqlal...@googlegroups.com

On Oct 2, 2012, at 4:05 PM, Michael Bayer wrote:

> In theory it would be only .new and .dirty that might have state after the flush completes.

correction, ".new and .deleted" lists that might have any state.


Mark Friedenbach

unread,
Oct 2, 2012, 4:28:36 PM10/2/12
to sqlal...@googlegroups.com
Indeed, session.dirty is non-empty within after_flush_postexec().

I'm working on a fix for sqlalchemy-orm-tree first before I can think about doing a (smaller) regression test. Besides, it now occurs to me that in some cases I might be setting attributes on objects in the session but outside of the flush plan (child nodes of a parent that gets moved around, for example). That could legitimately cause an undesired 2nd flush. `set_committed_value` looks like what I want to fix that.

Michael Bayer

unread,
Oct 2, 2012, 4:38:01 PM10/2/12
to sqlal...@googlegroups.com
oh, I know what you're doing, you're modifying the attributes of objects that aren't even involved - so yes, the flush normally doesn't go finding those, and set_committed_value() would be your workaround for now.

However, I can modify flush to do this "reset" for everything that's in "dirty", rather than just what it knows to have changed.    I'd have to think about this as I'm not sure it's appropriate.

Mark Friedenbach

unread,
Oct 2, 2012, 6:03:37 PM10/2/12
to sqlal...@googlegroups.com
I'm not 100% sure that's what was going on (it was hard to get a debug shell open at the moment of the error, as opposed to the assertion much later), but switching those setattr() calls to set_committed_value() certainly fixed it.

I admit I don't understand the session internals well enough to grok what you're saying about resetting session.dirty or it being a deeper bug I've uncovered. If I can be of any assistance with that please let me know. Regardless, for this project it is no longer an issue as objects are no longer getting dirtied by setattr() calls in the mapper extension.

Thanks for your help in pointing me down the right track; I'm pushing out a new version of SQLAlchemy-ORM-tree momentarily.

Cheers,
Mark
Reply all
Reply to author
Forward
0 new messages