non-deterministic event execution order

48 views
Skip to first unread message

Chad Dombrova

unread,
Aug 6, 2014, 3:46:40 PM8/6/14
to sqlal...@googlegroups.com

Hi all,
I’ve recently discovered that the order in which events are fired for individual instances is not deterministic. For example, if I add several Author instances to an existing Book then commit the changes, the Book‘s post_update may run before or after the Author‘s post_insert, and the same code might produce different results on successive runs.

Here’s some example code (full gist with models is here: https://gist.github.com/chadrik/c3fff520bd41a48f0a57):

@event.listens_for(Book, 'after_update')
def book_after_update(mapper, connection, instance):
    print "after_update", instance, instance.authors

@event.listens_for(Book, 'after_insert')
def book_after_insert(mapper, connection, instance):
    print "after_insert", instance, instance.authors

@event.listens_for(Author, 'after_insert')
def author_after_insert(mapper, connection, instance):
    print "after_insert", instance

def test():
    Base.metadata.create_all(engine)

    book = Book(title='sqlalchemy for dummies')
    session = Session()
    session.add(book)
    session.commit()

    john = Author(name='John Doe')
    jane = Author(name='Jane Doe')
    book.authors = [john, jane]
    book2 = Book(title='sqlalchemy pro tips')
    book2.authors = [john]
    session.add(book2)
    session.commit()

if __name__ == '__main__':
    test()

running test() might print:

after_insert Book(1) []
after_insert Author(1)
after_insert Author(2)
after_insert Book(2) [Author(1)]
after_update Book(1) [Author(1), Author(2)]

or it might print:

after_insert Book(1) []
after_insert Book(2) [Author(None)]
after_update Book(1) [Author(None), Author(None)]
after_insert Author(1)
after_insert Author(2)

it can be tricky to consistently reproduce the behavior. sometimes successive runs change randomly with no intervention, sometimes it requires changing a single line of unrelated code, like a comment (I guess this has something to do with python’s byte-compiling and hashing). sometimes a test that will run consistently in the same order using python -m "sqlbooks" will run consistently in a different order using python -c "import sqlbooks;sqlbooks.test()"

setting __mapper_args__ = {'batch': False} did not help.

it seems that internally sqlalchemy is using unordered containers like dicts or sets when managing the instances passed to these events. my question is: is it possible to provide a meaningful and consistent order to these events? 

ideally, for my case, I’m hoping to have the Author’s post_insert run before the Book’s post_update, such that the Author’s auto-incrementing id’s are available (you’ll notice in the two output blocks that I pasted, that in the second it prints Author(None)) where None should be the integer id. Also note that the relationship between Book and Author is many-to-many and handled by a secondary table, so no data is persisted to the db for the book table. I assume the post_update is purely to notify me that the Book.authors relationship changed, and as such, it would be highly convenient if the items which were added to the relationship were already inserted and had their primary keys.

thanks,
chad

Jonathan Vanasco

unread,
Aug 6, 2014, 4:01:36 PM8/6/14
to sqlal...@googlegroups.com
try calling `session.flush()`

that will "emit sql" and pull insert ids; syncing the SqlAlchemy Session to the database.

Chad Dombrova

unread,
Aug 6, 2014, 5:30:01 PM8/6/14
to sqlal...@googlegroups.com

try calling `session.flush()`

that will "emit sql" and pull insert ids; syncing the SqlAlchemy Session to the database.

In my case, I'm trying to create event handlers that work with existing sqlalchemy code, so modifying all existing code to add flush() statements at strategic points is not an option.  also, calling flush() from inside an event handler seems like a bad idea.  

chad.

Michael Bayer

unread,
Aug 6, 2014, 10:51:56 PM8/6/14
to sqlal...@googlegroups.com
you can’t call flush() within before_insert/before_update/ etc. as you are already inside the flush process.   I think at this point it should know to raise an exception for that action.


On Aug 6, 2014, at 4:01 PM, Jonathan Vanasco <jona...@findmeon.com> wrote:

try calling `session.flush()`

that will "emit sql" and pull insert ids; syncing the SqlAlchemy Session to the database.


--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Chad Dombrova

unread,
Aug 6, 2014, 11:10:44 PM8/6/14
to sqlal...@googlegroups.com
On Wed, Aug 6, 2014 at 7:51 PM, Michael Bayer <mik...@zzzcomputing.com> wrote:
you can’t call flush() within before_insert/before_update/ etc. as you are already inside the flush process.   I think at this point it should know to raise an exception for that action.

I only care about after_insert and after_update. is it ok to call flush there?

 

Michael Bayer

unread,
Aug 6, 2014, 11:15:28 PM8/6/14
to sqlal...@googlegroups.com
On Aug 6, 2014, at 3:46 PM, Chad Dombrova <cha...@gmail.com> wrote:

Hi all,
I’ve recently discovered that the order in which events are fired for individual instances is not deterministic.

to be clear, events are always fired in a completely deterministic order.  In this case however, your events are linked to some particular activities performed by the unit of work which themselves obey an ordering that is not in line with your expectations.

There’s a big green box in the docs for all the before/after insert/update/delete events that tries to make it known that there are caveats to using these events in order to coordinate between different types of instances:  "and in general should not affect any relationship() -mapped attributes, as session cascade rules will not function properly, nor is it always known if the related class has already been handled.”   That’s not to say it can’t be done, that’s just there so that people don’t expect things to “just work” when they try coordinating between mappers at this level.  Usually I recommend things like this be done in before_flush().  But you’re looking for auto generated primary keys, so OK read on...
The actual ordering here is based on a topological sort, and a description on how the unit of work makes use of this is in section 20.9 of http://aosabook.org/en/sqlalchemy.html.

In this case, from SQLAlchemy’s point of view, it only needs to emit INSERT for Book and Author before it INSERTs into “link_books_authors”.   It’s deterministic in this regard, but as far as Book and Author themselves, there is no natural ordering to that, so ultimately it is set/dict ordering that takes effect.

If you’d like to affect the topological sort here, it’s doable by creating a dependency between Book and Author.  But what is blowing me away is that there does not seem to be a public API for this as I really recall doing this for someone.    I could show you something that does it but it would not be stable API.

The bigger question is, if Author’s primary key is needed directly for Book, why isn’t there another relationship() between these two mappings?


Michael Bayer

unread,
Aug 7, 2014, 4:51:26 PM8/7/14
to sqlal...@googlegroups.com
On Aug 6, 2014, at 11:15 PM, Michael Bayer <mik...@zzzcomputing.com> wrote:


The actual ordering here is based on a topological sort, and a description on how the unit of work makes use of this is in section 20.9 of http://aosabook.org/en/sqlalchemy.html.

In this case, from SQLAlchemy’s point of view, it only needs to emit INSERT for Book and Author before it INSERTs into “link_books_authors”.   It’s deterministic in this regard, but as far as Book and Author themselves, there is no natural ordering to that, so ultimately it is set/dict ordering that takes effect.

If you’d like to affect the topological sort here, it’s doable by creating a dependency between Book and Author.  But what is blowing me away is that there does not seem to be a public API for this as I really recall doing this for someone.    I could show you something that does it but it would not be stable API.


Well anyway, it’s not terrible, if you want to play with it.  This API hasn’t changed in a couple of years and nothing is planned, but it isn’t guaranteed:

@event.listens_for(Session, "before_flush")
def setup_author_book_dependency(session, ctx, objects):
    from sqlalchemy.orm.unitofwork import SaveUpdateAll
    ctx.dependencies.add(
        (
            SaveUpdateAll(ctx, Author.__mapper__),
            SaveUpdateAll(ctx, Book.__mapper__)
        )
    )

I’m fine with this kind of thing becoming available via a simpler public API but it would have to be carefully planned so that we aren’t forced to change it at some point down the road.


Then within your test, to force it to randomize the order you can do this:

def test():
    from sqlalchemy.orm.util import randomize_unitofwork
    randomize_unitofwork()
    # .. rest of test

You’ll find that with the above randomize, your test case is very different 50% of the time, unless you add that dependency between Author and Book, in which case it runs identically every time.




Reply all
Reply to author
Forward
0 new messages