select with_for_update with nowait=True

445 views

Skip to first unread message

Uri Okrent

unread,

May 28, 2017, 11:16:43 AM5/28/17

to sqlalchemy

Background:

I have a postgres database containing objects with 1-N (parent-child) relationships. Each object can have several of these types of relationships (i.e., more than one parent). These are connected by foreign keys with 'on delete set null' since a child can lose a parent and that is OK. Because cascading updates occur with an undefined order, it's possible for two concurrent parent modifications/removals to deadlock since their updates may cascade to the same children in different order. To avoid this problem I've been using select with_for_update to serialize the modifications to the children.

My issue is, i would like to use nowait=True so that i can programmatically control the number of attempts to make and the time to wait between attempts to obtain all the row locks, before failing the command. As I understand it, nowait=False will potentially block forever, which I don't want. However, if I use nowait=True, I believe a failure to obtain the row locks on the first attempt results in a DBAPIError which invalidates the transaction and forces a rollback--which essentially misses the point--a rollback at this point would undo any previous work the transaction had done which means i would need to fail the command anyway, so reattempting to acquire the row locks no longer makes sense.

I have a feeling I already know the answer to this question (that I should do all my locking upfront, but that is difficult for reasons which I'll get into below). I just wanted to make sure I wasn't missing something, i.e., a way to attempt to lock, without risking invalidating the whole transaction.

Aside...

I have a legacy codebase which was written very object-centric. I've been working on improving performance by introducing many bulk operations. However, I've noticed a recurring pattern which essentially causes me to have two versions of every operation--the old object-centric version, and a new bulk version. I'll illustrate with an example:

eg, remove_object looks something like this:

# DBObject is an imaginary base class that provides SA mappings
class Foo(DBObject):

    def remove(self):
        lock_children(self)
        remove(self)

Removing multiple objects used to look like this:

def remove_objects(obj_list):
    for o in obj_list:
        o.remove()

However, that results in the following execution where the locking children and removal of each object is interleaved which is not ideal:

remove_objects([a, b, c]) ->
lock_children(a)
a.remove()
lock_children(b)
b.remove()
lock_children(c)
c.remove()

ideally, i would of course like to have all the prep-work done up front so execution occurs more like this:

remove_objects([a, b, c]) ->
lock_children(a)
lock_children(b)
lock_children(c)
a.remove()
b.remove()
c.remove()

If I were designing the model today, I probably would do it such that single object modifications are handled as the degenerate case of multiple object updates. Something like this:

class Foo(DBObject):

    @classmethod
    def remove_many(cls, obj_list):
        for o in obj_list:
            lock_children(o)
        for o in obj_list:
            remove(o)

    def remove(self):
        Foo.remove_many([self])

However this seems like a really unintuitive way to have designed it without the benefit of hindsight. Just wondering if you or anyone who happens to be reading the mailing list has had similar scaling-up pains and found some elegant tricks for handling situations like this.

mike bayer

unread,

May 29, 2017, 12:27:51 PM5/29/17

to sqlal...@googlegroups.com

On 05/28/2017 11:16 AM, Uri Okrent wrote:
> Background:
> I have a postgres database containing objects with 1-N (parent-child)
> relationships. Each object can have several of these types of
> relationships (i.e., more than one parent). These are connected by
> foreign keys with 'on delete set null' since a child can lose a parent
> and that is OK. Because cascading updates occur with an undefined
> order, it's possible for two concurrent parent modifications/removals to
> deadlock since their updates may cascade to the same children in
> different order. To avoid this problem I've been using select
> with_for_update to serialize the modifications to the children.
>
> My issue is, i would like to use nowait=True so that i can
> programmatically control the number of attempts to make and the time to
> wait between attempts to obtain all the row locks, before failing the
> command. As I understand it, nowait=False will potentially block
> forever, which I don't want. However, if I use nowait=True, I believe a
> failure to obtain the row locks on the first attempt results in a
> DBAPIError which invalidates the transaction

I'm assuming this "invalidates the transaction" is on the Postgresql
side, e.g. you get "current transaction is aborted". There is a simple
solution for that which is to use a savepoint, which with the ORM is via
session.begin_nested().

http://docs.sqlalchemy.org/en/latest/orm/session_transaction.html#session-begin-nested

The ORM only "forces a rollback" itself within the context of a flush()
which is not the case here.

> I have a feeling I already know the answer to this question (that I
> should do all my locking upfront, but that is difficult for reasons
> which I'll get into below). I just wanted to make sure I wasn't missing
> something, i.e., a way to attempt to lock, without risking invalidating
> the whole transaction.

yup, use SAVEPOINT.

>
>
> Aside...
> I have a legacy codebase which was written very object-centric. I've
> been working on improving performance by introducing many bulk
> operations. However, I've noticed a recurring pattern which essentially
> causes me to have two versions of every operation--the old
> object-centric version, and a new bulk version. I'll illustrate with an
> example:
>
> eg, remove_object looks something like this:
>
> |
> # DBObject is an imaginary base class that provides SA mappings
> classFoo(DBObject):
>
> defremove(self):
> lock_children(self)
> remove(self)
> |
>
> Removing multiple objects used to look like this:
> |
> defremove_objects(obj_list):
> foro inobj_list:
> o.remove()

So this is the active record style pattern, which note the SQLAlchemy
ORM very specifically tries to discourage. Lots of users are very used
to it:

my_object.save()
my_object.delete()

but you're then bundling a very limited notion of "persistence" onto the
object itself and losing the concept of any kind of transactional
grouping of things. Even though SQLAlchemy doesn't include these
methods people want to add them anyway.

> |
>
>
> However, that results in the following execution where the locking
> children and removal of each object is interleaved which is not ideal:
> |
> remove_objects([a,b,c])->
> lock_children(a)
> a.remove()
> lock_children(b)
> b.remove()
> lock_children(c)
> c.remove()
> |
>
> ideally, i would of course like to have all the prep-work done up front
> so execution occurs more like this:
>
> |
> remove_objects([a,b,c])->
> lock_children(a)
> lock_children(b)
> lock_children(c)
> a.remove()
> b.remove()
> c.remove()
> |
>
>
> If I were designing the model today, I probably would do it such that
> single object modifications are handled as the degenerate case of
> multiple object updates.

This is not really part of the design of the "model" as much as the
application layer API that's bolted onto your "model", e.g. a
persistence layer. I'd recommend using standard refactoring techniques
to remove all the top-level persistence methods from your mapped objects
and move into a service layer (E.g. your remove_objects([a, b, c]) layer).

>
> However this seems like a really unintuitive way to have designed it
> without the benefit of hindsight. Just wondering if you or anyone who
> happens to be reading the mailing list has had similar scaling-up pains
> and found some elegant tricks for handling situations like this.

this is something that is addressed by refactoring. You create the
new service layer you want to use, deprecate the old API, and then
slowly move everyone to use the new system. You can also make the old
API methods implicitly make use of the service layer so that everything
goes down through the same codepaths.

>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy+...@googlegroups.com>.
> To post to this group, send email to sqlal...@googlegroups.com
> <mailto:sqlal...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

Uri Okrent

unread,

Jun 5, 2017, 5:52:52 PM6/5/17

to sqlalchemy

On Monday, May 29, 2017 at 12:27:51 PM UTC-4, Mike Bayer wrote:

I'm assuming this "invalidates the transaction" is on the Postgresql
side, e.g. you get "current transaction is aborted". There is a simple
solution for that which is to use a savepoint, which with the ORM is via
session.begin_nested().

http://docs.sqlalchemy.org/en/latest/orm/session_transaction.html#session-begin-nested

Ah, right, SAVEPOINT. Thanks, that's perfect. Just a follow up regarding the behavior. The docs provide the following example:

Session = sessionmaker()
session = Session()
session.add(u1)
session.add(u2)

session.begin_nested() # establish a savepoint
session.add(u3)
session.rollback()  # rolls back u3, keeps u1 and u2

session.commit() # commits u1 and u2

It also says each use of begin_nested() must have a corresponding commit() or rollback(). I got that the rollback in the example only rolls back to the last savepoint (begin_nested()), but what would a commit right after session.add(u3) do? It looks like the non-error case would look like this:

Session = sessionmaker()
session = Session()
session.add(u1)
session.add(u2)

session.begin_nested() # establish a savepoint
session.add(u3)
session.commit()  # ???

session.commit() # ???

what does each commit actually do?

mike bayer

unread,

Jun 6, 2017, 3:18:11 AM6/6/17

to sqlal...@googlegroups.com

On 06/05/2017 05:52 PM, Uri Okrent wrote:
>
>
> On Monday, May 29, 2017 at 12:27:51 PM UTC-4, Mike Bayer wrote:
>
> I'm assuming this "invalidates the transaction" is on the Postgresql
> side, e.g. you get "current transaction is aborted". There is a
> simple
> solution for that which is to use a savepoint, which with the ORM is
> via
> session.begin_nested().
>
> http://docs.sqlalchemy.org/en/latest/orm/session_transaction.html#session-begin-nested
> <http://docs.sqlalchemy.org/en/latest/orm/session_transaction.html#session-begin-nested>
>
>
>
> Ah, right, SAVEPOINT. Thanks, that's perfect. Just a follow up
> regarding the behavior. The docs provide the following example:
>
> |
> Session=sessionmaker()
> session =Session()
> session.add(u1)
> session.add(u2)
>
> session.begin_nested()# establish a savepoint
> session.add(u3)

> session.rollback()# rolls back u3, keeps u1 and u2
>
> session.commit()# commits u1 and u2

> |
>
> It also says each use of begin_nested() must have a corresponding
> commit() or rollback(). I got that the rollback in the example only
> rolls back to the last savepoint (begin_nested()), but what would a
> commit right after session.add(u3) do? It looks like the non-error case
> would look like this:

the commit corresponding to begin_nested will emit RELEASE SAVEPOINT.

> |
> Session=sessionmaker()
> session =Session()
> session.add(u1)
> session.add(u2)
>
> session.begin_nested()# establish a savepoint
> session.add(u3)
> session.commit()# ???
>
> session.commit()# ???