Practical ways for dealing with concurrency problems?

1,547 views
Skip to first unread message

Russell Warren

unread,
Nov 9, 2010, 11:51:17 PM11/9/10
to sqlalchemy
I've got several questions about dealing with concurrency issues in
SQLAlchemy. To try and help with the asking, I'll use a small example
similar to a situation I've got in an SQLAlchemy application. In this
application, there is a table that tracks objects and their order/
position, something like below...

Name Pos
---- ---
ObjA 1
ObjB 2
ObjC 3

And this can change to the following with a new object inserted
"before" (position-wise) ObjB...

Name Pos
---- ---
ObjA 1
ObjB 3
ObjC 4
NewObj 2

Some sample code is whipped up to do this is at the bottom of this
post.

Worth noting is that my application is currently database-agnostic
(SQLAlchemy enabling this is a great feature), and I'd like to keep it
that way if at all possible.

My questions revolve around concurrency issues because the insert
operation can potentially update all rows in the table. It seems that
this is very vulnerable to concurrency problems since each transaction
will only update its own rows that existed at the start of the
transaction. If a parallel transaction is started doing a similar
insertion, a record can be potentially get a duplicate position
number, which would destroy the application logic.

I'm much more of an application person than a DB person, so to solve
this my immediate thinking is to simply do it all at the application
layer and use a thread lock on the table updating function. All table
access is routed through the application (which uses scoped sessions)
so this would certainly "work", but I'm certain that those more
familiar with sqla and the DB layer would likely cringe at that
approach.

I really want to learn more about dealing with concurrency at the
database level through sqlalchemy. What is a good way to deal with
this in sqla?

One possibility seems to be to get it all done in one atomic commit.
But how can I use the ORM (together with the expression language?) to
increment by 1 all positions where the position value is > than the
position for a given object name? For a single value, the docs show
me that can get the proper SQL emitted for a record with something
like:

obj.position = obj.__table__.c.position + 1

where obj is one of my mapped objects. But how do I do it for
multiple rows and not just the one obj? Also - is going through
__table__ the most direct way for achieving this?

After the application layer blocking, the next most brutal thing that
comes to mind is locking the entire table during the operation to
avoid problems. I'm not clear this will work since it seems a read
lock is needed, and I don't yet know if you can actually do that. It
may be irrelevant, since I certainly do not see any way of doing table
locks through sqla at the moment. If locking were an option, it seems
it would require direct (potentialy DB specifc) SQL statement
execution through session.execute() or conn.execute(). Is that
correct?

Something that I've seen referenced a few times for dealing with
concurrency in sqlalchemy is to use "with_lockmode()" on a query to
lock the relevant rows. However - I'm wary of this for a few
reasons: First is that I don't think it would work in my example case
because read blocking is needed (I think). Second is that I can't
actually figure out what with_lockmode actually does. Digging in the
code I see the lock options are "read", "update", "update_nowait".
Short of finding this in query.py, where are the docs for these
options?? The third bit of wariness is uncertain support for
different dialects. Digging through my 'SQL in a Nutshell' I see
varying DB support for "SELECT FOR UPDATE" syntax (which is what
with_lockmode seems to employ?) across various RDMS so I'd rather stay
away from it if possible. What is the best way to figure out the
database dialect support for each of these options, short of trying
with all potential database types?

Although I'm certainly interested in specific ideas on dealing with
something similar to the table update described, any links or tips
anyone has for dealing with concurrency in SQLAlchemy would be
appreciated. This includes any expansions anyone has on how the UOW
pattern and ORM transactions alone help with some forms of
concurrency, as I know they do to some extent.

Thanks,
Russ

## sample code with concurrency issue...

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Table, Column, Integer, String, MetaData
from sqlalchemy import create_engine
from pprint import pprint
engine = create_engine('sqlite:///:memory:', echo = False)
metadata = MetaData()

from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind = engine)

Base = declarative_base()
class OrderedObject(Base):
__tablename__ = 'objects'

id = Column(Integer, primary_key = True)
name = Column(String)
position = Column(Integer)

def __init__(self, name, name_to_bump = ""):
self._ormSess = Session() #does embedding this scare anyone?
self.name = name
if name_to_bump:
self._InsertBefore(name_to_bump)
else:
#Count below may be inaccurate with concurrent users!
newPos = self._ormSess.query(OrderedObject).count() + 1
self.position = newPos
self._ormSess.add(self)
self._ormSess.commit()

def _InsertBefore(self, name_to_bump):
#This (potentially) modifies all rows in the table. Need to
watch out
#for concurrency issues...
self._ormSess = Session()
allObjects = self._ormSess.query(OrderedObject).all()
pos = 0
for obj in allObjects:
pos += 1
if obj.name == name_to_bump:
self.position = pos
pos += 1
obj.position = pos

def __repr__(self):
return "<Object('%s' at %d)>" % (self.name, self.position)

Base.metadata.create_all(engine)

sess = Session()
initialObjects = [OrderedObject("Object_%s" % chr(65 + i)) for i in
range(5)]
print "Initial Objects..."
print "------------------"
pprint(sess.query(OrderedObject).order_by(OrderedObject.position).all())

OrderedObject("BEFORE_B", "Object_B")
print "\nBumped Objects..."
print "------------------"
pprint(sess.query(OrderedObject).order_by(OrderedObject.position).all())

Michael Bayer

unread,
Nov 10, 2010, 9:52:31 AM11/10/10
to sqlal...@googlegroups.com
Hi Russell -

There's a lot going on here, so I'll point you to a few things.


1. the "nested sets" example, which also has the need to do "update table set x=x+1" in order to make room for new entries, thats in the distro at examples/nested_sets/nested_sets.py . That example uses mapper extensions and stuff for a very specific use case which is probably more elaborate than what you have here, the main thing you're looking at there are the UPDATE statements. You definitely want to be using a single UPDATE to increment all the subsequent rows, however.

2. with_lockmode() is a gateway to "SELECT..FOR UPDATE". If you are selecting the rows first before updating them, this is a way to "lock" them, and it does in effect do "read" locking against other similar transactions, since they would be calling "SELECT..FOR UPDATE" as well and therefore would be locked out. PG's docs have the best explanation: http://www.postgresql.org/docs/8.1/static/sql-select.html#SQL-FOR-UPDATE-SHARE

3. you need to be mindful of transaction isolation mode, and the implications thereof. READ_COMMITTED and SERIALIZABLE are common. Take a look at MySQL's and PG's docs on this to get a feel for them. SQLA allows this to be set via a parameter to create_engine().

4. when you use the ORM and the Session, you're in a transaction. If you want to work with straight SQL statements in that transaction, you just use the execute() method. http://www.sqlalchemy.org/docs/orm/session.html#using-sql-expressions-with-sessions

hope this helps you get started.

> --
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To post to this group, send email to sqlal...@googlegroups.com.
> To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
>

Russell Warren

unread,
Nov 10, 2010, 11:09:32 AM11/10/10
to sqlalchemy
Thanks very much, Michael. I'll be looking into each point in detail.

Regarding the nested sets example, I've been scouring it since it is
much closer to my real use case which is hierarchical data. As tends
to happens more often than I like, I ended up finding out that I
basically re-invented an existing pattern (nested sets), albeit in a
slightly different form. I think in ordered treelists with indent
levels instead of nests, so my nodes have 'position', and
'position_of_bottom_leaf' instead of 'left' and 'right'. But after
looking into the nested sets pattern, I'm pretty sure they net the
same darned thing (verdict still out). I do think "bottom_leaf" is
much more intuitive to understand than twisting our perspectives to
match "right", but whatever. If I had seen nested sets in advance I
could have saved myself some brain ache for figuring out how to manage
my hierarchies!!

Anyway...

Sticking with the crude list bumping example that ends up doing a
multi-row update, and using some sql syntax like in your nested sets
example, my _InsertBefore method ends up working looking like this:

def _InsertBefore(self, name_to_bump):
objects = self.__table__
insertLoc = self._ormSess. \
query(OrderedObject.position). \
filter_by(name = name_to_bump). \
subquery().as_scalar()
#set the position of the new object...
# - decrement of 1 is odd, but is a result of the scalar
insertLoc,
# which will have been bumped by the time this object is flushed
self.position = insertLoc - 1
#and bump all subsequent object positions...
update = objects.update(objects.c.position >= insertLoc). \
values(position = objects.c.position + 1)
self._ormSess.execute(update)

The "insertLoc - 1" oddity is there because of me trying to save on an
emitted SQL statement by using subquery().as_scalar(). Although I
suppose it is not needed from a concurrency perspective since I'm in a
transaction and I can be sure no one else is updating it.

I think I'm closer to getting it right down at the database level
versus a high level abomination retrieving the entire table to achieve
an update. Can I get any lower, or are there any other remaining
abominations?

Russ
Reply all
Reply to author
Forward
0 new messages