celery and race conditions

214 views
Skip to first unread message

Jonathan Vanasco

unread,
Apr 14, 2014, 5:45:25 PM4/14/14
to sqlal...@googlegroups.com
I just ran into an issue where it looks like I could have race conditions using SqlAlchemy and Celery.  Wondering if anyone here has some ideas.

Here's the scenario:

A1 Process A - Pyramid - Creates SQLalchemy session.
A2 Process A - Pyramid - Creates data
A3 Process A - Pyramid - flushes data
A4 Process A - Pyramid - fires off an async request to Celery
A5 Process A - Pyramid - more operations
A6 Process A - Pyramid - commits.

B1 Process B - Celery - gets async request
B2 Process B - Celery - creates sqlalchemy session
B3 Process B - Celery - starts pulling data from the database
B4 Process B - Celery - starts writing data to the database
B5 Process B - Celery - Commit


The problem I foresee, is that the B series of events could happen between A4 and A6.  By luck, it appears they're not happening until after A6.  There's nothing in my code that should / could be ensuring that.

Has anyone else dealt with this?

Michael Bayer

unread,
Apr 14, 2014, 6:09:25 PM4/14/14
to sqlal...@googlegroups.com
Sure, you need to start your celery process after the commits.

I usually gather up “post-commit” tasks in some kind of list and then iterate through them after the commit to run them.


--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Jonathan Vanasco

unread,
Apr 14, 2014, 7:12:12 PM4/14/14
to sqlal...@googlegroups.com
i've got that now as a stopgap; i was hoping someone has better ideas.  i don't like the idea of a post-commit hook, because i fear requesting the celery task request will create an error.  I really don't want to build `transaction` support for celery, but i might need to.

Wichert Akkerman

unread,
Apr 15, 2014, 4:47:24 AM4/15/14
to sqlal...@googlegroups.com
On 15 Apr 2014, at 01:12, Jonathan Vanasco <jona...@findmeon.com> wrote:

i've got that now as a stopgap; i was hoping someone has better ideas.  i don't like the idea of a post-commit hook, because i fear requesting the celery task request will create an error.  I really don't want to build `transaction` support for celery, but i might need to.

That isn’t an uncommon scenario; I touched upon that in http://www.wiggy.net/articles/task-queues as well. For rq I am using a variant of https://gist.github.com/wichert/10714681 . One extra problem you need to take into account is that you are likely to run into problems when one of the arguments is a SQLAlchemy ORM instance: when your function is later run in another process that instance won’t be associated with the current session, so you need to merge it.

Wichert.

Jonathan Vanasco

unread,
Apr 15, 2014, 12:29:25 PM4/15/14
to sqlal...@googlegroups.com
if i have any time after shipping , i'll probably build in transaction support for celery and pyramid.

I keep away from tossing ORM objects around the system.  GETS are pretty cheap.  

my task arguments are generally:

   int = primary key of ORM object
   dict = "instructions" payload of what to do

for image processing, it's generally

   optimize_images( orm_id , { file_b64=BLOB , selected_resizes=[], } )

celery grabs and uses it's own session, pulling data off the database.  it optimizes the image, archives it to S3 and does the same for the resizes. then updates the database to reflect all the filesizes and locations.

Reply all
Reply to author
Forward
0 new messages