Warning: asynchronous index update vs. django transaction

Danny Adair

unread,

Mar 13, 2012, 12:53:03 AM3/13/12

to django-...@googlegroups.com

On Tue, Mar 13, 2012 at 14:43, Danny Adair <danny...@unfold.co.nz> wrote:
>[...]
> The transaction for myinstance.save() hasn't released the database lock yet,
> while the celery task triggered by post_save() is already querying the values.

That's exactly what's happening, and is a big fat warning for anyone
with asynchronous index updates.
The nasty thing is there won't be errors, your index will just quietly
have stale records lying around...

Two options which I _don't_ think solve the problem:

1. Put a delay into the task so that the transaction with save() and
post_save signal handling can finish.
a) how long is long enough? If ever it's not long enough the index
will quietly get updated with pre-update data, i.e. leave stale
records
b) a long-running request that changes lots of objects (example:
1000) inside its transaction would lead to the transaction not being
committed for possibly quite a while (example: 10 seconds). That has
two problems: The delay given to the Task needs to be long enough to
let such a long-running process finish (you'd need the "worst case"
delay, in the example: 10 seconds) - that might be a lot more than 10
seconds to be on the safe side - but also, each task (spawned for each
instance.save()) will now get delayed by that time (in the example
1000*10 seconds= 10,000 seconds) and leave the index out-of-date for
quite a while. If you need accurate search results, "waiting a while"
will not do the trick. Any tasks from subsequent changes will go to
the back of the line and worsen the situation.

2. Hook into a "post commit" signal.
a) Such a signal doesn't exist in Django and requires a
monkeypatch such as
https://github.com/davehughes/django-transaction-signals
b) For the purpose of indexing it still doesn't help much - the
transaction can contain multiple objects getting changed, how to
identify the "sender"?

Therefore I'm currently playing with the following idea. Granted it's
postgresql specific but I guess other databases have similar
facilities:
3. Pass the transaction id to the task and let it wait until that
transaction is closed:
a) inside the signal handler for post_save(), get the current
transaction id with raw sql: "SELECT txid_current()" returns e.g.
24291528
b) when instantiating the Task, also pass it the transaction id
(not just app_name, model_name, pk)
c) inside the Task, don't start the index update until the locks
for that transaction have disappeared (which will happen either by
rollback or commit): "select count(*) from pg_locks where
transactionid=24291528;" - just loop until the lock(s) have
disappeared, then update the index.
d) This could get improved: Maybe with a timeout value that aborts
the update. Also maybe differentiate between committed and rolled back
transaction. I personally don't find it too problematic: an additional
index update from a rolled back transaction doesn't hurt.

How do others handle this problem?
Were you aware of this problem?
Do you think 3. is a viable approach - how else could this be done?

Cheers,
Danny

--
Kind regards,

Danny W. Adair
Director
Unfold Limited
New Zealand

Talk: +64 - 9 - 9555 101
Fax: +64 - 9 - 9555 111
Write: danny...@unfold.co.nz
Browse: www.unfold.co.nz
Visit/Post: 253 Paihia Road, RD 2, Kawakawa 0282, New Zealand

"We are what we repeatedly do. Excellence, then, is not an act but a habit."

==============================
Caution
The contents of this email and any attachments contain information
which is CONFIDENTIAL to the recipient. If you are not the intended
recipient, you must not read, use, distribute, copy or retain this
email or its attachments. If you have received this email in error,
please notify us immediately by return email or collect telephone call
and delete this email. Thank you. We do not accept any
responsibility for any changes made to this email or any attachment
after transmission from us.
==============================

Danny Adair

unread,

Mar 13, 2012, 1:26:34 AM3/13/12

to django-...@googlegroups.com

On Tue, Mar 13, 2012 at 17:53, Danny Adair <danny...@unfold.co.nz> wrote:
>[...]

> 2. Hook into a "post commit" signal.
> a) Such a signal doesn't exist in Django and requires a
> monkeypatch such as
> https://github.com/davehughes/django-transaction-signals
> b) For the purpose of indexing it still doesn't help much - the
> transaction can contain multiple objects getting changed, how to
> identify the "sender"?

4. Read django-transaction-signals properly and wrap the Task
instantiation in a defer() as described.

Cheers,
Danny

Phill Tornroth

unread,

Mar 13, 2012, 11:58:31 AM3/13/12

to django-...@googlegroups.com

I built a post_commit and post_rollback signal and I publish all of my updates on it. I can throw the code into a gist if that'd be helpful to anyone.

Fortunately, there are errors on a save (since the id doesn't exist yet), so I caught it early and dealt with it.

Not sure how others deal with it. I think it's nuts that post_commit isn't a part of Django already.

Phill

--
You received this message because you are subscribed to the Google Groups "django-haystack" group.
To post to this group, send email to django-...@googlegroups.com.
To unsubscribe from this group, send email to django-haysta...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-haystack?hl=en.

Danny Adair

unread,

Mar 14, 2012, 2:10:05 AM3/14/12

to django-...@googlegroups.com

On Wed, Mar 14, 2012 at 04:58, Phill Tornroth <famous...@gmail.com> wrote:
> I built a post_commit and post_rollback signal and I publish all of my
> updates on it. I can throw the code into a gist if that'd be helpful to
> anyone.

There's quite a few of those signals around as monkeypatches.
The part I didn't get at first is to just wrap what you want executed
at the end into something like this defer() which connects a freshly
wrapped function to the new post_commit instead of executing it:
https://github.com/davehughes/django-transaction-signals/blob/master/django_transaction_signals/__init__.py#L172

Very comfy!

Cheers,
Danny

Danny Adair

unread,

Mar 14, 2012, 2:13:37 AM3/14/12

to django-...@googlegroups.com

On Wed, Mar 14, 2012 at 19:10, Danny Adair <danny...@unfold.co.nz> wrote:
>[...] wrap what you want executed

> at the end into something like this defer() which connects a freshly
> wrapped function to the new post_commit instead of executing it:
> https://github.com/davehughes/django-transaction-signals/blob/master/django_transaction_signals/__init__.py#L172

Spawning of print jobs, sending of emails... there's a few scenarios
where this comes in handy.

Cheers,
Danny

Reply all

Reply to author

Forward