Proposal for a transaction.on_before_commit

155 views
Skip to first unread message

Raphael Michel

unread,
Oct 10, 2021, 9:55:44 AM10/10/21
to django-d...@googlegroups.com
Hi everyone,

I've spent some days thinking about a use case we have internally where
we want to create some database records automatically based on some
other records, think e.g. of an audit log where whenever an instance of
model X is changed, we want to create an instance of model Y as a log
record. Of course, there are *lots* of ways and libraries that allow you
to do this with Django, mostly somehow circling around overriding
`save()` or using the `post_save` signal.

However, we want to do this in an *aggregated* way, i.e. even if you
add hundreds of model instances in the same transaction, we only want
*one* new line in our audit log to be created. This is impossible with
`save()` or `post_save` alone since we can't know which change is the
"last" that should trigger the audit log to be created.

I figured a perfect way to do this would be using `save()` or
`post_save` to add the changed model instance to some kind of
thread-local list, and then using `transaction.on_commit` to "schedule"
the aggregation and create the log entries when all changes have been
made. However, this obviously is not a good enough because `on_commit`
runs *after* the `COMMIT` statement and thus we're not guaranteed that
all log entries are persisted to the database.

So I was wondering if there was potential for a
`transaction.on_before_commit` that works just like
`transaction.on_commit` except that it is executed before the `COMMIT`
statement. I imagine there are lots of other possible uses as well and
without looking into it much deeper, it seems easy enough to implement.

Does anyone have opinions on whether or not this would be a good
feature to add to Django? Unfortunately it doesn't seem possible to do
this as third-party code (except if we were shipping entire database
backends) so it would need to be an acceptable change to Django to be a
viable option.

Thanks
Raphael

Markus Holtermann

unread,
Oct 10, 2021, 11:03:27 AM10/10/21
to Django developers
Hi Raphael,

This is funny. Just last week I looked into exactly the same thing for audit logging as well. Except that I'm writing multiple audit log events and want to batch them at the end of the transaction in a single bulk_create operation instead of a dozen save() calls.

I haven't gotten anywhere for now. But was considering the thread local approach as well. But then, thread locals and Django are not the nicest combination.

Maybe we can come up with an approach together?

Cheers

Markus
> --
> You received this message because you are subscribed to the Google
> Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to django-develop...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/20211010155522.6489b86d%40amara.localdomain.

Shai Berger

unread,
Oct 10, 2021, 11:39:18 AM10/10/21
to django-d...@googlegroups.com
Just to clarify the use-case:

Why is a before-commit signal preferable to a vanilla Python
context-manager around the code? Or, if it is in the context of
requests, a middleware?

Florian Apolloner

unread,
Oct 10, 2021, 12:41:00 PM10/10/21
to Django developers (Contributions to Django itself)
Because the scoping of the transactions makes it hard I imagine. A view could start more than one transaction for instance, and one of those might succeed and the other might get rolled back. Thinking more about this, a system like this might get hard to implement. Even if thread locals are used, one might have to clear the list in cases of rollbacks or at least partially clear it for savepoint rollbacks. And all that doesn't even consider autocommit yet :)

Aymeric Augustin

unread,
Oct 10, 2021, 3:38:34 PM10/10/21
to django-d...@googlegroups.com
Hello Raphael,

Oh - a use case for django-transaction-signals ;-) I'm bringing up this elaborate joke because you're essentially asking for a "pre-commit" signal here and the README contains a good list of things that can go wrong with transaction signals. (Please ignore how this package demonstrates a way to do it as third-party code *cough* *cough* *cough*)

I figured a perfect way to do this would be using `save()` or
`post_save` to add the changed model instance to some kind of
thread-local list, and then using `transaction.on_commit` to "schedule"
the aggregation and create the log entries when all changes have been
made. However, this obviously is not a good enough because `on_commit`
runs *after* the `COMMIT` statement and thus we're not guaranteed that
all log entries are persisted to the database.

In my opinion "saving the log entries may fail after a successful transaction" isn't the main design problem here. The bigger problem is "log may contain entries for writes that don't actually happen, because some savepoints were rolled back, typically due to atomic blocks exiting with an exception". And then you get dragged into the whole complexity that the README of django-transaction-signals outlines and that we're trying to avoid in Django.

(If you don't have any savepoint rollbacks, then your code sounds sufficiently constrained to implement logging of changes explicitly at the application layer rather than at the framework layer.)

If you run with ATOMIC_REQUESTS, I would suggest to replace it by a custom middleware that wraps get_response(request) in an atomic block. Then you know that this is the outermost traction and you can do whatever needed before exiting the atomic block. You also need the same in all management commands, which could be a problem if you depend on third-party management commands.

Failing that, in order to log changes with transactional correctness enforced by the ACID guarantees of PostgreSQL, I'd recommend doing it at that database level with triggers — which always execute in the same transaction. I realize it may be difficult to perform the kind of aggregation you have in mind.

As a last resort, I'd try a custom database backend to track accurately transactions and savepoints and maintain an up-to-date record of changes that will happen when the transaction is committed.

Hope this helps!

-- 
Aymeric.

Raphael Michel

unread,
Oct 12, 2021, 5:28:37 AM10/12/21
to Aymeric Augustin, django-d...@googlegroups.com
Hi Aymeric,

thank you for the insightful reply. Indeed I have overlooked the issue
with savepoints which makes it much more fragile. In our case, "not
using savepoint rollbacks any more" would be a trade-off that we'd
happily make (there are enough other problems with savepoints to begin
with), but I understand that this kinda disqualifies it for a
first-class feature in Django :(

A middleware is not an option for us, as we intentionally don't use
ATOMIC_REQUESTS and not all code is triggered through requests (celery
tasks, management commands, …) and we'd like to avoid to deal with all
of these separately.

Best
Raphael

Am Sun, 10 Oct 2021 21:38:13 +0200
schrieb Aymeric Augustin <aymeric....@polytechnique.org>:

> Hello Raphael,
>
> Oh - a use case for django-transaction-signals
> <https://github.com/aaugustin/django-transaction-signals> ;-) I'm

Raphael Michel

unread,
Oct 12, 2021, 5:29:30 AM10/12/21
to Shai Berger, django-d...@googlegroups.com
Hi,

Am Sun, 10 Oct 2021 18:38:55 +0300
schrieb Shai Berger <sh...@platonix.com>:
> Why is a before-commit signal preferable to a vanilla Python
> context-manager around the code? Or, if it is in the context of
> requests, a middleware?

basically mostly because you can forget to put the context manager
around it and because it might become very verbose adding context
managers everywhere.

Best
Raphael

Shai Berger

unread,
Oct 13, 2021, 12:06:01 PM10/13/21
to Raphael Michel, django-d...@googlegroups.com
Hi Raphael,

On Tue, 12 Oct 2021 11:28:20 +0200
Raphael Michel <ma...@raphaelmichel.de> wrote:
> In our case, "not using savepoint rollbacks any more" would be a
> trade-off that we'd happily make (there are enough other problems
> with savepoints to begin with)

You seem to imply that savepoint rollbacks are a very easy feature to
avoid, but I think this is not the case -- in a big part, because some
of them are hidden away from user code.

Any example within a transaction of

try:
with transaction.atomic():
my_model.save()
except SomeException:
handle_error()
do_something_else_touching_db()

is, explicitly, a savepoint-rollback in the error case, and one you may
need because of side-effects of save() (e.g. in signal handlers). But
even an innocuous

my_model.save()

can induce a savepoint rollback, in case you're using MTI.

You can, maybe, achieve something using database instrumentation --
identify when a "COMMIT" is sent to the database at the SQL level, and
do something before that. But that, also, sounds dangerous.

HTH,
Shai.

Basith

unread,
Oct 13, 2021, 5:48:09 PM10/13/21
to django-d...@googlegroups.com
Congratulations! Cheers buddy :)

--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages