[Django] #35793: Add support for atomic upserts

21 views
Skip to first unread message

Django

unread,
Sep 26, 2024, 12:18:35 PM9/26/24
to django-...@googlegroups.com
#35793: Add support for atomic upserts
-------------------------------------+-------------------------------------
Reporter: Storm Heg | Type: New
| feature
Status: new | Component: Database
| layer (models, ORM)
Version: | Severity: Normal
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Hello,

I took over maintenance of a package called wagtail-ab-testing which
performs a raw SQL query to insert or update (UPSERT) a model atomically
([https://github.com/wagtail-nest/wagtail-ab-
testing/blob/1ba57e4707061abf26c3c5829a8a79138fcaa2a5/wagtail_ab_testing/models.py#L450-L460
link to relevant source code])

This raw sql query confused me, but as it turns out it is there because
this the Django ORM does not appear support atomic upserts without taking
a lock on the row, which hurts performance

Specifically, the (performant) raw SQL for PostgreSQL looks like this:

{{{#!sql
INSERT INTO %s (ab_test_id, version, date, hour, participants,
conversions)
VALUES (%ss, %ss, %%s, %%s, %%s, %%s)
ON CONFLICT (ab_test_id, version, date, hour)
DO UPDATE SET participants = %s.participants + %%s, conversions =
%s.conversions + %%s;
}}}


There are a few things to note here:
* There is an unique constraint on combination of `ab_test_id`, `version`,
`date`, and `hour` columns that prevents duplicate objects from being
created.
* The `participants` and `conversions` columns are updated atomically;
That is: these columns are ''incremented'' by the given values, not set
directly. This is important for atomicity. We don't want multiple
concurrent database calls to overwrite each other.

The more common way to do this would be to use the
[https://docs.djangoproject.com/en/5.1/ref/models/querysets/#update-or-
create `update_or_create`] method, but this
[https://github.com/django/django/blob/5865ff5adcf64da03d306dc32b36e87ae6927c85/django/db/models/query.py#L965-L969
internally takes a lock on the row in case of updates], which is not
acceptable because this is a hot code path with a lot of concurrent
requests. We can’t afford to take a lock on the row.

I think it would be great if Django ORM supported atomic upserts without
taking a lock on the row. I’m not sure what the api for that would look
like, suggestions are welcome.

As for database support, PostgreSQL and SQLite support this syntax since
PostgreSQL 9.5 and SQLite 3.24.0 respectively. MySQL and MariaDB do not
support this natively, but it can be emulated using `INSERT ... ON
DUPLICATE KEY UPDATE` syntax and creating a unique index on the columns.

Oracle apparently supports something similar using the vastly different
`MERGE` statement syntax. That might be a bit of a challenge.

Looking for feedback on this idea. Is this something that would be worth
supporting in Django? I’ll admit it’s a bit of a niche feature, but it’s a
very useful one in certain cases. I would love to get rid of the raw SQL
query in `wagtail-ab-testing` and have something that works across all
supported databases.

- Storm
--
Ticket URL: <https://code.djangoproject.com/ticket/35793>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Sep 27, 2024, 7:40:15 PM9/27/24
to django-...@googlegroups.com
#35793: Add support for atomic upserts
-------------------------------------+-------------------------------------
Reporter: Storm Heg | Owner: (none)
Type: New feature | Status: closed
Component: Database layer | Version:
(models, ORM) |
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Sarah Boyce):

* resolution: => wontfix
* status: new => closed

Comment:

Hi Storm!

I think it might be worth asking on the
[https://forum.djangoproject.com/c/internals/5 Django forum] for
help/input and then confirming if something new would need adding to
Django (and what that would look like).
We also ask contributors to get feedback from the community through a
forum discussion first, so we can confirm there is community consensus
that we should add this feature

If you get positive feedback from the forum, can you link it back here and
reopen the ticket?

It feels like what you need might be possible using a combination of
`force_insert` and
[https://docs.djangoproject.com/en/5.1/ref/models/instances/#updating-
attributes-based-on-existing-fields F expressions], but I'm not sure if I
followed completely and I could be wrong
--
Ticket URL: <https://code.djangoproject.com/ticket/35793#comment:1>

Django

unread,
Oct 20, 2024, 7:51:19 AM10/20/24
to django-...@googlegroups.com
#35793: Add support for atomic upserts
-------------------------------------+-------------------------------------
Reporter: Storm Heg | Owner: (none)
Type: New feature | Status: closed
Component: Database layer | Version:
(models, ORM) |
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Storm Heg):

Thanks for the reply Sarah – and again, great meeting you at DjangoCon US
2024!

I wasn't aware that feature requests usually go through the forum. I've
moved my request over there: https://forum.djangoproject.com/t/add-
support-for-atomic-upserts/35813
--
Ticket URL: <https://code.djangoproject.com/ticket/35793#comment:2>

Django

unread,
Oct 23, 2024, 12:57:53 PM10/23/24
to django-...@googlegroups.com
#35793: Add support for atomic upserts
-------------------------------------+-------------------------------------
Reporter: Storm Heg | Owner: (none)
Type: New feature | Status: new
Component: Database layer | Version:
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Lily Foote):

* cc: Lily Foote (added)
* resolution: wontfix =>
* stage: Unreviewed => Accepted
* status: closed => new

Comment:

[https://forum.djangoproject.com/t/add-support-for-atomic-upserts/35813/5
As discussed in the forum post], the most likely way forward with this is
to update `bulk_create` to support general expressions in `update_fields`.
--
Ticket URL: <https://code.djangoproject.com/ticket/35793#comment:3>

Django

unread,
Oct 23, 2024, 1:55:11 PM10/23/24
to django-...@googlegroups.com
#35793: Add support for atomic upserts
-------------------------------------+-------------------------------------
Reporter: Storm Heg | Owner: (none)
Type: New feature | Status: new
Component: Database layer | Version:
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Simon Charette):

As a clarification the ORM already supports "atomic upsert" through
`bulk_create` (as defined by `INSERT ON CONFLICT DO UPDATE`) but lacks a
way to set the update value to anything but the value provided by the
excluded row on conflict.

One thing that we'll need to figure out though is what field references
(e.g. `F` or any `str` passed to expressions) should resolve to. Should it
be the excluded row (the value specified for associated field of the
instance passed to `bulk_create`), should it be the value of conflicting
row, or should we allow both by requiring `F('excluded__field')` and
`F('conflicting__field')` to be specified?

In other words what should the following resolve to

{{{#!python
PostView.objects.bulk_create(
[PostView(post_id=post_id, user_id=user_id, count=0)],
unique_fields=["post_id", "user_id"],
unique_updates={
"count": F("count") + 1
}
)
}}}

{{{#!sql
INSERT INTO postview(post_id, user_id, count) VALUES(:post_id, :user_id,
:count)
ON CONFLICT (post_id, user_id) DO UPDATE SET count = postview.update + 1
}}}

Stormheg's report requested that it defaults to the conflicting row (not
the `excluded` one) and I think it makes the most sense but I'm unsure if
all backends support that. Another argument for this using the conflicting
row value over the excluded one is that the latter can be emulated with a
`Case(When(...))` if necessary (return distinct value based on unique key
match) while the other way around is not possible. It's also easier to
implement resolving to the `INSERT` table alias (while disallowing JOINs)
as no alias re-pointing needs to take place and if a need eventually arise
to support excluded references it would be straightforward to implement a
`ExcludedF(F)` class that always resolve to the backend specific
equivalent of Postgres's `excluded`.
--
Ticket URL: <https://code.djangoproject.com/ticket/35793#comment:4>

Django

unread,
Oct 23, 2024, 1:57:25 PM10/23/24
to django-...@googlegroups.com
#35793: Add support for atomic upserts
-------------------------------------+-------------------------------------
Reporter: Storm Heg | Owner: (none)
Type: New feature | Status: new
Component: Database layer | Version:
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* cc: Simon Charette (added)

--
Ticket URL: <https://code.djangoproject.com/ticket/35793#comment:5>

Django

unread,
Oct 24, 2024, 7:12:01 PM10/24/24
to django-...@googlegroups.com
#35793: Add support for atomic upserts
-------------------------------------+-------------------------------------
Reporter: Storm Heg | Owner: (none)
Type: New feature | Status: new
Component: Database layer | Version:
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by John Speno):

* cc: John Speno (added)

--
Ticket URL: <https://code.djangoproject.com/ticket/35793#comment:6>

Django

unread,
Nov 4, 2024, 4:29:16 PM11/4/24
to django-...@googlegroups.com
#35793: Add support for atomic upserts
-------------------------------------+-------------------------------------
Reporter: Storm Heg | Owner:
| YashRaj1506
Type: New feature | Status: assigned
Component: Database layer | Version:
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by YashRaj1506):

* owner: (none) => YashRaj1506
* status: new => assigned

--
Ticket URL: <https://code.djangoproject.com/ticket/35793#comment:7>

Django

unread,
Nov 12, 2024, 10:25:20 AM11/12/24
to django-...@googlegroups.com
#35793: Add support for atomic upserts
-------------------------------------+-------------------------------------
Reporter: Storm Heg | Owner:
| YashRaj1506
Type: New feature | Status: assigned
Component: Database layer | Version:
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by YashRaj1506):

One thing i have realised that the `bulk_create` doesn't supports `F`
expressions directly, so a part of this feature would also include how to
incorporate the value for the current `conflicting_row` to the value of
the dictionaries which we will be passing, cause the `bulk_create` will
throw the errors at the tests. There can be two ways either get the values
from the db in the current dictionary of `update_fields`, and when
operations are done on the python side, the batched_insert will send the
data to db or After the the bulk insert has been done , a separate code
shall run to update those field values which raised conflicts...... but i
dont think using `update` in a `bulk_create` is a nice approach? I need
your all opinions on this..
--
Ticket URL: <https://code.djangoproject.com/ticket/35793#comment:8>

Django

unread,
Aug 21, 2025, 2:41:44 PMAug 21
to django-...@googlegroups.com
#35793: Add support for atomic upserts
-------------------------------------+-------------------------------------
Reporter: Storm Heg | Owner:
| YashRaj1506
Type: New feature | Status: assigned
Component: Database layer | Version:
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Take Weiland):

I just found this ticket after I already made
https://code.djangoproject.com/ticket/36567
I am copying the relevant info here.

- All supported DBs (all except Oracle, for which `update_conflicts` is
not implemented) support referring to the values that were about to be
inserted.
- PostgreSQL and SQLite explicitly state support for referring to the row
that is conflicting and that will be updated
(www.postgresql.org/docs/current/sql-insert.html,
https://sqlite.org/lang_upsert.html)
- MySQL does not state such support, however their examples show it being
done (`ON DUPLICATE KEY UPDATE c=c+1;` from
https://dev.mysql.com/doc/refman/9.4/en/insert-on-duplicate.html)
- MariaDB does not state such support, however considering it started from
MySQL I would be surprised if it is not supported.

My API bikeshedding was something like this:

{{{
MyModel.objects.bulk_create(
objs,
update_conflicts=True,
update_fields={
'field_b': Coalesce(F('field_b'), Inserted('field_b'))
}
)
}}}

I use `Inserted` here, but it could just as well be `Excluded`.
--
Ticket URL: <https://code.djangoproject.com/ticket/35793#comment:9>

Django

unread,
Aug 23, 2025, 10:32:18 AMAug 23
to django-...@googlegroups.com
#35793: Add support for atomic upserts
-------------------------------------+-------------------------------------
Reporter: Storm Heg | Owner:
| YashRaj1506
Type: New feature | Status: assigned
Component: Database layer | Version:
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Simon Charette):

Thanks for the backends support investigation, that's quite useful!

> I use Inserted here, but it could just as well be Excluded.

I think `Excluded` would be preferable as its closer to the SQL naming and
refers to the conflicting row. I feel like `Inserted` is ambiguous as it's
effectively ''tentatively inserted but failed''.
--
Ticket URL: <https://code.djangoproject.com/ticket/35793#comment:10>
Reply all
Reply to author
Forward
0 new messages