http://stackoverflow.com/questions/18797608/update-multiple-rows-in-same-
query-using-postgresql
However, such an interface is not exposed via the Django ORM. Would this
be accepted as a patch? if so, would a list dictionary with the various
fields:values to be updated be a good way to provide the input to ORM
--
Ticket URL: <https://code.djangoproject.com/ticket/23646>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0
Old description:
> Database backends like Postgresql support doing multiple value update for
> the same field in different rows using a single update query. Please see
> this sample below:
>
> http://stackoverflow.com/questions/18797608/update-multiple-rows-in-same-
> query-using-postgresql
>
> However, such an interface is not exposed via the Django ORM. Would this
> be accepted as a patch? if so, would a list dictionary with the various
> fields:values to be updated be a good way to provide the input to ORM
New description:
Database backends like Postgresql support doing multiple value update for
the same field in different rows using a single update query. Please see
this sample below:
http://stackoverflow.com/questions/18797608/update-multiple-rows-in-same-
query-using-postgresql
Example:
Currently:
{{{
Books.objects.all().update(price=10)
}}}
Proposed (syntax can vary):
{{{
Books.objects.all().update({'pk': 1, 'price': 10}, {'pk': 2, 'price':
25}])
}}}
Idea is to do it in sql for performance reasons but still use the ORM and
not use a raw database connection.
However, such an interface is not exposed via the Django ORM. Would this
be accepted as a patch? if so, would a list dictionary with the various
fields:values to be updated be a good way to provide the input to ORM
--
--
Ticket URL: <https://code.djangoproject.com/ticket/23646#comment:1>
* status: new => closed
* resolution: => wontfix
Comment:
The link you've provided suggests that the ORM example you've provided
would be interpreted as:
{{{
UPDATE books_book
SET
price = 10 where pk = 1,
price = 25 where pk = 2
}}}
So... what in the ORM statement identifies which column is to be used in
the `WHERE` clause, and which is to be updated? As currently expressed,
it's ambiguous. You'd need to modify the statement to be something like:
{{{
Books.objects.all().update('pk', values={'pk': 1, 'price': 10}, {'pk': 2,
'price': 25})
}}}
which nominates which values in the data are for the select.
However, even with those changes, I'm not sure I see why doing this as a
single statement version would be preferable to:
{{{
Book.objects.filter(pk=1).update(price=10)
Book.objects.filter(pk=2).update(price=25)
}}}
AFAIK, it's no faster on the database side (although feel free to prove me
wrong on that), and I don't find a "one liner API" especially expressive
or clear for this feature.
For that reason, I'm marking this wontfix. If you feel the idea is still
worth pursuing, feel free to start a discussion on django-developers and
make your case. (I know you started one on django-users, but in this case,
you need the eyeballs from -dev).
--
Ticket URL: <https://code.djangoproject.com/ticket/23646#comment:2>
Comment (by shaib):
On an old project, ~10 years ago and not Django based, we used a "bulk
update" procedure that did:
1) Bulk-insert the new records into a temporary table
2) update in one statement using a join (SQL Server lets you do that)
On hundreds of records, this was significantly faster than updating one-
by-one (I no longer have access to any hard data, and it's irrelevant
anyways).
Anyway, I suspect a better API would involve passing a collection of
objects and selecting just the fields to update:
`Book.objects.update_many(books, 'price')`
--
Ticket URL: <https://code.djangoproject.com/ticket/23646#comment:3>
* status: closed => new
* resolution: wontfix =>
Comment:
Reopening even though it was closed by a core team member because another
core member has reported the significant performance improvement.
Also, there is an app called django-bulk-upate (https://github.com/aykut
/django-bulk-update) which could/should be integrated into django core
since we have a bulk_create and this corresponds well with the save (could
be called bulk save if it makes sense). App works with an identical api to
bulk_create and so should be a good candidate to be considered for
inclusion in core.
--
Ticket URL: <https://code.djangoproject.com/ticket/23646#comment:4>
* version: 1.7 => 1.9
Old description:
> Database backends like Postgresql support doing multiple value update for
> the same field in different rows using a single update query. Please see
> this sample below:
>
> http://stackoverflow.com/questions/18797608/update-multiple-rows-in-same-
> query-using-postgresql
>
> Example:
> Currently:
> {{{
> Books.objects.all().update(price=10)
> }}}
>
> Proposed (syntax can vary):
> {{{
> Books.objects.all().update({'pk': 1, 'price': 10}, {'pk': 2, 'price':
> 25}])
> }}}
>
> Idea is to do it in sql for performance reasons but still use the ORM and
> not use a raw database connection.
>
> However, such an interface is not exposed via the Django ORM. Would this
> be accepted as a patch? if so, would a list dictionary with the various
> fields:values to be updated be a good way to provide the input to ORM
New description:
**Update**
django-bulk-upate (https://github.com/aykut/django-bulk-update) which
could/should be integrated into django core since we have a bulk_create
and this corresponds well with the save (could be called bulk save if it
makes sense). App works with an identical api to bulk_create and so should
be a good candidate to be considered for inclusion in core.
Database backends like Postgresql support doing multiple value update for
the same field in different rows using a single update query. Please see
this sample below:
http://stackoverflow.com/questions/18797608/update-multiple-rows-in-same-
query-using-postgresql
Example:
Currently:
{{{
Books.objects.all().update(price=10)
}}}
Proposed (syntax can vary):
{{{
Books.objects.all().update({'pk': 1, 'price': 10}, {'pk': 2, 'price':
25}])
}}}
Idea is to do it in sql for performance reasons but still use the ORM and
not use a raw database connection.
However, such an interface is not exposed via the Django ORM. Would this
be accepted as a patch? if so, would a list dictionary with the various
fields:values to be updated be a good way to provide the input to ORM
--
--
Ticket URL: <https://code.djangoproject.com/ticket/23646#comment:5>
* stage: Unreviewed => Accepted
Comment:
Tentatively accepting, pending discussion on the DevelopersMailingList to
approve the idea and API.
--
Ticket URL: <https://code.djangoproject.com/ticket/23646#comment:6>
* has_patch: 0 => 1
Comment:
#29037 is a duplicate with [https://github.com/django/django/pull/9606 a
patch].
--
Ticket URL: <https://code.djangoproject.com/ticket/23646#comment:7>
* cc: Tom Forbes (added)
* owner: nobody => Tom Forbes
* status: new => assigned
Comment:
Sorry for the duplicate, I'm not sure how my search missed this. I've got
a patch that implements this at a basic level, there are some small
optimizations that could be done at a later stage (including adding
Postgres specific syntax).
--
Ticket URL: <https://code.djangoproject.com/ticket/23646#comment:8>
* needs_better_patch: 0 => 1
--
Ticket URL: <https://code.djangoproject.com/ticket/23646#comment:9>
* needs_better_patch: 1 => 0
Comment:
I've made the changes requested by the review. I also expanded the tests a
fair bit and ran into one case where setting a model field to an
expression (e.g `F()` or a `Function`) would fail, because the field
values where passed to `Value()`.
This got me thinking, what's the 'correct' result of `Value(F('id'))`, or
`Value(Lower('field'))`? Currently it fails in weird ways when resolved
with an `Expression`, but should it perhaps return the resolved
`Expression` instead? Or should we error loudly in this case?
--
Ticket URL: <https://code.djangoproject.com/ticket/23646#comment:10>
* needs_better_patch: 0 => 1
Comment:
The documentation says, "A Value() object represents the smallest possible
component of an expression: a simple value. When you need to represent the
value of an integer, boolean, or string within an expression, you can wrap
that value within a Value()." I don't think `Value()` is meant to wrap
other expressions.
--
Ticket URL: <https://code.djangoproject.com/ticket/23646#comment:11>
* needs_better_patch: 1 => 0
Comment:
I've moved the tests around as requested and changed the documentation.
I also had an idea for a potential optimization by using simple case
statements (https://code.djangoproject.com/ticket/29710) after reading the
Oracle documentation where they make a clear distinction between the two
(simple or searched).
--
Ticket URL: <https://code.djangoproject.com/ticket/23646#comment:12>
Comment (by Simon Charette):
As mentioned on the PR I think it'd be worth investigating (in a follow up
ticket) if we could push the primary conditional update logic to the
compiler level (`UpdateQuery`/`SQLUpdateCompiler`) in order to use
`VALUES` instead of `CASE(WHEN))` on backends that support it. Given this
API is meant to be used for large number of objects I assume using
`VALUES` would perform significantly better because it'd be easier for
query planners to inspect the query.
FWIW `UpdateQuery` already has an `update_batch` method used by model
deletion but it doesn't support per-primary key values.
--
Ticket URL: <https://code.djangoproject.com/ticket/23646#comment:13>
Comment (by Tom Forbes):
Replying to [comment:13 Simon Charette]:
> As mentioned on the PR I think it'd be worth investigating (in a follow
up ticket) if we could push the primary conditional update logic to the
compiler level (`UpdateQuery`/`SQLUpdateCompiler`) in order to use
`VALUES` instead of `CASE(WHEN))` on backends that support it. Given this
API is meant to be used for large number of objects I assume using
`VALUES` would perform significantly better because it'd be easier for
query planners to inspect the query.
>
> FWIW `UpdateQuery` already has an `update_batch` method used by model
deletion but it doesn't support per-primary key values.
Yep, in #29037 I noted that Postgres has specific syntax for it and I was
going to open a ticket once (or if!) this is merged. I'm not sure what
other databases support this, and I spent today looking at how to
implement it. It requires a custom join on `VALUES` which I'm really not
sure how to implement. If you have any idea I'd love a point in the right
direction.
--
Ticket URL: <https://code.djangoproject.com/ticket/23646#comment:14>
Comment (by Simon Charette):
> It requires a custom join on VALUES which I'm really not sure how to
implement. If you have any idea I'd love a point in the right direction.
Hmm I'll have to dig into it a bit more as well but I was hoping it would
be possible without introducing a new type of JOIN by having
`SQLUpdateCompiler` do the heavy lifting.
I suppose having a look at how `QuerySet.extra(tables, where)` is
implemented could guide us towards implementing it at the `UpdateQuery`
level though some form of JOIN.
From my local tests it seems like we could use `UNION ALL` to replace
`VALUES` on MySQL at least.
{{{#!sql
UPDATE table, (
SELECT 1 id, 2 first, 3 second
UNION ALL SELECT 2, 3, 4
UNION ALL ...
) AS table_update
SET table.first = table_update.first, table.second = table_update.second
WHERE table.id IN (...) AND table.id = table_update.id
}}}
I haven't tested performance on SQLite and MySQL but the query seem to
perform significantly faster (3-5x) on PostgreSQL using `VALUES()` instead
of `CASE(WHEN)` when updating two columns on 1k rows of a table with 10k
rows. The performance difference was getting larger as more columns were
updated (I tried 1 to 5).
--
Ticket URL: <https://code.djangoproject.com/ticket/23646#comment:15>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"9cbdb44014c8027f1b4571bac701a247b0ce02a3" 9cbdb440]:
{{{
#!CommitTicketReference repository=""
revision="9cbdb44014c8027f1b4571bac701a247b0ce02a3"
Fixed #23646 -- Added QuerySet.bulk_update() to efficiently update many
models.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/23646#comment:16>
Comment (by Tom Forbes):
Thank you for this! I've added a ticket to support this:
https://code.djangoproject.com/ticket/29771#ticket
Replying to [comment:15 Simon Charette]:
> > It requires a custom join on VALUES which I'm really not sure how to
implement. If you have any idea I'd love a point in the right direction.
>
> Hmm I'll have to dig into it a bit more as well but I was hoping it
would be possible without introducing a new type of JOIN by having
`SQLUpdateCompiler` do the heavy lifting.
>
> I suppose having a look at how `QuerySet.extra(tables, where)` is
implemented could guide us towards implementing it at the `UpdateQuery`
level though some form of JOIN.
>
> From my local tests it seems like we could use `UNION ALL` to replace
`VALUES` on MySQL at least.
>
> {{{#!sql
> UPDATE table, (
> SELECT 1 id, 2 first, 3 second
> UNION ALL SELECT 2, 3, 4
> UNION ALL ...
> ) AS table_update
> SET table.first = table_update.first, table.second = table_update.second
> WHERE table.id IN (...) AND table.id = table_update.id
> }}}
>
> I haven't tested performance on SQLite and MySQL but the query seem to
perform significantly faster (3-5x) on PostgreSQL using `VALUES()` instead
of `CASE(WHEN)` when updating two columns on 1k rows of a table with 10k
rows. The performance difference was getting larger as more columns were
updated (I tried 1 to 5).
--
Ticket URL: <https://code.djangoproject.com/ticket/23646#comment:17>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"d35ce682e31ea4a86c2079c60721fae171f03d7c" d35ce682]:
{{{
#!CommitTicketReference repository=""
revision="d35ce682e31ea4a86c2079c60721fae171f03d7c"
Fixed #33506 -- Made QuerySet.bulk_update() perform atomic writes against
write database.
The lack of _for_write = True assignment in bulk_update prior to
accessing self.db resulted in the db_for_read database being used to
wrap batched UPDATEs in a transaction.
Also tweaked the batch queryset creation to also ensure they are
executed against the same database as the opened transaction under all
circumstances.
Refs #23646, #33501.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/23646#comment:18>