#36526: Document memory usage of bulk_update and ways to batch updates.
-------------------------------------+-------------------------------------
Reporter: Anže Pečar | Owner: Jason
Type: | Hall
Cleanup/optimization | Status: new
Component: Database layer | Version: 5.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Natalia Bidart):
Replying to [comment:10 Anže Pečar]:
> Natalia Bidart, I pointed out in my initial description that the two
issues are related but I am still not fully convinced that they are
duplicates. In my case I was updating a large number of objects for
several hours and it wouldn't have made a difference if the query took an
extra hour or two. What did make a difference was that the script was
killed with a SIGTERM when the container ran out of memory. :(
>
> Could we reopen until we fully understand what the performance impact of
the code changes proposed from Jason Hall? I made a quick benchmark
earlier today and Jason's solution with the longer transaction ended up
being 6% slower (29.76s vs 28s) but I wanted to also test it on a dataset
with more columns as was the example in #31202.
Thanks for the clarification. As Simon noted in the PR, performing the
expression resolution while the transaction is open (which is known to
take significant time) is likely to be more harmful than the memory
overhead it incurs.
Since manual batching already works around the memory issue, we don't plan
to pursue changes to defer expression resolution within `bulk_update`
other than what is covered in #31202. I'll work on a patch to
improve/extend the docs.
--
Ticket URL: <
https://code.djangoproject.com/ticket/36526#comment:15>
Django <
https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.