#36526: bulk_update uses more memory than expected
----------------------------+-----------------------------------------
Reporter: Anže Pečar | Type: Uncategorized
Status: new | Component: Uncategorized
Version: 5.2 | Severity: Normal
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
----------------------------+-----------------------------------------
I recently tried to update a large number of objects with:
{{{
things = list(Thing.objects.all()) # A large number of objects e.g. >
1_000_000
Thing.objects.bulk_update(things, ["description"], batch_size=300)
}}}
The first line above fits into the available memory (~2GB in my case), but
the second line caused a SIGTERM, even though I had an additional 2GB of
available memory. This was a bit surprising as I wasn't expecting
bulk_update to use this much memory since all the objects to update were
already loaded.
My solution was:
{{{
for batch in batched(things, 300):
Thing.objects.bulk_update(batch, ["description"], batch_size=300)
}}}
The first example `bulk_update` used 2.8GB of memory, but in the second
example, it only used 62MB.
[
https://github.com/anze3db/django-bulk-update-memory A GitHub repository
that reproduces the problem with memray results.]
Looking at the source code of `bulk_update`, the issue seems to be that
Django builds the `updates` list before starting to execute the queries.
I'd be happy to contribute a patch that makes the updates list lazy unless
there are concerns about adding more computation between each update call
and thus making the transaction longer?
This might be related to
https://code.djangoproject.com/ticket/31202, but
I decided to open a new issue because I wouldn't mind waiting longer for
bulk_update to complete, but the SIGTERM surprised me.
--
Ticket URL: <
https://code.djangoproject.com/ticket/36526>
Django <
https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.