[Django] #36526: bulk_update uses more memory than expected

27 views
Skip to first unread message

Django

unread,
Jul 26, 2025, 10:34:55 AMJul 26
to django-...@googlegroups.com
#36526: bulk_update uses more memory than expected
----------------------------+-----------------------------------------
Reporter: Anže Pečar | Type: Uncategorized
Status: new | Component: Uncategorized
Version: 5.2 | Severity: Normal
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
----------------------------+-----------------------------------------
I recently tried to update a large number of objects with:

{{{
things = list(Thing.objects.all()) # A large number of objects e.g. >
1_000_000
Thing.objects.bulk_update(things, ["description"], batch_size=300)
}}}

The first line above fits into the available memory (~2GB in my case), but
the second line caused a SIGTERM, even though I had an additional 2GB of
available memory. This was a bit surprising as I wasn't expecting
bulk_update to use this much memory since all the objects to update were
already loaded.

My solution was:

{{{
for batch in batched(things, 300):
Thing.objects.bulk_update(batch, ["description"], batch_size=300)
}}}

The first example `bulk_update` used 2.8GB of memory, but in the second
example, it only used 62MB.

[https://github.com/anze3db/django-bulk-update-memory A GitHub repository
that reproduces the problem with memray results.]

Looking at the source code of `bulk_update`, the issue seems to be that
Django builds the `updates` list before starting to execute the queries.
I'd be happy to contribute a patch that makes the updates list lazy unless
there are concerns about adding more computation between each update call
and thus making the transaction longer?

This might be related to https://code.djangoproject.com/ticket/31202, but
I decided to open a new issue because I wouldn't mind waiting longer for
bulk_update to complete, but the SIGTERM surprised me.
--
Ticket URL: <https://code.djangoproject.com/ticket/36526>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jul 26, 2025, 11:02:04 AMJul 26
to django-...@googlegroups.com
#36526: bulk_update uses more memory than expected
-------------------------------+--------------------------------------
Reporter: Anže Pečar | Owner: (none)
Type: Uncategorized | Status: new
Component: Uncategorized | Version: 5.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Description changed by Anže Pečar:

Old description:

> I recently tried to update a large number of objects with:
>
> {{{
> things = list(Thing.objects.all()) # A large number of objects e.g. >
> 1_000_000
> Thing.objects.bulk_update(things, ["description"], batch_size=300)
> }}}
>
> The first line above fits into the available memory (~2GB in my case),
> but the second line caused a SIGTERM, even though I had an additional 2GB
> of available memory. This was a bit surprising as I wasn't expecting
> bulk_update to use this much memory since all the objects to update were
> already loaded.
>
> My solution was:
>
> {{{
> for batch in batched(things, 300):
> Thing.objects.bulk_update(batch, ["description"], batch_size=300)
> }}}
>
> The first example `bulk_update` used 2.8GB of memory, but in the second
> example, it only used 62MB.
>
> [https://github.com/anze3db/django-bulk-update-memory A GitHub repository
> that reproduces the problem with memray results.]
>
> Looking at the source code of `bulk_update`, the issue seems to be that
> Django builds the `updates` list before starting to execute the queries.
> I'd be happy to contribute a patch that makes the updates list lazy
> unless there are concerns about adding more computation between each
> update call and thus making the transaction longer?
>
> This might be related to https://code.djangoproject.com/ticket/31202, but
> I decided to open a new issue because I wouldn't mind waiting longer for
> bulk_update to complete, but the SIGTERM surprised me.

New description:

I recently tried to update a large number of objects with:

{{{
things = list(Thing.objects.all()) # A large number of objects e.g. >
1_000_000
Thing.objects.bulk_update(things, ["description"], batch_size=300)
}}}

The first line above fits into the available memory (~2GB in my case), but
the second line caused a SIGTERM, even though I had an additional 2GB of
available memory. This was a bit surprising as I wasn't expecting
bulk_update to use this much memory since all the objects to update were
already loaded.

My solution was:

{{{
for batch in batched(things, 300):
Thing.objects.bulk_update(batch, ["description"], batch_size=300)
}}}

The first example `bulk_update` used 2.8GB of memory, but in the second
example, it only used 62MB.

[https://github.com/anze3db/django-bulk-update-memory A GitHub repository
that reproduces the problem with memray results.]

This might be related to https://code.djangoproject.com/ticket/31202, but
I decided to open a new issue because I wouldn't mind waiting longer for
bulk_update to complete, but the SIGTERM surprised me.

--
--
Ticket URL: <https://code.djangoproject.com/ticket/36526#comment:1>

Django

unread,
Jul 27, 2025, 5:24:55 AMJul 27
to django-...@googlegroups.com
> This might be related to https://code.djangoproject.com/ticket/31202, but
> I decided to open a new issue because I wouldn't mind waiting longer for
> bulk_update to complete, but the SIGTERM surprised me.

New description:

I recently tried to update a large number of objects with:

{{{
things = list(Thing.objects.all()) # A large number of objects e.g. >
1_000_000
Thing.objects.bulk_update(things, ["description"], batch_size=300)
}}}

The first line above fits into the available memory (~2GB in my case), but
the second line caused a SIGTERM, even though I had an additional 2GB of
available memory. This was a bit surprising as I wasn't expecting
bulk_update to use this much memory since all the objects to update were
already loaded.

My solution was:

{{{
for batch in batched(things, 300):
Thing.objects.bulk_update(batch, ["description"], batch_size=300)
}}}

The first example `bulk_update` used 2.8GB of memory, but in the second
example, it only used 62MB.

[https://github.com/anze3db/django-bulk-update-memory A GitHub repository
that reproduces the problem with memray results.]

As we can see from the [https://github.com/user-attachments/assets
/dd0bdcac-682f-4e79-aa25-aa5a4a2e6b9d memray flamegraph] the majority of
the memory in my example (2.1GB) is used to prepare the when statement for
all the batches before executing them. If we change this to generate the
when statement only for the current batch the memory consumption is going
to be greatly reduced. I'd be happy to contribute this patch unless there
are concerns on adding more compute between update queries and making the
transactions longer. Let me know :)

This might be related to https://code.djangoproject.com/ticket/31202, but
I decided to open a new issue because I wouldn't mind waiting longer for
bulk_update to complete, but the SIGTERM surprised me.

--
--
Ticket URL: <https://code.djangoproject.com/ticket/36526#comment:2>

Django

unread,
Jul 27, 2025, 5:25:56 AMJul 27
to django-...@googlegroups.com
transaction longer. Let me know :)

This might be related to https://code.djangoproject.com/ticket/31202, but
I decided to open a new issue because I wouldn't mind waiting longer for
bulk_update to complete, but the SIGTERM surprised me.

--
--
Ticket URL: <https://code.djangoproject.com/ticket/36526#comment:3>

Django

unread,
Jul 27, 2025, 5:26:56 AMJul 27
to django-...@googlegroups.com
#36526: bulk_update uses more memory than expected
-------------------------------------+-------------------------------------
Reporter: Anže Pečar | Owner: (none)
Type: Uncategorized | Status: new
Component: Database layer | Version: 5.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Anže Pečar):

* component: Uncategorized => Database layer (models, ORM)

--
Ticket URL: <https://code.djangoproject.com/ticket/36526#comment:4>

Django

unread,
Jul 27, 2025, 9:14:01 AMJul 27
to django-...@googlegroups.com
#36526: bulk_update uses more memory than expected
-------------------------------------+-------------------------------------
Reporter: Anže Pečar | Owner: Jason
| Hall
Type: Uncategorized | Status: assigned
Component: Database layer | Version: 5.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Jason Hall):

* owner: (none) => Jason Hall
* status: new => assigned

--
Ticket URL: <https://code.djangoproject.com/ticket/36526#comment:5>

Django

unread,
Jul 27, 2025, 11:27:02 AMJul 27
to django-...@googlegroups.com
#36526: bulk_update uses more memory than expected
-------------------------------------+-------------------------------------
Reporter: Anže Pečar | Owner: Jason
| Hall
Type: Uncategorized | Status: assigned
Component: Database layer | Version: 5.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Jason Hall):

* has_patch: 0 => 1

--
Ticket URL: <https://code.djangoproject.com/ticket/36526#comment:6>

Django

unread,
Jul 27, 2025, 10:05:30 PMJul 27
to django-...@googlegroups.com
#36526: bulk_update uses more memory than expected
-------------------------------------+-------------------------------------
Reporter: Anže Pečar | Owner: Jason
| Hall
Type: Uncategorized | Status: assigned
Component: Database layer | Version: 5.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Simon Charette):

I feel like we should likely close this one in favor of #31202
particularly because the proposed solution of
[https://github.com/django/django/pull/19677#pullrequestreview-3059926789
of batching the resolving of the expression] while the transaction is
opened is likely going to cause more harm than good until the query
resolving and compilation of the batches is made faster.
--
Ticket URL: <https://code.djangoproject.com/ticket/36526#comment:7>

Django

unread,
Jul 27, 2025, 10:35:04 PMJul 27
to django-...@googlegroups.com
#36526: bulk_update uses more memory than expected
-------------------------------------+-------------------------------------
Reporter: Anže Pečar | Owner: Jason
| Hall
Type: Uncategorized | Status: assigned
Component: Database layer | Version: 5.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Jason Hall):

Got it -- thanks for the feedback.
--
Ticket URL: <https://code.djangoproject.com/ticket/36526#comment:8>

Django

unread,
Jul 28, 2025, 9:35:42 AMJul 28
to django-...@googlegroups.com
#36526: bulk_update uses more memory than expected
-------------------------------------+-------------------------------------
Reporter: Anže Pečar | Owner: Jason
Type: | Hall
Cleanup/optimization | Status: closed
Component: Database layer | Version: 5.2
(models, ORM) |
Severity: Normal | Resolution: duplicate
Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Natalia Bidart):

* resolution: => duplicate
* status: assigned => closed
* type: Uncategorized => Cleanup/optimization

Comment:

Thank you Anže Pečar for taking the time to create this report, and thank
you Jason Hall for your interest in fixing this.

Simon, I agree with you this report is a dupe of #31202, but I also wonder
if we should consider a similar note to the one added in #28231 for
`bulk_create` and more efficient batching?
--
Ticket URL: <https://code.djangoproject.com/ticket/36526#comment:9>

Django

unread,
Jul 28, 2025, 10:18:41 AMJul 28
to django-...@googlegroups.com
#36526: bulk_update uses more memory than expected
-------------------------------------+-------------------------------------
Reporter: Anže Pečar | Owner: Jason
Type: | Hall
Cleanup/optimization | Status: closed
Component: Database layer | Version: 5.2
(models, ORM) |
Severity: Normal | Resolution: duplicate
Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Anže Pečar):

Natalia Bidart, I pointed out in my initial description that the two
issues are related but I am still not fully convinced that they are
duplicates. In my case I was updating a large number of objects for
several hours and it wouldn't have made a difference if the query took an
extra hour or two. What did make a difference was that the script was
killed with a SIGTERM when the container ran out of memory. :(

Could we reopen until we fully understand what the performance impact of
the code changes proposed from Jason Hall? I made a quick benchmark
earlier today and Jason's solution with the longer transaction ended up
being 6% slower (29.76s vs 28s) but I wanted to also test it on a dataset
with more columns as was the example in #31202.
--
Ticket URL: <https://code.djangoproject.com/ticket/36526#comment:10>

Django

unread,
Jul 28, 2025, 12:15:24 PMJul 28
to django-...@googlegroups.com
#36526: bulk_update uses more memory than expected
-------------------------------------+-------------------------------------
Reporter: Anže Pečar | Owner: Jason
Type: | Hall
Cleanup/optimization | Status: closed
Component: Database layer | Version: 5.2
(models, ORM) |
Severity: Normal | Resolution: duplicate
Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Simon Charette):

> I also wonder if we should consider a similar note to the one added in
#28231 for bulk_create and more efficient batching?

Natalia, I think we should yes.

My immediate reaction when reviewing the ticket was to have a look at the
`bulk_update` documentation and it's effectively not entirely clear what
''batching'' refers to (query batching vs objects materialization).
52aa26e6979ba81b00f1593d5ee8c5c73aaa6391 made it very clear that manual
generator slicing must be used to prevent evaluation.
--
Ticket URL: <https://code.djangoproject.com/ticket/36526#comment:11>

Django

unread,
Aug 28, 2025, 3:12:50 PM (7 days ago) Aug 28
to django-...@googlegroups.com
#36526: bulk_update uses more memory than expected
-------------------------------------+-------------------------------------
Reporter: Anže Pečar | Owner: Jason
Type: | Hall
Cleanup/optimization | Status: new
Component: Database layer | Version: 5.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Natalia Bidart):

* needs_better_patch: 0 => 1
* resolution: duplicate =>
* stage: Unreviewed => Accepted
* status: closed => new

Comment:

Replying to [comment:11 Simon Charette]:
> > I also wonder if we should consider a similar note to the one added in
#28231 for bulk_create and more efficient batching?
>
> Natalia, I think we should yes.
>
> My immediate reaction when reviewing the ticket was to have a look at
the `bulk_update` documentation and it's effectively not entirely clear
what ''batching'' refers to (query batching vs objects materialization).
52aa26e6979ba81b00f1593d5ee8c5c73aaa6391 made it very clear that manual
generator slicing must be used to prevent evaluation.

Reopening with the goal to add the clarification in the docs similar to
what `bulk_create` has.
--
Ticket URL: <https://code.djangoproject.com/ticket/36526#comment:12>
Reply all
Reply to author
Forward
0 new messages