[Django] #28231: bulk_create: avoid iterating `objs` more than necessary when `bulk_size` is provided

14 views
Skip to first unread message

Django

unread,
May 23, 2017, 1:03:40 AM5/23/17
to django-...@googlegroups.com
#28231: bulk_create: avoid iterating `objs` more than necessary when `bulk_size` is
provided
-------------------------------------+-------------------------------------
Reporter: Nir | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database | Version: 1.11
layer (models, ORM) |
Severity: Normal | Keywords:
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
When `bulk_size` provided in `bulk_create`, a user might assume (As I
myself have) that `objs` iterable will not be iterated more than
`bulk_size` times at once, and that no more than roughly `bulk_size` Model
objects reside in memory at any given time.

When using `bulk_create` for relatively big sets of objects provided by a
generator object, it would be prefered to avoid iterating over the entire
generator object. Moreover, if not iterating over the generator object is
deemed unnecessary or out-of-scope for django it would be prefered to make
a comment on said behavior in documentation.

I suggest two possible solutions:
1. Document this behavior (`bulk_create` converts passed `objs` iterator
to a list), or
2. Avoid doing so when `bulk_size` is given (or default is other than
`None`. i.e. sqlite).

I did not research the possibility of avoiding the list conversion, but if
that solution is accepted by the community I volunteer to investigate
further and claim this ticket.

--
Ticket URL: <https://code.djangoproject.com/ticket/28231>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
May 23, 2017, 1:06:42 AM5/23/17
to django-...@googlegroups.com
#28231: bulk_create: avoid iterating `objs` more than necessary when `bulk_size` is
provided
-------------------------------------+-------------------------------------
Reporter: Nir Izraeli | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Nir Izraeli):

* cc: nirizr@… (added)


Comment:

CCing myself to receive updates.

--
Ticket URL: <https://code.djangoproject.com/ticket/28231#comment:1>

Django

unread,
May 23, 2017, 1:09:24 AM5/23/17
to django-...@googlegroups.com
#28231: bulk_create: avoid iterating `objs` more than necessary when `batch_size`
is provided
-------------------------------------+-------------------------------------
Reporter: Nir Izraeli | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by Nir Izraeli:

Old description:

> When `bulk_size` provided in `bulk_create`, a user might assume (As I
> myself have) that `objs` iterable will not be iterated more than
> `bulk_size` times at once, and that no more than roughly `bulk_size`
> Model objects reside in memory at any given time.
>
> When using `bulk_create` for relatively big sets of objects provided by a
> generator object, it would be prefered to avoid iterating over the entire
> generator object. Moreover, if not iterating over the generator object is
> deemed unnecessary or out-of-scope for django it would be prefered to
> make a comment on said behavior in documentation.
>
> I suggest two possible solutions:
> 1. Document this behavior (`bulk_create` converts passed `objs` iterator
> to a list), or
> 2. Avoid doing so when `bulk_size` is given (or default is other than
> `None`. i.e. sqlite).
>
> I did not research the possibility of avoiding the list conversion, but
> if that solution is accepted by the community I volunteer to investigate
> further and claim this ticket.

New description:

When `batch_size` provided in `bulk_create`, a user might assume (As I


myself have) that `objs` iterable will not be iterated more than

`batch_size` times at once, and that no more than roughly `batch_size`


Model objects reside in memory at any given time.

When using `bulk_create` for relatively big sets of objects provided by a
generator object, it would be prefered to avoid iterating over the entire
generator object. Moreover, if not iterating over the generator object is
deemed unnecessary or out-of-scope for django it would be prefered to make
a comment on said behavior in documentation.

I suggest two possible solutions:
1. Document this behavior (`bulk_create` converts passed `objs` iterator
to a list), or

2. Avoid doing so when `batch_size` is given (or default is other than
`None`. i.e. sqlite).

I did not research the possibility of avoiding the list conversion, but if
that solution is accepted by the community I volunteer to investigate
further and claim this ticket.

--

--
Ticket URL: <https://code.djangoproject.com/ticket/28231#comment:2>

Django

unread,
May 23, 2017, 10:00:42 AM5/23/17
to django-...@googlegroups.com
#28231: bulk_create: avoid iterating `objs` more than necessary when `batch_size`
is provided
-------------------------------------+-------------------------------------
Reporter: Nir Izraeli | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tim Graham):

Feel free to offer a patch. Looking at the code of
`QuerySet.bulk_create()`, I don't understand how your proposal would work.

--
Ticket URL: <https://code.djangoproject.com/ticket/28231#comment:3>

Django

unread,
May 23, 2017, 1:29:11 PM5/23/17
to django-...@googlegroups.com
#28231: bulk_create: avoid iterating `objs` more than necessary when `batch_size`
is provided
-------------------------------------+-------------------------------------
Reporter: Nir Izraeli | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Nir Izraeli):

I'm thinking of creating a `chunks` generator method, similar to ones
implemented in, for example, the following Stack Overflow Answers:

1. https://stackoverflow.com/a/24527424/1146713
2. https://stackoverflow.com/a/8290514/1146713

And then iterate over it in `bulk_create` and essentially run the existing
`bulk_create` code for each `batch_size`-d chunks sequentially.

WDYT?

Also, if I am to create a patch should I post it here first or create a PR
for it?

--
Ticket URL: <https://code.djangoproject.com/ticket/28231#comment:4>

Django

unread,
May 23, 2017, 3:52:15 PM5/23/17
to django-...@googlegroups.com
#28231: bulk_create: avoid iterating `objs` more than necessary when `batch_size`
is provided
-------------------------------------+-------------------------------------
Reporter: Nir Izraeli | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

> Also, if I am to create a patch should I post it here first or create a
PR for it?

Create a PR for it and link it back to this ticket, it will allow you to
run CI against and gather feedback more easily.

--
Ticket URL: <https://code.djangoproject.com/ticket/28231#comment:5>

Django

unread,
May 24, 2017, 1:49:21 AM5/24/17
to django-...@googlegroups.com
#28231: bulk_create: avoid iterating `objs` more than necessary when `batch_size`
is provided
-------------------------------------+-------------------------------------
Reporter: Nir Izraeli | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Nir Izraeli):

PR created and passing tests: https://github.com/django/django/pull/8540

--
Ticket URL: <https://code.djangoproject.com/ticket/28231#comment:6>

Django

unread,
May 24, 2017, 8:10:35 PM5/24/17
to django-...@googlegroups.com
#28231: bulk_create: avoid iterating `objs` more than necessary when `batch_size`
is provided
-------------------------------------+-------------------------------------
Reporter: Nir Izraeli | Owner: Nir
Type: | Izraeli
Cleanup/optimization | Status: assigned

Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Nir Izraeli):

* status: new => assigned
* owner: nobody => Nir Izraeli
* has_patch: 0 => 1


Comment:

Claimed this issue and marked it as having a patch (assume a PR is
considered a "patch" for tracking/reviewing purposes)

--
Ticket URL: <https://code.djangoproject.com/ticket/28231#comment:7>

Django

unread,
May 27, 2017, 9:06:42 PM5/27/17
to django-...@googlegroups.com
#28231: Avoid iterating `objs` more than necessary in QuerySet.bulk_create() when
`batch_size` is provided
-------------------------------------+-------------------------------------

Reporter: Nir Izraeli | Owner: Nir
Type: | Izraeli
Cleanup/optimization | Status: assigned
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim Graham):

* stage: Unreviewed => Accepted


--
Ticket URL: <https://code.djangoproject.com/ticket/28231#comment:8>

Django

unread,
Jun 7, 2017, 5:05:30 PM6/7/17
to django-...@googlegroups.com
#28231: Document that QuerySet.bulk_create() casts objs to a list

-------------------------------------+-------------------------------------
Reporter: Nir Izraeli | Owner: Nir
Type: | Izraeli
Cleanup/optimization | Status: assigned
Component: Documentation | Version: 1.11

Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim Graham):

* has_patch: 1 => 0
* component: Database layer (models, ORM) => Documentation


Comment:

I guess this is a duplicate of #26400 (closed as wontfix). François
Freitag's comment on the PR: "I would rather document the current behavior
and the reason why the iterator has to be consumed. I suggest going to the
django-developers mailing list to see if others are interested."

Tentatively reclassifying as a documentation in light of this.

--
Ticket URL: <https://code.djangoproject.com/ticket/28231#comment:9>

Django

unread,
Oct 25, 2017, 1:55:05 PM10/25/17
to django-...@googlegroups.com
#28231: Document that QuerySet.bulk_create() casts objs to a list
-------------------------------------+-------------------------------------
Reporter: Nir Izraeli | Owner: Botond
Type: | Béres

Cleanup/optimization | Status: assigned
Component: Documentation | Version: 1.11
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Botond Béres):

* owner: Nir Izraeli => Botond Béres


--
Ticket URL: <https://code.djangoproject.com/ticket/28231#comment:10>

Django

unread,
Oct 25, 2017, 2:07:47 PM10/25/17
to django-...@googlegroups.com
#28231: Document that QuerySet.bulk_create() casts objs to a list
-------------------------------------+-------------------------------------
Reporter: Nir Izraeli | Owner: Botond
Type: | Béres
Cleanup/optimization | Status: assigned
Component: Documentation | Version: 1.11
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Botond Béres):

Opened [https://github.com/django/django/pull/9286 PR 9286] to document
this behaviour, as explained in the discussion from
[https://github.com/django/django/pull/8540 PR 8540]

--
Ticket URL: <https://code.djangoproject.com/ticket/28231#comment:11>

Django

unread,
Oct 25, 2017, 2:08:16 PM10/25/17
to django-...@googlegroups.com
#28231: Document that QuerySet.bulk_create() casts objs to a list
-------------------------------------+-------------------------------------
Reporter: Nir Izraeli | Owner: Botond
Type: | Béres
Cleanup/optimization | Status: assigned
Component: Documentation | Version: 1.11
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Botond Béres):

* cc: Botond Béres (added)


* has_patch: 0 => 1


--
Ticket URL: <https://code.djangoproject.com/ticket/28231#comment:12>

Django

unread,
Oct 27, 2017, 3:36:43 PM10/27/17
to django-...@googlegroups.com
#28231: Document that QuerySet.bulk_create() casts objs to a list
-------------------------------------+-------------------------------------
Reporter: Nir Izraeli | Owner: Botond
Type: | Béres
Cleanup/optimization | Status: assigned
Component: Documentation | Version: 1.11
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim Martin):

* cc: Tim Martin (added)
* stage: Accepted => Ready for checkin


Comment:

Looks good to me.

--
Ticket URL: <https://code.djangoproject.com/ticket/28231#comment:13>

Django

unread,
Jan 12, 2018, 7:56:31 PM1/12/18
to django-...@googlegroups.com
#28231: Document that QuerySet.bulk_create() casts objs to a list
-------------------------------------+-------------------------------------
Reporter: Nir Izraeli | Owner: Botond
Type: | Béres
Cleanup/optimization | Status: closed
Component: Documentation | Version: 1.11
Severity: Normal | Resolution: fixed

Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim Graham <timograham@…>):

* status: assigned => closed
* resolution: => fixed


Comment:

In [changeset:"52aa26e6979ba81b00f1593d5ee8c5c73aaa6391" 52aa26e]:
{{{
#!CommitTicketReference repository=""
revision="52aa26e6979ba81b00f1593d5ee8c5c73aaa6391"
Fixed #28231 -- Doc'd that QuerySet.bulk_create() casts objs to a list.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/28231#comment:14>

Django

unread,
Jan 12, 2018, 7:56:55 PM1/12/18
to django-...@googlegroups.com
#28231: Document that QuerySet.bulk_create() casts objs to a list
-------------------------------------+-------------------------------------
Reporter: Nir Izraeli | Owner: Botond
Type: | Béres
Cleanup/optimization | Status: closed
Component: Documentation | Version: 1.11
Severity: Normal | Resolution: fixed
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tim Graham <timograham@…>):

In [changeset:"881f66bc55adb2cdbe2d52d6c2978e74c7e3a802" 881f66b]:
{{{
#!CommitTicketReference repository=""
revision="881f66bc55adb2cdbe2d52d6c2978e74c7e3a802"
[2.0.x] Fixed #28231 -- Doc'd that QuerySet.bulk_create() casts objs to a
list.

Backport of 52aa26e6979ba81b00f1593d5ee8c5c73aaa6391 from master
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/28231#comment:15>

Reply all
Reply to author
Forward
0 new messages