[Django] #32244: ORM inefficiency: ModelFormSet executes a single-object SELECT query per formset instance when saving/validating

Django

unread,

Dec 4, 2020, 3:14:09 PM12/4/20

to django-...@googlegroups.com

#32244: ORM inefficiency: ModelFormSet executes a single-object SELECT query per
formset instance when saving/validating
-------------------------------------+-------------------------------------
Reporter: Lushen Wu | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database | Version: 3.1
layer (models, ORM) |
Severity: Normal | Keywords: formsets
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
Conceptual summary of the issue:
Let's say we have a Django app with {{{Author}}} and {{{Book}}} models,
and use a {{{BookFormSet}}} to add / modify / delete books that are
created by a given {{{Author}}}. The problem is when the {{{BookFormSet}}}
is validated, {{{ModelChoiceField.to_python()}}} ends up calling
{{{self.queryset.get(id=123)}}} which results in a single-object SELECT
query for **each** book in the formset. That means if I want to update 15
books, Django performs 15 separate SELECT queries, which seems incredibly
inefficient. (Our actual app is an editor that can update any number of
objects in a single formset, e.g. 50+).

My failed attempts to solve this:
1. First I tried passing a queryset to the {{{BookFormSet}}}, i.e.
{{{formset = BookFormSet(data=request.POST,
queryset=Book.objects.filter(author=1))}}}, but the `ModelChoiceField`
still does its single-object SELECT queries.
2. Then I tried to see where the {{{ModelChoiceField}}} defines its
queryset, which seems to be in {{{BaseModelFormSet.add_fields()}}}. I
tried initiating the {{{ModelChoiceField}}} with the same queryset that I
passed to the formset, e.g. {{{Book.objects.filter(author=1)}}} instead of
the original code which would be
{{{Book._default_manager.get_queryset()}}}. But this doesn't help because
I guess the new queryset I defined isn't actually linked to what was
passed to the formset (and we don't have a cache running). So the multiple
SELECT queries still happen. (Note: I realize
{{{_default_manager.get_queryset}}} is necessary for use cases where you
would want to switch one Model instance for another one which might not be
in the original queryset passed to the {{{BaseModelFormset}}}, but this is
not our use case)
3. I noticed that {{{BaseFormSet._existing_object()}}} provides a way to
check whether an object exists in the queryset that was giving to the
FormSet constructor, which means that queryset is evaluated at most once
and the results stored in {{{BaseFormSet._object_dict}}}. I thought there
might be some way to have {{{ModelChoiceField.to_python()}}} do something
similar before calling {{{self.queryset.get(id=123)}}}, but I don't think
{{{ModelChoiceField}}} is aware of {{{BaseFormSet}}}, and it would seem an
anti-pattern to reach up the hierarchy like this.

The easiest solution seems to me to pass {{{BaseFormSet._object_dict}}} in
some way to each {{{ModelForm}}} that's created, and then allow the
{{{ModelChoiceField}}} to check {{{_object_dict}}} before making another
SELECT query.

--
Ticket URL: <https://code.djangoproject.com/ticket/32244>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,

Dec 4, 2020, 3:17:30 PM12/4/20

to django-...@googlegroups.com

#32244: ORM inefficiency: ModelFormSet executes a single-object SELECT query per
formset instance when saving/validating
-------------------------------------+-------------------------------------
Reporter: Lushen Wu | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Description changed by Lushen Wu:

Old description:

New description:

Conceptual summary of the issue:
Let's say we have a Django app with {{{Author}}} and {{{Book}}} models,

and use a {{{BookFormSet}}} to add / modify / delete books that belong to

a given {{{Author}}}. The problem is when the {{{BookFormSet}}} is
validated, {{{ModelChoiceField.to_python()}}} ends up calling
{{{self.queryset.get(id=123)}}} which results in a single-object SELECT
query for **each** book in the formset. That means if I want to update 15
books, Django performs 15 separate SELECT queries, which seems incredibly
inefficient. (Our actual app is an editor that can update any number of
objects in a single formset, e.g. 50+).

My failed attempts to solve this:
1. First I tried passing a queryset to the {{{BookFormSet}}}, i.e.
{{{formset = BookFormSet(data=request.POST,
queryset=Book.objects.filter(author=1))}}}, but the `ModelChoiceField`
still does its single-object SELECT queries.
2. Then I tried to see where the {{{ModelChoiceField}}} defines its
queryset, which seems to be in {{{BaseModelFormSet.add_fields()}}}. I
tried initiating the {{{ModelChoiceField}}} with the same queryset that I
passed to the formset, e.g. {{{Book.objects.filter(author=1)}}} instead of
the original code which would be
{{{Book._default_manager.get_queryset()}}}. But this doesn't help because
I guess the new queryset I defined isn't actually linked to what was
passed to the formset (and we don't have a cache running). So the multiple
SELECT queries still happen. (Note: I realize

{{{_default_manager.get_queryset()}}} might be necessary in cases where
the formset can be used to switch one Model instance to another instance

which might not be in the original queryset passed to the
{{{BaseModelFormset}}}, but this is not our use case)
3. I noticed that {{{BaseFormSet._existing_object()}}} provides a way to
check whether an object exists in the queryset that was giving to the
FormSet constructor, which means that queryset is evaluated at most once
and the results stored in {{{BaseFormSet._object_dict}}}. I thought there
might be some way to have {{{ModelChoiceField.to_python()}}} do something
similar before calling {{{self.queryset.get(id=123)}}}, but I don't think
{{{ModelChoiceField}}} is aware of {{{BaseFormSet}}}, and it would seem an
anti-pattern to reach up the hierarchy like this.

The easiest solution seems to me to pass {{{BaseFormSet._object_dict}}} in
some way to each {{{ModelForm}}} that's created, and then allow the

{{{ModelChoiceField}}} to check this {{{_object_dict}}} before making
another SELECT query.

--

--
Ticket URL: <https://code.djangoproject.com/ticket/32244#comment:1>

Django

unread,

Dec 4, 2020, 3:23:04 PM12/4/20

to django-...@googlegroups.com

#32244: ORM inefficiency: ModelFormSet executes a single-object SELECT query per
formset instance when saving/validating
-------------------------------------+-------------------------------------
Reporter: Lushen Wu | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Description changed by Lushen Wu:

Old description:

> Conceptual summary of the issue:

> Let's say we have a Django app with {{{Author}}} and {{{Book}}} models,

> and use a {{{BookFormSet}}} to add / modify / delete books that belong to

> a given {{{Author}}}. The problem is when the {{{BookFormSet}}} is
> validated, {{{ModelChoiceField.to_python()}}} ends up calling
> {{{self.queryset.get(id=123)}}} which results in a single-object SELECT
> query for **each** book in the formset. That means if I want to update 15
> books, Django performs 15 separate SELECT queries, which seems incredibly
> inefficient. (Our actual app is an editor that can update any number of
> objects in a single formset, e.g. 50+).
>
> My failed attempts to solve this:
> 1. First I tried passing a queryset to the {{{BookFormSet}}}, i.e.
> {{{formset = BookFormSet(data=request.POST,
> queryset=Book.objects.filter(author=1))}}}, but the `ModelChoiceField`
> still does its single-object SELECT queries.
> 2. Then I tried to see where the {{{ModelChoiceField}}} defines its
> queryset, which seems to be in {{{BaseModelFormSet.add_fields()}}}. I
> tried initiating the {{{ModelChoiceField}}} with the same queryset that I
> passed to the formset, e.g. {{{Book.objects.filter(author=1)}}} instead
> of the original code which would be
> {{{Book._default_manager.get_queryset()}}}. But this doesn't help because
> I guess the new queryset I defined isn't actually linked to what was
> passed to the formset (and we don't have a cache running). So the
> multiple SELECT queries still happen. (Note: I realize

> {{{_default_manager.get_queryset()}}} might be necessary in cases where

> the formset can be used to switch one Model instance to another instance

> which might not be in the original queryset passed to the
> {{{BaseModelFormset}}}, but this is not our use case)
> 3. I noticed that {{{BaseFormSet._existing_object()}}} provides a way to
> check whether an object exists in the queryset that was giving to the
> FormSet constructor, which means that queryset is evaluated at most once
> and the results stored in {{{BaseFormSet._object_dict}}}. I thought there
> might be some way to have {{{ModelChoiceField.to_python()}}} do something
> similar before calling {{{self.queryset.get(id=123)}}}, but I don't think
> {{{ModelChoiceField}}} is aware of {{{BaseFormSet}}}, and it would seem
> an anti-pattern to reach up the hierarchy like this.
>
> The easiest solution seems to me to pass {{{BaseFormSet._object_dict}}}
> in some way to each {{{ModelForm}}} that's created, and then allow the

> {{{ModelChoiceField}}} to check this {{{_object_dict}}} before making
> another SELECT query.

New description:

Conceptual summary of the issue:
Let's say we have a Django app with {{{Author}}} and {{{Book}}} models,

and use a {{{BookFormSet}}} to add / modify / delete books that belong to

a given {{{Author}}}. The problem is when the {{{BookFormSet}}} is
validated, {{{ModelChoiceField.to_python()}}} ends up calling
{{{self.queryset.get(id=123)}}} which results in a single-object SELECT
query for **each** book in the formset. That means if I want to update 15
books, Django performs 15 separate SELECT queries, which seems incredibly
inefficient. (Our actual app is an editor that can update any number of
objects in a single formset, e.g. 50+).

My failed attempts to solve this:
1. First I tried passing a queryset to the {{{BookFormSet}}}, i.e.
{{{formset = BookFormSet(data=request.POST,
queryset=Book.objects.filter(author=1))}}}, but the `ModelChoiceField`
still does its single-object SELECT queries.
2. Then I tried to see where the {{{ModelChoiceField}}} defines its
queryset, which seems to be in {{{BaseModelFormSet.add_fields()}}}. I
tried initiating the {{{ModelChoiceField}}} with the same queryset that I
passed to the formset, e.g. {{{Book.objects.filter(author=1)}}} instead of
the original code which would be
{{{Book._default_manager.get_queryset()}}}. But this doesn't help because
I guess the new queryset I defined isn't actually linked to what was
passed to the formset (and we don't have a cache running). So the multiple
SELECT queries still happen. (Note: I realize

{{{_default_manager.get_queryset()}}} might be necessary in cases where

the formset can be used to switch one Model instance to another instance

which might not be in the original queryset passed to the
{{{BaseModelFormset}}}, but this is not our use case)

3. I noticed that {{{BaseFormSet._existing_object()}}} actually provides a

way to check whether an object exists in the queryset that was giving to

the formset constructor, which means that queryset is evaluated at most

once and the results stored in {{{BaseFormSet._object_dict}}}. I thought
there might be some way to have {{{ModelChoiceField.to_python()}}} do
something similar before calling {{{self.queryset.get(id=123)}}}, but I
don't think {{{ModelChoiceField}}} is aware of {{{BaseFormSet}}}, and it
would seem an anti-pattern to reach up the hierarchy like this.

The easiest solution seems to me to pass {{{BaseFormSet._object_dict}}} in
some way to each {{{ModelForm}}} that's created, and then allow the

{{{ModelChoiceField}}} to check this {{{_object_dict}}} before making
another SELECT query.

--

--
Ticket URL: <https://code.djangoproject.com/ticket/32244#comment:2>

Django

unread,

Dec 4, 2020, 3:29:57 PM12/4/20

to django-...@googlegroups.com

#32244: ORM inefficiency: ModelFormSet executes a single-object SELECT query per
formset instance when saving/validating
-------------------------------------+-------------------------------------
Reporter: Lushen Wu | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Description changed by Lushen Wu:

Old description:

> Conceptual summary of the issue:

> Let's say we have a Django app with {{{Author}}} and {{{Book}}} models,

> and use a {{{BookFormSet}}} to add / modify / delete books that belong to

> a given {{{Author}}}. The problem is when the {{{BookFormSet}}} is
> validated, {{{ModelChoiceField.to_python()}}} ends up calling
> {{{self.queryset.get(id=123)}}} which results in a single-object SELECT
> query for **each** book in the formset. That means if I want to update 15
> books, Django performs 15 separate SELECT queries, which seems incredibly
> inefficient. (Our actual app is an editor that can update any number of
> objects in a single formset, e.g. 50+).
>
> My failed attempts to solve this:
> 1. First I tried passing a queryset to the {{{BookFormSet}}}, i.e.
> {{{formset = BookFormSet(data=request.POST,
> queryset=Book.objects.filter(author=1))}}}, but the `ModelChoiceField`
> still does its single-object SELECT queries.
> 2. Then I tried to see where the {{{ModelChoiceField}}} defines its
> queryset, which seems to be in {{{BaseModelFormSet.add_fields()}}}. I
> tried initiating the {{{ModelChoiceField}}} with the same queryset that I
> passed to the formset, e.g. {{{Book.objects.filter(author=1)}}} instead
> of the original code which would be
> {{{Book._default_manager.get_queryset()}}}. But this doesn't help because
> I guess the new queryset I defined isn't actually linked to what was
> passed to the formset (and we don't have a cache running). So the
> multiple SELECT queries still happen. (Note: I realize

> {{{_default_manager.get_queryset()}}} might be necessary in cases where

> the formset can be used to switch one Model instance to another instance

> which might not be in the original queryset passed to the
> {{{BaseModelFormset}}}, but this is not our use case)

> 3. I noticed that {{{BaseFormSet._existing_object()}}} actually provides

> a way to check whether an object exists in the queryset that was giving

> to the formset constructor, which means that queryset is evaluated at

> most once and the results stored in {{{BaseFormSet._object_dict}}}. I
> thought there might be some way to have
> {{{ModelChoiceField.to_python()}}} do something similar before calling
> {{{self.queryset.get(id=123)}}}, but I don't think {{{ModelChoiceField}}}
> is aware of {{{BaseFormSet}}}, and it would seem an anti-pattern to reach
> up the hierarchy like this.
>
> The easiest solution seems to me to pass {{{BaseFormSet._object_dict}}}
> in some way to each {{{ModelForm}}} that's created, and then allow the

> {{{ModelChoiceField}}} to check this {{{_object_dict}}} before making
> another SELECT query.

New description:

Conceptual summary of the issue:
Let's say we have a Django app with {{{Author}}} and {{{Book}}} models,

and use a {{{BookFormSet}}} to add / modify / delete books that belong to

a given {{{Author}}}. The problem is when the {{{BookFormSet}}} is
validated, {{{ModelChoiceField.to_python()}}} ends up calling
{{{self.queryset.get(id=123)}}} which results in a single-object SELECT
query for **each** book in the formset. That means if I want to update 15
books, Django performs 15 separate SELECT queries, which seems incredibly
inefficient. (Our actual app is an editor that can update any number of
objects in a single formset, e.g. 50+).

My failed attempts to solve this:
1. First I tried passing a queryset to the {{{BookFormSet}}}, i.e.
{{{formset = BookFormSet(data=request.POST,
queryset=Book.objects.filter(author=1))}}}, but the `ModelChoiceField`
still does its single-object SELECT queries.
2. Then I tried to see where the {{{ModelChoiceField}}} defines its
queryset, which seems to be in {{{BaseModelFormSet.add_fields()}}}. I
tried initiating the {{{ModelChoiceField}}} with the same queryset that I
passed to the formset, e.g. {{{Book.objects.filter(author=1)}}} instead of
the original code which would be
{{{Book._default_manager.get_queryset()}}}. But this doesn't help because
I guess the new queryset I defined isn't actually linked to what was
passed to the formset (and we don't have a cache running). So the multiple
SELECT queries still happen. (Note: I realize

{{{_default_manager.get_queryset()}}} might be necessary in cases where

the formset can be used to switch one Model instance to another instance

which might not be in the original queryset passed to the
{{{BaseModelFormset}}}, but this is not our use case)

3. I noticed that {{{BaseModelFormSet._existing_object()}}} actually

provides a way to check whether an object exists in the queryset that was

giving to the formset constructor, which means that queryset is evaluated

at most once and the results stored in

{{{BaseModelFormSet._object_dict}}}. I thought there might be some way to

have {{{ModelChoiceField.to_python()}}} do something similar before
calling {{{self.queryset.get(id=123)}}}, but I don't think

{{{ModelChoiceField}}} is aware of {{{BaseModelFormSet}}}, and it would

seem an anti-pattern to reach up the hierarchy like this.

The easiest solution seems to me to pass

{{{BaseModelFormSet._object_dict}}} in some way to each {{{ModelForm}}}
that's created, and then allow the {{{ModelChoiceField}}} to check this

{{{_object_dict}}} before making another SELECT query.

--

--
Ticket URL: <https://code.djangoproject.com/ticket/32244#comment:3>

Django

unread,

Dec 4, 2020, 3:33:41 PM12/4/20

to django-...@googlegroups.com

#32244: ORM inefficiency: ModelFormSet executes a single-object SELECT query per
formset instance when saving/validating
-------------------------------------+-------------------------------------
Reporter: Lushen Wu | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Description changed by Lushen Wu:

Old description:

> Conceptual summary of the issue:

> Let's say we have a Django app with {{{Author}}} and {{{Book}}} models,

> and use a {{{BookFormSet}}} to add / modify / delete books that belong to

> a given {{{Author}}}. The problem is when the {{{BookFormSet}}} is
> validated, {{{ModelChoiceField.to_python()}}} ends up calling
> {{{self.queryset.get(id=123)}}} which results in a single-object SELECT
> query for **each** book in the formset. That means if I want to update 15
> books, Django performs 15 separate SELECT queries, which seems incredibly
> inefficient. (Our actual app is an editor that can update any number of
> objects in a single formset, e.g. 50+).
>
> My failed attempts to solve this:
> 1. First I tried passing a queryset to the {{{BookFormSet}}}, i.e.
> {{{formset = BookFormSet(data=request.POST,
> queryset=Book.objects.filter(author=1))}}}, but the `ModelChoiceField`
> still does its single-object SELECT queries.
> 2. Then I tried to see where the {{{ModelChoiceField}}} defines its
> queryset, which seems to be in {{{BaseModelFormSet.add_fields()}}}. I
> tried initiating the {{{ModelChoiceField}}} with the same queryset that I
> passed to the formset, e.g. {{{Book.objects.filter(author=1)}}} instead
> of the original code which would be
> {{{Book._default_manager.get_queryset()}}}. But this doesn't help because
> I guess the new queryset I defined isn't actually linked to what was
> passed to the formset (and we don't have a cache running). So the
> multiple SELECT queries still happen. (Note: I realize

> {{{_default_manager.get_queryset()}}} might be necessary in cases where

> the formset can be used to switch one Model instance to another instance

> which might not be in the original queryset passed to the
> {{{BaseModelFormset}}}, but this is not our use case)

> 3. I noticed that {{{BaseModelFormSet._existing_object()}}} actually

> provides a way to check whether an object exists in the queryset that was

> giving to the formset constructor, which means that queryset is evaluated

> at most once and the results stored in

> {{{BaseModelFormSet._object_dict}}}. I thought there might be some way to

> have {{{ModelChoiceField.to_python()}}} do something similar before
> calling {{{self.queryset.get(id=123)}}}, but I don't think

> {{{ModelChoiceField}}} is aware of {{{BaseModelFormSet}}}, and it would

> seem an anti-pattern to reach up the hierarchy like this.
>
> The easiest solution seems to me to pass

> {{{BaseModelFormSet._object_dict}}} in some way to each {{{ModelForm}}}
> that's created, and then allow the {{{ModelChoiceField}}} to check this

> {{{_object_dict}}} before making another SELECT query.

New description:

Conceptual summary of the issue:
Let's say we have a Django app with {{{Author}}} and {{{Book}}} models,

and use a {{{BookFormSet}}} to add / modify / delete books that belong to

a given {{{Author}}}. The problem is when the {{{BookFormSet}}} is
validated, {{{ModelChoiceField.to_python()}}} ends up calling
{{{self.queryset.get(id=123)}}} which results in a single-object SELECT
query for **each** book in the formset. That means if I want to update 15
books, Django performs 15 separate SELECT queries, which seems incredibly
inefficient. (Our actual app is an editor that can update any number of
objects in a single formset, e.g. 50+).

My failed attempts to solve this:
1. First I tried passing a queryset to the {{{BookFormSet}}}, i.e.
{{{formset = BookFormSet(data=request.POST,
queryset=Book.objects.filter(author=1))}}}, but the `ModelChoiceField`
still does its single-object SELECT queries.
2. Then I tried to see where the {{{ModelChoiceField}}} defines its
queryset, which seems to be in {{{BaseModelFormSet.add_fields()}}}. I
tried initiating the {{{ModelChoiceField}}} with the same queryset that I
passed to the formset, e.g. {{{Book.objects.filter(author=1)}}} instead of
the original code which would be
{{{Book._default_manager.get_queryset()}}}. But this doesn't help because
I guess the new queryset I defined isn't actually linked to what was

passed to the formset (and already evaluated). So the multiple SELECT

queries still happen. (Note: I realize

{{{_default_manager.get_queryset()}}} might be necessary in cases where

the formset can be used to switch one Model instance to another instance

which might not be in the original queryset passed to the
{{{BaseModelFormset}}}, but this is not our use case)

3. I noticed that {{{BaseModelFormSet._existing_object()}}} actually

provides a way to check whether an object exists in the queryset that was

giving to the formset constructor, which means that queryset is evaluated

at most once and the results stored in

{{{BaseModelFormSet._object_dict}}}. I thought there might be some way to

have {{{ModelChoiceField.to_python()}}} do something similar before
calling {{{self.queryset.get(id=123)}}}, but I don't think

{{{ModelChoiceField}}} is aware of {{{BaseModelFormSet}}}, and it would

seem an anti-pattern to reach up the hierarchy like this.

The easiest solution seems to me to pass

{{{BaseModelFormSet._object_dict}}} in some way to each {{{ModelForm}}}
that's created, and then allow the {{{ModelChoiceField}}} to check this

{{{_object_dict}}} before making another SELECT query.

Even if this could be solved with some form of DB caching, it still seems
inefficient application-layer logic.

--

--
Ticket URL: <https://code.djangoproject.com/ticket/32244#comment:4>

Django

unread,

Dec 4, 2020, 4:19:16 PM12/4/20

to django-...@googlegroups.com

#32244: ORM inefficiency: ModelFormSet executes a single-object SELECT query per
formset instance when saving/validating
-------------------------------------+-------------------------------------
Reporter: Lushen Wu | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Lushen Wu):

Is this a more fundamental aspect of Django ORM behavior? In that if I
have {{{qs = Book.objects.filter(author=1)}}} (let's say this gives back 3
Books with id = 1,2,3), and then evaluate it -- i.e. {{{len(list(qs))}}}
-- and then try {{{book2 = qs.get(id=2)}}} this will still run a SELECT
query even though book2 was already fetched when the original queryset was
evaluated

--
Ticket URL: <https://code.djangoproject.com/ticket/32244#comment:5>

Django

unread,

Dec 5, 2020, 6:35:34 PM12/5/20

to django-...@googlegroups.com

#32244: ORM inefficiency: ModelFormSet executes a single-object SELECT query per
formset instance when saving/validating
-------------------------------------+-------------------------------------
Reporter: Lushen Wu | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Lushen Wu):

I figured out a very hacky workaround that preserves the formset
functionality we need. Basically in {{{BookForm}}} you can override
{{{ModelForm._clean_fields()}}}

{{{
def _clean_fields(self):
# remove 'id' field so it's not cleaned
# (this is the ModelChoiceField that's generating an extra SELECT
query per Book object in the formset during clean)
id_field = self.fields.pop('id')

# run normal cleaning, with the form now unaware of its own 'id'
field
super(BookForm, self)._clean_fields()

# add 'id' field back and insert it into clean_data
id_value = id_field.widget.value_from_datadict(self.data,
self.files, self.add_prefix('id'))
self.cleaned_data['id'] = id_value

# add the 'id' field back into the form
self.fields['id'] = id_field
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32244#comment:6>

Django

unread,

Dec 8, 2020, 3:16:32 AM12/8/20

to django-...@googlegroups.com

#32244: ORM inefficiency: ModelFormSet executes a single-object SELECT query per
formset instance when saving/validating
-------------------------------------+-------------------------------------
Reporter: Lushen Wu | Owner: nobody

Type: | Status: closed

Cleanup/optimization |
Component: Database layer | Version: 3.1
(models, ORM) |

Severity: Normal | Resolution: wontfix

Keywords: formsets | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Changes (by Carlton Gibson):

* status: new => closed
* resolution: => wontfix

Comment:

Hi. Thanks for the report.

It looks like you'e dug into the various aspects.

This seems to make the most sense to me:

> The easiest solution seems to me to pass BaseModelFormSet._object_dict

in some way to each ModelForm that's created, and then allow the

ModelChoiceField to check this _object_dict before making another SELECT
query.

You'd pass a reference via the form kwargs, and then need a custom form
and `ModelChoiceField` subclass to make use of it in `to_python()`.

I have to say this is probably a `wontfix`. It's not likely that this is a
performance bottleneck for most projects. Even several hundred lookups are
not going to noticeably affect the form submission in normal cases.
Projects needing to process data in a performant tight-loop will want to
make adjustments as discusses here, but the complication there probably
isn't worth it for the general case.

If you experiment in your own project and come up with a clean and simple
solution, it may be worth reviewing then, either here or on the
DevelopersMailingList.

I hope that makes sense.
Thanks.

--
Ticket URL: <https://code.djangoproject.com/ticket/32244#comment:7>

Django

unread,

Dec 8, 2020, 2:49:10 PM12/8/20

to django-...@googlegroups.com

#32244: ORM inefficiency: ModelFormSet executes a single-object SELECT query per
formset instance when saving/validating
-------------------------------------+-------------------------------------
Reporter: Lushen Wu | Owner: nobody

Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Lushen Wu):

Thanks for taking a look! I understand that the issue is probably not a
performance bottle neck in most projects, but if it truly is an extra SQL
query translating to millions of unnecessary roundtrips across all Django
deployments (and the fix would be relatively straightforward), I feel like
it still deserves some consideration. I'd be willing to contribute to a
patch if helpful.

Conceptually I'm not exactly sure why ModelChoiceField needs a Python
representation of the value to validate. I understand why this would be
needed for e.g. a CharField with a max_length requirement. But if the
ModelChoiceField value passed from the form is a PK, can it be validated
without loading the entire instance into memory? (If fetching the full
Python object is required for custom validators, the to_python() call and
SELECT query could be deferred until such a case is encountered.)

Currently the validation callstack goes BaseForm.full_clean() ->
BaseForm._clean_fields(value) -> Field.clean(value) ->
ModelChoiceField.to_python(value).

From what I can see, ModelChoiceField does not override Field.validate()
or Field.run_validators(). So the only validation step performed by
ModelChoiceField basically checks whether the Book instance generated by
Field.to_python() belongs to ModelChoiceField._get_choices() <-- which by
default is a ModelChoiceIterator for all Book objects.

Is there a "lighter" way of doing this that simply checks whether the
ModelChoiceField value (as a PK) exists in the corresponding model table?
If the SELECT query is simply a side effect of ModelChoiceField inheriting
Field.clean(), then I think it would be cleaner design to have
ModelChoiceField.clean() override Field.clean() and avoid calling
to_python() in the default validation loop. And as mentioned fetch the
object only if needed for custom validators.

Thanks again!

------------------------------

P.S. The kw_args workaround you suggested seems to be the cleanest in the
meantime if this stays a wontfix.

--
Ticket URL: <https://code.djangoproject.com/ticket/32244#comment:8>

Django

unread,

Nov 15, 2021, 6:02:24 PM11/15/21

to django-...@googlegroups.com

#32244: ORM inefficiency: ModelFormSet executes a single-object SELECT query per
formset instance when saving/validating
-------------------------------------+-------------------------------------
Reporter: Lushen Wu | Owner: nobody

Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by powderflask):

Thanks for this discussion - I have the same issue with an app where user
may update a formset with >100 records in a single post. Like OP, it's
not the performance issue per say that bugs me, just the wasted cycles and
DB hits.

That `ModelChoiceField` is added to the form dynamically by
`BaseModelFormSet.add_fields(self, form, index)` so that every form has a
pk field:

{{{
form.fields[self._pk_field.name] = ModelChoiceField(qs,
initial=pk_value, required=False, widget=widget)
}}}

I ** think ** what we want that field to validate here is that the pk
value in the post data matches the form's instance, (or that the pk value
is None if form has no instance) i.e., we need to verify that no one
tampered with the form's pk data on the round trip.

Might this approach work:
- define a `ModelPkField` intended specifically to define a read-only pk
field.
- validation as above, no DB hit
- ModelFormSet uses this field type to define these hidden pk fields,
and returns the form's original instance if it validates

I don't mind having a kick at this can, but looking for some advice on
whether this is even a reasonable idea?

--
Ticket URL: <https://code.djangoproject.com/ticket/32244#comment:9>

Django

unread,

Nov 15, 2021, 8:49:07 PM11/15/21

to django-...@googlegroups.com

#32244: ORM inefficiency: ModelFormSet executes a single-object SELECT query per
formset instance when saving/validating
-------------------------------------+-------------------------------------
Reporter: Lushen Wu | Owner: nobody

Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by powderflask):

Here's a gist with a first crack at the idea I proposed above:
https://gist.github.com/powderflask/de0d3d6c6bef6c1b3890bdc1e83ff05c

It worries me that this code is so much simpler than the original.
Also, it may be that some client code relies on the private behaviour and
is expecting the pk form field to be ModelChoiceField.
So I suspect a robust solution will need to handle more use-cases, but...

This code seems to work for the use cases I tested, saving formsets
without any extra DB hits and flagging validation errors if the post data
is tampered with.

If you want to try this, simply pass the EfficientModelFormSet into
modelformset_factory:

{{{
modelformset_factory( ... , formset=EfficientModelFormSet)
}}}

If I get some feedback here, I can work up some test cases and turn this
into a pull request.

--
Ticket URL: <https://code.djangoproject.com/ticket/32244#comment:10>

Django

unread,

Nov 16, 2021, 3:46:34 AM11/16/21

to django-...@googlegroups.com

#32244: ORM inefficiency: ModelFormSet executes a single-object SELECT query per
formset instance when saving/validating
-------------------------------------+-------------------------------------
Reporter: Lushen Wu | Owner: nobody

Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Carlton Gibson):

Hi @powderflask

It looks a bit like the Admin's `raw_id_fields` —
[https://github.com/django/django/blob/12fe3224f5086161462faf614cad91f3fad32e78/django/contrib/admin/widgets.py#L120
which is widget based].

I gave a [https://www.youtube.com/watch?v=e52S1SjuUeM talk at DjangoCon
Europe 2020 on optimising ModelChoiceField] — the basic idea in the
`FormSet` case being to set the `choices` directly in order to avoid
regenerating them for each form. Take a look at that: does it address your
issue?

--
Ticket URL: <https://code.djangoproject.com/ticket/32244#comment:11>

Django

unread,

Nov 17, 2021, 1:01:22 AM11/17/21

to django-...@googlegroups.com

#32244: ORM inefficiency: ModelFormSet executes a single-object SELECT query per
formset instance when saving/validating
-------------------------------------+-------------------------------------
Reporter: Lushen Wu | Owner: nobody

Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by powderflask):

Replying to [comment:11 Carlton Gibson]:
@Carlton - thanks for the reply, for the interesting talk, and for all the
work you do maintaining django. Awesome.
I regularly run into the issues you cover in your talk, and have applied
all 3 solutions at various times - it was nice to see them all laid out,
and I'll be glad for the menu of options in future.

**But that's not at all the issue here.**

This issue is about how django validates the hidden pk field automatically
added to each form by `BaseModelFormSet`.
That hidden field is represented as a `ModelChoiceField`, so it hits the
DB during validation, but for zero effect as far as I can tell - the form
already has the instance. It's just a side-effect of the way
`ModelChoiceField` (correctly) validates data that isn't needed here.

What I ** think ** I see here is that using ModelChoiceField has several
potential drawbacks:

1. when saving a large modelformset, one redundant query is done for each
record;

2. if the form's pk data does get messed up (say by a buggy piece of JS),
the behaviour of modelforset is a bit erratic and it may do any of these 3
things, all of which are unexpected (in my mind):

a. if the form's pk data is not a valid value for the model's pk
field, form validation fails but the error is attached to the hidden pk
field, so is unlikely to be displayed to the user. Worse, the hidden pk
field now has no value , but the rest of the form data looks the same to
the end user. If the user just presses save again, a new instance is
created with the form data - creating a kind-of duplicate of the original
data.

b. if the form's pk data is a valid value, but not in the formset's
queryset, the form validates BUT the pk field for the instance is set to
None. Thus, the save logic proceeds, and again, a new instance is
created.

c. if the bad pk data value happens to duplicate a pk that **is** in
the queryset, the form validates and the save logic may overwrite one
record with the other record's data, resulting in potential ** data loss
**.

Now, i understand getting bum pk values back in a formset is a strange
edge case and means something went horribly wrong in the round-trip. BUT
the framework should provide consistent, predictable behaviour, and it
should not be making spurious DB queries for no good reason.

I posted a gist above to illustrate a concept I'm thinking of - I'm sure
I'm missing some important use-cases, just looking for advice on whether
this is even worth pursuing.
thanks again for taking an interest. Hope I made some sense here.

--
Ticket URL: <https://code.djangoproject.com/ticket/32244#comment:12>

Django

unread,

Nov 17, 2021, 1:22:46 AM11/17/21

to django-...@googlegroups.com

#32244: ORM inefficiency: ModelFormSet executes a single-object SELECT query per
formset instance when saving/validating
-------------------------------------+-------------------------------------
Reporter: Lushen Wu | Owner: nobody

Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by powderflask):

Replying to [comment:12 powderflask]:
Correcting myself - some of the unexpected behaviors described above
result from logic in BaseModelFormSet._construct_form and are not
addressed by the concept I proposed. As I investigated, scope broadened
and that logic IS logically part of validating that inserted pk field.
Don't want to create confusion though.

--
Ticket URL: <https://code.djangoproject.com/ticket/32244#comment:13>

Django

unread,

Nov 17, 2021, 3:22:58 AM11/17/21

to django-...@googlegroups.com

#32244: ORM inefficiency: ModelFormSet executes a single-object SELECT query per
formset instance when saving/validating
-------------------------------------+-------------------------------------
Reporter: Lushen Wu | Owner: nobody

Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Carlton Gibson):

Hi powderflask, thanks for the follow-up.

> This issue is about how django **validates** the hidden pk field
automatically
> added to each form by BaseModelFormSet. That hidden field is represented
as a
> ModelChoiceField, so it hits the DB during validation, but for zero

effect as
> far as I can tell - the form already has the instance. It's just a side-
effect
> of the way ModelChoiceField (correctly) validates data that isn't needed
here.
>

> What I think I see here is that using ModelChoiceField has several

potential
> drawbacks:
>
> 1. when saving a large modelformset, one redundant query is done for
each record;

But the lookup is **not** redundant… — it's a ModelChoiceField: it has a
QuerySet of possible valid values, and in validation we need to check that
the submitted value is in the QuerySet.

The reason it's evaluated per-row is because we go to great lengths to
avoid stale QuerySets. Hence the suggestion in the talk to generate
`choices` upfront if you want to avoid that repeated work.

> …but for zero effect as far as I can tell - the form already has the
instance…

I haven't understood exactly your point here… To wit, how about you put
together a very minimal sample project with a test case showing the exact
behaviour you think is wrong? (Here's
[https://github.com/carltongibson/issue31262 an example I did for another
issue].) It's much clearer with code in hand, and a "At this breakpoint()
the behaviour should be X".

You're always welcome to use a custom field in your own project, but I
don't suspect there's a change on the cards here. A search of the Trac
history shows ModelChoiceField has been through a lot. It's the way it is
for good reasons. However, always happy to have a look.

I hope that makes sense. :)

--
Ticket URL: <https://code.djangoproject.com/ticket/32244#comment:14>

Django

unread,

Oct 5, 2022, 9:54:21 PM10/5/22

to django-...@googlegroups.com

#32244: ORM inefficiency: ModelFormSet executes a single-object SELECT query per
formset instance when saving/validating
-------------------------------------+-------------------------------------
Reporter: Lushen Wu | Owner: nobody

Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by powderflask):

Thanks Carlton for your serious consideration of this issue and deep
explanations - it's very kind of you to spend time on this.
I hit this issue for a 3rd time, so had another chance to think it
through, but still don't see where I've misunderstood anything
fundamental...

But the lookup is **not** redundant… — it's a ModelChoiceField: it has
a QuerySet of possible valid values, and in validation we need to check
that the submitted value is in the QuerySet.

You are right in general - a ModelChoiceField needs a QuerySet to
validate. What I think I'm arguing is that hidden PK field should not be
a ModelChoiceField at all - the overhead is not needed.

Here's why I think the Queryset / ModelChoiceField is redundant and could
be replaced by a simpler type of field:

1. this is a hidden field precisely because the PK field can't be edited.
In fact, it should not be valid for that hidden field to have any value
other than None (for new forms) or the PK of the form instance. Any
other value should raise a validation error.

2. so we don't need a QuerySet to do the validation.
- If the hidden PK field is empty, great, it's a new instance,
nothing to validate;
- otherwise, it's value MUST match the form's instance - if it
doesn't, something has gone horribly wrong in any case.

Perfectly possible I've just missed some crucial use case or fact about
how model formsets are validated. But I've been through this 3 times now
for 3 different use cases, and come to the same conclusion each time -
valid values for the PK are already known at validation time, so the DB
access to fetch the Queryset is just pure overhead (for reference, I have
forms where >500 entries are being submitted, so the DB lag is non-
trivial).

thanks again - sorry to be a dog with a bone :-P

--
Ticket URL: <https://code.djangoproject.com/ticket/32244#comment:15>

Django

unread,

Oct 6, 2022, 3:41:29 AM10/6/22

to django-...@googlegroups.com

#32244: ORM inefficiency: ModelFormSet executes a single-object SELECT query per
formset instance when saving/validating
-------------------------------------+-------------------------------------
Reporter: Lushen Wu | Owner: nobody

Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Carlton Gibson):

Hi powderflask — no problem. :)

> Here's why I think the Queryset / ModelChoiceField is redundant and

could be replaced by a simpler type of field...

I think the way forward here would be for you to provide a proof-of-
concept. Then folks could assess properly whether it were viable.

> … for reference, I have forms where >500 entries are being submitted, so

the DB lag is non-trivial

Sure. It's in this case that I provide pre-calculated `choices` to avoid
the overhead (as per above).

--
Ticket URL: <https://code.djangoproject.com/ticket/32244#comment:16>

Django

unread,

Nov 17, 2025, 8:45:08 PM11/17/25

to django-...@googlegroups.com

#32244: ORM inefficiency: ModelFormSet executes a single-object SELECT query per
formset instance when saving/validating
-------------------------------------+-------------------------------------
Reporter: Lushen Wu | Owner: nobody

Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by rowleyd):

I had the same questions as powderflask and Lushen Wu, so I was about to
try to resurrect this conversation with a code example, but I found some
answers in the process. So I'm leaving a comment that will hopefully
clarify the questions being asked, and save others the trouble I went
through to find some clarity on this topic.

The questions in this ticket are probably coming from this kind of
context: we have a ModelFormSet (let's say a list of Books), and we create
a queryset that we pass in to retrieve a list of 100+ Books from a remote
database and initialize the forms and their corresponding instances.
Everything is peachy until you POST the form, and notice that right after
it runs that query, it then runs a set of queries (one for each form in
the formset) re-retrieving all of the same instances that we already have
from the first query. It's confusing where all these extra, redundant
queries are coming from, and after some investigation, it turns out that
it's not something that I did wrong - they are the result of
formset.is_valid() attempting to "validate" a hidden field in each form
that represents the PK for each form's instance. So the questions in this
ticket are basically asking 1) why is this "validation" of the PK field
necessary, 2) why not just use the queryset I provided, and 3) how to
skip/avoid the latter set of seemingly unnecessary and redundant queries?

I think the misunderstanding at the heart of these questions is a simple
oversight: when the form is POSTed, the formset is initialized from the
POST data, **NOT** the queryset. Therefore, the hidden PK field validation
is necessary because the POST data might contain stale PK values (e.g.,
checking for forms whose model/instance no longer exists in the database).
So skipping the validation (e.g., Lushen Wu's solution above) should be
avoided since it opens the door to potentially worse issues. And the
answer to resolving the redundant queries is to skip the first query
if/when the form is POSTed (i.e., don't evaluate and pass a queryset into
BaseModelFormSet because it won't be used to initialize the formset forms
anyway).

So now we're back to where we started, with the original question of
whether there is a more efficient way to do the validation on the hidden
PK field, instead of executing a separate query for every form in the
formset. I'll just throw out a few thoughts since I have to move on after
spending too much time in this rabbit hole: Carlton suggested a workaround
(see the video he mentioned), but it didn't work for me. Looking back, I
think it was probably because the suggestion was to set 'choices' on the
ModelChoiceField in the ModelForm constructor; but in this case, the
hidden PK ModelChoiceField doesn't yet exist at that point (I think it's
added later in BaseModelFormset.add_fields()?). So maybe this would work
if we iterate over the forms and set 'choices' **after** add_fields() is
finished. But even if so, I think this is taking it a bit too far (i.e.,
using somewhat obscure hooks to tweak a hidden, internal ModelChoiceField
that the user didn't create and which they have little control over).
Since this affects practically every user out there of
modelformset_factory(), it might be worth making BaseModelFormSet
implement this general workaround under the hood by default. (i.e., reuse
the same queryset/choices for all of the hidden PK fields that it
creates?) Alternatively, if the user passes a queryset to
BaseModelFormSet, maybe it would make sense to use that queryset to
validate the hidden PK fields (i.e., limit the choices of the hidden PK
field to this given queryset)? If not, it could be helpful to generate an
error/warning that the user is passing in a queryset argument that will be
ignored.
--
Ticket URL: <https://code.djangoproject.com/ticket/32244#comment:17>

Reply all

Reply to author

Forward