[Django] #31295: required ModelChoiceField makes duplicate (cursor) queries to the database

169 views
Skip to first unread message

Django

unread,
Feb 21, 2020, 5:18:58 AM2/21/20
to django-...@googlegroups.com
#31295: required ModelChoiceField makes duplicate (cursor) queries to the database
------------------------------------------------+------------------------
Reporter: Aurélien Pardon | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Forms | Version: 2.2
Severity: Normal | Keywords: Model
Triage Stage: Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
------------------------------------------------+------------------------
ModelChoiceField use ModelChoiceIterator for its {{{ queryset }}}/{{{
self.choices }}}, which use {{{.iterator()}}} and doesn't cache the query
under some conditions.

If the field is required, the method use_required_attribute
(https://github.com/django/django/blob/master/django/forms/widgets.py#L689)
fetch the first choice, making a duplicate query to the database (worse
than a useless query, the data may have changed):
{{{#!python
class Select(ChoiceWidget):
[...]

def use_required_attribute(self, initial):
[...]
first_choice = next(iter(self.choices), None)
}}}


Disabling the use of {{{.iterator()}}} (by adding an arbitrary
{{{.prefetch_related}}} for example) leads to no duplicate queries.
https://github.com/django/django/blob/da79ee472d803963dc3ea81ee67767dc06068aac/django/forms/models.py#L1152
:
{{{#!python
class ModelChoiceIterator:
[...]

def __iter__(self):
[...]
# Can't use iterator() when queryset uses prefetch_related()
if not queryset._prefetch_related_lookups:
queryset = queryset.iterator()
}}}


One solution would be to add another test to the previous piece of code :
(https://github.com/django/django/blob/da79ee472d803963dc3ea81ee67767dc06068aac/django/forms/models.py#L1152)
:
{{{#!python
if not queryset._prefetch_related_lookups and not
self.field.required:
queryset = queryset.iterator()
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/31295>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Feb 21, 2020, 5:21:27 AM2/21/20
to django-...@googlegroups.com
#31295: required ModelChoiceField makes duplicate (cursor) queries to the database
-------------------------------------+-------------------------------------

Reporter: Aurélien Pardon | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Forms | Version: 2.2
Severity: Normal | Resolution:

Keywords: Model | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by Aurélien Pardon:

Old description:

New description:

{{{#!python
if not queryset._prefetch_related_lookups and not
self.field.required:
queryset = queryset.iterator()
}}}

--

--
Ticket URL: <https://code.djangoproject.com/ticket/31295#comment:1>

Django

unread,
Feb 25, 2020, 4:12:57 AM2/25/20
to django-...@googlegroups.com
#31295: required ModelChoiceField makes duplicate (cursor) queries to the database
-------------------------------------+-------------------------------------

Reporter: Aurélien Pardon | Owner: nobody
Type: | Status: closed
Cleanup/optimization |
Component: Forms | Version: 2.2
Severity: Normal | Resolution: wontfix

Keywords: Model | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Carlton Gibson):

* status: new => closed
* resolution: => wontfix


Comment:

Hi Aurélien,

Using the iterator was a deliberate design choice. (Placing your
suggestion in causes tests failure, including `test_choices_freshness` in
line with this.)

The use to point to in `Select.use_required_attribute()` shouldn't be
problematic, since
[https://github.com/django/django/blob/271fdab8b78af558238df51c64b4d1c8dd0792bb/django/forms/widgets.py#L555-L560
`self.choices` is fully evaluated in `ChoiceWidget.__init__()`], as per
the comment there, precisely to allow repeated iteration.

I think the underlying issue here is probably #22841.

--
Ticket URL: <https://code.djangoproject.com/ticket/31295#comment:2>

Django

unread,
Feb 25, 2020, 4:07:58 PM2/25/20
to django-...@googlegroups.com
#31295: required ModelChoiceField makes duplicate (cursor) queries to the database
---------------------------------+--------------------------------------

Reporter: Aurélien Pardon | Owner: nobody
Type: Bug | Status: new
Component: Forms | Version: 2.2
Severity: Normal | Resolution:

Keywords: Model | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+--------------------------------------
Changes (by Aurélien Pardon):

* status: closed => new
* type: Cleanup/optimization => Bug
* resolution: wontfix =>


Comment:

Hi Carlton and thanks for your answer but, all due respect, I think you
overlooked this bug.

First, this simple example on an empty project/database that I just tested
(Python 3.8.1, Django 2.2.10):
{{{#!python
class MyModel(models.Model):
my_field = models.CharField(max_length=1)

class MyForm(forms.Form):
my_form_field = forms.ModelChoiceField(MyModel.objects.all(), ,
empty_label=None)

my_form = MyForm()
type(my_form.widget.choices)
# >>> return <class 'django.forms.models.ModelChoiceIterator'>

my_form.as_ul()
len(connecton.queries)
# >>> return 2 (and it's two times "SELECT * from my_app.my_model")
}}}

Here is what I understand:
1) The code you sent, where {{{ChoiceWidget.choices}}} is evaluated and
saved into a {{{list}}} is executed at the initialization of the widget
(which is executed at the initialization of the field at the declaration
of the form).
2) But! a {{{ModelChoiceField}}} override {{{self.choices}}}
[https://github.com/django/django/blob/271fdab8b78af558238df51c64b4d1c8dd0792bb/django/forms/models.py#L1225-L1227
here] (it makes perfect sense: the choices have to be evaluated everytime
the form is instanciated).
3) As the choices are bundled in a
{{{ModelChoiceIterator}}}/{{{.iterator()}}}, when rendering the form,
there is a test about the {{{required}}} attribute that try to
[https://github.com/django/django/blob/ffcf1a8ebfbdc8e3afac84aed88d6ed29a57c72b/django/forms/widgets.py#L699
fetch the first value of the queryset]:


{{{#!python
class Select(ChoiceWidget):
"""[...]"""
def use_required_attribute(self, initial):
"""[...]"""
first_choice = next(iter(self.choices), None)
}}}

4) {{{ModelChoiceIterator.__iter__}}}
[https://github.com/django/django/blob/ffcf1a8ebfbdc8e3afac84aed88d6ed29a57c72b/django/forms/models.py#L1148-L1156
is called]. If {{{empty_label}}} is {{{not None}}}, everything is fine, as
the {{{next()}}} grabs the empty label, without additional request to the
database. But if {{{empty_label}}} is {{{None}}} (and if there is no
prefetch to the queryset), an additional request is made (code is changed
lightly to illustrate the bug):
{{{#!python
def __iter__(self):
if self.field.empty_label is not None:
yield ("", self.field.empty_label)
for obj in self.queryset.iterator():
yield self.choice(obj)
}}}

Now that I have made some tests on an empty project/database, I'm
confident enough to state that it's a **bug** because there is a useless
and additional request to the DB.

''Thanks for all the work you put into the projects in and around Django.
I hope I tested correctly, I do not want to waste your time.''

--
Ticket URL: <https://code.djangoproject.com/ticket/31295#comment:3>

Django

unread,
Feb 26, 2020, 4:01:26 AM2/26/20
to django-...@googlegroups.com
#31295: required ModelChoiceField makes duplicate (cursor) queries to the database
-------------------------------------+-------------------------------------

Reporter: Aurélien Pardon | Owner: nobody
Type: | Status: closed
Cleanup/optimization |
Component: Forms | Version: 2.2
Severity: Normal | Resolution: needsinfo

Keywords: Model | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Carlton Gibson):

* status: new => closed

* type: Bug => Cleanup/optimization
* resolution: => needsinfo


Comment:

Hi Aurélien,

Thanks for the follow-up. Really not a problem. Better to ''measure twice,
cut once'' and all that. :)

The iterator behaviour — refetching each time — was deliberate, and is
correct. I can't see an adjustment there that doesn't entail a fix to
#22841.
(Duplicate then...)

''Maybe you don't want that behaviour and a ModelChoiceIterator subclass
would work for you?'' But as soon as I think that, better would be a
Select widget subclass.

#27370 fixed the `Select.use_required_attribute()` behaviour. It
necessarily needs the first item for the `_choice_has_empty_value` check.
So, give the design of ModelChoiceIterator, fetching it isn't a Bug. (Yes
there's an extra query but correctness trumps performance.)

''Maybe'' there's a way of performing that
`Select.use_required_attribute()` check without fetching the first
item...? Would that be better than using a subclass for this kind of case?
(If I know my dataset I can just return `False` from
`use_required_attribute()`.)

I guess we could look at a patch there. I'm going to call this needsinfo
pending such though. I hope that makes sense.

--
Ticket URL: <https://code.djangoproject.com/ticket/31295#comment:4>

Django

unread,
Feb 26, 2020, 4:42:58 AM2/26/20
to django-...@googlegroups.com
#31295: required ModelChoiceField makes duplicate (cursor) queries to the database
-------------------------------------+-------------------------------------

Reporter: Aurélien Pardon | Owner: nobody
Type: | Status: closed
Cleanup/optimization |
Component: Forms | Version: 2.2
Severity: Normal | Resolution: needsinfo

Keywords: Model | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Aurélien Pardon):

Replying to [comment:4 Carlton Gibson]:


> (Yes there's an extra query but correctness trumps performance.)

How is that correct if the queries yield different results ?
It only works if the two requests are inside a transaction, for example
when {{{ATOMIC_REQUESTS}}} is set to {{{True}}} (which is not the
default).


For example, imagine this model (the primary key is important, it's the
value that the choice will use):
{{{#!python
class MyModel(models.Model):
my_field = models.CharField(max_length=1, primary_key=True)
}}}
If the first query (for the actual choices) returns {{{['a', 'b']}}}, but
the second one (for the required attribute) returns {{{['', 'b', 'c']}}},
then the required attribute will be used leading to invalid HTML.

--
Ticket URL: <https://code.djangoproject.com/ticket/31295#comment:5>

Django

unread,
Feb 26, 2020, 5:12:58 AM2/26/20
to django-...@googlegroups.com
#31295: required ModelChoiceField makes duplicate (cursor) queries to the database
-------------------------------------+-------------------------------------

Reporter: Aurélien Pardon | Owner: nobody
Type: | Status: closed
Cleanup/optimization |
Component: Forms | Version: 2.2
Severity: Normal | Resolution: needsinfo

Keywords: Model | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Carlton Gibson):

> How is that correct if the queries yield different results ?

I don't know what else I can say. It's part of the design of
ModelChoiceField that it re-evaluates it's queryset each time it's
iterated.
See #3534 for example, 13 years ago:

> ModelChoiceField and ModelMultipleChoiceField cache the output of their
queryset the first time self.choices is accessed. This is bad for long-
running processes, such as mod_python, because the cache gets stale. Plus,
it's bad saving all of those choices in memory. The attached unit tests
illustrate the problem.

The exact same memory usage considerations were the motivation for the
move to iterator() in #23623. I can't see that being removed.

If the kind of race conditions you're talking about are a factor for your
app then, at least until #22841 is addressed, maybe you do need a
ModelChoiceIterator subclass for your field. (Always use the QuerySet
rather than the iterator...)

--
Ticket URL: <https://code.djangoproject.com/ticket/31295#comment:6>

Django

unread,
Feb 26, 2020, 5:56:36 AM2/26/20
to django-...@googlegroups.com
#31295: required ModelChoiceField makes duplicate (cursor) queries to the database
-------------------------------------+-------------------------------------

Reporter: Aurélien Pardon | Owner: nobody
Type: | Status: closed
Cleanup/optimization |
Component: Forms | Version: 2.2
Severity: Normal | Resolution: needsinfo

Keywords: Model | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Aurélien Pardon):

Replying to [comment:6 Carlton Gibson]:


> I don't know what else I can say. It's part of the design of
ModelChoiceField that it re-evaluates it's queryset each time it's
iterated.

{{{ModelChoiceField}}} iterates over his {{{ModelChoiceIterator}}} when
rendering its widget (and its options).
{{{ModelChoiceIterator}}} does not necessarily re-evaluates it's queryset
each time it's iterated: sometimes the query is cached (for example when
the query involves a {{{prefetch_related}}}), sometimes it is not cached.

The reason {{{ModelChoiceIterator}}} use, sometimes, {{{.iterator()}}} is
for '''performance''' (less memory usage, see #3534) and because, at the
times, the queryset was evaluated/fetched only once when the the field was
rendered.
After [https://code.djangoproject.com/ticket/27370 Django's Select widget
adds a required="required" attribute, even if created with
empty_label=True], the queryset is evaluated/fetched twice because of the
new {{{.use_required_attribute}}} method.


> See #3534 for example, 13 years ago:
> > ModelChoiceField and ModelMultipleChoiceField cache the output of
their queryset the first time self.choices is accessed. This is bad for

long-running processes, such as mod_python, because the cache gets stale.


Plus, it's bad saving all of those choices in memory. The attached unit
tests illustrate the problem.
> The exact same memory usage considerations were the motivation for the
move to iterator() in #23623. I can't see that being removed.

You said that "correctness trumps performance" but since
[https://code.djangoproject.com/ticket/27370], the usage of
{{{.iterator()}}} in ModelChoiceIterator '''with''' the new
{{{.use_required_attribute}}} when rendering the form leads to invalid
HTML. How is this not a bug?
Moreover, how duplicate requests to the database (the second one is
useless and leads to bug) do not fall under the same performance
considerations of #3534?


I think the solution is to decide to use the required attribute after
having rendered the field options: while iterating the choices/options, we
can check if the first option allows us to use the required attribute.

--
Ticket URL: <https://code.djangoproject.com/ticket/31295#comment:7>

Django

unread,
Feb 26, 2020, 7:40:06 AM2/26/20
to django-...@googlegroups.com
#31295: required ModelChoiceField makes duplicate (cursor) queries to the database
-------------------------------------+-------------------------------------

Reporter: Aurélien Pardon | Owner: nobody
Type: | Status: closed
Cleanup/optimization |
Component: Forms | Version: 2.2
Severity: Normal | Resolution: needsinfo

Keywords: Model | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Carlton Gibson):

If you can provide a test case demonstrating invalid HTML then that we be
a big we could look at.

--
Ticket URL: <https://code.djangoproject.com/ticket/31295#comment:8>

Django

unread,
Feb 26, 2020, 11:11:34 AM2/26/20
to django-...@googlegroups.com
#31295: required ModelChoiceField makes duplicate (cursor) queries to the database
-------------------------------------+-------------------------------------

Reporter: Aurélien Pardon | Owner: nobody
Type: | Status: closed
Cleanup/optimization |
Component: Forms | Version: 2.2
Severity: Normal | Resolution: needsinfo

Keywords: Model | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Aurélien Pardon):

The fact that the following code makes '''two''' requests to the database
:
{{{#!python
class MyModel(models.Model):
my_field = models.CharField(max_length=1)

class MyForm(forms.Form):
my_form_field = forms.ModelChoiceField(MyModel.objects.all(),

empty_label=None)

my_form.as_ul()
}}}
is absolutely not working as intended. Show this to any other core dev and
ask them, please.


This whole "invalid HTML code" is not an important issue. It happens yes,
but under very rare circumstances (a blank primary key and an {{{INSERT}}}
between the two '''identical''' requests to the database), and it also
won't bother any browser. I can work hard and show you the invalid HTML
code (using database trigger to emulate race conditions?) but honestly
it's not the problem.

--
Ticket URL: <https://code.djangoproject.com/ticket/31295#comment:9>

Django

unread,
Feb 27, 2020, 7:53:35 AM2/27/20
to django-...@googlegroups.com
#31295: Avoid Select widget triggering additional query in ModelChoiceIterator.
-------------------------------------+-------------------------------------

Reporter: Aurélien Pardon | Owner: nobody
Type: | Status: closed
Cleanup/optimization |
Component: Forms | Version: 2.2
Severity: Normal | Resolution: needsinfo
Keywords: Model | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Carlton Gibson):

* stage: Unreviewed => Accepted


Comment:

Hi Aurélien,

> Maybe there's a way of performing that Select.use_required_attribute()
check without fetching the first item...?

It strikes me that we might be able to do this. This patch passes the
existing tests:

{{{
diff --git a/django/forms/widgets.py b/django/forms/widgets.py
index 40ac1d3162..5bc040065a 100644
--- a/django/forms/widgets.py
+++ b/django/forms/widgets.py
@@ -696,7 +696,16 @@ class Select(ChoiceWidget):
if self.allow_multiple_selected:
return use_required_attribute

- first_choice = next(iter(self.choices), None)
+ iterator = iter(self.choices)
+ # Avoid an extra query if we have a ModelChoiceIterator.
+ try:
+ empty_label = iterator.field.empty_label
+ except AttributeError:
+ pass
+ else:
+ if empty_label is not None:
+ return use_required_attribute
+ first_choice = next(iterator, None)
return use_required_attribute and first_choice is not None and
self._choice_has_empty_value(first_choice)
}}}

So with a further test asserting number of queries, a think about any edge
cases, and a tidy up, I think we can evaluate a patch on that basis.

Thanks for your input!

--
Ticket URL: <https://code.djangoproject.com/ticket/31295#comment:10>

Django

unread,
Feb 27, 2020, 7:54:16 AM2/27/20
to django-...@googlegroups.com
#31295: Avoid Select widget triggering additional query in ModelChoiceIterator.
--------------------------------------+------------------------------------

Reporter: Aurélien Pardon | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Forms | Version: 2.2
Severity: Normal | Resolution:

Keywords: Model | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by Carlton Gibson):

* status: closed => new
* resolution: needsinfo =>


--
Ticket URL: <https://code.djangoproject.com/ticket/31295#comment:11>

Django

unread,
Feb 27, 2020, 7:56:17 AM2/27/20
to django-...@googlegroups.com
#31295: Avoid Select widget triggering additional query when using
ModelChoiceIterator.
--------------------------------------+------------------------------------

Reporter: Aurélien Pardon | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Forms | Version: 2.2
Severity: Normal | Resolution:
Keywords: Model | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------

--
Ticket URL: <https://code.djangoproject.com/ticket/31295#comment:12>

Django

unread,
Feb 27, 2020, 12:15:37 PM2/27/20
to django-...@googlegroups.com
#31295: Avoid Select widget triggering additional query when using
ModelChoiceIterator.
--------------------------------------+------------------------------------
Reporter: Aurélien Pardon | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Forms | Version: 2.2
Severity: Normal | Resolution:
Keywords: Model | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------

Comment (by Aurélien Pardon):

Hi Carlton,

Thanks for your answer. I'm afraid the code you proposed won't work, I
just tested it. I will try to think about a solution.

The difficulty is that when rendering the widget, we need '''before
rendering the options''' (and thus iterate over the
ModelChoiceFieldIterator/fetch the data from the DB) if the HTML attribute
{{{required}}} should be used.
And we also don't want to cache the query, for memory usage performance.

I think of two different solutions :
* defer the presence of the {{{required}}} attribute after the iteration
for the <options>. During this iteration, we store the fact that the first
option allows or not the {{{required}}} attribute. After the rendering of
the options, we put afterwards (or not) the {{{required}}} attribute.
Maybe with a placeholder replaced by a regexp? I find this quite ugly.
* Maybe the ModelChoiceIterator may cache it's first value? It should
not take a lot of memory. I don't think it will work because I assume
that, for efficiency, multiple rows are fetched from the database, if you
want to cache the first value for an ulterior iteration, you also have to
cache the rest of the rows until <options> are iterated.

--
Ticket URL: <https://code.djangoproject.com/ticket/31295#comment:13>

Reply all
Reply to author
Forward
0 new messages