[Django] #24614: Symmetry for select_related/prefetch_related for handling non-related fields

35 views
Skip to first unread message

Django

unread,
Apr 9, 2015, 4:54:50 PM4/9/15
to django-...@googlegroups.com
#24614: Symmetry for select_related/prefetch_related for handling non-related
fields
----------------------------------------------+--------------------
Reporter: Naddiseo | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer (models, ORM) | Version: master
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------------------+--------------------
I've been looking at the generated queries from some of our code and
noticed a few things about select_related and prefetch_related. To start
with, I want to provide some code examples for reference:

{{{#!python

from django.db import models

class City(models.Model):
name = models.CharField(max_length = 50)
population = models.PositiveIntegerField()
mayor = models.ForeignKey('Person', null=True, default=None)

class Person(models.Model):
name = models.CharField(max_length = 50)
hometown = models.ForeignKey(City)

def initial_data():
c1 = City(name='City1', population=2)
c1.save()
p1 = Person(name='Citizen One', hometown=c1)
p1.save()
p2 = Person(name='Mayor', hometown=c1)
p2.save()
c1.mayor = p2
c1.save(update_fields=['mayor'])
}}}

{{{
# ==== Current behaviour (with abbreviated sql) ====

# --- Select related at depth=1 ---
>>> p = Person.objects.select_related('hometown').get(id=1)
# SELECT p.*, c.* FROM app_person p INNER JOIN app_city c ON
p.hometown_id = c.id WHERE p.id=1;
>>> print(p.hometown.name) # No query
City1
>>> print(p.hometown.population) # No query
2
>>> print(p.hometown.mayor)
# SELECT p.* FROM app_person p WHERE p.id = 2
Person(Mayor)

# --- Select related on a non-rel field ---
p = Person.objects.select_related('hometown__name').get(id=1)
# SELECT p.*, c.* FROM app_person p INNER JOIN app_city c ON
p.hometown_id = c.id WHERE p.id=1;
>>> print(p.hometown.name) # No query
City1
>>> print(p.hometown.population) # No query
2
>>> print(p.hometown.mayor)
# SELECT p.* FROM app_person p WHERE p.id = 2
Person(Mayor)

# --- Select related on a rel-rel field ---
p = Person.objects.select_related('hometown__mayor').get(id=1)
# SELECT p1.*, c.*, p2.* FROM app_person p1 INNER JOIN app_city c ON
p1.hometown_id = c.id LEFT OUTER JOIN app_person p2 ON c.mayor_id = p2.id
WHERE p1.id = 1
>>> print(p.hometown.name) # No query
City1
>>> print(p.hometown.population) # No query
2
>>> print(p.hometown.mayor) # No query
Person(Mayor)

# --- Prefetch related on a related field ---
>>> p = Person.objects.prefetch_related('hometown').get(id=1)
# SELECT p.* FROM app_person p WHERE p.id = 1
# SELECT c.* FROM app_city c WHERE c.id IN (...)
>>> print(p.hometown.name) # No query
City1
>>> print(p.hometown.population) # No query
2
>>> print(p.hometown.mayor)
# SELECT p.* FROM app_person p WHERE p.id = 2
Person(Mayor)

# --- Prefetch related on a non-rel field ---
>>> p = Person.objects.prefetch_related('hometown__name').get(id=1)
# SELECT p.* FROM app_person p WHERE p.id = 1
# SELECT c.* FROM app_city c WHERE c.id IN (...)
ValueError: 'hometown__name' does not resolve to an item that supports
prefetching - this is an invalid parameter to prefetch_related().

# --- Prefetch related on a rel-rel field ---
>>> p = Person.objects.prefetch_related('hometown__name').get(id=1)
# SELECT p.* FROM app_person p WHERE p.id = 1
# SELECT c.* FROM app_city c WHERE c.id IN (...)
# SELECT p.* FROM app_person p WHERE p.id = 2
>>> print(p.hometown.name) # No query
City1
>>> print(p.hometown.population) # No query
2
>>> print(p.hometown.mayor) # No query
Person(Mayor)
}}}

There are two things in the example I want to bring up:

The first is that `select_related` and `prefetch_related` differ in how
they handle their arguments if they're invalid (ie, when they are pointing
to non-related fields), `select_related` will happily accept the argument
and just fetch the table parts, whereas `prefetch_related` will throw a
`ValueError`. I think both methods should either throw the `ValueError`,
or both sanitize the input :- by the same reason given in
> django/db/models/query.py line ~1508
> Last one, this *must* resolve to something that supports prefetching,
otherwise there is no point adding it and the developer asking for it has
made a mistake."

The second is that I believe there is some semantic overloading going on
for the argument format for
`prefetch_related`/`select_related`/`only`/`defer`, I think it's best
explained in a table:

|| `"model__attr"` ||= `prefetch_related`
=||= `select_related` =||=
`only/defer` =||= `only/defer` + `select_related` =||
||attr is Related field ||Extra query (expected), selects all
fields||Joins field (expected), selects all fields ||affects what is
in the select clause (expected), but ignores "attr" and only selects
"model" ||affects what is in the select clause (expected), and only
selects "attr" from "model" (expected) ||
||attr is non-relational field||ValueError
||Ignores the field, selects all fields on table ||affects what is in the
select clause (expected), but ignores "attr" and only selects "model" ||
affects what is in the select clause (expected), and only selects "attr"
from "model" (expected) ||

There seems to be a two semantic overlap issues. First is with the name of
`select_related` in that it doesn't really affect which fields are
"selected", only really which tables are joined, second is the semantic
overlap between the argument formats which `selected_related` accepts and
which formats `only/defer` accept. The main problem I perceive is that
both accept arguments in the format `"model__attr"` yet treat them
differently. On one hand, `prefetch_related` and `select_related` are
supposed to only deal with arguments where `"__attr"` is a relational
field (prefetch does correctly), on the other hand, `only/defer` accept
arguments where `"__attr"` is both a relational field, or just a normal
attribute field. To me, at least, these are two distinct data types, yet
when I'm looking through code they are represented in exactly the same
manner. Obviously, there isn't any other practical way to differentiate
the two data types, so there should probably be changes made to how
they're handled.

Originally, I was going to propose that if a non-relational field was
detected in `prefetch_related` or `select_related`, then it is passed to
an `only()`, since when I see the name "select_related", I associate it
with the fields "SELECT"ed, thus `A.objects.select_related('b__attr')`
would behave the same as `A.objects.select_related('b').only('b__attr',
'aAttr1', 'aAttr2').defer('b__otherattr')`, however, after trying to
reason about it, I'm not sure it's a good idea since it further
exacerbates the symmetry issues.
Pros:
- More succinct code for optimizing which fields are selected
- `select_related` accepts the same field types as `only/defer`
- Allows more fine grained control to which fields are `SELECT`ed (fields
in the `select_related` will be the ones actually selected for those
related tables)
Cons/Concerns:
- Behaviour is less like `prefetch_related`
- Possibly breaking change for `select_related` on relational fields.
- Doesn't really make sense for `prefetch_related`, thus the symmetry
issue between `prefetch_related` and `select_related`.
- What is selected for `A.objects.select_related('b__c')` where `c` is
also a related field? Is it all fields on `c` or just its id?

Perhaps a better approach, although probably most controversial, would be
to rename `select_related` to `join_related` and have it throw a
`ValueError` on non-relational fields. This would resolve the semantic
overloading of the method name and implications of its implementations,
however, it doesn't resolve the second - although admittedly more minor -
the previously mentioned argument/data types/semantics issue.

Anyway, I'm hoping a discussion can be started on this, but at very
minimum I think `select_related` should match the behaviour of
`prefetch_related` when it encounters an argument that does not resolve to
a relational field. Let me know if a separate issue should be reported.

--
Ticket URL: <https://code.djangoproject.com/ticket/24614>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Apr 10, 2015, 5:28:44 AM4/10/15
to django-...@googlegroups.com
#24614: Symmetry for select_related/prefetch_related for handling non-related
fields
-------------------------------------+-------------------------------------

Reporter: Naddiseo | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by shaib):

* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0


Comment:

Hello, and thanks for the long and detailed explanation and discussion. I
appreciate that creating this ticket must have taken significant effort.
However, I tend to disagree with your conclusions.

For the symmetry in validation of non-related fields, I tend to think that
the benefit does not justify breaking backwards-compatibility. If someone
`select_relatred`s an extra field -- either by mistake, or because a
field was once a relation and changed -- and their code works to their
satisfaction, I see no reason to break it for them on Django upgrade. If
`select_related` were a new feature, you'd be right on. Unfortunately, it
isn't.

On the "semantic overload" issue, I am not convinced that is a real
problem in common use, and even if it is, I'm not sure if it should be
solved by (again, backwards-incompatible) behavior change or by improving
documentation.

I am leaving this ticket open, as I consider the work you put into it
deserves another eye looking at it, but if you really want to start a
discussion, the place for it is not the issue tracker, but the
DevelopersMailingList.

--
Ticket URL: <https://code.djangoproject.com/ticket/24614#comment:1>

Django

unread,
Apr 10, 2015, 6:16:01 PM4/10/15
to django-...@googlegroups.com
#24614: Symmetry for select_related/prefetch_related for handling non-related
fields
-------------------------------------+-------------------------------------

Reporter: Naddiseo | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by shaib):

So... there was a little discussion on IRC about this, and it was pointed
out that [https://docs.djangoproject.com/en/1.8/releases/1.8/#select-
related-now-checks-given-fields this was already done]. Are you sure you
see the behavior you described on master?

Some participants in the IRC discussion favored turning it from error to a
deprecation warning -- so it might be changed.

--
Ticket URL: <https://code.djangoproject.com/ticket/24614#comment:2>

Django

unread,
Apr 10, 2015, 7:15:45 PM4/10/15
to django-...@googlegroups.com
#24614: Symmetry for select_related/prefetch_related for handling non-related
fields
-------------------------------------+-------------------------------------

Reporter: Naddiseo | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Naddiseo):

Thanks for your responses @shaib, I really do appreciate you taking the
time to read it and being open to discussing it, I'll move the discussion
and a reply into the mailing list, as you suggest, over the weekend (does
the mailing list handle the table and code formatting?).

As for your second response: I've just double checked in dj 1.8, and the
validation only occurs if the field is on the immediate model, or if it's
invalid:

{{{#!python

>>> Person.objects.select_related('foo')
django.core.exceptions.FieldError: Invalid field name(s) given in
select_related: 'foo'. Choices are: hometown
>>> Person.objects.select_related('hometown__foo')
django.core.exceptions.FieldError: Invalid field name(s) given in
select_related: 'foo'. Choices are: mayor
>>> Person.objects.select_related('hometown__name')
# query = select ... blah
[ Person object ...]
>>> Person.objects.select_related('name')
django.core.exceptions.FieldError: Non-relational field given in
select_related: 'name'. Choices are: hometown

}}}

Notice how it will happily accept the third one. Would you like me to file
a separate bug report for this issue?

--
Ticket URL: <https://code.djangoproject.com/ticket/24614#comment:3>

Django

unread,
Apr 13, 2015, 12:49:52 PM4/13/15
to django-...@googlegroups.com
#24614: Symmetry for select_related/prefetch_related for handling non-related
fields
-------------------------------------+-------------------------------------

Reporter: Naddiseo | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by timgraham):

Yes, a separate bug sounds appropriate.

--
Ticket URL: <https://code.djangoproject.com/ticket/24614#comment:4>

Django

unread,
Apr 22, 2015, 2:00:45 PM4/22/15
to django-...@googlegroups.com
#24614: Symmetry for select_related/prefetch_related for handling non-related
fields
-------------------------------------+-------------------------------------
Reporter: Naddiseo | Owner: nobody
Type: Uncategorized | Status: closed

Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by timgraham):

* status: new => closed
* resolution: => needsinfo


Comment:

I created a ticket for the separate issue: #24687.

Closing this issue pending a mailing discussion as discussed above.

--
Ticket URL: <https://code.djangoproject.com/ticket/24614#comment:5>

Reply all
Reply to author
Forward
0 new messages