{{{
class WorkSet(models.Model):
name = models.TextField(blank=True)
class Work(models.Model):
name = models.TextField(blank=True)
work_set = models.ForeignKey(WorkSet, on_delete=models.CASCADE,
related_name='works')
extra_data = JSONField(default=dict, blank=True)
}}}
when I iterate over the Works related to a WorkSet {{{ ws }}} like this:
{{{
for work in ws.works.all().only('name'):
print(work.name)
}}}
Not only one query is made to the database, but 1+N where N is the number
of Works related to the WorkSet. In {{{ connection.queries }}} these
queries look like this
{{{
'SELECT "test_work"."id", "test_work"."work_set_id" FROM "test_work" WHERE
"test_work"."id" = 677931'
}}}
etc. That is - the ORM refetches the {{{ work_set_id }}} for each of the
Works iterated over. If I change the loop to explicitly include {{{
work_set_id }}}, only one query is run:
{{{
for work in ws.works.all().only('name', 'work_set_id'):
print(work.name)
}}}
While this behavior makes some sense, I find it quite unexpected and a
potential cause of nasty performance 'leaks'. It think that either the
documentation should explicitly mention this case and how to work around
it or better the ORM should automatically add the relationship key into
the fetched fields.
Please let me know if anything is unclear about this issue and if there is
anything I can improve.
--
Ticket URL: <https://code.djangoproject.com/ticket/30124>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* stage: Unreviewed => Accepted
Comment:
I'm not sure if `QuerySet.only()` is meant to work with reverse
relationships (see #23051 for another issue) but as I said there, a
helpful message should be raised if the decision is not to support it.
--
Ticket URL: <https://code.djangoproject.com/ticket/30124#comment:1>
* status: new => assigned
* owner: nobody => Dan Davis
Comment:
I'd like some feedback of whether it is worth fixing this rather than
documenting the issue. My inclination is to fix it - the query object has
an alias_map for the join at the moment that only is called, and so
whether that's the best way to detect the reverse relationship or not,
there is certainly some way:
{{{
qs1 = ws.works.get_queryset()
qs1.query.alias_map['queries_workset'] # I put them in
tests/queries/models.py for a start
}}}
I'm not sure quite how to select the reverse relationship, here is the
join.__dict__:
{{{
{'table_name': 'queries_workset',
'parent_alias': 'queries_work',
'table_alias': 'queries_workset',
'join_type': 'INNER JOIN',
'join_cols': (('work_set_id', 'id'),),
'join_field': <django.db.models.fields.related.ForeignKey: work_set>,
'nullable': False,
'filtered_relation': None}
}}}
Least interesting part for me - I've validated this by writing a failing
test case. I'm counting queries using @override_settings(DEBUG=True), and
I'm not sure that's a good implementation strategy. But anyway, I'm more
interested in understanding how Django structures joins now that I've
gotten so good at annotate as a user ;)
--
Ticket URL: <https://code.djangoproject.com/ticket/30124#comment:2>
Comment (by Dan Davis):
I'd guess the key is in understanding how the alias_map will look under
different circumstances. When the WorkSet is the base table, and I
examine this:
{{{
WorkSet.objects.filter(works__name='Flying
Car').query.alias_map['queries_work'].__dict__
}}}
I see this:
{{{
{'table_name': 'queries_work',
'parent_alias': 'queries_workset',
'table_alias': 'queries_work',
'join_type': 'INNER JOIN',
'join_cols': (('id', 'work_set_id'),),
'join_field': <ManyToOneRel: queries.wolrk>,
'nullable': True,
'filtered_relation': None}
}}}
The other problem is that even if beginner Django Dev me can fix this
particular problem, getting the treatment of only/defer in the presence of
joins completely consistent will be a bit of a challenge. I know
annotations will not excluded by only() or defer().
--
Ticket URL: <https://code.djangoproject.com/ticket/30124#comment:3>
Comment (by Dan Davis):
There seem to be two general approaches here:
=== only is intended for fields of the base table
So, if the alias_map has cardinality > 1, then detect fields that are not
from the base table and raise a warning. Since only is an optimization,
the warning alerts the user to the problem, but his/her webapp simply
continues to work, albeit slower. In a reverse relationship, the
parent's key is a field of the base table - add it automatically. This
doesn't necessarily mean that the pk of the parent table will be added.
If the ForeignKey is on something other than the parent pk, then the
parent pk is not what is added to the response. This would fix this
issue, #30124, and in the case of #23051, a warning would be raised and
the call to only would not modify the query. Since only is an
optimization, this may be acceptable.
=== only is not intended for joins
So, if the table_map has cardinality > 1, then raise some warning so that
we can gain feedback from users, and if reasonable, deprecate at a later
time. The code in #30124 would have to be done with defer() or values(),
which could in a real-world case be a lot of fields and a great loss of
utility. #23051 would similarly cause a warning, but here the warning
might be better received ;).
Are there implications of this on other query types, such as
union/difference, etc.?
--
Ticket URL: <https://code.djangoproject.com/ticket/30124#comment:4>
Comment (by Dan Davis):
New idea - in the case of a select_related(), then the natural expectation
is that only will also apply to fields in the related model. To finish
this, I will need to understand better how the select tuple in
`django.db.models.sql.query.Query` is populated, and when. I know it is
deferred to the end. I will check.
--
Ticket URL: <https://code.djangoproject.com/ticket/30124#comment:5>
Comment (by Simon Charette):
Having spent a bit of time in refactoring `only` and `defer` for #21204
([https://github.com/django/django/pull/15985 PR]) I think the proper
solution is likely not to introspect aliases and try to make the best of
it but to add a new attribute to `Query` that denotes fields that must
always be present when the select mask is generated.
We already do so for all primary keys of models involved in
`select_related`/`only` or `defer` combinations so if this attribute was
present and the `get_queryset` method of reverse manager set it as
`queryset.query.select_mask = {rel}` and then a copy of this value could
be used in ` _get_defer_select_mask` and ` _get_only_select_mask`.
--
Ticket URL: <https://code.djangoproject.com/ticket/30124#comment:6>