[Django] #30158: Subquery Expressions Incorrectly Added to Group by

32 views
Skip to first unread message

Django

unread,
Feb 5, 2019, 10:56:31 AM2/5/19
to django-...@googlegroups.com
#30158: Subquery Expressions Incorrectly Added to Group by
-------------------------------------+-------------------------------------
Reporter: | Owner: nobody
JonnyWaffles |
Type: | Status: new
Uncategorized |
Component: Database | Version: 2.1
layer (models, ORM) |
Severity: Normal | Keywords: subquery, group_by
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
Hi friends,

My Django/SQL skills are not good enough to properly isolate the problem
independently of my use case detailed below. I believe the problem is
Subqueries being forced in to the group_by clause because they are select
expressions. Per the below if I remove the subqueries manually from sql
group_by, my query runs perfectly. I tried to manually edit the
qs.query.group_by, but because it is created by compiler.get_group_by() I
cannot fix the ORM query group by clause.

Are Subquery expressions always supposed to be included in group_by? If
this is desired behavior is it possible to toggle it off so the ORM can
produce the accurate query?

{{{
"""
Problem Statement: The individual annotations work fine when run
independently, but when chained the query takes 5 minutes. This is due to
the final group by clause unexpectedly receiving the Subquery as extra
fields.
"""

# relevant models and querysets
class ClaimQuerySet(models.QuerySet):
def annotate_all(self):
"""Adds ``results``, ``latest_note_text``, and
``latest_assessment_text`` to the queryset."""
return
self.annotate_latest_results().annotate_most_recent_note().annotate_most_recent_assessment()

def prefetch_all(self, annotate_sum=True):
return self.prefetch_notes().prefetch_latest_results(annotate_sum)

def prefetch_latest_results(self, annotate_sum: bool=True):
"""Prefetches the most result :class:`RulesEngineResult` object
and optionally
preload its :attr:`RulesEngineResult.score`.

Args:
annotate_sum:
"""
latest_runs = self.latest_runs
if annotate_sum:
latest_runs =
latest_runs.annotate(_score=Sum('results__value'))
return self.prefetch_related(Prefetch(
'rules_engine_results', queryset=latest_runs,
to_attr='_last_run')
)

def prefetch_notes(self):
"""Prefetches all related notes and assessments."""
return self.prefetch_related('notes', 'assessments')

@property
def latest_runs(self):
"""Shortcut for
:attr:`RulesEngineResultQuerySet.get_latest_runs`"""
return RulesEngineResult.objects.get_latest_runs()

def annotate_latest_results(self) -> 'ClaimQuerySet':
"""Annotates the queryset with a new field ``results`` whose value
is the Sum
of the last :attr:`RulesEngineResult.results` for the claim.
"""
# Only Sum on runs in the above set.
filter_q = Q(rules_engine_results__in=self.latest_runs)
# noinspection PyTypeChecker
return
self.annotate(results=Sum('rules_engine_results__results__value',
filter=filter_q))

def annotate_most_recent_note(self) -> 'ClaimQuerySet':
"""Annotates the queryset with a field ``latest_note_text`` whose
value is the last
entered :attr:`Note.text` for the claim or ``None`` if there are
no associated notes.
"""
return self._annotate_most_recent_basenote(Note,
'latest_note_text')

def annotate_most_recent_assessment(self) -> 'ClaimQuerySet':
"""Annotates the queryset with a field ``latest_assessment_text``
whose value is the last
entered :attr:`Assessment.text` for the claim or ``None`` if there
are no associated assessments.
"""
return self._annotate_most_recent_basenote(Assessment,
'latest_assessment_text')

def _annotate_most_recent_basenote(self, model: Type['BaseNote'],
field_name: str) -> 'ClaimQuerySet':
newest =
model.objects.filter(claim=OuterRef('id')).order_by('-created')
annotate_kwargs = {
field_name: Subquery(newest.values('text')[:1])
}
# noinspection PyTypeChecker
return self.annotate(**annotate_kwargs)


class Claim(BaseClaim):
"""Concrete :class:`~mkl.fraud_django.models.BaseClaim` for
:mod:`~mkl.fraud_django.workers_comp` claims.
"""
objects = ClaimQuerySet.as_manager()
first_rules_engine_run = models.DateField()

@property
def latest_note(self) -> 'Note':
"""Returns the latest :class:`Note`."""
return self.notes.latest()

@property
def latest_assessment(self) -> 'Assessment':
"""Retrieves the latest :class:`Assessment`."""
return self.assessments.latest()

@property
def latest_rulesengine_run(self) -> 'RulesEngineResult':
"""Returns the most most recent run.

.. note::

Use :meth:`ClaimQuerySet.prefetch_latest_results` to
prefetch the last_run, falls back on querying for the latest
value.

Note, if used in a prefetched queryset the value could be
stale.
"""
return self._get_latest(RulesEngineResult, '_last_run')

def _get_latest(self, model: Type[models.Model], cache_attr: str):
"""Handler to return None if a latest related object does not
exist,
checks the cache first."""
if hasattr(self, cache_attr):
try:
return getattr(self, cache_attr)[0]
except IndexError:
return None
try:
return model.objects.filter(claim=self).latest()
except model.DoesNotExist:
return None

def __unicode__(self):
return self.claim_number


class BaseNote(models.Model):
"""Abstract Base Model for both Notes and Assessments.

Use this base for any claim related editable field whose
historical data is important.

On the claim we can write functions to retrieve the latest.

.. note:: The related name will be the class name lower case with an
's'.

Attributes:
text (str): The user provided content
created (datetime.datetime): Created time stamp
claim (:class:`Claim`): The claim related to the note.
"""
id = models.AutoField(primary_key=True)
text = models.TextField(max_length=1000)
created = models.DateTimeField(auto_now_add=True)
claim = models.ForeignKey('Claim', on_delete=models.PROTECT,
related_name='%(class)ss')

class Meta:
abstract = True
get_latest_by = 'created'
ordering = ('-created',)


class Note(BaseNote):
"""Concrete class for Notes, related_name will become ``notes``."""


class Assessment(BaseNote):
"""Concrete class for Assessment, related_name will become
``assessments``."""
CHOICES = (
('01', 'Will open a case'),
('02', 'Will not open a case'),
('03', 'Previously opened'),
('04', "Appears suspicious but won't open"),
('05', 'Not enough info to determine'),
('06', 'Existing vendor request'),
)
text = models.CharField(max_length=1000, choices=CHOICES)

def get_choice_value(self) -> str:
"""Returns the value as the choice human readable text."""
db_text = self.text
return dict(self.CHOICES)[db_text]


class RuleResult(models.Model):
"""The result of running the engine for a particular claim against a
:class:`Rule`.

Attributes:
rule: The rule to be checked
value: The numeric weight of the result
result: The rules engine result of all rules run against the claim
"""
id = models.AutoField(primary_key=True)
rule = models.ForeignKey('Rule', on_delete=models.PROTECT)
value = models.IntegerField()
result = models.ForeignKey('RulesEngineResult',
on_delete=models.PROTECT, related_name='results')


class RulesEngineResultQuerySet(models.QuerySet):
def get_latest_runs(self):
"""Filters to only the most recent
:class:`RulesEngineResult`\s."""
annotated = self.annotate(
latest=Max('claim__rules_engine_results__created')
)
return annotated.filter(created=F('latest'))


class RulesEngineResult(models.Model):
"""
RulesEngine run result.

Attributes:
claim (:class:`Claim`): The claim run through the RulesEngine.
results (List[:class:`RuleResult`]): Collection of results for
each rule.
"""
id = models.AutoField(primary_key=True)
created = models.DateTimeField(auto_now_add=True)
claim = models.ForeignKey('Claim', on_delete=models.PROTECT,
related_name='rules_engine_results')
objects = RulesEngineResultQuerySet.as_manager()

class Meta:
get_latest_by = 'created'

@property
def score(self) -> int:
"""Returns the aggregate score of all related results. Checks
prefetched cache first."""
if hasattr(self, '_score'):
return self._score
d = self.results.aggregate(score=models.Sum('value'))
return d['score']


"""
Individual Query rendering
"""
# Recent Note
qs = models.Claim.objects.annotate_most_recent_note()

SELECT "workers_comp_claim"."id",
"workers_comp_claim"."claim_number",
"workers_comp_claim"."first_rules_engine_run",
(SELECT U0."text"
FROM "workers_comp_note" U0
WHERE U0."claim_id" = ("workers_comp_claim"."id")
ORDER BY U0."created" DESC
LIMIT 1) AS "latest_note_text"
FROM "workers_comp_claim"

# Recent Assessment
qs = models.Claim.objects.annotate_most_recent_assessment()

SELECT "workers_comp_claim"."id",
"workers_comp_claim"."claim_number",
"workers_comp_claim"."first_rules_engine_run",
(SELECT U0."text"
FROM "workers_comp_assessment" U0
WHERE U0."claim_id" = ("workers_comp_claim"."id")
ORDER BY U0."created" DESC
LIMIT 1) AS "latest_assessment_text"
FROM "workers_comp_claim"

# Latest Results (Run)
qs = models.Claim.objects.annotate_latest_results()

SELECT "workers_comp_claim"."id",
"workers_comp_claim"."claim_number",
"workers_comp_claim"."first_rules_engine_run",
SUM("workers_comp_ruleresult"."value") FILTER (WHERE
"workers_comp_rulesengineresult"."id" IN (SELECT U0."id"
FROM "workers_comp_rulesengineresult" U0
INNER JOIN "workers_comp_claim" U1 ON (U0."claim_id" = U1."id")
LEFT OUTER JOIN "workers_comp_rulesengineresult" U2 ON (U1."id" =
U2."claim_id")
GROUP BY U0."id"
HAVING U0."created" = (MAX(U2."created")))) AS "results"
FROM "workers_comp_claim"
LEFT OUTER JOIN "workers_comp_rulesengineresult"
ON ("workers_comp_claim"."id" =
"workers_comp_rulesengineresult"."claim_id")
LEFT OUTER JOIN "workers_comp_ruleresult"
ON ("workers_comp_rulesengineresult"."id" =
"workers_comp_ruleresult"."result_id")
GROUP BY "workers_comp_claim"."id"


"""
When chained the query renders incorrectly like this
"""
qs =
models.Claim.objects.annotate_latest_results().annotate_most_recent_note().annotate_most_recent_assessment()

SELECT "workers_comp_claim"."id",
"workers_comp_claim"."claim_number",
"workers_comp_claim"."first_rules_engine_run",
SUM("workers_comp_ruleresult"."value") FILTER (WHERE
"workers_comp_rulesengineresult"."id" IN (SELECT U0."id"
FROM "workers_comp_rulesengineresult" U0
INNER JOIN "workers_comp_claim" U1 ON (U0."claim_id" = U1."id")
LEFT OUTER JOIN "workers_comp_rulesengineresult" U2 ON (U1."id" =
U2."claim_id")
GROUP BY U0."id"
HAVING U0."created" = (MAX(U2."created")))) AS "results",
(SELECT U0."text"
FROM "workers_comp_note" U0
WHERE U0."claim_id" = ("workers_comp_claim"."id")
ORDER BY U0."created" DESC
LIMIT 1)
AS "latest_note_text",
(SELECT U0."text"
FROM "workers_comp_assessment" U0
WHERE U0."claim_id" = ("workers_comp_claim"."id")
ORDER BY U0."created" DESC
LIMIT 1)
AS "latest_assessment_text"
FROM "workers_comp_claim"
LEFT OUTER JOIN "workers_comp_rulesengineresult"
ON ("workers_comp_claim"."id" =
"workers_comp_rulesengineresult"."claim_id")
LEFT OUTER JOIN "workers_comp_ruleresult"
ON ("workers_comp_rulesengineresult"."id" =
"workers_comp_ruleresult"."result_id")
GROUP BY "workers_comp_claim"."id", (SELECT U0."text"
FROM "workers_comp_note" U0
WHERE U0."claim_id" =
("workers_comp_claim"."id")
ORDER BY U0."created" DESC
LIMIT 1), (SELECT U0."text"
FROM
"workers_comp_assessment" U0
WHERE U0."claim_id" =
("workers_comp_claim"."id")
ORDER BY U0."created" DESC
LIMIT 1)


"""
Why is Django performing the group by with the Subqueries? How do I make
it render correctly like this:
"""
SELECT "workers_comp_claim"."id",
"workers_comp_claim"."claim_number",
"workers_comp_claim"."first_rules_engine_run",
SUM("workers_comp_ruleresult"."value") FILTER (WHERE
"workers_comp_rulesengineresult"."id" IN (SELECT U0."id"
FROM "workers_comp_rulesengineresult" U0
INNER JOIN "workers_comp_claim" U1 ON (U0."claim_id" = U1."id")
LEFT OUTER JOIN "workers_comp_rulesengineresult" U2 ON (U1."id" =
U2."claim_id")
GROUP BY U0."id"
HAVING U0."created" = (MAX(U2."created")))) AS "results",
(SELECT U0."text"
FROM "workers_comp_note" U0
WHERE U0."claim_id" = ("workers_comp_claim"."id")
ORDER BY U0."created" DESC
LIMIT 1)
AS "latest_note_text",
(SELECT U0."text"
FROM "workers_comp_assessment" U0
WHERE U0."claim_id" = ("workers_comp_claim"."id")
ORDER BY U0."created" DESC
LIMIT 1)
AS "latest_assessment_text"
FROM "workers_comp_claim"
LEFT OUTER JOIN "workers_comp_rulesengineresult"
ON ("workers_comp_claim"."id" =
"workers_comp_rulesengineresult"."claim_id")
LEFT OUTER JOIN "workers_comp_ruleresult"
ON ("workers_comp_rulesengineresult"."id" =
"workers_comp_ruleresult"."result_id")
GROUP BY "workers_comp_claim"."id";
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/30158>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Feb 5, 2019, 11:28:22 AM2/5/19
to django-...@googlegroups.com
#30158: Subquery Expressions Incorrectly Added to Group by
-------------------------------------+-------------------------------------
Reporter: JonnyWaffles | Owner: nobody
Type: Uncategorized | Status: closed
Component: Database layer | Version: 2.1
(models, ORM) |
Severity: Normal | Resolution: invalid

Keywords: subquery, group_by | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim Graham):

* status: new => closed
* resolution: => invalid


Comment:

See TicketClosingReasons/UseSupportChannels for places to ask "is it a
bug/how do I?" questions. Most likely you'll need to simplify your
problem. If you find Django at fault, please create a more concise,
minimal ticket.

--
Ticket URL: <https://code.djangoproject.com/ticket/30158#comment:1>

Django

unread,
Feb 5, 2019, 1:32:29 PM2/5/19
to django-...@googlegroups.com
#30158: Subquery Expressions Incorrectly Added to Group by
-------------------------------------+-------------------------------------
Reporter: JonnyWaffles | Owner: nobody
Type: Uncategorized | Status: closed
Component: Database layer | Version: 2.1
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: subquery, group_by | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

Additional details about which database engine and Django version you are
using would be also appreciated.

--
Ticket URL: <https://code.djangoproject.com/ticket/30158#comment:2>

Django

unread,
Feb 5, 2019, 2:07:02 PM2/5/19
to django-...@googlegroups.com
#30158: Subquery Expressions Incorrectly Added to Group by
-------------------------------------+-------------------------------------
Reporter: JonnyWaffles | Owner: nobody
Type: Uncategorized | Status: closed
Component: Database layer | Version: 2.1
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: subquery, group_by | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

I suspect `Subquery.get_group_by_cols` needs to be adjusted to return an
empty list. Since #30099 (2.2a1+ only)
[https://github.com/django/django/blob/f021c110d02fd7ca32ae56f511b46e5d138b6c73/django/db/models/expressions.py#L334-L336
it returns] `[self]` but before this change [it returned all the filter
conditions with a left hand
side](https://github.com/django/django/blob/f021c110d02fd7ca32ae56f511b46e5d138b6c73/django/db/models/expressions.py#L1047-L1053)
which is even worst.

Could you try subclassing `Subquery` and overriding `get_group_by_cols` to
`return []`, use the subclass with your query, and see if it helps?

--
Ticket URL: <https://code.djangoproject.com/ticket/30158#comment:3>

Django

unread,
Feb 5, 2019, 2:55:00 PM2/5/19
to django-...@googlegroups.com
#30158: Subquery Expressions Incorrectly Added to Group by
-------------------------------------+-------------------------------------
Reporter: JonnyWaffles | Owner: nobody
Type: Uncategorized | Status: closed
Component: Database layer | Version: 2.1
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: subquery, group_by | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by JonnyWaffles):

Replying to [comment:2 Simon Charette]:


> Additional details about which database engine and Django version you
are using would be also appreciated.

Hi Simon. I am using Django 2.1.5 and Postgre.

--
Ticket URL: <https://code.djangoproject.com/ticket/30158#comment:4>

Django

unread,
Feb 5, 2019, 2:59:29 PM2/5/19
to django-...@googlegroups.com
#30158: Subquery Expressions Incorrectly Added to Group by
-------------------------------------+-------------------------------------
Reporter: JonnyWaffles | Owner: nobody
Type: Uncategorized | Status: closed
Component: Database layer | Version: 2.1
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: subquery, group_by | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by JonnyWaffles):

Replying to [comment:3 Simon Charette]:


> I suspect `Subquery.get_group_by_cols` needs to be adjusted to return an
empty list. Since #30099 (2.2a1+ only)
[https://github.com/django/django/blob/f021c110d02fd7ca32ae56f511b46e5d138b6c73/django/db/models/expressions.py#L334-L336
it returns] `[self]` but before this change

[https://github.com/django/django/blob/f021c110d02fd7ca32ae56f511b46e5d138b6c73/django/db/models/expressions.py#L1047-L1053
it returned all the filter conditions with a left hand side] which is even


worst.
>
> Could you try subclassing `Subquery` and overriding `get_group_by_cols`
to `return []`, use the subclass with your query, and see if it helps?

Wow I don't know what black magic that did but it works!

--
Ticket URL: <https://code.djangoproject.com/ticket/30158#comment:5>

Django

unread,
Feb 5, 2019, 3:32:43 PM2/5/19
to django-...@googlegroups.com
#30158: Subquery Expressions Incorrectly Added to Group by
-------------------------------------+-------------------------------------
Reporter: JonnyWaffles | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: master

(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: subquery, group_by | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* has_patch: 0 => 1
* version: 2.1 => master
* type: Uncategorized => Bug
* needs_tests: 0 => 1
* stage: Unreviewed => Accepted


Comment:

I just ran the full suite with the `get_group_by_cols` override detailed
and it worked fine, this case is simply untested.

Jonny, would you be interested in submitting a Github PR with the changes
suggested above with the addition of a regression test to make sure the
`GROUP BY` doesn't occur? I'd be happy to offer assistance like I did for
#30099 in https://github.com/django/django/pull/10846.

--
Ticket URL: <https://code.djangoproject.com/ticket/30158#comment:6>

Django

unread,
Feb 5, 2019, 3:33:29 PM2/5/19
to django-...@googlegroups.com
#30158: Subquery Expressions Incorrectly Added to Group by
-------------------------------------+-------------------------------------
Reporter: JonnyWaffles | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: subquery, group_by | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* cc: Simon Charette (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/30158#comment:7>

Django

unread,
Feb 5, 2019, 4:40:19 PM2/5/19
to django-...@googlegroups.com
#30158: Subquery Expressions Incorrectly Added to Group by
-------------------------------------+-------------------------------------
Reporter: JonnyWaffles | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: subquery, group_by | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by JonnyWaffles):

Replying to [comment:6 Simon Charette]:


> I just ran the full suite with the `get_group_by_cols` override detailed
and it worked fine, this case is simply untested.
>
> Jonny, would you be interested in submitting a Github PR with the
changes suggested above with the addition of a regression test to make
sure the `GROUP BY` doesn't occur? I'd be happy to offer assistance like I
did for #30099 in https://github.com/django/django/pull/10846.

Sure thing Simon. I just need to figure out how to write a generic test
case...

--
Ticket URL: <https://code.djangoproject.com/ticket/30158#comment:8>

Django

unread,
Feb 5, 2019, 6:09:08 PM2/5/19
to django-...@googlegroups.com
#30158: Subquery Expressions Incorrectly Added to Group by
-------------------------------------+-------------------------------------
Reporter: JonnyWaffles | Owner: nobody
Type: Bug | Status: new

Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: subquery, group_by | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim Graham):

* status: closed => new
* resolution: invalid =>


--
Ticket URL: <https://code.djangoproject.com/ticket/30158#comment:9>

Django

unread,
Feb 26, 2019, 11:31:58 PM2/26/19
to django-...@googlegroups.com
#30158: Subquery Expressions Incorrectly Added to Group by
-------------------------------------+-------------------------------------
Reporter: Jonny Fuller | Owner: Simon
| Charette
Type: Bug | Status: assigned

Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: subquery, group_by | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* owner: nobody => Simon Charette
* status: new => assigned


--
Ticket URL: <https://code.djangoproject.com/ticket/30158#comment:10>

Django

unread,
Feb 26, 2019, 11:32:22 PM2/26/19
to django-...@googlegroups.com
#30158: Subquery Expressions Incorrectly Added to Group by
-------------------------------------+-------------------------------------
Reporter: Jonny Fuller | Owner: Simon
| Charette
Type: Bug | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: subquery, group_by | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

This happens to be a bit related to #29542.

--
Ticket URL: <https://code.djangoproject.com/ticket/30158#comment:11>

Django

unread,
Feb 27, 2019, 12:43:31 AM2/27/19
to django-...@googlegroups.com
#30158: Subquery Expressions Incorrectly Added to Group by
-------------------------------------+-------------------------------------
Reporter: Jonny Fuller | Owner: Simon
Type: | Charette
Cleanup/optimization | Status: assigned

Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: subquery, group_by | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* type: Bug => Cleanup/optimization


Comment:

Jonny, by playing a bit with the code I noticed that it's a bit more
complex than simply returning an empty list in
`Subquery.get_group_by_cols`.

Can you confirm that both queries were returning the appropriate results
but that the one where subqueries were added to the `GROUP BY` was
performing significantly slower?

If that's the case then this is more of an optimization problem where
subqueries can sometimes be removed from the `GROUP BY` and sometimes not
but not in all cases.

e.g.

{{{#!python
Publisher.objects.annotate(
has_long_books=Exists(
Book.objects.filter(
publisher=OuterRef('pk'),
pages__gt=800,
),
),
).values_list('has_long_books').annotate(
total=Count('*'),
)
}}}

When there's an explicit grouping by a subquery (or exists) then it must
be honoured but I believe in other cases it's not.

--
Ticket URL: <https://code.djangoproject.com/ticket/30158#comment:12>

Django

unread,
Mar 18, 2019, 12:26:34 AM3/18/19
to django-...@googlegroups.com
#30158: Subquery expressions unnecessarily added to group by

-------------------------------------+-------------------------------------
Reporter: Jonny Fuller | Owner: Simon
Type: | Charette
Cleanup/optimization | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: subquery, group_by | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

I renamed the ticket because I'm pretty confident that the generated query
doesn't cause the wrong results to be returned.

The fact that the ORM doesn't have the introspection abilities to
determine if some annotated expressions can be trimmed from the `GROUP BY`
clause on aggregation
[https://github.com/django/django/blob/5c17c273ae2d7274f1fa78218b3b74690efddb86/django/db/models/sql/query.py#L407-L409
is a known limitation] and there's probably other existing issues for
other type of expressions.

Since the initial approach of making `get_group_by_cols` return an empty
list is not viable for the aforementioned reason I think we should begin
by merging [https://github.com/django/django/pull/11030 regression tests
to prevent this edge case from being overlooked in the future].

--
Ticket URL: <https://code.djangoproject.com/ticket/30158#comment:13>

Django

unread,
Mar 19, 2019, 2:24:14 AM3/19/19
to django-...@googlegroups.com
#30158: Subquery expressions unnecessarily added to group by
-------------------------------------+-------------------------------------
Reporter: Jonny Fuller | Owner: Simon
Type: | Charette
Cleanup/optimization | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: subquery, group_by | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* needs_tests: 1 => 0


Comment:

Finally came up with an elegant solution. By changing
`Expression.get_group_by_cols`'s signature to accept an optional `alias`
parameter that is only provided when the annotated expression is being
explicitly grouped against. That allows `Subquery.get_group_by_cols` to
either return a reference to it's selected alias when grouped against or
no columns at all when it's not the case which is what this ticket is
tracking.

--
Ticket URL: <https://code.djangoproject.com/ticket/30158#comment:14>

Django

unread,
Mar 21, 2019, 7:38:18 PM3/21/19
to django-...@googlegroups.com
#30158: Subquery expressions unnecessarily added to group by
-------------------------------------+-------------------------------------
Reporter: Jonny Fuller | Owner: Simon
Type: | Charette
Cleanup/optimization | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: subquery, group_by | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tim Graham <timograham@…>):

In [changeset:"9dc367dc10594ad024c83d398a8e3c3f8f221446" 9dc367d]:
{{{
#!CommitTicketReference repository=""
revision="9dc367dc10594ad024c83d398a8e3c3f8f221446"
Refs #30158 -- Added alias argument to Expression.get_group_by_cols().
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/30158#comment:15>

Django

unread,
Mar 21, 2019, 7:38:18 PM3/21/19
to django-...@googlegroups.com
#30158: Subquery expressions unnecessarily added to group by
-------------------------------------+-------------------------------------
Reporter: Jonny Fuller | Owner: Simon
Type: | Charette
Cleanup/optimization | Status: closed

Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution: fixed
Keywords: subquery, group_by | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim Graham <timograham@…>):

* status: assigned => closed
* resolution: => fixed


Comment:

In [changeset:"fb3f034f1c63160c0ff13c609acd01c18be12f80" fb3f034f]:
{{{
#!CommitTicketReference repository=""
revision="fb3f034f1c63160c0ff13c609acd01c18be12f80"
Fixed #30158 -- Avoided unnecessary subquery group by on aggregation.

Subquery annotations can be omitted from the GROUP BY clause on
aggregation
as long as they are not explicitly grouped against.

Thanks Jonny Fuller for the report.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/30158#comment:16>

Django

unread,
Mar 21, 2019, 7:38:18 PM3/21/19
to django-...@googlegroups.com
#30158: Subquery expressions unnecessarily added to group by
-------------------------------------+-------------------------------------
Reporter: Jonny Fuller | Owner: Simon
Type: | Charette
Cleanup/optimization | Status: closed
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution: fixed
Keywords: subquery, group_by | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tim Graham <timograham@…>):

In [changeset:"e595a713cc5ce66dfc5e22f85d671c06d842e99b" e595a713]:
{{{
#!CommitTicketReference repository=""
revision="e595a713cc5ce66dfc5e22f85d671c06d842e99b"
Refs #29542, #30158 -- Enabled a HAVING subquery filter test on Oracle.

Now that subquery annotations aren't included in the GROUP BY unless
explicitly grouped against, the test works on Oracle.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/30158#comment:17>

Django

unread,
Sep 5, 2019, 9:57:52 AM9/5/19
to django-...@googlegroups.com
#30158: Subquery expressions unnecessarily added to group by
-------------------------------------+-------------------------------------
Reporter: Jonny Fuller | Owner: Simon
Type: | Charette
Cleanup/optimization | Status: closed
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution: fixed
Keywords: subquery, group_by | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by GitHub <noreply@…>):

In [changeset:"350123f38c2b6217c38d70bfbd924a9ba3df1289" 350123f]:
{{{
#!CommitTicketReference repository=""
revision="350123f38c2b6217c38d70bfbd924a9ba3df1289"
Moved release note for refs #30158 from deprecated to backwards
incompatible changes.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/30158#comment:18>

Django

unread,
Jan 14, 2021, 2:12:17 PM1/14/21
to django-...@googlegroups.com
#30158: Subquery expressions unnecessarily added to group by
-------------------------------------+-------------------------------------
Reporter: Jonny Fuller | Owner: Simon
Type: | Charette
Cleanup/optimization | Status: closed
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution: fixed
Keywords: subquery, group_by | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak <felisiak.mariusz@…>):

In [changeset:"5e33ec80d153416d3693e89138ed21decf042672" 5e33ec80]:
{{{
#!CommitTicketReference repository=""
revision="5e33ec80d153416d3693e89138ed21decf042672"
Refs #30158 -- Made alias argument required in signature of
Expression.get_group_by_cols() subclasses.

Per deprecation timeline.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/30158#comment:19>

Django

unread,
Oct 6, 2022, 7:54:11 AM10/6/22
to django-...@googlegroups.com
#30158: Subquery expressions unnecessarily added to group by
-------------------------------------+-------------------------------------
Reporter: Jonny Fuller | Owner: Simon
Type: | Charette
Cleanup/optimization | Status: closed
Component: Database layer | Version: dev

(models, ORM) |
Severity: Normal | Resolution: fixed
Keywords: subquery, group_by | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak <felisiak.mariusz@…>):

In [changeset:"c6350d594c359151ee17b0c4f354bb44f28ff69e" c6350d5]:
{{{
#!CommitTicketReference repository=""
revision="c6350d594c359151ee17b0c4f354bb44f28ff69e"
Refs #30158 -- Removed alias argument for Expression.get_group_by_cols().

Recent refactors allowed GROUP BY aliasing allowed for aliasing to be
entirely handled by the sql.Query.set_group_by and compiler layers.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/30158#comment:20>

Reply all
Reply to author
Forward
0 new messages