[Django] #28296: Subquery with annotation fails without tweaking

43 views
Skip to first unread message

Django

unread,
Jun 10, 2017, 10:16:51 AM6/10/17
to django-...@googlegroups.com
#28296: Subquery with annotation fails without tweaking
-------------------------------------+-------------------------------------
Reporter: László | Owner: nobody
Károlyi |
Type: Bug | Status: new
Component: Database | Version: 1.11
layer (models, ORM) |
Severity: Normal | Keywords:
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
Hey guys,

after having spent a couple of hours figuring out why a subquery annotated
with `Count` fails, I came to the conclusion that `Count` in a query
causes `GROUP BY` clauses added to it, which in turn renders the SQL
subquery useless.

Please investigate this example which I just added to Stack Overflow as a
resolution, and see what I'm talking about:

https://stackoverflow.com/a/44474268/1067833

--
Ticket URL: <https://code.djangoproject.com/ticket/28296>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jun 10, 2017, 8:19:09 PM6/10/17
to django-...@googlegroups.com
#28296: Subquery with annotation fails without tweaking
-------------------------------------+-------------------------------------
Reporter: László Károlyi | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

Hello László,

It's not clear to me why you not expecting a `GROUP BY` statement to be
present in the subquery when you are explicitly grouping by `id` by using
`values('id')` before annotating. Please see the section about aggregation
and the usage of `values()` to `GROUP BY` in the
[https://docs.djangoproject.com/en/1.11/topics/db/aggregation/#order-of-
annotate-and-values-clauses aggregation documentation].

Also, is there a reason you are not simply using
[https://docs.djangoproject.com/en/1.11/ref/models/conditional-expressions
/#conditional-aggregation conditional aggregation] for this query instead?

{{{#!python
Performance.objects.annotate(
reserved_seats=Count(
Case(When(seat_reservations__status__in=TAKEN_TYPES,
then='seat_reservations')),
),
)
}}}

Note that I'm assuming `SeatReservation.performance.related_name ==
'seat_reservations'` above.

--
Ticket URL: <https://code.djangoproject.com/ticket/28296#comment:1>

Django

unread,
Jun 11, 2017, 8:41:17 AM6/11/17
to django-...@googlegroups.com
#28296: Subquery with annotation fails without tweaking
-------------------------------------+-------------------------------------
Reporter: László Károlyi | Owner: nobody
Type: Bug | Status: new

Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by László Károlyi):

Hello Simon, thanks for getting back to me.

Replying to [comment:1 Simon Charette]:


> It's not clear to me why you not expecting a `GROUP BY` statement to be
present in the subquery when you are explicitly grouping by `id` by using
`values('id')` before annotating. Please see the section about aggregation
and the usage of `values()` to `GROUP BY` in the
[https://docs.djangoproject.com/en/1.11/topics/db/aggregation/#order-of-
annotate-and-values-clauses aggregation documentation].

The reason for why I needed to get rid of the `GROUP BY` clause was
because it broke the `COUNT()` on the SQL level. I recognized this after
going down to the SQL level and fiddling around with native SQL queries to
find out why I'm getting `NULL` values at the counts. So here's a
simplified version how the clause with `GROUP BY` looks, and what results
it brings, copying from the MariaDB command line:

{{{#!sql
MariaDB [ticketshop]> select ticketshop_event_performance.id, (select
count(u0.`id`) as count from ticketshop_booking_seatreservation u0 where
(ticketshop_event_performance.id=(u0.performance_id)) and status='xxx'
group by u0.id) as count from ticketshop_event_performance limit 1;
+-----+-------+
| id | count |
+-----+-------+
| 134 | NULL |
+-----+-------+
1 row in set (0.00 sec)
}}}

As you can see, it returns `NULL` in the count column. And so I removed
the `GROUP BY` to see if it works that way:

{{{#!sql
MariaDB [ticketshop]> select ticketshop_event_performance.id, (select
count(u0.`id`) as count from ticketshop_booking_seatreservation u0 where
(ticketshop_event_performance.id=(u0.performance_id)) and status='xxx') as
count from ticketshop_event_performance limit 1;
+-----+-------+
| id | count |
+-----+-------+
| 134 | 0 |
+-----+-------+
1 row in set (0.00 sec)
}}}

See, it's returning proper values now. So that is why I went back to the
ORM level and played around with it until I managed to remove the `GROUP
BY` clause from the subquery. Basically, empirical testing.

Also it's worth mentioning that when I managed to remove the `GROUP BY` on
the ORM level, executing the query without specifying the output field
resulted in an exception when evaluating the query the first time, but not
the second time:

{{{#!python
In [8]: from ticketshop.booking.choices import TAKEN_TYPES

In [9]: rq = SeatReservation.objects.filter(
...: performance=OuterRef(name='pk'),
status__in=TAKEN_TYPES
...: ).values('id').annotate(count=Count('id')).values('count')

In [10]: rq.query.group_by = []

In [11]: a = Performance.objects.annotate(count=Subquery(rq))

In [12]: a[0].count
### LOTS OF TRACEBACK CUT
# ~/Work/venv/lib/python3.6/site-packages/django/db/models/expressions.py
in _resolve_output_field(self) line 280:
# FieldError: Expression contains mixed types. You must set output_field
In [13]: a[0].count
Out[13]: 0
}}}

> Also, is there a reason you are not simply using
[https://docs.djangoproject.com/en/1.11/ref/models/conditional-expressions
/#conditional-aggregation conditional aggregation] for this query instead?
>
> {{{#!python
> Performance.objects.annotate(
> reserved_seats=Count(
> Case(When(seat_reservations__status__in=TAKEN_TYPES,
then='seat_reservations')),
> ),
> )
> }}}
>
> Note that I'm assuming `SeatReservation.performance.related_name ==
'seat_reservations'` above.

Thanks for the hint, I wasn't aware of this method, I'll look into it
later.

--
Ticket URL: <https://code.djangoproject.com/ticket/28296#comment:2>

Django

unread,
Jun 11, 2017, 8:54:41 PM6/11/17
to django-...@googlegroups.com
#28296: Add support for aggregation through subqueries

-------------------------------------+-------------------------------------
Reporter: László Károlyi | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master

(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* version: 1.11 => master
* type: Bug => New feature
* stage: Unreviewed => Accepted


Comment:

Hello László, thanks for taking the time to provide all these details.

I think that what you'd like to do use is the `aggregate` method instead
of the `annotate` one to avoid the group by. Something along the lines of

{{{#!python
Performance.objects.annotate(
reserved_seats=Subquery(
SeatReservation.objects.filter(
performance=OuterRef(name='pk'),
status__in=TAKEN_TYPES,
).aggregate(Count('pk'))
),
)
}}}


Unfortunately the `Subquery` API
[https://github.com/django/django/pull/6478#issuecomment-217759006 doesn't
allow that yet].

It would require to either make `aggregate` return a lazy object that
`Subquery` can deal with (right now it returns a `dict` on call) which has
the potential of breaking backward compatibility or introducing a new kind
of expression to deal with this case (e.g. `AggregateSubquery(query,
Count('pk'))`).

In an ideal world we'd be able to call `count()` directly and avoid the
`Subquery` wrapper


{{{#!python
Performance.objects.annotate(
reserved_seats=SeatReservation.objects.filter(
performance=OuterRef(name='pk'),
status__in=TAKEN_TYPES,
).count()
)
}}}

I'm converting the ticket to a feature request for subquery aggregation.
It has a bit of overlap with #10060 as it would add a way to work around
the issue by explicitly using subqueries as opposed to having the ORM do
automatic pushdown but I still consider this a feature on it's own.

--
Ticket URL: <https://code.djangoproject.com/ticket/28296#comment:3>

Django

unread,
Jun 11, 2017, 9:43:24 PM6/11/17
to django-...@googlegroups.com
#28296: Add support for aggregation through subqueries
-------------------------------------+-------------------------------------
Reporter: László Károlyi | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Matthew Schinckel):

I attempted to comment in the SO post, but comments are hardly conducive
to code blocks.

I've written several `Subquery` subclasses to do aggregation of the
subquery: perhaps that will work here.


{{{
class Count(Subquery):
template = "(SELECT COUNT(*) FROM (%(subquery)s) _count)"

@property
def output_field(self):
return models.IntegerField()
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/28296#comment:4>

Django

unread,
Jul 16, 2018, 1:26:10 PM7/16/18
to django-...@googlegroups.com
#28296: Add support for aggregation through subqueries
-------------------------------------+-------------------------------------
Reporter: László Károlyi | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Sylvain Zimmer):

Hello !

I solved this with a custom class:

{{{#!python

class SubQueryCount(Subquery):
output_field = models.IntegerField()
def __init__(self, *args, **kwargs):
Subquery.__init__(self, *args, **kwargs)
self.queryset =
self.queryset.annotate(cnt=Count("*")).values("cnt")
self.queryset.query.set_group_by() # values() adds a GROUP BY we
don't want here

subquery = Media.objects.filter(artist=OuterRef('id'))
artists_with_count_medias =
Artist.objects.all().annotate(count_medias=SubQueryCount(subquery))

}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/28296#comment:5>

Django

unread,
Jul 22, 2018, 2:34:57 PM7/22/18
to django-...@googlegroups.com
#28296: Add support for aggregation through subqueries
-------------------------------------+-------------------------------------
Reporter: László Károlyi | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by S. Dole Melipone):

I would like to offer the following solution using the new SQL `Window`
class. This allows the aggregation of annotated values by calculating the
aggregate over partitions based on the outer query model (in the GROUP BY
clause), then annotating that data to every row in the subquery queryset.
The subquery can then use the aggregated data from the first row returned
and ignore the other rows.

I'd like some opinions and feedback on whether this is the optimal
solution for aggregating arbitrary annotated data in a subquery. If so I'd
be happy to write something up for the documentation. I couldn't find any
other way to do this in the ORM! This also works well for calling `Sum`,
`Avg` etc. rather than `Count` on annotated values on the subquery
queryset.

{{{


Performance.objects.annotate(
reserved_seats=Subquery(
SeatReservation.objects.filter(
performance=OuterRef(name='pk'),
status__in=TAKEN_TYPES,

).annotate(
reserved_seat_count=Window(
expression=Count('pk'),
partition_by=[F('performance')]
),
).values('reserved_seat_count')[:1],
output_field=FloatField()
)
)
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/28296#comment:6>

Django

unread,
Sep 18, 2018, 7:33:15 AM9/18/18
to django-...@googlegroups.com
#28296: Add support for aggregation through subqueries
-------------------------------------+-------------------------------------
Reporter: László Károlyi | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Diederik van der Boor):

> ... either make aggregate return a lazy object that Subquery can deal


with (right now it returns a dict on call)

I second that! It would make subqueries much easier to use.
A lazy dict or `AggregateResult` doesn't have to be too complicated.
Providing `__getitem__`, `__contains__`, etc.. like `UserDict` would do
the trick.
The same could happen with `.count()`. By letting it return a lazy
`CountResult` object that casts to an int, it doesn't change the existing
code, but also offers some `.query` attribute that `Subquery()` can use.

This would allow things like:

{{{
Performance.objects.annotate(
reserved_seats=Subquery(
SeatReservation.objects.filter(
performance=OuterRef(name='pk'), status__in=TAKEN_TYPES

).aggregate(Count('pk'))
)
)
}}}

Or even:

{{{
Performance.objects.annotate(
reserved_seats=Subquery(
SeatReservation.objects.filter(
performance=OuterRef(name='pk'), status__in=TAKEN_TYPES

).count()
)
)
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/28296#comment:7>

Django

unread,
Nov 27, 2018, 3:43:42 AM11/27/18
to django-...@googlegroups.com
#28296: Add support for aggregation through subqueries
-------------------------------------+-------------------------------------
Reporter: László Károlyi | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

> A lazy dict or AggregateResult doesn't have to be too complicated.

Providing __getitem__, __contains__, etc.. like UserDict would do the
trick.

The main challenge here is that it changes the nature of `count()` and
`aggregate()` methods so that they no longer perform a query
automatically. This is certainly going to break a few tests but it should
be possible to provide shims through `__int__` for `count()` and `dict`
subclassing for `aggregate`.

--
Ticket URL: <https://code.djangoproject.com/ticket/28296#comment:8>

Django

unread,
Nov 27, 2018, 3:43:51 AM11/27/18
to django-...@googlegroups.com
#28296: Add support for aggregation through subqueries
-------------------------------------+-------------------------------------
Reporter: László Károlyi | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* cc: Simon Charette (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/28296#comment:9>

Django

unread,
Nov 27, 2018, 5:56:21 AM11/27/18
to django-...@googlegroups.com
#28296: Add support for aggregation through subqueries
-------------------------------------+-------------------------------------
Reporter: László Károlyi | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

I did a bit of investigation to determine the length of changes this would
require. I started with trying to make `count()` return a lazy object as
it's a bit easier to implement than `aggregate()`. I didn't implement the
`resolve_expression` or `as_sql` for now.

As expected making `count()` lazy breaks a few tests that expect the query
to be immediately executed but but only a few adjustments to `LazyObject`
are required to get most of the suite passing.

Still this breaks backward compatiblity and would certainly break more
than just test code out there. For example, a view a could be getting a
count, perfoming some alterations and retrieving a new count and that
would break if the first query is not immediately executed.

The work I've done can be found here
https://github.com/django/django/compare/master...charettes:ticket-28296

What I suggest we do here instead is to introduce a new
`AggregateSubQuery(queryset, aggregate)` expression instead. It's
certainly not as elegant as using `count()` or `aggregate()` but it does
maintain backward compatiblity.

e.g.

{{{#!python
Performance.objects.annotate(
reserved_seats=AggregateSubQuery(SeatReservation.objects.filter(
performance=OuterRef(name='pk'),
status__in=TAKEN_TYPES,
), Count('pk')),
)
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/28296#comment:10>

Django

unread,
Nov 28, 2018, 9:08:55 AM11/28/18
to django-...@googlegroups.com
#28296: Add support for aggregation through subqueries
-------------------------------------+-------------------------------------
Reporter: László Károlyi | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

Another API alternative could be allow queries to be passed directly to
aggregation functions

{{{#!python
# Implicit Count(*)
Performance.objects.annotate(
reserved_seats=Count(SeatReservation.objects.filter(
performance=OuterRef('pk'),
status__in=TAKEN_TYPES,
))
)

# Explicit Count('pk')
Performance.objects.annotate(
reserved_seats=Count(SeatReservation.objects.filter(
performance=OuterRef('pk'),
status__in=TAKEN_TYPES,
).values('pk'))
)

# Sum('seat__amount')
Performance.objects.annotate(
reserved_seats_total_amount=Sum(SeatReservation.objects.filter(
performance=OuterRef('pk'),
status__in=TAKEN_TYPES,
).values('seat__amount')
)
}}}

Or to allow passing a `subquery` kwarg to aggregation expressions. It can
either be `True` or a query

{{{#!python
# subquery=True
Performance.objects.annotate(
reserved_seats=Count(
'seat_reservations',
filter=Q(status__in=TAKEN_TYPES),
subquery=True,
),
)

# explicit subquery
Performance.objects.annotate(
reserved_seats=Count(
'seat_reservations',
subquery=SeatReservation.objects.filter(
status__in=TAKEN_TYPES,
),
),
)

# Sum('seat_reservations__seat__amount')
Performance.objects.annotate(
reserved_seats_total_amount=Sum(
'seat_reservations__seat__amount'',
subquery=True,
),
)
}}}

I think I like the`subquery` kwarg alternative better. It blends
naturally with the newly added `filter` kwarg and result in less verbose
expressions as the `subquery` shouldn't have to be provided most of the
time.

--
Ticket URL: <https://code.djangoproject.com/ticket/28296#comment:11>

Django

unread,
Apr 2, 2019, 1:47:55 PM4/2/19
to django-...@googlegroups.com
#28296: Add support for aggregation through subqueries
-------------------------------------+-------------------------------------
Reporter: László Károlyi | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

I looks like someone went ahead [https://github.com/martsberger/django-
sql-utils and implemented] the `AggregateSubquery` API.

I'd like to throw another one into the mix which I believe would be more
suitable than the `AggregateSubquery` and `subquery` kwarg idea; an
`as_subquery()` method that returns a `Subquery` instance. It would offer
a nicer interface through composition and allow encapsulation of most of
the subquery creation logic instead of doing it at `Aggregate` expression
resolving and compiling time. The `as_*` pattern is also coherent with the
`Queryset.as_manager` interface so it shouldn't be completely alien to
most of our users.

--
Ticket URL: <https://code.djangoproject.com/ticket/28296#comment:12>

Django

unread,
Sep 27, 2019, 9:52:13 AM9/27/19
to django-...@googlegroups.com
#28296: Add support for aggregation through subqueries
-------------------------------------+-------------------------------------
Reporter: László Károlyi | Owner: B
| Martsberger
Type: New feature | Status: assigned

Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by B Martsberger):

* owner: nobody => B Martsberger
* status: new => assigned


--
Ticket URL: <https://code.djangoproject.com/ticket/28296#comment:13>

Django

unread,
Sep 27, 2019, 1:23:11 PM9/27/19
to django-...@googlegroups.com
#28296: Add support for aggregation through subqueries
-------------------------------------+-------------------------------------
Reporter: László Károlyi | Owner: B
| Martsberger
Type: New feature | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by felixxm):

* has_patch: 0 => 1


Comment:

[https://github.com/django/django/pull/11841 PR]

--
Ticket URL: <https://code.djangoproject.com/ticket/28296#comment:14>

Django

unread,
Oct 24, 2019, 1:21:34 PM10/24/19
to django-...@googlegroups.com
#28296: Add support for aggregation through subqueries
-------------------------------------+-------------------------------------
Reporter: László Károlyi | Owner: B
| Martsberger
Type: New feature | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Evgeny Arshinov):

* cc: Evgeny Arshinov (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/28296#comment:15>

Django

unread,
Jan 20, 2020, 3:16:36 AM1/20/20
to django-...@googlegroups.com
#28296: Add support for aggregation through subqueries
-------------------------------------+-------------------------------------
Reporter: László Károlyi | Owner: B
| Martsberger
Type: New feature | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by felixxm):

* needs_better_patch: 0 => 1


Comment:

Marking as "needs improvement" due to Simon's comments.

--
Ticket URL: <https://code.djangoproject.com/ticket/28296#comment:16>

Django

unread,
Jan 1, 2021, 11:04:02 AM1/1/21
to django-...@googlegroups.com
#28296: Add support for aggregation through subqueries
-------------------------------------+-------------------------------------
Reporter: László Károlyi | Owner: B
| Martsberger
Type: New feature | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Matt Hegarty):

I wanted to comment in case it helps anyone else.

My use case was to use Django's subquery API to retrieve a count of
"active" days. That is, the number of days on which events were recorded.

The (postgres) SQL for this query would be:

{{{
SELECT "player_player"."id",
(SELECT count(1) from (select date_trunc('day', evt.timestamp)
"day"
FROM "player_event" evt
WHERE (evt."player_id" = "player_player"."id")
GROUP BY evt."player_id", "day") a) AS "day_count"

FROM "player_player"
}}}

I had to solve this using a subclass of Subquery:

{{{
class EventDayCountSubquery(Subquery):
template = "(SELECT COUNT(1) FROM (%(subquery)s) _count)"
output_field = IntegerField()
}}}

I could then use this as a regular annotation:

{{{
event_day_count = (
Event.objects.filter(player=OuterRef("pk"))
.annotate(day=TruncDay("timestamp")).values("day")
.annotate(count=Count("pk"))
)

Player.objects.annotate(
event_day_count_90_day=EventDayCountSubquery(event_day_count)
)
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/28296#comment:17>

Django

unread,
Jan 1, 2021, 10:47:52 PM1/1/21
to django-...@googlegroups.com
#28296: Add support for aggregation through subqueries
-------------------------------------+-------------------------------------
Reporter: László Károlyi | Owner: B
| Martsberger
Type: New feature | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by B Martsberger):

Matt, I'm not sure your use case will be improved by this ticket due to
the need to group by a computed field. But it also looks like it's not
necessary to subclass Subquery with a new template.


{{{
event_day_subquery = Subquery(
Event.objects.annotate(day=TruncDay('timestamp'))
.values('project_id', 'day')
.annotate(cnt=Count(Value(1)))
.values('cnt')
.filter(project_id=OuterRef('id'))

Player.objects.annotate(day_count=event_day_count).values('id',
'day_count')
}}}

Produces SQL like

{{{
SELECT "player_player"."id",

(SELECT COUNT(1) AS "cnt"
FROM "player_event" U0
WHERE U0."player_id" = "player_player"."id"
GROUP BY U0."player_id",
DATE_TRUNC('day', U0."timestamp" )) AS "day_count"
FROM "player_player"
}}}

Replying to [comment:17 Matt Hegarty]:

--
Ticket URL: <https://code.djangoproject.com/ticket/28296#comment:18>

Django

unread,
Aug 13, 2022, 10:11:07 PM8/13/22
to django-...@googlegroups.com
#28296: Add support for aggregation through subqueries
-------------------------------------+-------------------------------------
Reporter: László Károlyi | Owner: B
| Martsberger
Type: New feature | Status: assigned
Component: Database layer | Version: dev

(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

#10621 was a 13 years old duplicate.

--
Ticket URL: <https://code.djangoproject.com/ticket/28296#comment:19>

Django

unread,
Dec 20, 2023, 7:19:14 AM12/20/23
to django-...@googlegroups.com
#28296: Add support for aggregation through subqueries
-------------------------------------+-------------------------------------
Reporter: László Károlyi | Owner: B
| Martsberger
Type: New feature | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by şuayip üzülmez):

* cc: şuayip üzülmez (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/28296#comment:20>

Django

unread,
Jan 25, 2024, 8:36:54 AMJan 25
to django-...@googlegroups.com
#28296: Add support for aggregation through subqueries
-------------------------------------+-------------------------------------
Reporter: László Károlyi | Owner: B
| Martsberger
Type: New feature | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Ionel Cristian Mărieș):

* cc: Ionel Cristian Mărieș (added)

--
Ticket URL: <https://code.djangoproject.com/ticket/28296#comment:21>

Reply all
Reply to author
Forward
0 new messages