[Django] #27719: Add queryset.alias() to mimic .annotate() for aggregations without loading data

39 views
Skip to first unread message

Django

unread,
Jan 10, 2017, 3:14:06 PM1/10/17
to django-...@googlegroups.com
#27719: Add queryset.alias() to mimic .annotate() for aggregations without loading
data
-------------------------------------+-------------------------------------
Reporter: Marc | Owner: nobody
Tamlyn |
Type: New | Status: new
feature |
Component: Database | Version: master
layer (models, ORM) |
Severity: Normal | Keywords:
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
At the moment the Expressions API currently does not allow creating a
value for later use in the query without returning that data from the
database. In some cases this can be quite expensive over the pipe, and can
also influence query execution where functional indexes are involved. I
would like to introduce an API like `alias()`, which does the same thing
as `annotate()` to allow reuse in later querysets but doesn't return the
value.

To be explicit, we are changing a query from one of the form:

`SELECT to_tsvector('english', "some_field"), "some", "other", "fields"
FROM "table" WHERE to_tsvector('english', "some_field") @@
plainto_tsquery("search term") ... LIMIT 10;`

to one of the form:

`SELECT "some", "other", "fields" FROM "table" WHERE
to_tsvector('english', "some_field") @@ plainto_tsquery("search term") ...
LIMIT 10;`

--
Ticket URL: <https://code.djangoproject.com/ticket/27719>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jan 10, 2017, 4:25:52 PM1/10/17
to django-...@googlegroups.com
#27719: Add queryset.alias() to mimic .annotate() for aggregations without loading
data
-------------------------------------+-------------------------------------
Reporter: Marc Tamlyn | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Christophe Pettus):

Doing a quick example query: In a table with a relatively large text
field, but just 2,400 rows, the difference between returning all rows and
all fields without the ts_vector() call and returning all with it was
12.7ms vs 24,000ms.

Doing a full sequential scan (no index) with both calculating the
to_tsvector() for return ''and'' in query predicate is 27,000ms without
recalculating the to_tsvector for return, 46,700ms with recalculating it.

After having created a functional index on the to_tsvector() call, and
using a relatively common search term, the time to return without
recalculating the to_tsvector was 1.7ms, vs 21,200ms with recalculating
the ts_vector.

So, in any scenario, not having to recalcuate the to_tsvector() is a win.

--
Ticket URL: <https://code.djangoproject.com/ticket/27719#comment:1>

Django

unread,
Jan 11, 2017, 10:52:16 AM1/11/17
to django-...@googlegroups.com
#27719: Add queryset.alias() to mimic .annotate() for aggregations without loading
data
-------------------------------------+-------------------------------------
Reporter: Marc Tamlyn | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

Could this be solved by allowing expressions to be passed to `filter()`
instead? #25367.

--
Ticket URL: <https://code.djangoproject.com/ticket/27719#comment:2>

Django

unread,
Jan 11, 2017, 11:06:00 AM1/11/17
to django-...@googlegroups.com
#27719: Add queryset.alias() to mimic .annotate() for aggregations without loading
data
-------------------------------------+-------------------------------------
Reporter: Marc Tamlyn | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Marc Tamlyn):

Whilst support in all the places which can use an annotated name would
solve this kind of performance problem, there are loads of ways you can
use a field name after it's defined.

`.filter(name__lookup=value)`
`.order_by('name')`
`.annotate(SomeOtherExpression('name'))`
`.values('name')` (and related APIs)`
`.distinct()`
`.date()` (and `datetimes`)
`.defer('name')` (and `only`)

I know that some of these places already support expressions. For very
complex expressions there's utility in being able to "name" the expression
as some kind of virtual field. If you're within the same function then a
python reference is usually sufficient (except for the `__lookup` case),
but if you have for example a manager method which would annotate a
certain field and you use it elsewhere then things get messy quickly.

--
Ticket URL: <https://code.djangoproject.com/ticket/27719#comment:3>

Django

unread,
Jan 11, 2017, 11:31:43 AM1/11/17
to django-...@googlegroups.com
#27719: Add queryset.alias() to mimic .annotate() for aggregations without loading
data
-------------------------------------+-------------------------------------
Reporter: Marc Tamlyn | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* stage: Unreviewed => Accepted


Comment:

That makes sense, thanks for the clarification.

--
Ticket URL: <https://code.djangoproject.com/ticket/27719#comment:4>

Django

unread,
Jan 12, 2017, 7:20:13 AM1/12/17
to django-...@googlegroups.com
#27719: Add queryset.alias() to mimic .annotate() for aggregations without loading
data
-------------------------------------+-------------------------------------
Reporter: Marc Tamlyn | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Josh Smeaton):

I'm kind of on the fence with this. The 90% case is expressions in WHERE
clauses. We **need** expression support in filter/exclude, including the
ability to filter without a left-hand-side (column reference) for things
such as EXISTS( subq ).

Having `alias()` feels like a fairly big hack.

Let's look at your list of example usages of aliases:

.filter(name__lookup=value) - Yes, Yes, Yes. But solved by expressions-in-
filters.
.order_by('name') - Already supports expressions
.annotate(SomeOtherExpression('name')) - Already supported.
.values('name') (and related APIs)` - If you want the values, then it's
already in the select list and should be kept there. But expressions in
values works now, no?
.distinct() - Already in the select list, refer by name.
.date() (and datetimes) - unsure about this, fairly uncommon?
.defer('name') (and only) - it's already in the select list

Out of all of the above, except for `filter` we always want the expression
in the select list anyway. And when we use defer, we still want it in the
annotations list so we can use it later (I think).

Am I missing something? The only time you might not want the value in the
select list is if you're running a filter and possibly an order by.
**maybe** an annotation over another expression where the other expression
shouldn't be selected, but I feel that'd be a very small set of users,
just to avoid creating two objects that are effectively the same.

--
Ticket URL: <https://code.djangoproject.com/ticket/27719#comment:5>

Django

unread,
Jan 12, 2017, 9:48:32 AM1/12/17
to django-...@googlegroups.com
#27719: Add queryset.alias() to mimic .annotate() for aggregations without loading
data
-------------------------------------+-------------------------------------
Reporter: Marc Tamlyn | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Marc Tamlyn):

I take your point about the select list, most of those APIs (e.g. values,
distinct) do indeed require the field to be selected making alias slightly
redundant.

For me it's very explicitly not a hack, and the fact we can use annotate
is the hack. But that's bikeshedding.

More practically, consider this code:

{{{
query = SearchQuery('some term')
return qs.alias(
main_vector=SearchVector('title', 'body'),
secondary_vector=SearchVector('parent__title', 'parent__body'),
).alias(
vector=F('main_vector') + F('secondard_vector'),
).alias(
rank=Case(
When(vector__search=query, then=SearchRank('vector', query)),
When(secondary_vector__search=query,
then=SearchRank('secondary_vector', query)),
# other cases here
)
).order_by('-rank')
}}}

(I'm aware this example is contrived, but I have some pretty complex
annotations for reporting, and they're even more fun with `SubQuery`... In
most of these cases it doesn't matter that this field is in the
select_list, although it doesn't need to be.

Here I've actually used the "alias approach" and the "variable approach" -
one for vectors and one for queries.

The alias approach has allowed me to:
- Incrementally build up expressions in the query
- Name values and potentially reuse them at a later time in the query
creation phase - perhaps I have some other function which takes the
returned queryset from this method and counts the records where the rank
is > 0.7 to give an indication of strong matching results, then without
the naming from annotate or alias I'd have to access or recreate the
component expressions. Rank is an int, so again it doesn't really matter
to have it queryable.

I think #25367 is an important piece of work, and it would help solve the
performance problem by allowing unselected values. I think though that
there is a merit in being able to create a "virtual field" using an
expression, and use it in subsequent querysets. This would be an easier
solution to this particular performance issue, and it also gives the user
explicit API control over the select list where needed, rather than
implicitly by how you used your expression. It's not an either/or
scenario.

--
Ticket URL: <https://code.djangoproject.com/ticket/27719#comment:6>

Django

unread,
Nov 24, 2017, 8:55:37 AM11/24/17
to django-...@googlegroups.com
#27719: Add queryset.alias() to mimic .annotate() for aggregations without loading
data
-------------------------------------+-------------------------------------
Reporter: Marc Tamlyn | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Cristiano Coelho):

In case it helps, I have got an issue using annotate to simply select some
values and performance related to it and the way it interacts with count:
https://groups.google.com/forum/#!topic/django-users/q6XdfyK29HA

Looks like alias might help this issue.

--
Ticket URL: <https://code.djangoproject.com/ticket/27719#comment:7>

Django

unread,
Sep 26, 2018, 7:57:36 PM9/26/18
to django-...@googlegroups.com
#27719: Add queryset.alias() to mimic .annotate() for aggregations without loading
data
-------------------------------------+-------------------------------------
Reporter: Marc Tamlyn | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

FWIW `alias(...)` was initially proposed in
https://code.djangoproject.com/ticket/16187#comment:28 but was later
removed.

--
Ticket URL: <https://code.djangoproject.com/ticket/27719#comment:8>

Django

unread,
Jun 25, 2019, 10:05:33 AM6/25/19
to django-...@googlegroups.com
#27719: Add queryset.alias() to mimic .annotate() for aggregations without loading
data
-------------------------------------+-------------------------------------
Reporter: Marc Tamlyn | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

[https://groups.google.com/d/msg/django-developers/OmAtiBbq-
EM/8ArMxdKiBQAJ There was a recent post to django-developers] where a user
was trying to use `annotate(foo=expr).defer('foo')` for this exact
purpose. Their use case was more about avoiding the double evaluation of
an `EXISTS` expression which is probably fixed on the master branch by
1ca825e4dc186da2b93292b5c848a3e5445968d7 and could be addressed by #25367
which was previously mentioned but it remains that adding annotation
support to `defer` and `only` could be another solution for this issue.

--
Ticket URL: <https://code.djangoproject.com/ticket/27719#comment:9>

Django

unread,
May 24, 2020, 3:18:17 PM5/24/20
to django-...@googlegroups.com
#27719: Add queryset.alias() to mimic .annotate() for aggregations without loading
data
-------------------------------------+-------------------------------------
Reporter: Marc Tamlyn | Owner: Alexandr
| Tatarinov
Type: New feature | Status: assigned

Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Alexandr Tatarinov):

* owner: nobody => Alexandr Tatarinov
* status: new => assigned


Comment:

I am working on this one. I've had a lot of cases where unneeded
expressions ended up in a select clause.
When someone builds complex queries spanning multiple methods and modules,
this can save a lot of CPU (for ORM to build the query) and query
execution time.
My suggestion is to support only the usages below, values(), distinct()
and aggregate() will require to use .annotate() to add the expression into
the select list.

{{{
.alias(alias=SomeExpression()).annotate(SomeOtherExpression('alias'))
.alias(alias=SomeExpression()).filter(alias__lookup=value)
.alias(alias=SomeExpression()).order_by('alias')
.alias(alias=SomeExpression()).update(field=F('alias'))
}}}

Also automatically supported: earliest/latest and dates/datetimes.

And if you want to use values() or distinct(), promote alias to annotation
{{{
.alias(alias=SomeExpression()).annotate(alias=F('alias')).distinct('alias')
}}}


The implementation relies on annotation_mask to remove an expression from
the select list, all other logic remains the same: despite the expression
is not selected, the joins and group by are the same as for annotation.

--
Ticket URL: <https://code.djangoproject.com/ticket/27719#comment:10>

Django

unread,
May 25, 2020, 4:04:01 AM5/25/20
to django-...@googlegroups.com
#27719: Add queryset.alias() to mimic .annotate() for aggregations without loading
data
-------------------------------------+-------------------------------------
Reporter: Marc Tamlyn | Owner: Alexandr
| Tatarinov
Type: New feature | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Alexandr Tatarinov):

[https://github.com/django/django/pull/12971 PR]

--
Ticket URL: <https://code.djangoproject.com/ticket/27719#comment:11>

Django

unread,
May 25, 2020, 10:48:06 AM5/25/20
to django-...@googlegroups.com
#27719: Add queryset.alias() to mimic .annotate() for aggregations without loading
data
-------------------------------------+-------------------------------------
Reporter: Marc Tamlyn | Owner: Alexandr
| Tatarinov
Type: New feature | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Alexandr Tatarinov):

* has_patch: 0 => 1


--
Ticket URL: <https://code.djangoproject.com/ticket/27719#comment:12>

Django

unread,
Jul 19, 2020, 11:30:09 PM7/19/20
to django-...@googlegroups.com
#27719: Add queryset.alias() to mimic .annotate() for aggregations without loading
data
-------------------------------------+-------------------------------------
Reporter: Marc Tamlyn | Owner: Alexandr
| Tatarinov
Type: New feature | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* needs_better_patch: 0 => 1


Comment:

Left some comments for improvements on the PR but pending that the
proposed changes LGTM.

--
Ticket URL: <https://code.djangoproject.com/ticket/27719#comment:13>

Django

unread,
Jul 23, 2020, 11:15:47 AM7/23/20
to django-...@googlegroups.com
#27719: Add queryset.alias() to mimic .annotate() for aggregations without loading
data
-------------------------------------+-------------------------------------
Reporter: Marc Tamlyn | Owner: Alexandr
| Tatarinov
Type: New feature | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Alexandr Tatarinov):

* needs_better_patch: 1 => 0


Comment:

Made requested changes

--
Ticket URL: <https://code.djangoproject.com/ticket/27719#comment:14>

Django

unread,
Jul 30, 2020, 5:52:03 AM7/30/20
to django-...@googlegroups.com
#27719: Add queryset.alias() to mimic .annotate() for aggregations without loading
data
-------------------------------------+-------------------------------------
Reporter: Marc Tamlyn | Owner: Alexandr
| Tatarinov
Type: New feature | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by felixxm):

* needs_better_patch: 0 => 1


--
Ticket URL: <https://code.djangoproject.com/ticket/27719#comment:15>

Django

unread,
Jul 30, 2020, 9:01:10 AM7/30/20
to django-...@googlegroups.com
#27719: Add queryset.alias() to mimic .annotate() for aggregations without loading
data
-------------------------------------+-------------------------------------
Reporter: Marc Tamlyn | Owner: Alexandr
| Tatarinov
Type: New feature | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Alexandr Tatarinov):

* needs_better_patch: 1 => 0


--
Ticket URL: <https://code.djangoproject.com/ticket/27719#comment:16>

Django

unread,
Jul 31, 2020, 7:21:55 AM7/31/20
to django-...@googlegroups.com
#27719: Add queryset.alias() to mimic .annotate() for aggregations without loading
data
-------------------------------------+-------------------------------------
Reporter: Marc Tamlyn | Owner: Alexandr
| Tatarinov
Type: New feature | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by felixxm):

* stage: Accepted => Ready for checkin


--
Ticket URL: <https://code.djangoproject.com/ticket/27719#comment:17>

Django

unread,
Jul 31, 2020, 4:03:51 PM7/31/20
to django-...@googlegroups.com
#27719: Add queryset.alias() to mimic .annotate() for aggregations without loading
data
-------------------------------------+-------------------------------------
Reporter: Marc Tamlyn | Owner: Alexandr
| Tatarinov
Type: New feature | Status: closed

Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution: fixed

Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak <felisiak.mariusz@…>):

* status: assigned => closed
* resolution: => fixed


Comment:

In [changeset:"f4ac167119e8897c398527c392ed117326496652" f4ac1671]:
{{{
#!CommitTicketReference repository=""
revision="f4ac167119e8897c398527c392ed117326496652"
Fixed #27719 -- Added QuerySet.alias() to allow creating reusable aliases.

QuerySet.alias() allows creating reusable aliases for expressions that
don't need to be selected but are used for filtering, ordering, or as
a part of complex expressions.

Thanks Simon Charette for reviews.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/27719#comment:18>

Reply all
Reply to author
Forward
0 new messages