[Django] #24462: Using .distinct().order_by() on a queryset produces unexpected results

120 views
Skip to first unread message

Django

unread,
Mar 7, 2015, 11:08:14 PM3/7/15
to django-...@googlegroups.com
#24462: Using .distinct().order_by() on a queryset produces unexpected results
-------------------------------------+-------------------------------------
Reporter: trosos | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 1.7
(models, ORM) | Keywords: distinct order_by
Severity: Normal | subquery
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
When I try to order a queryset that uses {{{distinct(...)}}}, e.g.:
{{{
#!python
qs2 = qs.order_by('a', 'b').distinct('a')
qs3 = qs2.order_by('c')
}}}
then Django seems to forget about the first {{{order_by}}} and apply the
second {{{order_by}}} before applying {{{distinct}}}. I would expect the
second {{{order_by}}} to be applied after applying {{{distinct}}}.

This is particularly troublesome when one wants to create a custom manager
on top of {{{qs2}}}.

I don't know if this is the intended behavior, but it is very unexpected
to me (see the section "Rationale").

=== Steps to reproduce ===

Use the Postgre SQL backend.

In models.py:
{{{
#!python
from django.db import models

class YoungestAmongNamesakes(models.Manager):
def get_queryset(self):
return super(YoungestAmongNamesakes, self).get_queryset().\
order_by('name', 'age').distinct('name')

class Human(models.Model):
nick = models.CharField(max_length=20)
name = models.CharField(max_length=20)
age = models.IntegerField()

objects = models.Manager()
youngest_among_namesakes = YoungestAmongNamesakes()
}}}

In the interactive shell:
{{{
#!python
Human.objects.create(nick='foo', name='Helen', age=20)
Human.objects.create(nick='bar', name='Helen', age=23)
Human.objects.create(nick='baz', name='Jennifer', age=15)

for human in Human.youngest_among_namesakes.all():
print human.nick

for human in Human.youngest_among_namesakes.order_by('name', 'nick'):
print human.nick

for human in Human.youngest_among_namesakes.order_by('nick'):
print human.nick
}}}

=== Actual results ===

The first iteration outputs:
{{{
foo
baz
}}}

The second iteration outputs:
{{{
bar
baz
}}}

The third iteration raises
{{{
ProgrammingError: SELECT DISTINCT ON expressions must match initial ORDER
BY expressions
}}}

{{{Human.youngest_among_namesakes.all()}}} generates:
{{{
SELECT DISTINCT ON ("myapp_human"."name") "myapp_human"."id",
"myapp_human"."nick", "myapp_human"."name", "myapp_human"."age" FROM
"myapp_human" ORDER BY "myapp_human"."name" ASC, "myapp_human"."age" ASC
}}}

{{{Human.youngest_among_namesakes.order_by('name', 'nick')}}} generates:
{{{
SELECT DISTINCT ON ("myapp_human"."name") "myapp_human"."id",
"myapp_human"."nick", "myapp_human"."name", "myapp_human"."age" FROM
"myapp_human" ORDER BY "myapp_human"."name" ASC, "myapp_human"."nick" ASC
}}}

Similarly, {{{Human.youngest_among_namesakes.order_by('nick')}}}
generates:
{{{
SELECT DISTINCT ON ("myapp_human"."name") "myapp_human"."id",
"myapp_human"."nick", "myapp_human"."name", "myapp_human"."age" FROM
"myapp_human" ORDER BY "myapp_human"."nick" ASC
}}}

=== Expected results ===

I would expect the second iteration to output:
{{{
foo
baz
}}}

I would expect the third iteration to output:
{{{
baz
foo
}}}

I would expect {{{Human.youngest_among_namesakes.order_by('name',
'nick')}}} to generate the following query:
{{{
SELECT "myapp_human"."id", "myapp_human"."nick", "myapp_human"."name",
"myapp_human"."age" FROM "myapp_human" WHERE "myapp_human"."id" IN (SELECT
DISTINCT ON ("myapp_human"."name") "myapp_human"."id" FROM "myapp_human"
ORDER BY "myapp_human"."name" ASC, "myapp_human"."age" ASC) ORDER BY
"myapp_human"."name" ASC, "myapp_human"."nick" ASC
}}}

Similarly, I would expect
{{{Human.youngest_among_namesakes.order_by('nick')}}} to generate the
following query:
{{{
SELECT "myapp_human"."id", "myapp_human"."nick", "myapp_human"."name",
"myapp_human"."age" FROM "myapp_human" WHERE "myapp_human"."id" IN (SELECT
DISTINCT ON ("myapp_human"."name") "myapp_human"."id" FROM "myapp_human"
ORDER BY "myapp_human"."name" ASC, "myapp_human"."age" ASC) ORDER BY
"myapp_human"."nick" ASC
}}}

=== Rationale ===

I would expect any queryset {{{qs}}} to contain the same objects as
{{{qs.order_by(...)}}}, just in a possibly different order.

=== Workaround ===

As a (suboptimal) workaround, one might use a subquery instead:

{{{
#!python
class YoungestAmongNamesakes(models.Manager):
def get_queryset(self):
qs = super(YoungestAmongNamesakes, self).get_queryset().\
order_by('name', 'age').distinct('name')
return super(YoungestAmongNamesakes,
self).get_queryset().filter(pk__in=qs)
}}}

This however generates unnecessarily complex SQL query for
{{{Human.youngest_among_namesakes.all()}}}.

=== Suggestion ===

I would suggest to rewrite {{{django.db.models.query.QuerySet.order_by}}}
not to unconditionally clear the previous ordering, but to first check
whether {{{distinct(...)}}} is used in the present queryset and use a
subquery in such a case.

Something like this (the code is untested and written quickly just to
document my thought):

{{{
#!python
def order_by(self, *field_names):
"""
Returns a new QuerySet instance with the ordering changed.
"""
assert self.query.can_filter(), \
"Cannot reorder a query once a slice has been taken."
obj = self._clone()

if self.query.distinct and self.query.distinct_fields:
from django.db.models.sql.subqueries import AggregateQuery
subquery = obj

obj = AggregateQuery(obj.model)
try:
obj.add_subquery(subquery, using=self.db)
except EmptyResultSet:
obj = subquery
else:
obj.query.clear_ordering(force_empty=False)

obj.query.add_ordering(*field_names)
return obj
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/24462>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Mar 7, 2015, 11:12:19 PM3/7/15
to django-...@googlegroups.com
#24462: Using .distinct().order_by() on a queryset produces unexpected results
-------------------------------------+-------------------------------------
Reporter: trosos | Owner: nobody
Type: Bug | Status: new

Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: distinct order_by | Triage Stage:
subquery | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by trosos):

* cc: trosos (added)
* needs_docs: => 0
* type: Uncategorized => Bug
* needs_tests: => 0
* needs_better_patch: => 0


--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:1>

Django

unread,
Mar 11, 2015, 3:43:25 PM3/11/15
to django-...@googlegroups.com
#24462: Using .distinct().order_by() on a queryset produces unexpected results
-------------------------------------+-------------------------------------
Reporter: trosos | Owner: nobody
Type: Bug | Status: new

Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: distinct order_by | Triage Stage:
subquery | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by timgraham):

Not sure the proposal is feasible or a good idea, but attaching a doc
patch about `order_by()` clearing previous ordering which we can commit in
the meantime.

--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:2>

Django

unread,
Mar 11, 2015, 3:43:48 PM3/11/15
to django-...@googlegroups.com
#24462: Using .distinct().order_by() on a queryset produces unexpected results
-------------------------------------+-------------------------------------
Reporter: trosos | Owner: nobody
Type: Bug | Status: new

Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: distinct order_by | Triage Stage:
subquery | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by timgraham):

* Attachment "24462.diff" added.

Django

unread,
Mar 13, 2015, 2:29:31 AM3/13/15
to django-...@googlegroups.com
#24462: Using .distinct().order_by() on a queryset produces unexpected results
-------------------------------------+-------------------------------------
Reporter: trosos | Owner: nobody
Type: Bug | Status: new

Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: distinct order_by | Triage Stage:
subquery | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by akaariai):

I don't find changing the way .order_by().distinct().order_by() works a
good idea. It *will* break a lot of existing code. For example, if you
have automatic Meta ordering for model, what should
`Human.objects.distinct('name').order_by('name', 'age')` return?

Instead I think we need a new queryset operation to force Django use the
current results as a subquery. There are various cases where you'd want
this. Simplest one is
`SomeModel.objects.order_by('posted')[0:10].subquery().order_by('-posted')`.

This feature is likely pretty hard to implement correctly. Generating the
SQL for the subquery is easy, but generating references to the subquery's
available fields correctly is likely going to be hard.

--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:3>

Django

unread,
Mar 13, 2015, 3:41:55 PM3/13/15
to django-...@googlegroups.com
#24462: Using .distinct().order_by() on a queryset produces unexpected results
-------------------------------------+-------------------------------------
Reporter: trosos | Owner: nobody
Type: Bug | Status: new

Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: distinct order_by | Triage Stage:
subquery | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tim Graham <timograham@…>):

In [changeset:"b9d9ab23bdcc404708aada664e718a9d56415ca3" b9d9ab23]:
{{{
#!CommitTicketReference repository=""
revision="b9d9ab23bdcc404708aada664e718a9d56415ca3"
Refs #24462 -- Emphasized that order_by() clears previous ordering.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:4>

Django

unread,
Mar 13, 2015, 3:42:19 PM3/13/15
to django-...@googlegroups.com
#24462: Using .distinct().order_by() on a queryset produces unexpected results
-------------------------------------+-------------------------------------
Reporter: trosos | Owner: nobody
Type: Bug | Status: new

Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: distinct order_by | Triage Stage:
subquery | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tim Graham <timograham@…>):

In [changeset:"f8ed647435e13d62d4df0bbc76d37b91d0ad81fb" f8ed647]:
{{{
#!CommitTicketReference repository=""
revision="f8ed647435e13d62d4df0bbc76d37b91d0ad81fb"
[1.7.x] Refs #24462 -- Emphasized that order_by() clears previous
ordering.

Backport of b9d9ab23bdcc404708aada664e718a9d56415ca3 from master
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:5>

Django

unread,
Mar 13, 2015, 3:42:19 PM3/13/15
to django-...@googlegroups.com
#24462: Using .distinct().order_by() on a queryset produces unexpected results
-------------------------------------+-------------------------------------
Reporter: trosos | Owner: nobody
Type: Bug | Status: new

Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: distinct order_by | Triage Stage:
subquery | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tim Graham <timograham@…>):

In [changeset:"5b91802718b61346967e7991cf3585554c4c97fa" 5b91802]:
{{{
#!CommitTicketReference repository=""
revision="5b91802718b61346967e7991cf3585554c4c97fa"
[1.8.x] Refs #24462 -- Emphasized that order_by() clears previous
ordering.

Backport of b9d9ab23bdcc404708aada664e718a9d56415ca3 from master
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:6>

Django

unread,
Mar 13, 2015, 3:43:10 PM3/13/15
to django-...@googlegroups.com
#24462: Add a new QuerySet operation to use current results as a subquery

-------------------------------------+-------------------------------------
Reporter: trosos | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master

(models, ORM) |
Severity: Normal | Resolution:
Keywords: distinct order_by | Triage Stage: Accepted
subquery |

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by timgraham):

* type: Bug => New feature
* version: 1.7 => master
* stage: Unreviewed => Accepted


--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:7>

Django

unread,
May 28, 2021, 10:33:38 AM5/28/21
to django-...@googlegroups.com
#24462: Add a new QuerySet operation to use current results as a subquery
-------------------------------------+-------------------------------------
Reporter: trosos | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: dev

(models, ORM) |
Severity: Normal | Resolution:
Keywords: distinct order_by | Triage Stage: Accepted
subquery |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by John Speno):

* cc: John Speno (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:8>

Django

unread,
May 31, 2021, 2:28:36 PM5/31/21
to django-...@googlegroups.com
#24462: Add a new QuerySet operation to use current results as a subquery
-------------------------------------+-------------------------------------
Reporter: trosos | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: distinct order_by | Triage Stage: Accepted
subquery |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Jameel Al-Aziz):

I believe we have a similar need, but for difference reasons.

We have a JSONB postgres column and want to `jsonb_array_elements`
followed by filtering. Because `jsonb_array_elements` expands into a set
of columns, you cannot apply a `where` clause filter on the function
column. A simple solution is to use `jsonb_array_elements` in a subquery
and then filter the resulting subquery.

Another use case for sub-querying is being able to apply filters on union
queries. Our actual use case involves both `jsonb_array_elements` and
`UNION`s.

A sample query that I'm trying to write with querysets (but so far have
been unable to) is:
{{{
select *
from (
select id, jsonb_array_elements(json_data->'some_array') as elem
from foo as foo1
union
select id, jsonb_array_elements(json_data->'other_array') as elem
from foo as foo2
) as foo_w_elems
where (elem->>'subfield')::int in (
select id
from bar
where expires_at >= CURRENT_TIMESTAMP
)
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:9>

Django

unread,
Jun 8, 2021, 5:25:32 AM6/8/21
to django-...@googlegroups.com
#24462: Add a new QuerySet operation to use current results as a subquery
-------------------------------------+-------------------------------------
Reporter: trosos | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: distinct order_by | Triage Stage: Accepted
subquery |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Jameel A.):

* cc: Jameel A. (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:10>

Django

unread,
Jul 14, 2021, 5:53:37 AM7/14/21
to django-...@googlegroups.com
#24462: Add a new QuerySet operation to use current results as a subquery
-------------------------------------+-------------------------------------
Reporter: trosos | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: distinct order_by | Triage Stage: Accepted
subquery |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Paolo Melchiorre):

* cc: Paolo Melchiorre (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:11>

Django

unread,
Dec 13, 2021, 11:31:51 AM12/13/21
to django-...@googlegroups.com
#24462: Add a new QuerySet operation to use current results as a subquery
-------------------------------------+-------------------------------------
Reporter: trosos | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: distinct order_by | Triage Stage: Accepted
subquery |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Dave Johansen):

* cc: Dave Johansen (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:12>

Django

unread,
Jan 4, 2022, 2:35:58 AM1/4/22
to django-...@googlegroups.com
#24462: Add a new QuerySet operation to use current results as a subquery
-------------------------------------+-------------------------------------
Reporter: trosos | Owner: Ayush
| Joshi
Type: New feature | Status: assigned

Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: distinct order_by | Triage Stage: Accepted
subquery |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Ayush Joshi):

* owner: nobody => Ayush Joshi
* status: new => assigned


--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:13>

Django

unread,
Jan 4, 2022, 3:04:34 AM1/4/22
to django-...@googlegroups.com
#24462: Add a new QuerySet operation to use current results as a subquery
-------------------------------------+-------------------------------------
Reporter: trosos | Owner: (none)
Type: New feature | Status: new

Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: distinct order_by | Triage Stage: Accepted
subquery |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Ayush Joshi):

* owner: Ayush Joshi => (none)
* status: assigned => new


--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:14>

Django

unread,
May 10, 2023, 11:36:28 PM5/10/23
to django-...@googlegroups.com
#24462: Add a new QuerySet operation to use current results as a subquery
-------------------------------------+-------------------------------------
Reporter: trosos | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: distinct order_by | Triage Stage: Accepted
subquery |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by David Sanders):

For posterity, another PR related to this topic:
https://github.com/django/django/pull/16814

--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:13>

Django

unread,
Jan 26, 2024, 10:42:50 AM1/26/24
to django-...@googlegroups.com
#24462: Add a new QuerySet operation to use current results as a subquery
-------------------------------------+-------------------------------------
Reporter: trosos | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: distinct order_by | Triage Stage: Accepted
subquery |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Simon Charette):

Another one relating to aggregation over a union
https://code.djangoproject.com/ticket/35146
--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:14>

Django

unread,
Dec 26, 2024, 12:58:05 PM12/26/24
to django-...@googlegroups.com
#24462: Add a new QuerySet operation to use current results as a subquery
-------------------------------------+-------------------------------------
Reporter: trosos | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: distinct order_by | Triage Stage: Accepted
subquery |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Stephen):

* cc: Stephen (added)

--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:15>

Django

unread,
May 31, 2025, 8:22:00 AMMay 31
to django-...@googlegroups.com
#24462: Add a new QuerySet operation to use current results as a subquery
-------------------------------------+-------------------------------------
Reporter: trosos | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: distinct order_by | Triage Stage: Accepted
subquery |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by ldeluigi):

* cc: ldeluigi (added)

--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:16>
Reply all
Reply to author
Forward
0 new messages