This is particularly troublesome when one wants to create a custom manager
on top of {{{qs2}}}.
I don't know if this is the intended behavior, but it is very unexpected
to me (see the section "Rationale").
=== Steps to reproduce ===
Use the Postgre SQL backend.
In models.py:
{{{
#!python
from django.db import models
class YoungestAmongNamesakes(models.Manager):
def get_queryset(self):
return super(YoungestAmongNamesakes, self).get_queryset().\
order_by('name', 'age').distinct('name')
class Human(models.Model):
nick = models.CharField(max_length=20)
name = models.CharField(max_length=20)
age = models.IntegerField()
objects = models.Manager()
youngest_among_namesakes = YoungestAmongNamesakes()
}}}
In the interactive shell:
{{{
#!python
Human.objects.create(nick='foo', name='Helen', age=20)
Human.objects.create(nick='bar', name='Helen', age=23)
Human.objects.create(nick='baz', name='Jennifer', age=15)
for human in Human.youngest_among_namesakes.all():
print human.nick
for human in Human.youngest_among_namesakes.order_by('name', 'nick'):
print human.nick
for human in Human.youngest_among_namesakes.order_by('nick'):
print human.nick
}}}
=== Actual results ===
The first iteration outputs:
{{{
foo
baz
}}}
The second iteration outputs:
{{{
bar
baz
}}}
The third iteration raises
{{{
ProgrammingError: SELECT DISTINCT ON expressions must match initial ORDER
BY expressions
}}}
{{{Human.youngest_among_namesakes.all()}}} generates:
{{{
SELECT DISTINCT ON ("myapp_human"."name") "myapp_human"."id",
"myapp_human"."nick", "myapp_human"."name", "myapp_human"."age" FROM
"myapp_human" ORDER BY "myapp_human"."name" ASC, "myapp_human"."age" ASC
}}}
{{{Human.youngest_among_namesakes.order_by('name', 'nick')}}} generates:
{{{
SELECT DISTINCT ON ("myapp_human"."name") "myapp_human"."id",
"myapp_human"."nick", "myapp_human"."name", "myapp_human"."age" FROM
"myapp_human" ORDER BY "myapp_human"."name" ASC, "myapp_human"."nick" ASC
}}}
Similarly, {{{Human.youngest_among_namesakes.order_by('nick')}}}
generates:
{{{
SELECT DISTINCT ON ("myapp_human"."name") "myapp_human"."id",
"myapp_human"."nick", "myapp_human"."name", "myapp_human"."age" FROM
"myapp_human" ORDER BY "myapp_human"."nick" ASC
}}}
=== Expected results ===
I would expect the second iteration to output:
{{{
foo
baz
}}}
I would expect the third iteration to output:
{{{
baz
foo
}}}
I would expect {{{Human.youngest_among_namesakes.order_by('name',
'nick')}}} to generate the following query:
{{{
SELECT "myapp_human"."id", "myapp_human"."nick", "myapp_human"."name",
"myapp_human"."age" FROM "myapp_human" WHERE "myapp_human"."id" IN (SELECT
DISTINCT ON ("myapp_human"."name") "myapp_human"."id" FROM "myapp_human"
ORDER BY "myapp_human"."name" ASC, "myapp_human"."age" ASC) ORDER BY
"myapp_human"."name" ASC, "myapp_human"."nick" ASC
}}}
Similarly, I would expect
{{{Human.youngest_among_namesakes.order_by('nick')}}} to generate the
following query:
{{{
SELECT "myapp_human"."id", "myapp_human"."nick", "myapp_human"."name",
"myapp_human"."age" FROM "myapp_human" WHERE "myapp_human"."id" IN (SELECT
DISTINCT ON ("myapp_human"."name") "myapp_human"."id" FROM "myapp_human"
ORDER BY "myapp_human"."name" ASC, "myapp_human"."age" ASC) ORDER BY
"myapp_human"."nick" ASC
}}}
=== Rationale ===
I would expect any queryset {{{qs}}} to contain the same objects as
{{{qs.order_by(...)}}}, just in a possibly different order.
=== Workaround ===
As a (suboptimal) workaround, one might use a subquery instead:
{{{
#!python
class YoungestAmongNamesakes(models.Manager):
def get_queryset(self):
qs = super(YoungestAmongNamesakes, self).get_queryset().\
order_by('name', 'age').distinct('name')
return super(YoungestAmongNamesakes,
self).get_queryset().filter(pk__in=qs)
}}}
This however generates unnecessarily complex SQL query for
{{{Human.youngest_among_namesakes.all()}}}.
=== Suggestion ===
I would suggest to rewrite {{{django.db.models.query.QuerySet.order_by}}}
not to unconditionally clear the previous ordering, but to first check
whether {{{distinct(...)}}} is used in the present queryset and use a
subquery in such a case.
Something like this (the code is untested and written quickly just to
document my thought):
{{{
#!python
def order_by(self, *field_names):
"""
Returns a new QuerySet instance with the ordering changed.
"""
assert self.query.can_filter(), \
"Cannot reorder a query once a slice has been taken."
obj = self._clone()
if self.query.distinct and self.query.distinct_fields:
from django.db.models.sql.subqueries import AggregateQuery
subquery = obj
obj = AggregateQuery(obj.model)
try:
obj.add_subquery(subquery, using=self.db)
except EmptyResultSet:
obj = subquery
else:
obj.query.clear_ordering(force_empty=False)
obj.query.add_ordering(*field_names)
return obj
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/24462>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* cc: trosos (added)
* needs_docs: => 0
* type: Uncategorized => Bug
* needs_tests: => 0
* needs_better_patch: => 0
--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:1>
Comment (by timgraham):
Not sure the proposal is feasible or a good idea, but attaching a doc
patch about `order_by()` clearing previous ordering which we can commit in
the meantime.
--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:2>
* Attachment "24462.diff" added.
Comment (by akaariai):
I don't find changing the way .order_by().distinct().order_by() works a
good idea. It *will* break a lot of existing code. For example, if you
have automatic Meta ordering for model, what should
`Human.objects.distinct('name').order_by('name', 'age')` return?
Instead I think we need a new queryset operation to force Django use the
current results as a subquery. There are various cases where you'd want
this. Simplest one is
`SomeModel.objects.order_by('posted')[0:10].subquery().order_by('-posted')`.
This feature is likely pretty hard to implement correctly. Generating the
SQL for the subquery is easy, but generating references to the subquery's
available fields correctly is likely going to be hard.
--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:3>
Comment (by Tim Graham <timograham@…>):
In [changeset:"b9d9ab23bdcc404708aada664e718a9d56415ca3" b9d9ab23]:
{{{
#!CommitTicketReference repository=""
revision="b9d9ab23bdcc404708aada664e718a9d56415ca3"
Refs #24462 -- Emphasized that order_by() clears previous ordering.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:4>
Comment (by Tim Graham <timograham@…>):
In [changeset:"f8ed647435e13d62d4df0bbc76d37b91d0ad81fb" f8ed647]:
{{{
#!CommitTicketReference repository=""
revision="f8ed647435e13d62d4df0bbc76d37b91d0ad81fb"
[1.7.x] Refs #24462 -- Emphasized that order_by() clears previous
ordering.
Backport of b9d9ab23bdcc404708aada664e718a9d56415ca3 from master
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:5>
Comment (by Tim Graham <timograham@…>):
In [changeset:"5b91802718b61346967e7991cf3585554c4c97fa" 5b91802]:
{{{
#!CommitTicketReference repository=""
revision="5b91802718b61346967e7991cf3585554c4c97fa"
[1.8.x] Refs #24462 -- Emphasized that order_by() clears previous
ordering.
Backport of b9d9ab23bdcc404708aada664e718a9d56415ca3 from master
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:6>
* type: Bug => New feature
* version: 1.7 => master
* stage: Unreviewed => Accepted
--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:7>
* cc: John Speno (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:8>
Comment (by Jameel Al-Aziz):
I believe we have a similar need, but for difference reasons.
We have a JSONB postgres column and want to `jsonb_array_elements`
followed by filtering. Because `jsonb_array_elements` expands into a set
of columns, you cannot apply a `where` clause filter on the function
column. A simple solution is to use `jsonb_array_elements` in a subquery
and then filter the resulting subquery.
Another use case for sub-querying is being able to apply filters on union
queries. Our actual use case involves both `jsonb_array_elements` and
`UNION`s.
A sample query that I'm trying to write with querysets (but so far have
been unable to) is:
{{{
select *
from (
select id, jsonb_array_elements(json_data->'some_array') as elem
from foo as foo1
union
select id, jsonb_array_elements(json_data->'other_array') as elem
from foo as foo2
) as foo_w_elems
where (elem->>'subfield')::int in (
select id
from bar
where expires_at >= CURRENT_TIMESTAMP
)
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:9>
* cc: Jameel A. (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:10>
* cc: Paolo Melchiorre (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:11>
* cc: Dave Johansen (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:12>
* owner: nobody => Ayush Joshi
* status: new => assigned
--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:13>
* owner: Ayush Joshi => (none)
* status: assigned => new
--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:14>
Comment (by David Sanders):
For posterity, another PR related to this topic:
https://github.com/django/django/pull/16814
--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:13>
Another one relating to aggregation over a union
https://code.djangoproject.com/ticket/35146
--
Ticket URL: <https://code.djangoproject.com/ticket/24462#comment:14>