results in:
{{{
======================================================================
FAIL: test_ordering_with_null_first
(queries.test_qs_combinators.QuerySetSetOperationTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File
"/home/florian/sources/django.git/tests/queries/test_qs_combinators.py",
line 171, in test_ordering_with_null_first
self.assertQuerysetEqual(qs, [1, 2, 3, 4, 6, 7, 8, 9, 10, None])
File "/home/florian/sources/django.git/django/test/testcases.py", line
1328, in assertQuerysetEqual
return self.assertEqual(list(items), values, msg=msg)
AssertionError: Lists differ: [None, 1, 2, 3, 4, 6, 7, 8, 9, 10] != [1, 2,
3, 4, 6, 7, 8, 9, 10, None]
First differing element 0:
None
1
- [None, 1, 2, 3, 4, 6, 7, 8, 9, 10]
+ [1, 2, 3, 4, 6, 7, 8, 9, 10, None]
----------------------------------------------------------------------
(0.000) UPDATE `queries_number` SET `other_num` = NULL WHERE
`queries_number`.`other_num` = 5; args=(5,); alias=default
(0.000) (SELECT `queries_number`.`other_num` FROM `queries_number` WHERE
`queries_number`.`num` <= 1) UNION (SELECT `queries_number`.`other_num`
FROM `queries_number` WHERE `queries_number`.`num` >= 2) ORDER BY (1) ASC;
args=(1, 2); alias=default
(0.000) (SELECT `queries_number`.`other_num` FROM `queries_number` WHERE
`queries_number`.`num` <= 1) UNION (SELECT `queries_number`.`other_num`
FROM `queries_number` WHERE `queries_number`.`num` >= 2) ORDER BY (1) IS
NULL, (1) ASC; args=(1, 2); alias=default
----------------------------------------------------------------------
}}}
Due to the `UNION` we are rewriting queries to use column numbers so we
can somewhat reliably target the columns. This breaks down when
`nulls_first/last` is used because `(1) IS NULL` is interpreted as
expression in mysql and not as the colname 1 being NULL. Which brings me
back to https://twitter.com/fapolloner/status/1533512493208936450 -- I
think that the rewriting of thise expression should happen in the compiler
and not in `OrderBy` itself. This way we could duplicate the `OrderBy`
into two and push them into the relevant select clauses like we already do
in
https://github.com/django/django/blob/49b470b9187b6be60e573fed08c8f4a87f133750/django/db/models/sql/compiler.py#L410
for other things.
Sadly it is a mess… Thoughts? Paging Adam since he was the resident MySQL
specialist IIRC :)
--
Ticket URL: <https://code.djangoproject.com/ticket/33768>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
Old description:
New description:
results in:
----------------------------------------------------------------------
}}}
think that the rewriting of this expression should happen in the compiler
and not in `OrderBy` itself. This way we could duplicate the `OrderBy`
into two and push them into the relevant select clauses like we already do
in
https://github.com/django/django/blob/49b470b9187b6be60e573fed08c8f4a87f133750/django/db/models/sql/compiler.py#L410
for other things.
Sadly it is a mess… Thoughts? Paging Adam since he was the resident MySQL
specialist IIRC :)
--
--
Ticket URL: <https://code.djangoproject.com/ticket/33768#comment:1>
* cc: Simon Charette (added)
Comment:
What if we performed a subquery pushdown instead to make compilation more
predictable so we can have the order by clause refer to the union columns
by name?
{{{#!sql
SELECT `other_num`
FROM (
(SELECT `queries_number`.`other_num` FROM `queries_number` WHERE
`queries_number`.`num` <= 1)
UNION
(SELECT `queries_number`.`other_num` FROM `queries_number` WHERE
`queries_number`.`num` >= 2)
)
ORDER BY `other_num` IS NULL
}}}
This is a strategy that is used successfully for aggregation for example.
I don't see `OrderBy.as_sql` usage of `template` for compilation is an
abuse at all; it's in charge of it's own compilation unit. As mentioned on
#33767 I'd prefer if we errored out on constant order by usage instead of
making leaps to support it because it has ambiguous evaluation rules.
--
Ticket URL: <https://code.djangoproject.com/ticket/33768#comment:2>
* stage: Unreviewed => Accepted
--
Ticket URL: <https://code.djangoproject.com/ticket/33768#comment:3>
Comment (by Florian Apolloner):
> What if we performed a subquery pushdown instead to make compilation
more predictable so we can have the order by clause refer to the union
columns by name?
Maybe, in hindsight that would probably be have the way to implement
unions (etc) in the first place. Maybe now is the time to switch to that
:) What does a subquery mean in terms of performance -- I'd hope the
database wouldn't care to much…
> I don't see `OrderBy.as_sql` usage of template for compilation is an
abuse at all; it's in charge of it's own compilation unit.
That is fair, but then again it still requires help from the outside
because we have to pull it into the select clause when used with distinct.
Which then results in horrible queries like the one outlined by Tim in
https://github.com/django/django/pull/15687#issuecomment-1133798092 (see
"The old SQL") where we have the initial select value and then once with
`IS NULL` and once without because we can't match it up to the initial
select value that is aliased to `last_date` anymore. In practice part of
this is caused by
https://github.com/django/django/blob/49b470b9187b6be60e573fed08c8f4a87f133750/django/db/models/sql/compiler.py#L466
where we use a regex to parse the generated SQL from `OrderBy` and match
it back to the select clause, which then obviously fails because the
compilation of `OrderBy` generated two clauses. Imo this is all rather
fragile…
Also, and this is where #33767 comes back into play, we should try as hard
as possible to reuse aliases where possible. Tim's old SQL example
executed on PG nowadays (his is CockroachDB) looks like this:
{{{
SELECT DISTINCT "ordering_author"."id",
"ordering_author"."name",
"ordering_author"."editor_id",
(SELECT Max(U0."pub_date") AS "last_date"
FROM "ordering_article" U0
WHERE ( U0."author_id" = ( "ordering_author"."id" )
AND Upper(U0."headline" :: text) LIKE Upper(
'%Article%') )
GROUP BY U0."author_id") AS "last_date",
FROM "ordering_author"
ORDER BY (SELECT Max(U0."pub_date") AS "last_date"
FROM "ordering_article" U0
WHERE ( U0."author_id" = ( "ordering_author"."id" )
AND Upper(U0."headline" :: text) LIKE
Upper('%Article%') )
GROUP BY U0."author_id") DESC NULLS FIRST
}}}
when it could be
{{{
SELECT DISTINCT "ordering_author"."id",
"ordering_author"."name",
"ordering_author"."editor_id",
(SELECT Max(U0."pub_date") AS "last_date"
FROM "ordering_article" U0
WHERE ( U0."author_id" = ( "ordering_author"."id" )
AND Upper(U0."headline" :: text) LIKE Upper(
'%Article%') )
GROUP BY U0."author_id") AS "last_date",
FROM "ordering_author"
ORDER BY "last_date" DESC NULLS FIRST
}}}
Reusing aliases and references seems like a good idea. It makes the query
imo more readable and also means less parsing overhead for the server
(probably negligible but still).
Last but not least, the reason I discovered this is the new code for
psycopg3 support. Psycopg3 makes use of server side param binding which
introduces a whole new class of issues. To be fair this is not postgres
alone but also informix and others. When actually using server side params
you run into funny things like this: Assume Django ends up generating a
query like this:
{{{
cursor.execute("""
SELECT left(name, %s) AS prefix, count(*) AS number
FROM people
GROUP BY left(name, %s)
ORDER BY left(name, %s)
""",
[2, 2, 2])
}}}
This works perfectly fine with client side merging (simply replace %s with
2), but breaks done once you use server side binding because the server
gets `$1 to $3` for the params and doesn't know that they are actually the
same and as such fails because it no longer knows if the group by is part
of the select clause or not. Mind you, this example is rather simplistic
but Django actually generates those like shown in Tim's example above.
To fix this we have two options:
* Switch to named parameters and introduce even more logic to always
write %(same_param)s in the previous example.
* Switch our usage of order by / group by to use aliases and column
numbers whenever possible.
--
Ticket URL: <https://code.djangoproject.com/ticket/33768#comment:4>
* owner: nobody => a1nn1997
* status: new => assigned
Comment:
Hello Everyone, I am new to open source contribution. But had used Django
for the last 3 years. I want to contribute to the resolution of this bug
so that I can start my journey as an open-source contributor to Django.
--
Ticket URL: <https://code.djangoproject.com/ticket/33768#comment:5>
Comment (by Simon Charette):
Aniruddh, we appreciate your interest but this issue is quite complex so
you might want to focus on another issue instead for your first
contributions.
----
Thanks for the great summary Florian!
> Maybe, in hindsight that would probably should have been the way to
implement unions (etc) in the first place. Maybe now is the time to switch
to that :) What does a subquery mean in terms of performance -- I'd hope
the database wouldn't care to much…
From some limited local testing on SQLite, PostgreSQL, and MySQL and
looking at the generate `EXPLAIN` plan it doesn't seem like they care too
much. But it might not be necessary after some more thoughts as the `ORDER
BY` clause allows for ordering by aliases as well.
> Reusing aliases and references seems like a good idea. It makes the
query imo more readable and also means less parsing overhead for the
server (probably negligible but still).
I completely agree with this statement but I don't understand how this
relates to using selected field indexes instead of aliases?
From my understanding and testing engines locally it seems that all
engines allow to refer to the first query member of the union by column
aliases just like by index so once we've addressed the issue with ordering
by selected references not being used we could entirely drop the
specialized combinator logic that uses `RawSQL` to order by column index?
--
Ticket URL: <https://code.djangoproject.com/ticket/33768#comment:6>
Comment (by Florian Apolloner):
Hi Simon, sorry I lost track of this one in my mails.
> From my understanding and testing engines locally it seems that all
engines allow to refer to the first query member of the union by column
aliases just like by index
Interesting, that would be great indeed. I think I might have had issues
with UNION at the time I implemented it but if aliases work around the
board, that would be preferably imo.
> we could entirely drop the specialized combinator logic that uses RawSQL
to order by column index?
Hopefully yes :)
--
Ticket URL: <https://code.djangoproject.com/ticket/33768#comment:7>
Comment (by Simon Charette):
Submitted a [https://github.com/django/django/pull/15831 PR] for further
discussion on the subject.
--
Ticket URL: <https://code.djangoproject.com/ticket/33768#comment:8>
* owner: Aniruddh Singh => Simon Charette
* has_patch: 0 => 1
* stage: Accepted => Ready for checkin
Comment:
[https://github.com/django/django/pull/16150 PR]
--
Ticket URL: <https://code.djangoproject.com/ticket/33768#comment:9>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"c58a8acd413ccc992dd30afd98ed900897e1f719" c58a8acd]:
{{{
#!CommitTicketReference repository=""
revision="c58a8acd413ccc992dd30afd98ed900897e1f719"
Fixed #33768 -- Fixed ordering compound queries by nulls_first/nulls_last
on MySQL.
Columns of the left outer most select statement in a combined query
can be referenced by alias just like by index.
This removes combined query ordering by column index and avoids an
unnecessary usage of RawSQL which causes issues for backends that
specialize the treatment of null ordering.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/33768#comment:10>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"f47fec31f8b51ad8fdc0a5d6805d0130f8700ca8" f47fec3]:
{{{
#!CommitTicketReference repository=""
revision="f47fec31f8b51ad8fdc0a5d6805d0130f8700ca8"
Refs #33768 -- Fixed ordering compound queries by NULLs on SQLite < 3.30.
The lack of support for native nulls last/first on SQLite 3.28 and 3.29
requires the compound query to be wrapped for emulation layer to work
properly.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/33768#comment:11>