[Django] #33768: MySQL ordering of nulls last/first is broken in combination with UNION

132 views
Skip to first unread message

Django

unread,
Jun 5, 2022, 3:40:19 PM6/5/22
to django-...@googlegroups.com
#33768: MySQL ordering of nulls last/first is broken in combination with UNION
-------------------------------------+-------------------------------------
Reporter: Florian | Owner: nobody
Apolloner |
Type: Bug | Status: new
Component: Database | Version: dev
layer (models, ORM) |
Severity: Normal | Keywords:
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
putting the following into test_qs_combinators.py:
{{{
def test_ordering_with_null_first(self):
Number.objects.filter(other_num=5).update(other_num=None)
qs = (
Number.objects.filter(num__lte=1)
.union(Number.objects.filter(num__gte=2))
.order_by(F("other_num").asc(nulls_first=True))
.values_list("other_num", flat=True)
)
self.assertQuerysetEqual(qs, [None, 1, 2, 3, 4, 6, 7, 8, 9, 10])
qs = (
Number.objects.filter(num__lte=1)
.union(Number.objects.filter(num__gte=2))
.order_by(F("other_num").asc(nulls_last=True))
.values_list("other_num", flat=True)
)
self.assertQuerysetEqual(qs, [1, 2, 3, 4, 6, 7, 8, 9, 10, None])
}}}

results in:

{{{
======================================================================
FAIL: test_ordering_with_null_first
(queries.test_qs_combinators.QuerySetSetOperationTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File
"/home/florian/sources/django.git/tests/queries/test_qs_combinators.py",
line 171, in test_ordering_with_null_first
self.assertQuerysetEqual(qs, [1, 2, 3, 4, 6, 7, 8, 9, 10, None])
File "/home/florian/sources/django.git/django/test/testcases.py", line
1328, in assertQuerysetEqual
return self.assertEqual(list(items), values, msg=msg)
AssertionError: Lists differ: [None, 1, 2, 3, 4, 6, 7, 8, 9, 10] != [1, 2,
3, 4, 6, 7, 8, 9, 10, None]

First differing element 0:
None
1

- [None, 1, 2, 3, 4, 6, 7, 8, 9, 10]
+ [1, 2, 3, 4, 6, 7, 8, 9, 10, None]

----------------------------------------------------------------------
(0.000) UPDATE `queries_number` SET `other_num` = NULL WHERE
`queries_number`.`other_num` = 5; args=(5,); alias=default
(0.000) (SELECT `queries_number`.`other_num` FROM `queries_number` WHERE
`queries_number`.`num` <= 1) UNION (SELECT `queries_number`.`other_num`
FROM `queries_number` WHERE `queries_number`.`num` >= 2) ORDER BY (1) ASC;
args=(1, 2); alias=default
(0.000) (SELECT `queries_number`.`other_num` FROM `queries_number` WHERE
`queries_number`.`num` <= 1) UNION (SELECT `queries_number`.`other_num`
FROM `queries_number` WHERE `queries_number`.`num` >= 2) ORDER BY (1) IS
NULL, (1) ASC; args=(1, 2); alias=default

----------------------------------------------------------------------
}}}

Due to the `UNION` we are rewriting queries to use column numbers so we
can somewhat reliably target the columns. This breaks down when
`nulls_first/last` is used because `(1) IS NULL` is interpreted as
expression in mysql and not as the colname 1 being NULL. Which brings me
back to https://twitter.com/fapolloner/status/1533512493208936450 -- I
think that the rewriting of thise expression should happen in the compiler
and not in `OrderBy` itself. This way we could duplicate the `OrderBy`
into two and push them into the relevant select clauses like we already do
in
https://github.com/django/django/blob/49b470b9187b6be60e573fed08c8f4a87f133750/django/db/models/sql/compiler.py#L410
for other things.

Sadly it is a mess… Thoughts? Paging Adam since he was the resident MySQL
specialist IIRC :)

--
Ticket URL: <https://code.djangoproject.com/ticket/33768>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jun 5, 2022, 3:47:08 PM6/5/22
to django-...@googlegroups.com
#33768: MySQL ordering of nulls last/first is broken in combination with UNION
-------------------------------------+-------------------------------------
Reporter: Florian Apolloner | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by Florian Apolloner:

Old description:

New description:

results in:

----------------------------------------------------------------------
}}}

think that the rewriting of this expression should happen in the compiler


and not in `OrderBy` itself. This way we could duplicate the `OrderBy`
into two and push them into the relevant select clauses like we already do
in
https://github.com/django/django/blob/49b470b9187b6be60e573fed08c8f4a87f133750/django/db/models/sql/compiler.py#L410
for other things.

Sadly it is a mess… Thoughts? Paging Adam since he was the resident MySQL
specialist IIRC :)

--

--
Ticket URL: <https://code.djangoproject.com/ticket/33768#comment:1>

Django

unread,
Jun 5, 2022, 7:39:27 PM6/5/22
to django-...@googlegroups.com
#33768: MySQL ordering of nulls last/first is broken in combination with UNION
-------------------------------------+-------------------------------------
Reporter: Florian Apolloner | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* cc: Simon Charette (added)


Comment:

What if we performed a subquery pushdown instead to make compilation more
predictable so we can have the order by clause refer to the union columns
by name?

{{{#!sql
SELECT `other_num`
FROM (


(SELECT `queries_number`.`other_num` FROM `queries_number` WHERE
`queries_number`.`num` <= 1)
UNION
(SELECT `queries_number`.`other_num` FROM `queries_number` WHERE
`queries_number`.`num` >= 2)

)
ORDER BY `other_num` IS NULL
}}}

This is a strategy that is used successfully for aggregation for example.
I don't see `OrderBy.as_sql` usage of `template` for compilation is an
abuse at all; it's in charge of it's own compilation unit. As mentioned on
#33767 I'd prefer if we errored out on constant order by usage instead of
making leaps to support it because it has ambiguous evaluation rules.

--
Ticket URL: <https://code.djangoproject.com/ticket/33768#comment:2>

Django

unread,
Jun 6, 2022, 4:57:15 AM6/6/22
to django-...@googlegroups.com
#33768: MySQL ordering of nulls last/first is broken in combination with UNION
-------------------------------------+-------------------------------------
Reporter: Florian Apolloner | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Carlton Gibson):

* stage: Unreviewed => Accepted


--
Ticket URL: <https://code.djangoproject.com/ticket/33768#comment:3>

Django

unread,
Jun 6, 2022, 5:25:34 AM6/6/22
to django-...@googlegroups.com
#33768: MySQL ordering of nulls last/first is broken in combination with UNION
-------------------------------------+-------------------------------------
Reporter: Florian Apolloner | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Florian Apolloner):

> What if we performed a subquery pushdown instead to make compilation
more predictable so we can have the order by clause refer to the union
columns by name?

Maybe, in hindsight that would probably be have the way to implement
unions (etc) in the first place. Maybe now is the time to switch to that
:) What does a subquery mean in terms of performance -- I'd hope the
database wouldn't care to much…

> I don't see `OrderBy.as_sql` usage of template for compilation is an
abuse at all; it's in charge of it's own compilation unit.

That is fair, but then again it still requires help from the outside
because we have to pull it into the select clause when used with distinct.
Which then results in horrible queries like the one outlined by Tim in
https://github.com/django/django/pull/15687#issuecomment-1133798092 (see
"The old SQL") where we have the initial select value and then once with
`IS NULL` and once without because we can't match it up to the initial
select value that is aliased to `last_date` anymore. In practice part of
this is caused by
https://github.com/django/django/blob/49b470b9187b6be60e573fed08c8f4a87f133750/django/db/models/sql/compiler.py#L466
where we use a regex to parse the generated SQL from `OrderBy` and match
it back to the select clause, which then obviously fails because the
compilation of `OrderBy` generated two clauses. Imo this is all rather
fragile…

Also, and this is where #33767 comes back into play, we should try as hard
as possible to reuse aliases where possible. Tim's old SQL example
executed on PG nowadays (his is CockroachDB) looks like this:

{{{
SELECT DISTINCT "ordering_author"."id",
"ordering_author"."name",
"ordering_author"."editor_id",
(SELECT Max(U0."pub_date") AS "last_date"
FROM "ordering_article" U0
WHERE ( U0."author_id" = ( "ordering_author"."id" )
AND Upper(U0."headline" :: text) LIKE Upper(
'%Article%') )
GROUP BY U0."author_id") AS "last_date",
FROM "ordering_author"
ORDER BY (SELECT Max(U0."pub_date") AS "last_date"
FROM "ordering_article" U0
WHERE ( U0."author_id" = ( "ordering_author"."id" )
AND Upper(U0."headline" :: text) LIKE
Upper('%Article%') )
GROUP BY U0."author_id") DESC NULLS FIRST
}}}

when it could be

{{{
SELECT DISTINCT "ordering_author"."id",
"ordering_author"."name",
"ordering_author"."editor_id",
(SELECT Max(U0."pub_date") AS "last_date"
FROM "ordering_article" U0
WHERE ( U0."author_id" = ( "ordering_author"."id" )
AND Upper(U0."headline" :: text) LIKE Upper(
'%Article%') )
GROUP BY U0."author_id") AS "last_date",
FROM "ordering_author"
ORDER BY "last_date" DESC NULLS FIRST
}}}

Reusing aliases and references seems like a good idea. It makes the query
imo more readable and also means less parsing overhead for the server
(probably negligible but still).

Last but not least, the reason I discovered this is the new code for
psycopg3 support. Psycopg3 makes use of server side param binding which
introduces a whole new class of issues. To be fair this is not postgres
alone but also informix and others. When actually using server side params
you run into funny things like this: Assume Django ends up generating a
query like this:
{{{
cursor.execute("""
SELECT left(name, %s) AS prefix, count(*) AS number
FROM people
GROUP BY left(name, %s)
ORDER BY left(name, %s)
""",
[2, 2, 2])
}}}

This works perfectly fine with client side merging (simply replace %s with
2), but breaks done once you use server side binding because the server
gets `$1 to $3` for the params and doesn't know that they are actually the
same and as such fails because it no longer knows if the group by is part
of the select clause or not. Mind you, this example is rather simplistic
but Django actually generates those like shown in Tim's example above.

To fix this we have two options:
* Switch to named parameters and introduce even more logic to always
write %(same_param)s in the previous example.
* Switch our usage of order by / group by to use aliases and column
numbers whenever possible.

--
Ticket URL: <https://code.djangoproject.com/ticket/33768#comment:4>

Django

unread,
Jun 6, 2022, 11:56:30 AM6/6/22
to django-...@googlegroups.com
#33768: MySQL ordering of nulls last/first is broken in combination with UNION
-------------------------------------+-------------------------------------
Reporter: Florian Apolloner | Owner: a1nn1997
Type: Bug | Status: assigned

Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by a1nn1997):

* owner: nobody => a1nn1997
* status: new => assigned


Comment:

Hello Everyone, I am new to open source contribution. But had used Django
for the last 3 years. I want to contribute to the resolution of this bug
so that I can start my journey as an open-source contributor to Django.

--
Ticket URL: <https://code.djangoproject.com/ticket/33768#comment:5>

Django

unread,
Jun 20, 2022, 12:46:50 AM6/20/22
to django-...@googlegroups.com
#33768: MySQL ordering of nulls last/first is broken in combination with UNION
-------------------------------------+-------------------------------------
Reporter: Florian Apolloner | Owner: Aniruddh
| Singh

Type: Bug | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

Aniruddh, we appreciate your interest but this issue is quite complex so
you might want to focus on another issue instead for your first
contributions.
----

Thanks for the great summary Florian!

> Maybe, in hindsight that would probably should have been the way to


implement unions (etc) in the first place. Maybe now is the time to switch
to that :) What does a subquery mean in terms of performance -- I'd hope
the database wouldn't care to much…

From some limited local testing on SQLite, PostgreSQL, and MySQL and
looking at the generate `EXPLAIN` plan it doesn't seem like they care too
much. But it might not be necessary after some more thoughts as the `ORDER
BY` clause allows for ordering by aliases as well.

> Reusing aliases and references seems like a good idea. It makes the
query imo more readable and also means less parsing overhead for the
server (probably negligible but still).

I completely agree with this statement but I don't understand how this
relates to using selected field indexes instead of aliases?

From my understanding and testing engines locally it seems that all
engines allow to refer to the first query member of the union by column
aliases just like by index so once we've addressed the issue with ordering
by selected references not being used we could entirely drop the
specialized combinator logic that uses `RawSQL` to order by column index?

--
Ticket URL: <https://code.djangoproject.com/ticket/33768#comment:6>

Django

unread,
Jul 1, 2022, 2:37:22 AM7/1/22
to django-...@googlegroups.com
#33768: MySQL ordering of nulls last/first is broken in combination with UNION
-------------------------------------+-------------------------------------
Reporter: Florian Apolloner | Owner: Aniruddh
| Singh
Type: Bug | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Florian Apolloner):

Hi Simon, sorry I lost track of this one in my mails.

> From my understanding and testing engines locally it seems that all
engines allow to refer to the first query member of the union by column
aliases just like by index

Interesting, that would be great indeed. I think I might have had issues
with UNION at the time I implemented it but if aliases work around the
board, that would be preferably imo.

> we could entirely drop the specialized combinator logic that uses RawSQL
to order by column index?

Hopefully yes :)

--
Ticket URL: <https://code.djangoproject.com/ticket/33768#comment:7>

Django

unread,
Jul 7, 2022, 9:49:22 AM7/7/22
to django-...@googlegroups.com
#33768: MySQL ordering of nulls last/first is broken in combination with UNION
-------------------------------------+-------------------------------------
Reporter: Florian Apolloner | Owner: Aniruddh
| Singh
Type: Bug | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

Submitted a [https://github.com/django/django/pull/15831 PR] for further
discussion on the subject.

--
Ticket URL: <https://code.djangoproject.com/ticket/33768#comment:8>

Django

unread,
Oct 5, 2022, 3:16:45 AM10/5/22
to django-...@googlegroups.com
#33768: MySQL ordering of nulls last/first is broken in combination with UNION
-------------------------------------+-------------------------------------
Reporter: Florian Apolloner | Owner: Simon
| Charette

Type: Bug | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* owner: Aniruddh Singh => Simon Charette
* has_patch: 0 => 1
* stage: Accepted => Ready for checkin


Comment:

[https://github.com/django/django/pull/16150 PR]

--
Ticket URL: <https://code.djangoproject.com/ticket/33768#comment:9>

Django

unread,
Oct 5, 2022, 8:30:01 AM10/5/22
to django-...@googlegroups.com
#33768: MySQL ordering of nulls last/first is broken in combination with UNION
-------------------------------------+-------------------------------------
Reporter: Florian Apolloner | Owner: Simon
| Charette
Type: Bug | Status: closed

Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution: fixed

Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak <felisiak.mariusz@…>):

* status: assigned => closed
* resolution: => fixed


Comment:

In [changeset:"c58a8acd413ccc992dd30afd98ed900897e1f719" c58a8acd]:
{{{
#!CommitTicketReference repository=""
revision="c58a8acd413ccc992dd30afd98ed900897e1f719"
Fixed #33768 -- Fixed ordering compound queries by nulls_first/nulls_last
on MySQL.

Columns of the left outer most select statement in a combined query
can be referenced by alias just like by index.

This removes combined query ordering by column index and avoids an
unnecessary usage of RawSQL which causes issues for backends that
specialize the treatment of null ordering.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/33768#comment:10>

Django

unread,
Nov 1, 2022, 1:51:30 AM11/1/22
to django-...@googlegroups.com
#33768: MySQL ordering of nulls last/first is broken in combination with UNION
-------------------------------------+-------------------------------------
Reporter: Florian Apolloner | Owner: Simon
| Charette
Type: Bug | Status: closed
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution: fixed
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak <felisiak.mariusz@…>):

In [changeset:"f47fec31f8b51ad8fdc0a5d6805d0130f8700ca8" f47fec3]:
{{{
#!CommitTicketReference repository=""
revision="f47fec31f8b51ad8fdc0a5d6805d0130f8700ca8"
Refs #33768 -- Fixed ordering compound queries by NULLs on SQLite < 3.30.

The lack of support for native nulls last/first on SQLite 3.28 and 3.29
requires the compound query to be wrapped for emulation layer to work
properly.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/33768#comment:11>

Reply all
Reply to author
Forward
0 new messages