Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

#28560 - distinct() on ordered queryset with restricted list of columns returns incorrect result

175 views
Skip to first unread message

Tobias McNulty

unread,
Oct 28, 2020, 3:04:33 PM10/28/20
to django-developers
Hi all,

Starting a thread on ticket #28560, "distinct() on ordered queryset with restricted list of columns returns incorrect result." In a nutshell:

$ cat testapp/models.py 

from django.db import models



class School(models.Model):

    name = models.CharField(max_length=255)

    county = models.CharField(max_length=255)


    class Meta:

        ordering = ("name",)


$ python manage.py shell

...

>>> from testapp.models import School

>>> str(School.objects.values("county").distinct().query)

'SELECT DISTINCT "testapp_school"."county", "testapp_school"."name" FROM "testapp_school" ORDER BY "testapp_school"."name" ASC'


Note the "name" column is added implicitly to the SELECT clause due to the default ordering on the model (I believe with good reason, since #7070). This is not specific to a default ordering; it's also possible to generate unintended results with a mismatching list of columns between an explicit order_by() and values().

It's possible to fix with an empty order_by():

>>> str(School.objects.values("county").order_by().distinct().query)

'SELECT DISTINCT "testapp_school"."county" FROM "testapp_school"'


But, this still feels like a case where we could do better. Some potential options:
  1. It looks like there was an initial attempt to fix the issue with a subquery, but from what I can tell it was not possible to preserve ordering in the outer query.
  2. My colleague Dmitriy pointed out that there may be a precedent for excluding the default ordering for queries like this.
  3. An option I suggested on the ticket is to raise an error if the list of columns in values() is insufficient to run the requested query (i.e., never add a column implicitly if the user specified a list of columns via values() or values_list()).
What do others think? What are other potential fixes I'm not thinking of?

Cheers,

Tobias McNulty
Chief Executive Officer

tob...@caktusgroup.com
www.caktusgroup.com

Adam Johnson

unread,
Oct 28, 2020, 3:38:11 PM10/28/20
to django-d...@googlegroups.com
I’d vote for option 2. I don’t think it can be expected the ordering will be obeyed when not selecting the columns it includes.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAMGFDKRQsbUz%3DeYKxX2U%2B892AgQbXr%3DMA9AiWw9vfiPcM%2BnAXw%40mail.gmail.com.
--
Adam

Tobias McNulty

unread,
Oct 28, 2020, 3:46:54 PM10/28/20
to django-developers
On Wed, Oct 28, 2020 at 3:38 PM Adam Johnson <m...@adamj.eu> wrote:
I’d vote for option 2. I don’t think it can be expected the ordering will be obeyed when not selecting the columns it includes.

Thanks! To make sure I understand, are you suggesting we should leave the behavior with an explicit order_by() as-is, or not add "name" to the SELECT in this case either?

>>> str(School.objects.values("county").order_by("name").distinct().query)

'SELECT DISTINCT "testapp_school"."county", "testapp_school"."name" FROM "testapp_school" ORDER BY "testapp_school"."name" ASC'

Fran Hrženjak

unread,
Oct 28, 2020, 5:38:30 PM10/28/20
to django-d...@googlegroups.com
Personally I would prefer to get a ProgrammingError, so option 3. 

Explicit is better then implicit 😁 

Here is a great explanation of this exact issue, cleared it up for me:



On 28.10.2020., at 20:04, Tobias McNulty <tob...@caktusgroup.com> wrote:


--

Tobias McNulty

unread,
Oct 28, 2020, 6:09:10 PM10/28/20
to django-developers
I tend to agree, though I'll note that was also a bug at one point: https://code.djangoproject.com/ticket/5321

charettes

unread,
Oct 28, 2020, 8:46:46 PM10/28/20
to Django developers (Contributions to Django itself)
I'm also a fan of option 3. to require an explicit opt-in or raise an error.

Not a lot of folks are familiar with this requirement imposed by the usage of DISTINCT and even if Model.Meta.ordering is the most common reason but unexpected ordering it can also be caused by the dynamic creation of a queryset through multiple abstractions (e.g. A DRF API filter backend that applies ordering).

I think this is a similar problem to https://code.djangoproject.com/ticket/14357#comment:11

Cheers,
Simon

Michael Manfre

unread,
Oct 29, 2020, 8:52:44 AM10/29/20
to Django developers (Contributions to Django itself)
Having run in to this issue in the past, automatically changing the meaning and/or breaking the query can turn in to a debug time sink. Option 3 could save people a bit of time.

Cheers,
Michael Manfre

Adam Johnson

unread,
Oct 30, 2020, 5:45:25 AM10/30/20
to django-d...@googlegroups.com
I misunderstood the bug, the blog post Fran linked to has cleared it up for me. I think option 3 makes sense now.



--
Adam
Reply all
Reply to author
Forward
0 new messages