using .values().distinct().count(), it return 2k rows
{{{
In [6]: VirtualMachineResources.objects.all().values('machine', 'cluster')
...: .distinct().count()
Out[6]: 2247
}}}
When I loop through the distinct query
{{{
for resource in VirtualMachineResources.objects.all().values('machine',
'cluster').distinct():
print(resource['machine'], resource['cluster'])
}}}
I observed it return 100k rows, with repeating rows that the same
'machine` with/without the cluster.
Here is the corresponding stackoverflow question
https://stackoverflow.com/questions/54354462/django-distinct-returns-more-
records-than-count
--
Ticket URL: <https://code.djangoproject.com/ticket/30130>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
Old description:
> I have a table virtualmachineresources, which has 100k+ rows, it has
> columns 'machine` and `cluster`, some rows the cluster field is empty. it
> has repeating rows of machine + with/without cluster, hence I want to use
> the distinct() method.
>
> using .values().distinct().count(), it return 2k rows
>
> {{{
> In [6]: VirtualMachineResources.objects.all().values('machine',
> 'cluster')
> ...: .distinct().count()
> Out[6]: 2247
> }}}
>
> When I loop through the distinct query
>
> {{{
> for resource in VirtualMachineResources.objects.all().values('machine',
> 'cluster').distinct():
> print(resource['machine'], resource['cluster'])
> }}}
>
> I observed it return 100k rows, with repeating rows that the same
> 'machine` with/without the cluster.
>
> Here is the corresponding stackoverflow question
> https://stackoverflow.com/questions/54354462/django-distinct-returns-
> more-records-than-count
New description:
I have a table virtualmachineresources, which has 100k+ rows, it has
columns `machine` and `cluster`, some rows the cluster field is empty. it
has repeating rows of machine + with/without cluster, hence I want to use
the distinct() method.
using .values().distinct().count(), it return 2k rows
{{{
In [6]: VirtualMachineResources.objects.all().values('machine', 'cluster')
...: .distinct().count()
Out[6]: 2247
}}}
When I loop through the distinct query
{{{
for resource in VirtualMachineResources.objects.all().values('machine',
'cluster').distinct():
print(resource['machine'], resource['cluster'])
}}}
I observed it return 100k rows, with repeating rows that the same
'machine` with/without the cluster.
Here is the corresponding stackoverflow question
https://stackoverflow.com/questions/54354462/django-distinct-returns-more-
records-than-count
--
--
Ticket URL: <https://code.djangoproject.com/ticket/30130#comment:1>
Old description:
> I have a table virtualmachineresources, which has 100k+ rows, it has
> columns `machine` and `cluster`, some rows the cluster field is empty. it
> has repeating rows of machine + with/without cluster, hence I want to use
> the distinct() method.
>
> using .values().distinct().count(), it return 2k rows
>
> {{{
> In [6]: VirtualMachineResources.objects.all().values('machine',
> 'cluster')
> ...: .distinct().count()
> Out[6]: 2247
> }}}
>
> When I loop through the distinct query
>
> {{{
> for resource in VirtualMachineResources.objects.all().values('machine',
> 'cluster').distinct():
> print(resource['machine'], resource['cluster'])
> }}}
>
> I observed it return 100k rows, with repeating rows that the same
> 'machine` with/without the cluster.
>
> Here is the corresponding stackoverflow question
> https://stackoverflow.com/questions/54354462/django-distinct-returns-
> more-records-than-count
New description:
I have a table virtualmachineresources, which has 100k+ rows, it has
columns `machine` and `cluster`, some rows the cluster field is empty. it
has repeating rows of machine + with/without cluster, hence I want to use
the distinct() method.
using .values().distinct().count(), it returned 2k rows
{{{
In [6]: VirtualMachineResources.objects.all().values('machine', 'cluster')
...: .distinct().count()
Out[6]: 2247
}}}
When I loop through the distinct query
{{{
for resource in VirtualMachineResources.objects.all().values('machine',
'cluster').distinct():
print(resource['machine'], resource['cluster'])
}}}
I observed it returned 100k rows, with repeating rows that the same
'machine` with/without the cluster.
Here is the corresponding stackoverflow question
https://stackoverflow.com/questions/54354462/django-distinct-returns-more-
records-than-count
--
--
Ticket URL: <https://code.djangoproject.com/ticket/30130#comment:2>
Comment (by Simon Charette):
Do you happen to have a `VirtualMachineResources.Meta.ordering` defined?
--
Ticket URL: <https://code.djangoproject.com/ticket/30130#comment:3>
Comment (by Simon Charette):
I'm pretty sure that's what's at play here as I managed to reproduce
locally. You have a defined `Meta.ordering` (or are performing an
`order_by`) which makes your non-count query something along the lines of
{{{#!sql
SELECT DISTINCT "machine", "cluster", "ordering" FROM "resources" ORDER BY
"ordering" ASC
}}}
The `ordering` column is added to the `SELECT` clause to prevent the
generation of invalid SQL.
{{{#!sql
SELECT DISTINCT "machine", "cluster" FROM "resources" ORDER BY "ordering"
ASC
> ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select
list
}}}
The `.count()` query happens to be the following
{{{#!sql
SELECT COUNT(*) FROM (SELECT DISTINCT "ticket_30130_foo"."bat" AS Col1,
"ticket_30130_foo"."baz" AS Col2 FROM "ticket_30130_foo") subquery
}}}
It looks like the issue is that
[https://github.com/django/django/blob/7e978fdc4228eb44cf97cb4243adc7b7bfd586c7/django/db/models/sql/query.py#L439-L443
we clear the ordering when we shouldn't] on the `.count()`. I think the
logic needs to be adjusted to branch of `not self.distinct` instead of
`self.distinct_fields`.
--
Ticket URL: <https://code.djangoproject.com/ticket/30130#comment:4>
* status: new => closed
* resolution: => duplicate
Comment:
There doesn't seem to be a consensus on how to deal with these type of
queries right now so I'm just going to close as a duplicate of #14357,
#16025, #28560.
--
Ticket URL: <https://code.djangoproject.com/ticket/30130#comment:5>