Queryset .count() breaks when counting distinct values generated by .extra()

45 views
Skip to first unread message

Mattias Linnap

unread,
Jan 18, 2015, 3:24:49 PM1/18/15
to django...@googlegroups.com
Hi all,

I think I've found a strange case where QuerySet.count() does not match len(queryset), and the behaviour is certainly unexpected.
But I'm not sure whether this is a bug, some mistake on my part, or a known limitation of combining .extra(), .distinct() and .count().
I am aware of the default ordering interfering with .distinct(), and already add .order_by() to get rid of it.

I have a model called RadioSignal, with an integer field "rssi". I'm interested in finding out how many distinct values for "rssi / 10" there are (-10, -20, -30, etc).

Here is a commented "./manage.py shell" record:

>>> RadioSignal.objects.count()
523
>>> RadioSignal.objects.order_by().values('rssi').distinct().count()
49
>>> connection.queries[-1]['sql']
'SELECT COUNT(DISTINCT "maps_radiosignal"."rssi") FROM "maps_radiosignal"'

Looks okay so far. But I'm interested in each distinct tens of RSSI values, not every single value. I can compute these with .extra():

>>> len(RadioSignal.objects.order_by().extra({'tens': 'rssi / 10'}).values('tens').distinct())
6
>>> RadioSignal.objects.order_by().extra({'tens': 'rssi / 10'}).values('tens').distinct()
[{'tens': -8}, {'tens': -4}, {'tens': -5}, {'tens': -9}, {'tens': -6}, {'tens': -7}]
>>> connection.queries[-1]['sql']
'SELECT DISTINCT (rssi / 10) AS "tens" FROM "maps_radiosignal" LIMIT 21'

Also looks good so far. But running len() on a queryset is unnecessary if I only need the count.

>>> RadioSignal.objects.order_by().extra({'tens': 'rssi / 10'}).values('tens').distinct().count()
523
>>> connection.queries[-1]['sql']
'SELECT COUNT(DISTINCT "maps_radiosignal"."id") FROM "maps_radiosignal"'

Uhoh. Somehow .count() keeps the .distinct() part, but replaces the .extra() and .values() parts with counting primary keys?

I tried it with values('tens'), values_list('tens'), and values_list('tens', flat=True), as well no change.
So far I've tested Django 1.6 with Python 2.7 and PostgreSQL 9.3, and Django 1.7 with Python 3.4 and PostgreSQL 9.1, all seem to behave the same.

I can work around this, since the possible resulting querysets are pretty small, and evaluating them with len() isn't too slow. But I'm wondering if it's a bug in Django, or something else I've missed?

Mattias


Collin Anderson

unread,
Jan 21, 2015, 10:06:03 PM1/21/15
to django...@googlegroups.com
Hi,

I don't use extra() a lot, but it could be that count() clears out extra. This doesn't exactly answer your question, but you may want to try your query on the 1.8 alpha using the new available expressions. You might be able to do this without needing extra() at all in 1.8.

Collin
Reply all
Reply to author
Forward
0 new messages