Admin list view counting

145 views
Skip to first unread message

Josh Smeaton

unread,
Aug 7, 2017, 2:45:18 AM8/7/17
to Django developers (Contributions to Django itself)
The admin list page performs a count(*) (twice!) when viewing the list page of a model. One of these counts can be disabled, but the pagination count can not be. On huge tables, this can be a massive performance issue. https://code.djangoproject.com/ticket/8408 was created to disable the count, but it only allowed disabling one of them. I've reopened the ticket with Matthew who has implemented the suggestion to disable pagination features that require count.


Does anyone have strong views one way or another here? Would like to hear if so.

Adam Johnson

unread,
Aug 7, 2017, 3:03:18 AM8/7/17
to django-d...@googlegroups.com
+1 from me, I've spent long enough on those COUNT(*) queries at work.

Is it worth putting this logic in Paginator so applications can reuse it?

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/d093b4dd-4bc4-4c16-910d-53c4740ec5ea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Adam

Aymeric Augustin

unread,
Aug 7, 2017, 7:51:09 AM8/7/17
to django-d...@googlegroups.com
Yes, the ability to disable all COUNT(*) is important (at least on Postgres).

-- 
Aymeric.



To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.

Tom Forbes

unread,
Aug 7, 2017, 8:03:30 AM8/7/17
to django-d...@googlegroups.com
This is a great idea. A related issue I've come across with the paginator is not being able to pass an explicit count into it. 

If you have a query set with expensive annotations that don't effect the count in any way it's very wasteful to include them in the count SQL statement (on top of general PG count slowness), which Django does. Being able to pass in 'num_objects=x' would be great.

Shai Berger

unread,
Aug 7, 2017, 4:53:46 PM8/7/17
to django-d...@googlegroups.com
On PG we can and should do much better -- there is a way to get a very fast,
though not accurate, count of records in a table:

https://wiki.postgresql.org/wiki/Count_estimate

I think we should expose an API along the lines of

queryset.table_count(estimate_above=2000)

which actually performs the COUNT(*) only if the estimate is under the
threshold, at least where such facilities are available. We could use that by
default for the admin count queries, if the queryset has no filtering or
limiting joins. IMO, this alone would solve a whole lot of the problem; and
allowing it to be also used explicitly (e.g. by adding Tom's idea to allow the
user to specify the number of objects) would solve a whole lot of the rest of
the problem.
> > email to django-develop...@googlegroups.com.
> > To post to this group, send email to django-d...@googlegroups.com.
> > Visit this group at https://groups.google.com/group/django-developers.
> > To view this discussion on the web visit https://groups.google.com/d/ms
> > gid/django-developers/d093b4dd-4bc4-4c16-910d-53c4740ec5ea%
> > 40googlegroups.com
> > <https://groups.google.com/d/msgid/django-developers/d093b4dd-4bc4-4c16-9
> > 10d-53c4740ec5ea%40googlegroups.com?utm_medium=email&utm_source=footer> .

Tom Forbes

unread,
Aug 7, 2017, 5:46:19 PM8/7/17
to django-d...@googlegroups.com
Mysql, Oracle and Postgres expose estimated rows in a pretty similar manner (selecting the table name from a database-specific table). Perhaps a more generic 'estimate_table_rows' function could be added for all backends? I guess for sqlite it would have to use an actual count() as a fallback. This could cause issues if people develop on sqlite and find the function returns accurate row counts and this changes in production.

The problem is these methods only return estimates for the entire table, not including any filters, which narrows down the usefulness a lot. The postgres wiki suggests 'scraping' the output of `EXPLAIN ANALYZE` which sounds rather too hacky to add?

> > email to django-developers+unsubscribe@googlegroups.com.
> > To post to this group, send email to django-developers@googlegroups.com.

> > Visit this group at https://groups.google.com/group/django-developers.
> > To view this discussion on the web visit https://groups.google.com/d/ms
> > gid/django-developers/d093b4dd-4bc4-4c16-910d-53c4740ec5ea%
> > 40googlegroups.com

Josh Smeaton

unread,
Aug 8, 2017, 5:40:30 PM8/8/17
to Django developers (Contributions to Django itself)
We use the explain analyze method at work, but I don't think it's an appropriate thing to include in core.

I agree that these changes should be a function of the paginator and that being able to pass in a could also be very useful. 

I'm not so sure about providing count estimates in core, but I don't fully grok how the `estimate_above` would work in practise. Would that execute two queries if the estimate returned a value below the threshold? I think this could be tackled in a separate patch because it would then feed into the expansion of the paginator.
> > email to django-develop...@googlegroups.com.
> > To post to this group, send email to django-d...@googlegroups.com.

> > Visit this group at https://groups.google.com/group/django-developers.
> > To view this discussion on the web visit https://groups.google.com/d/ms
> > gid/django-developers/d093b4dd-4bc4-4c16-910d-53c4740ec5ea%
> > 40googlegroups.com

Shai Berger

unread,
Aug 8, 2017, 6:13:02 PM8/8/17
to django-d...@googlegroups.com
On Wednesday 09 August 2017 00:40:30 Josh Smeaton wrote:
> We use the explain analyze method at work, but I don't think it's an
> appropriate thing to include in core.
>

Agreed.

> I'm not so sure about providing count estimates in core, but I don't fully
> grok how the `estimate_above` would work in practise. Would that execute
> two queries if the estimate returned a value below the threshold?

That's what I had in mind. But two queries do not have to mean two database
roundtrips: Since the estimation queries are backend-specific anyway, we might
as well implement `table_count(estimate_above)` using blocks of procedural
SQL.

I am, again, not suggesting a change to count() at the ORM level -- this
should be a new API; I *am* suggesting that the admin would use this new API
where appropriate (when the admin queryset has no filters or joins which could
affect the count).

> I think
> this could be tackled in a separate patch because it would then feed into
> the expansion of the paginator.
>

Yes, the idea of controlling whether any count is done, while not entirely
orthogonal, is certainly of value regardless of this idea, and can (and
should) be dealt with separately.

Shai.

Adam Johnson

unread,
Aug 9, 2017, 5:58:51 PM8/9/17
to django-d...@googlegroups.com
FYI Django-MySQL supplies an approx_count() method: https://django-mysql.readthedocs.io/en/latest/queryset_extensions.html#django_mysql.models.approx_count . It only works on QuerySets without filters. From what I understand it's possible to get an estimate for filtered queries in MySQL in *some* circumstances, but you need to be filtering on only one index and not involve any joins.

I'm in favour of adding this (by whatever name) to core.
--
Adam
Reply all
Reply to author
Forward
0 new messages