Optionally using a custom queryset for Paginator and MultipleObjectMixin (ListView)

93 views
Skip to first unread message

Jakub Kleň

unread,
Jun 29, 2018, 5:00:33 PM6/29/18
to Django developers (Contributions to Django itself)
Hi guys, I came across a problem today while testing my webapp with a full migrations applied from my legacy site for the first time.
The problem is I'm using Prefetch in my view query (get_queryset), and it works fine when I just fetch it, but the problem comes with the .count() query which generates sql with subqueries, and it takes really long to process by MySQL.
The nicest solution I'm thinking about would be to use 2 different queries for object_list / pagination count, because for count the query would be really simple, and wouldn't need the prefetches.
I think this scenario may be pretty likely to come across also for other Django users, so I'm thinking if it wouldn't be nice to have it directly in Django.
From my point of view, it seems like the implementation would be pretty straightforward and easy, and I would even like to look into it and contribute to the Django project if possible and if you think it would be useful.
I would like to add count_queryset param to Paginator.__init__, and def get_count_queryset to MultipleObjectMixin. Happy to hear your opinions on this :)

Adam Johnson

unread,
Jun 30, 2018, 7:50:48 AM6/30/18
to django-d...@googlegroups.com
it takes really long to process by MySQL.

Which version? Subquery performance gets better in MySQL 5.6 and even more so on MariaDB.

 for count the query would be really simple, and wouldn't need the prefetches.

When counting, prefetches aren't actaully executed. Check the actual SQL with e.g. django-debug-toolbar .

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/a29e6b81-f05c-46af-a479-04e1f4bce705%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Adam

Tom Forbes

unread,
Jun 30, 2018, 11:40:32 AM6/30/18
to django-d...@googlegroups.com
Are you sure it is the prefetches that is causing this? As Adam pointed out these are correctly ignored. Annotations however are not, which can cause unnecessary work and longer execution times. i.e:

Book.objects.annotate(something=Max('pages__word_count')).count()

We have enough information to be able to strip the pages join when using count() it certain conditions are met, and this might be preferable than adding workarounds to various places.


--

Jakub Kleň

unread,
Jun 30, 2018, 4:08:43 PM6/30/18
to Django developers (Contributions to Django itself)
After some more investigation yesterday, I found out that it isn't the prefetches and the subquery, and exactly as you write, it's annotations which generate joins, and for count() the difference is it has to go thru the whole db table, and that is slowing it down a lot. For my queryset, the normal .all() query takes 0.006s, and the count query takes 16.6s. And I noticed, that even for queries that didn't seem problematic (1 join or so), it actually slows them down a lot, so basically every queryset in a paginated view is getting slowed down if it's a bit more complex, and you're totally true that the real solution would be to optimize them directly for the .count() calls. That would be an awesome tweak! :)

Yesterday I was looking into Django, because I needed to implement some workaround for my app, and created a PR which doesn't seem to me like a good change now.
I basically implemented the stuff mentioned in my first message: https://github.com/django/django/pull/10114
I guess I can reply to the PR and close it now, sorry for the SPAM, I thought it was a good idea to have it there.

For now I have a workaround which basically adds the functionality in the PR to my views using a mixin, so It's fine, but I would be really happy to see the tweak you mentioned in Django some day :)

Thank you for your time,
Jakub.

Jakub Kleň

unread,
Jun 30, 2018, 5:26:15 PM6/30/18
to Django developers (Contributions to Django itself)
Hi, thank you for your response.

I'm currently running MySQL 5.7.17, and the time difference between the queries is really big (0.006s vs 16.6s).
I'm also wondering that something like this isn't handled by MySQL.

Jakub Kleň

unread,
Jul 1, 2018, 7:40:53 PM7/1/18
to Django developers (Contributions to Django itself)
I checked MySQL 8 today, and it's the same story. Does anyone know how this performs in MariaDB or Postgresql?

Ramiro Morales

unread,
Jul 17, 2018, 2:00:03 PM7/17/18
to django-d...@googlegroups.com
On Sat, Jun 30, 2018 at 12:40 PM Tom Forbes <t...@tomforb.es> wrote:
Are you sure it is the prefetches that is causing this? As Adam pointed out these are correctly ignored. Annotations however are not, which can cause unnecessary work and longer execution times. i.e:

Book.objects.annotate(something=Max('pages__word_count')).count()

We have enough information to be able to strip the pages join when using count() it certain conditions are met, and this might be preferable than adding workarounds to various places.

FYI: This had already been reported in #23771 (https://code.djangoproject.com/ticket/23771) (and a couple of similar/duplicates tickets)

Regards,

--
Ramiro Morales
@ramiromorales

Tom Forbes

unread,
Jul 17, 2018, 3:24:35 PM7/17/18
to django-d...@googlegroups.com
Indeed, I had an attempt at doing this here (https://github.com/django/django/pull/8928/), but it seems a hard problem. I think there is huge potential here but I have no time to work on this anymore.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
Reply all
Reply to author
Forward
0 new messages