[Django] #25375: Paginator generates suboptimal queries

13 views
Skip to first unread message

Django

unread,
Sep 10, 2015, 9:53:42 AM9/10/15
to django-...@googlegroups.com
#25375: Paginator generates suboptimal queries
----------------------------------------------+--------------------
Reporter: alexei | Owner: nobody
Type: Bug | Status: new
Component: Database layer (models, ORM) | Version: 1.7
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------------------+--------------------
Hello,

I'm using Django Rest Framework and have the following situation:

I have two models, `Article` and `Comment`:

{{{
class Article(models.Model):
# ...

class Comment(models.Model):
article = models.ForeignKey(Article, related_name='comments')
# ...

}}}

And an articles viewset:

{{{
class Articles(viewsets.ModelViewSet):
model = Article
serializer_class = ArticleSerializer
paginate_by = 50
paginate_by_param = 'limit'

def get_queryset(self):
return Article.objects.annotate(comments_count=Count('comments'))
}}}

When doing a simple listing, the database will be queried two times: one
to count the result set, the other to fetch the results:

{{{
SELECT
COUNT(*)
FROM (
SELECT
`article`.`id` AS `id`,
COUNT(`article_comment`.`id`) AS `comments_count`
FROM `article`
LEFT OUTER JOIN `article_comment` ON (`article`.`id` =
`article_comment`.`article_id`)
GROUP BY `article`.`id`
ORDER BY NULL
)
subquery;
}}}

And

{{{
SELECT
`article`.`id` AS `id`,
COUNT(`article_comment`.`id`) AS `comments_count`
FROM `article`
LEFT OUTER JOIN `article_comment` ON (`article`.`id` =
`article_comment`.`article_id`)
GROUP BY `article`.`id`
ORDER BY `article`.`id` DESC
LIMIT 50;
}}}
respectively.

If you look at (and `EXPLAIN`) the first query, the one that does the
counting, you'll notice it loads up *all* records and then starts counting
them, a process which may take up all resources and eventually crash the
machine. It did so in my case, with only 50000 records.

I figured this has something to do with Django, not DRF.

--
Ticket URL: <https://code.djangoproject.com/ticket/25375>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Sep 10, 2015, 10:21:57 AM9/10/15
to django-...@googlegroups.com
#25375: Paginator generates suboptimal queries
-------------------------------------+-------------------------------------

Reporter: alexei | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by timgraham):

* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0


Comment:

Could you please add a test case that demonstrates the problem without
DRF?

--
Ticket URL: <https://code.djangoproject.com/ticket/25375#comment:1>

Django

unread,
Sep 10, 2015, 11:34:15 AM9/10/15
to django-...@googlegroups.com
#25375: Paginator generates suboptimal queries
-------------------------------------+-------------------------------------

Reporter: alexei | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by charettes):

Wonder if this can be related to the changes that caused #25230?

@alexei, are you using MySQL?

--
Ticket URL: <https://code.djangoproject.com/ticket/25375#comment:2>

Django

unread,
Sep 14, 2015, 2:20:39 PM9/14/15
to django-...@googlegroups.com
#25375: Paginator generates suboptimal queries
-------------------------------------+-------------------------------------
Reporter: alexei | Owner: nobody
Type: Bug | Status: closed

Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by timgraham):

* status: new => closed
* resolution: => needsinfo


--
Ticket URL: <https://code.djangoproject.com/ticket/25375#comment:3>

Django

unread,
Sep 15, 2015, 6:41:50 AM9/15/15
to django-...@googlegroups.com
#25375: Paginator generates suboptimal queries
-------------------------------------+-------------------------------------
Reporter: alexei | Owner: nobody

Type: Bug | Status: closed
Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by adamchainz):

I think DRF is probably mutating your QuerySet to use a subquery to get
the count when it's not necessary. If you are using MySQL, these tend to
be slow - newer versions of MySQL and MariaDB have improvements that make
them faster.

--
Ticket URL: <https://code.djangoproject.com/ticket/25375#comment:4>

Django

unread,
Sep 20, 2015, 2:04:09 PM9/20/15
to django-...@googlegroups.com
#25375: Paginator generates suboptimal queries
-------------------------------------+-------------------------------------

Reporter: alexei | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by alexei):

* status: closed => new
* cc: alexei (added)
* resolution: needsinfo =>


Comment:

Hello,

Sorry for the delay. I didn't get notified when you guys replied, and I
completely forgot about it.

Yes, in my previous example I was using MySQL.

Hoewever, I created a test app to see how it goes using the default
settings (including SQLite db). The result is as follows:

{{{
$ python manage.py shell
>>> from django.core.paginator import Paginator
>>> from django.db import connections
>>> from django.db.models import Count
>>> from bug.models import Article
>>> a = Article.objects.annotate(comments_count=Count('comments'))
>>> p = Paginator(a, 25)
>>> print p.object_list
[]
>>> print connections['default'].queries
[{u'time': u'0.001', u'sql': u'QUERY = u\'SELECT "bug_article"."id",
"bug_article"."title", "bug_article"."body", COUNT("bug_comment"."id") AS
"comments_count" FROM "bug_article" LEFT OUTER JOIN "bug_comment" ON (
"bug_article"."id" = "bug_comment"."article_id" ) GROUP BY
"bug_article"."id", "bug_article"."title", "bug_article"."body" LIMIT 21\'
- PARAMS = ()'}]
>>> print p.count
0
>>> print connections['default'].queries
[{u'time': u'0.001', u'sql': u'QUERY = u\'SELECT "bug_article"."id",
"bug_article"."title", "bug_article"."body", COUNT("bug_comment"."id") AS
"comments_count" FROM "bug_article" LEFT OUTER JOIN "bug_comment" ON (
"bug_article"."id" = "bug_comment"."article_id" ) GROUP BY
"bug_article"."id", "bug_article"."title", "bug_article"."body" LIMIT 21\'
- PARAMS = ()'}, {u'time': u'0.000', u'sql': u'QUERY = u\'SELECT COUNT(*)
FROM (SELECT "bug_article"."id" AS "id", "bug_article"."title" AS "title",
"bug_article"."body" AS "body", COUNT("bug_comment"."id") AS
"comments_count" FROM "bug_article" LEFT OUTER JOIN "bug_comment" ON (
"bug_article"."id" = "bug_comment"."article_id" ) GROUP BY
"bug_article"."id", "bug_article"."title", "bug_article"."body")
subquery\' - PARAMS = ()'}]
>>>

}}}

This time I'm again on Django 1.7.1.

For a second look, I updated Django to 1.8.4. The result is the same.

As you can see, it's Paginator's `count` that wraps the query resulting
from the QuerySet in a subquery.

--
Ticket URL: <https://code.djangoproject.com/ticket/25375#comment:5>

Django

unread,
Sep 24, 2015, 5:45:06 PM9/24/15
to django-...@googlegroups.com
#25375: Paginator generates suboptimal queries
-------------------------------------+-------------------------------------

Reporter: alexei | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by timgraham):

The queries in your latest comment are difficult to read given they are
not formatted as nicely as the ticket description. Perhaps you could edit
the description and move your latest comment there.

After that, could you explain what the expected query is? Are you able to
propose a fix? Not knowing these particular internal details of Django
offhand, it's a bit difficult for me to triage the ticket otherwise.
Thanks!

--
Ticket URL: <https://code.djangoproject.com/ticket/25375#comment:6>

Django

unread,
Sep 30, 2015, 2:29:18 PM9/30/15
to django-...@googlegroups.com
#25375: Paginator generates suboptimal queries
-------------------------------------+-------------------------------------
Reporter: alexei | Owner: nobody
Type: Bug | Status: closed

Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by timgraham):

* status: new => closed
* resolution: => needsinfo


--
Ticket URL: <https://code.djangoproject.com/ticket/25375#comment:7>

Django

unread,
Jun 22, 2017, 1:16:41 PM6/22/17
to django-...@googlegroups.com
#25375: Paginator generates suboptimal queries
-------------------------------------+-------------------------------------
Reporter: Alexandru | Owner: nobody
Mărășteanu |

Type: Bug | Status: closed
Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Stefan Wehrmeyer):

The Paginator needs to make two queries: the count query and the query for
the page objects.

If the queryset you use in your Paginator satisfies
[https://github.com/django/django/blob/0af14b2eaa0bf0821a8aacc8486489a1eb348397/django/db/models/sql/query.py#L412-L413
one of these conditions] the count query will run as a subquery, so this
query:

{{{
SELECT COUNT(*) FROM (SELECT ...);
}}}

instead of
{{{
SELECT COUNT(*) AS "__count" FROM ...;
}}}

In my case the first query takes double the amount of time of the second
query.

The row count can be independent of annotations. So if you make the
Paginator do the count query on a queryset without annotations (resulting
in a more performant count query) and then doing the actual pagination
with the annotated queryset may result in a small performance win.

Annotations can influence row count (e.g. when filtering by annotation)
and you still may have `groupby()` or `distinct()` on your queryset that
influences counts and forces a count with a subquery. So I'm not sure if
there's a general way for the Paginator (or any `.count()` call) to run
the more performant query.

--
Ticket URL: <https://code.djangoproject.com/ticket/25375#comment:8>

Django

unread,
Jul 17, 2018, 12:12:01 PM7/17/18
to django-...@googlegroups.com
#25375: Paginator generates suboptimal queries
-------------------------------------+-------------------------------------
Reporter: Alexandru | Owner: nobody
Mărășteanu |
Type: Bug | Status: closed
Component: Database layer | Version: 1.7
(models, ORM) |
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Ramiro Morales):

This is similar to #23771

--
Ticket URL: <https://code.djangoproject.com/ticket/25375#comment:9>

Reply all
Reply to author
Forward
0 new messages