[Django] #35733: Page.__len__ could skip a query if self.paginator.count == 0

20 views
Skip to first unread message

Django

unread,
Sep 5, 2024, 9:29:01 AM9/5/24
to django-...@googlegroups.com
#35733: Page.__len__ could skip a query if self.paginator.count == 0
-------------------------------------+-------------------------------------
Reporter: Jacob | Owner: Jacob Walls
Walls |
Type: | Status: assigned
Cleanup/optimization |
Component: Core | Version: dev
(Other) |
Severity: Normal | Keywords:
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
With code like this,
{{{
paginator = Paginator(unevaluated_queryset, 25)
page = paginator.get_page(1) # causes count query via validate_number()
if not page: # causes select query
return {}
}}}
I get an additional query versus:
{{{
paginator = Paginator(unevaluated_queryset, 25)
if not paginator.count: # causes count query
return {}
}}}

----
In the case of no data, the second SELECT query is unnecessary if we
already know the paginator is empty. Since `paginator.count` is cached,
would it be worth optimizing out this additional query? Otherwise, you
have to dig into the internals to discover that the two examples above do
not perform equivalently.

{{{#!diff
diff --git a/django/core/paginator.py b/django/core/paginator.py
index 7b3189cc8b..334166636d 100644
--- a/django/core/paginator.py
+++ b/django/core/paginator.py
@@ -188,6 +188,8 @@ class Page(collections.abc.Sequence):
return "<Page %s of %s>" % (self.number,
self.paginator.num_pages)

def __len__(self):
+ if self.paginator.count == 0:
+ return 0
return len(self.object_list)

def __getitem__(self, index):
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/35733>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Sep 5, 2024, 10:34:46 AM9/5/24
to django-...@googlegroups.com
#35733: Page.__len__ could skip a query if self.paginator.count == 0
-------------------------------------+-------------------------------------
Reporter: Jacob Walls | Owner: Jacob
Type: | Walls
Cleanup/optimization | Status: assigned
Component: Core (Other) | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Simon Charette):

If we are going to do this I think should use
`paginator.__dict__.get('count')` instead to avoid causing the inverse
problem if `count` is not already cached.
--
Ticket URL: <https://code.djangoproject.com/ticket/35733#comment:1>

Django

unread,
Sep 5, 2024, 7:34:55 PM9/5/24
to django-...@googlegroups.com
#35733: Page.__len__ could skip a query if self.paginator.count == 0
-------------------------------------+-------------------------------------
Reporter: Jacob Walls | Owner: Jacob
Type: | Walls
Cleanup/optimization | Status: assigned
Component: Core (Other) | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Jacob Walls):

> Or maybe it's not necessary because a page cannot be created without
count being cached in the first place?

That's more or less what I found, except that it is ''possible'' to
manually instantiate a `Page`, just unlikely enough that it may not be
worth coding around?

The recommend interface is to create the `Page` via `Paginator.page()` (or
`Paginator.get_page()` and `Paginator.__iter__()` which call it), any of
which in turn cache the count via `validate_number()`:

{{{
You usually won't construct ``Page`` objects by hand -- you'll get them by
iterating :class:`Paginator`, or by using :meth:`Paginator.page`.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/35733#comment:2>

Django

unread,
Sep 5, 2024, 8:48:19 PM9/5/24
to django-...@googlegroups.com
#35733: Page.__len__ could skip a query if self.paginator.count == 0
-------------------------------------+-------------------------------------
Reporter: Jacob Walls | Owner: Jacob
Type: | Walls
Cleanup/optimization | Status: assigned
Component: Core (Other) | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* stage: Unreviewed => Accepted

Comment:

In this case I think it's safe to move forward with the optimization, the
closest ticket I could find is #23771.

Note that a similar optimization could have done the other way around
though where when a page is requested `if len(page.object_list) <
paginator.per_page` then `count` can be deduced from `self.paginator.count
= (self.number - 1) * self.paginator.per_page + len(self.object_list)`.

{{{#!diff
diff --git a/django/core/paginator.py b/django/core/paginator.py
index 7b3189cc8b..6b0aafc6d0 100644
--- a/django/core/paginator.py
+++ b/django/core/paginator.py
@@ -89,7 +89,7 @@ def page(self, number):
number = self.validate_number(number)
bottom = (number - 1) * self.per_page
top = bottom + self.per_page
- if top + self.orphans >= self.count:
+ if self.orphans and top + self.orphans >= self.count:
top = self.count
return self._get_page(self.object_list[bottom:top], number, self)

@@ -188,7 +188,14 @@ def __repr__(self):
return "<Page %s of %s>" % (self.number,
self.paginator.num_pages)

def __len__(self):
- return len(self.object_list)
+ if (paginator_count := self.paginator.__dict__.get("count")) ==
0:
+ return 0
+ object_list_len = len(self.object_list)
+ if paginator_count is None and object_list_len <
self.paginator.per_page:
+ self.paginator.count = (
+ self.number - 1
+ ) * self.paginator.per_page + object_list_len
+ return object_list_len

def __getitem__(self, index):
if not isinstance(index, (int, slice)):
}}}

Unfortunately because `Paginator.get_page` validates against
`self.num_pages` which triggers the calculation of `self.count` this is
not feasible. On the bright side though, this means that `get_page` will
systematically trigger `self.count` calculation which strengthen the
assumption that it's safe to use directly in `Page.__len__`.
--
Ticket URL: <https://code.djangoproject.com/ticket/35733#comment:3>

Django

unread,
Sep 7, 2024, 8:38:37 AM9/7/24
to django-...@googlegroups.com
#35733: Page.__len__ could skip a query if self.paginator.count == 0
-------------------------------------+-------------------------------------
Reporter: Jacob Walls | Owner: Jacob
Type: | Walls
Cleanup/optimization | Status: closed
Component: Core (Other) | Version: dev
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Jacob Walls):

* resolution: => invalid
* status: assigned => closed

Comment:

In writing a test case, I discovered that I failed to confirm the no-data
case does any additional select query. In the no-data case, a slice of the
queryset from [0:0] is taken, which the ORM knows not to perform a query
for. So there is nothing to optimize here.


In `Paginator.page()`, `bottom` and `top` are both 0 if the count is 0:
{{{
return self._get_page(self.object_list[bottom:top], number, self)
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/35733#comment:4>
Reply all
Reply to author
Forward
0 new messages