The problem is that for slicing data, current solution needs to compute
whole list to render data `bottom:top`. The reason is that, current script
will generate for example `SELECT ID, COL_1, ..., COL_N, ... WHERE ...`
which has huge burden for database to slice the data. To overcome this
problem we can instead select primary keys and do the slicing step and
then fetch records that their pk are in that sliced list. Very simple but
very efficient solution. I improved the performance for our project
significantly using this approach. So form 30 seconds to only 2-3 seconds
for 8 million records.
In this ticket I propose same approach to improve django Paginator class.
--
Ticket URL: <https://code.djangoproject.com/ticket/30207>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* status: new => assigned
* owner: nobody => M. Javidan Darugar
--
Ticket URL: <https://code.djangoproject.com/ticket/30207#comment:1>
Comment (by Simon Charette):
Hello there,
Could you provide more details about which database backend is sped up
with this technique? I'm a bit surprised two queries perform faster than
one in this case because even if the first one would allows the usage of
an index only scan the tuples still have to be fetched in the end. The
query planner should be smart enough to chain the operations in a single
query. I suspect something else might be at play here such as the query to
''count'' the number of results.
By the way, the issue title should be changed to reflect what the
optimization technique is as right now it's pretty vague.
--
Ticket URL: <https://code.djangoproject.com/ticket/30207#comment:2>
Comment (by Nikolas):
1. i too use paginator hack for admin, work for Postgres only if not use
any filters, this is can be inbox good optimization
{{{
class LargeTablePaginator(Paginator):
"""
Warning: Postgresql only hack
Overrides the count method of QuerySet objects to get an estimate
instead of actual count when not filtered.
However, this estimate can be stale and hence not fit for situations
where the count of objects actually matter.
"""
def _get_count(self):
if getattr(self, '_count', None) is not None:
return self._count
query = self.object_list.query
if not query.where:
try:
cursor = connection.cursor()
cursor.execute("SELECT reltuples FROM pg_class WHERE
relname = %s",
[query.model._meta.db_table])
self._count = int(cursor.fetchone()[0])
except:
self._count = super(LargeTablePaginator,
self)._get_count()
else:
self._count = super(LargeTablePaginator, self)._get_count()
return self._count
count = property(_get_count)
}}}
maximum speed on any sized table
2. if use only pk order query can serios optimize when have max pk and
wont get page-1, in this case just need select ..... where pk<last_max_id.
In this case will be use index and have maximum speed
--
Ticket URL: <https://code.djangoproject.com/ticket/30207#comment:3>
* status: assigned => closed
* resolution: => needsinfo
Comment:
The technical justification seems lacking both here and on the
[https://github.com/django/django/pull/11016 PR].
--
Ticket URL: <https://code.djangoproject.com/ticket/30207#comment:4>
* status: closed => new
* resolution: needsinfo =>
Comment:
I reopen this ticket because it is not solved yet, if I am replying to
messages by delay it is because I have a full time job and rarely have
time to work on a side project but still love to contribute to the django
community.
Some of the comments in the github branch was about naming for variable or
function arguments which means there is no standards for naming for
variables because everyone can suggest something new.
Another comment was too irrelevant about what I proposed. If postgres has
a new feature it does not mean that it is good for django because django
has an ORM which should work with other databases as well.
--
Ticket URL: <https://code.djangoproject.com/ticket/30207#comment:5>
Comment (by M. Javidan Darugar):
What do you mean by index? Tables index? for a table with such huge data
index table can be very huge. In our case indexing used almost 18GB of
disk memory. Indexing is very useful method but not always it depends on
the problem and the situation.
Replying to [comment:3 Nikolas]:
--
Ticket URL: <https://code.djangoproject.com/ticket/30207#comment:6>
Comment (by M. Javidan Darugar):
Well this is a bit surprising for me as well that many developers don't
know about this, because it is a very basic fact in database core, maybe
because most developers focused on code base rather than raw queries.
Replying to [comment:2 Simon Charette]:
> Hello there,
>
> Could you provide more details about which database backend is sped up
with this technique? I'm a bit surprised two queries perform faster than
one in this case because even if the first one would allows the usage of
an index only scan the tuples still have to be fetched in the end. The
query planner should be smart enough to chain the operations in a single
query. I suspect something else might be at play here such as the query to
''count'' the number of results.
>
> By the way, the issue title should be changed to reflect what the
optimization technique is as right now it's pretty vague.
--
Ticket URL: <https://code.djangoproject.com/ticket/30207#comment:7>
* cc: Can Sarıgöl (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/30207#comment:8>
* status: new => closed
* resolution: => needsinfo
Comment:
Hi. At this point, the issue tracker here is not the appropriate place for
this discussion.
Please see TicketClosingReasons/DontReopenTickets.
Rather, a message to the DevelopersMailingList explaining the issue and
the proposed change would be needed to progress.
A couple of points to make it easier:
* This kind of thing has come up previously: search here and the mailing
list history to find the discussions. Summarizing that history may help.
* Wrapping your suggestion in a custom paginator class and providing a
sample project, with some benchmarking data maybe, so people can explore
your solution, would be useful.
* In general, the proficiency level of contributors to the ORM is very
high. They know how databases work. Assume that.
Thanks.
--
Ticket URL: <https://code.djangoproject.com/ticket/30207#comment:9>