[Django] #23242: slow date_hierarchy on a big tables

Django

unread,

Aug 6, 2014, 10:03:59 AM8/6/14

to django-...@googlegroups.com

#23242: slow date_hierarchy on a big tables
-------------------------------------+-------------------------------------
Reporter: meteozond | Owner: nobody
Type: | Status: new
Cleanup/optimization | Version: master
Component: contrib.admin | Keywords: date_hierarchy
Severity: Normal | datetimes dates
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 1 | UI/UX: 0
-------------------------------------+-------------------------------------
Because of using DISTINCT query date_hierarchy incredibly slows down big
tables rendering (~10^6 records),
May be it is more convenient to simply iterate between min and max values
rather than using dates or datetimes queryset methods?

--
Ticket URL: <https://code.djangoproject.com/ticket/23242>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,

Aug 27, 2014, 9:09:05 AM8/27/14

to django-...@googlegroups.com

#23242: slow date_hierarchy on a big tables
-------------------------------------+-------------------------------------
Reporter: meteozond | Owner: nobody

* status: new => closed
* needs_docs: => 0
* resolution: => needsinfo
* needs_tests: => 0
* needs_better_patch: => 0

Comment:

I'm not sure where the distinct query is that you're referring to. I also
don't understand your proposed solution. Please reopen if you can provide
more details (or even a patch).

--
Ticket URL: <https://code.djangoproject.com/ticket/23242#comment:1>

Django

unread,

Aug 27, 2014, 9:16:36 AM8/27/14

to django-...@googlegroups.com

#23242: slow date_hierarchy on a big tables
-------------------------------------+-------------------------------------
Reporter: meteozond | Owner: nobody

Type: | Status: closed
Cleanup/optimization | Version: master

Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by aaugustin):

* easy: 1 => 0

Comment:

When the `date_hierarchy` option is used, the admin shows a list of
available years, months or days. The implementation is designed to show
only dates for which objects exist.

It provides a better UX for sparsely populated tables, where iterating
between min and max dates would show links to pages with no objects, but
performs badly on large tables.

In general, the admin isn't designed for dealing with large amounts of
data. This choice can be traced back to Django being designed for
editorial sites.

Certainly, the situation could be improved, but how?

--
Ticket URL: <https://code.djangoproject.com/ticket/23242#comment:2>

Django

unread,

Sep 18, 2014, 11:35:27 AM9/18/14

to django-...@googlegroups.com

#23242: slow date_hierarchy on a big tables
-------------------------------------+-------------------------------------
Reporter: meteozond | Owner: nobody
Type: | Status: new
Cleanup/optimization | Version: master

Component: contrib.admin | Resolution:

Changes (by herve):

* status: closed => new
* cc: herve (added)
* resolution: needsinfo =>

Comment:

Here is an example of such a request:

{{{#!sql
SELECT DISTINCT DATE_TRUNC('day', "messages_message"."sent_at" AT TIME
ZONE E'Europe/Paris') FROM "messages_message" WHERE (EXTRACT('month' FROM
"messages_message"."sent_at" AT TIME ZONE E'Europe/Paris') = 9 AND
"messages_message"."sent_at" BETWEEN
'2014-01-01T00:00:00+01:00'::timestamptz and
'2014-12-31T23:59:59.999999+01:00'::timestamptz AND
"messages_message"."sent_at" IS NOT NULL) ORDER BY 1 ASC
}}}

The explain/analyze on ~6 million entries:

{{{
Sort (cost=62488.59..62488.60 rows=1 width=8) (actual
time=18791.784..18791.784 rows=1 loops=1)
Sort Key: (date_trunc('day'::text, timezone('Europe/Paris'::text,
sent_at)))
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=62488.57..62488.58 rows=1 width=8) (actual
time=18791.465..18791.466 rows=1 loops=1)
-> Bitmap Heap Scan on messages_message (cost=856.42..62488.18
rows=153 width=8) (actual time=1276.004..17066.794 rows=6135296 loops=1)
Recheck Cond: ((sent_at >= '2014-01-01
00:00:00+01'::timestamp with time zone) AND (sent_at <= '2014-12-31
23:59:59.999999+01'::timestamp with time zone) AND (sent_at IS NOT NULL))
Filter: (date_part('month'::text,
timezone('Europe/Paris'::text, sent_at)) = 9::double precision)
-> Bitmap Index Scan on messages_message_sent_at_id_idx
(cost=0.00..856.38 rows=30655 width=0) (actual time=1274.039..1274.039
rows=6136069 loops=1)
Index Cond: ((sent_at >= '2014-01-01
00:00:00+01'::timestamp with time zone) AND (sent_at <= '2014-12-31
23:59:59.999999+01'::timestamp with time zone) AND (sent_at IS NOT NULL))
Total runtime: 18791.933 ms
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/23242#comment:3>

Django

unread,

Sep 18, 2014, 12:23:33 PM9/18/14

to django-...@googlegroups.com

#23242: slow date_hierarchy on a big tables
-------------------------------------+-------------------------------------
Reporter: meteozond | Owner: nobody
Type: | Status: new
Cleanup/optimization | Version: master

Comment (by timgraham):

@herve, do you have ideas for how to improve the situation? Absent that, I
don't see a reason to keep this ticket open besides maybe documenting
"Don't use `date_hierarchy` if you have millions of entries."

--
Ticket URL: <https://code.djangoproject.com/ticket/23242#comment:4>

Django

unread,

Sep 18, 2014, 4:05:39 PM9/18/14

to django-...@googlegroups.com

#23242: slow date_hierarchy on a big tables
-------------------------------------+-------------------------------------
Reporter: meteozond | Owner: nobody
Type: | Status: new
Cleanup/optimization | Version: master

Comment (by aaugustin):

The alternative would be an option to show all months and days regardless
of whether items exist on these dates.

{{{
date_hierarchy_existing_only = True # default
}}}

The year part can be handled with range(min, max) which will be fast if
the field is indexed.

--
Ticket URL: <https://code.djangoproject.com/ticket/23242#comment:5>

Django

unread,

Sep 18, 2014, 4:30:09 PM9/18/14

to django-...@googlegroups.com

#23242: Add an option to avoid slow date_hierarchy queries on a big tables

-------------------------------------+-------------------------------------
Reporter: meteozond | Owner: nobody

Type: New feature | Status: new
Component: contrib.admin | Version: master
Severity: Normal | Resolution:
Keywords: date_hierarchy | Triage Stage: Accepted

datetimes dates | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Changes (by timgraham):

* type: Cleanup/optimization => New feature
* stage: Unreviewed => Accepted

--
Ticket URL: <https://code.djangoproject.com/ticket/23242#comment:6>

Django

unread,

Aug 31, 2015, 1:02:29 PM8/31/15

to django-...@googlegroups.com

#23242: Add an option to avoid slow date_hierarchy queries on a big tables

-------------------------------------+-------------------------------------
Reporter: meteozond | Owner: nobody

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by gavinwahl):

* has_patch: 0 => 1

Comment:

https://github.com/django/django/pull/5213

--
Ticket URL: <https://code.djangoproject.com/ticket/23242#comment:7>

Django

unread,

Sep 2, 2015, 2:26:32 PM9/2/15

to django-...@googlegroups.com

#23242: Add an option to avoid slow date_hierarchy queries on a big tables

-------------------------------------+-------------------------------------
Reporter: meteozond | Owner: nobody

Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Changes (by timgraham):

* needs_better_patch: 0 => 1

--
Ticket URL: <https://code.djangoproject.com/ticket/23242#comment:8>

Django

unread,

Dec 17, 2017, 5:03:55 AM12/17/17

to django-...@googlegroups.com

#23242: Add an option to avoid slow date_hierarchy queries on a big tables
-------------------------------------+-------------------------------------
Reporter: Alexander Klimenko | Owner: nobody

Comment (by Aymeric Augustin):

As discussed in this [https://groups.google.com/d/msg/django-
developers/puw2kK1IX0o/j4gaCdDyBAAJ django-developers discussion], there's
a third-party solution in [https://github.com/hakib/django-admin-
lightweight-date-hierarchy#quickstart django-admin-lightweight-date-
hierarchy].

--
Ticket URL: <https://code.djangoproject.com/ticket/23242#comment:9>

Django

unread,

Dec 17, 2017, 11:03:02 AM12/17/17

to django-...@googlegroups.com

#23242: Add an option to avoid slow date_hierarchy queries on a big tables
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

While it's still possible that getting rid of the the `DISTINCT` generated
by `dates()` might help I wonder, by looking at the provided PostgreSQL
plan in comment:3, if the actual culprit of the slow query was the
`EXTRACT('month')` all along as related in #28933.

I also wonder if `LIMIT`'ing the results to `12` in in the case of
`year&month` might help the database engine engine figure out that it can
stop the `DISTINCT` processing once it retrieved 12 results.

Another option would to stop using `dates` (or allow passing a range to
it) for `year&month` and `year&month&day` filtering and use a combination
of `generate_series` and `EXISTS` to speed up the query.

e.g. for the `year&month` case (on a `DateField` for simplicitiy but
nothing prevent us from using the same mechanism on `DateTimeField` with a
few adjustments)

{{{#!sql
SELECT EXTRACT('month' from date)
FROM generate_series('2017-01-01'::date, '2017-12-01'::date, '1 month') AS
serie(data)
WHERE EXISTS (
SELECT 1
FROM event
WHERE event.datetime >= month AND event.datetime < (month + interval
'1' month)
)
}}}

This seems to perform very well on a large ''events'' table with an
indexed `datetime` column containing 17M rows.

--
Ticket URL: <https://code.djangoproject.com/ticket/23242#comment:10>

Django

unread,

Dec 17, 2017, 11:47:14 AM12/17/17

to django-...@googlegroups.com

#23242: Add an option to avoid slow date_hierarchy queries on a big tables
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

FWIW it looks like the `SELECT DISTINCT column FROM large_result_set LIMIT
small_threshold` optimization [https://www.postgresql.org/message-
id/gh8m5v%247oj%241%40news.hub.org was suggested on pgsql-hackers a few
years ago] but [https://www.postgresql.org/message-
id/gi94g7%24be%241%40news.hub.org never made it to the TODO].

--
Ticket URL: <https://code.djangoproject.com/ticket/23242#comment:11>

Reply all

Reply to author

Forward