--
Ticket URL: <https://code.djangoproject.com/ticket/23242>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* status: new => closed
* needs_docs: => 0
* resolution: => needsinfo
* needs_tests: => 0
* needs_better_patch: => 0
Comment:
I'm not sure where the distinct query is that you're referring to. I also
don't understand your proposed solution. Please reopen if you can provide
more details (or even a patch).
--
Ticket URL: <https://code.djangoproject.com/ticket/23242#comment:1>
* easy: 1 => 0
Comment:
When the `date_hierarchy` option is used, the admin shows a list of
available years, months or days. The implementation is designed to show
only dates for which objects exist.
It provides a better UX for sparsely populated tables, where iterating
between min and max dates would show links to pages with no objects, but
performs badly on large tables.
In general, the admin isn't designed for dealing with large amounts of
data. This choice can be traced back to Django being designed for
editorial sites.
Certainly, the situation could be improved, but how?
--
Ticket URL: <https://code.djangoproject.com/ticket/23242#comment:2>
* status: closed => new
* cc: herve (added)
* resolution: needsinfo =>
Comment:
Here is an example of such a request:
{{{#!sql
SELECT DISTINCT DATE_TRUNC('day', "messages_message"."sent_at" AT TIME
ZONE E'Europe/Paris') FROM "messages_message" WHERE (EXTRACT('month' FROM
"messages_message"."sent_at" AT TIME ZONE E'Europe/Paris') = 9 AND
"messages_message"."sent_at" BETWEEN
'2014-01-01T00:00:00+01:00'::timestamptz and
'2014-12-31T23:59:59.999999+01:00'::timestamptz AND
"messages_message"."sent_at" IS NOT NULL) ORDER BY 1 ASC
}}}
The explain/analyze on ~6 million entries:
{{{
Sort (cost=62488.59..62488.60 rows=1 width=8) (actual
time=18791.784..18791.784 rows=1 loops=1)
Sort Key: (date_trunc('day'::text, timezone('Europe/Paris'::text,
sent_at)))
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=62488.57..62488.58 rows=1 width=8) (actual
time=18791.465..18791.466 rows=1 loops=1)
-> Bitmap Heap Scan on messages_message (cost=856.42..62488.18
rows=153 width=8) (actual time=1276.004..17066.794 rows=6135296 loops=1)
Recheck Cond: ((sent_at >= '2014-01-01
00:00:00+01'::timestamp with time zone) AND (sent_at <= '2014-12-31
23:59:59.999999+01'::timestamp with time zone) AND (sent_at IS NOT NULL))
Filter: (date_part('month'::text,
timezone('Europe/Paris'::text, sent_at)) = 9::double precision)
-> Bitmap Index Scan on messages_message_sent_at_id_idx
(cost=0.00..856.38 rows=30655 width=0) (actual time=1274.039..1274.039
rows=6136069 loops=1)
Index Cond: ((sent_at >= '2014-01-01
00:00:00+01'::timestamp with time zone) AND (sent_at <= '2014-12-31
23:59:59.999999+01'::timestamp with time zone) AND (sent_at IS NOT NULL))
Total runtime: 18791.933 ms
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/23242#comment:3>
Comment (by timgraham):
@herve, do you have ideas for how to improve the situation? Absent that, I
don't see a reason to keep this ticket open besides maybe documenting
"Don't use `date_hierarchy` if you have millions of entries."
--
Ticket URL: <https://code.djangoproject.com/ticket/23242#comment:4>
Comment (by aaugustin):
The alternative would be an option to show all months and days regardless
of whether items exist on these dates.
{{{
date_hierarchy_existing_only = True # default
}}}
The year part can be handled with range(min, max) which will be fast if
the field is indexed.
--
Ticket URL: <https://code.djangoproject.com/ticket/23242#comment:5>
* type: Cleanup/optimization => New feature
* stage: Unreviewed => Accepted
--
Ticket URL: <https://code.djangoproject.com/ticket/23242#comment:6>
* has_patch: 0 => 1
Comment:
https://github.com/django/django/pull/5213
--
Ticket URL: <https://code.djangoproject.com/ticket/23242#comment:7>
* needs_better_patch: 0 => 1
--
Ticket URL: <https://code.djangoproject.com/ticket/23242#comment:8>
Comment (by Aymeric Augustin):
As discussed in this [https://groups.google.com/d/msg/django-
developers/puw2kK1IX0o/j4gaCdDyBAAJ django-developers discussion], there's
a third-party solution in [https://github.com/hakib/django-admin-
lightweight-date-hierarchy#quickstart django-admin-lightweight-date-
hierarchy].
--
Ticket URL: <https://code.djangoproject.com/ticket/23242#comment:9>
Comment (by Simon Charette):
While it's still possible that getting rid of the the `DISTINCT` generated
by `dates()` might help I wonder, by looking at the provided PostgreSQL
plan in comment:3, if the actual culprit of the slow query was the
`EXTRACT('month')` all along as related in #28933.
I also wonder if `LIMIT`'ing the results to `12` in in the case of
`year&month` might help the database engine engine figure out that it can
stop the `DISTINCT` processing once it retrieved 12 results.
Another option would to stop using `dates` (or allow passing a range to
it) for `year&month` and `year&month&day` filtering and use a combination
of `generate_series` and `EXISTS` to speed up the query.
e.g. for the `year&month` case (on a `DateField` for simplicitiy but
nothing prevent us from using the same mechanism on `DateTimeField` with a
few adjustments)
{{{#!sql
SELECT EXTRACT('month' from date)
FROM generate_series('2017-01-01'::date, '2017-12-01'::date, '1 month') AS
serie(data)
WHERE EXISTS (
SELECT 1
FROM event
WHERE event.datetime >= month AND event.datetime < (month + interval
'1' month)
)
}}}
This seems to perform very well on a large ''events'' table with an
indexed `datetime` column containing 17M rows.
--
Ticket URL: <https://code.djangoproject.com/ticket/23242#comment:10>
Comment (by Simon Charette):
FWIW it looks like the `SELECT DISTINCT column FROM large_result_set LIMIT
small_threshold` optimization [https://www.postgresql.org/message-
id/gh8m5v%247oj%241%40news.hub.org was suggested on pgsql-hackers a few
years ago] but [https://www.postgresql.org/message-
id/gi94g7%24be%241%40news.hub.org never made it to the TODO].
--
Ticket URL: <https://code.djangoproject.com/ticket/23242#comment:11>