Optimizing Prefetch for Postgres IN Limit

278 views
Skip to first unread message

Ram Jayaraman

unread,
Jul 24, 2018, 11:00:48 PM7/24/18
to Django users
Is there a prescribed pattern to optimize Prefetches when the expected Prefetch list is known to exceed 100 ? We are on Django 1.8 + Postgres 10. PG has a limit of 100 for values in IN queries after which the Index on the said column is not used. For ex: a typical Prefetch generating an IN query on the PK will be turned into a full table scan if the IN list exceeds 100.

Appreciate your suggestions!


Jason

unread,
Jul 25, 2018, 5:59:26 AM7/25/18
to Django users
Where do you get that in the pg documentation? I can't find that anywhere (google-fu may be failing me), and we do have some queries with more than 100 values using IN.

Christophe Pettus

unread,
Jul 26, 2018, 10:35:15 PM7/26/18
to django...@googlegroups.com

> On Jul 25, 2018, at 02:59, Jason <jjohn...@gmail.com> wrote:
>
> Where do you get that in the pg documentation? I can't find that anywhere (google-fu may be failing me), and we do have some queries with more than 100 values using IN.

It's slightly more complicated than that. Above 100 entries, the PostgreSQL optimizer won't try to do optimizations of the form changing i IN (1, 2 ..., 101) to (i = 1) OR (i = 2) OR (i = 3)... to see if there's a better way of executing the query. It *can* still do an index scan in those cases, although the more entries in the IN list, the less efficient that will be. In general, large IN clauses aren't a great idea; they're better replaced with a join.

--
-- Christophe Pettus
x...@thebuild.com

Ram J

unread,
Jul 27, 2018, 1:10:35 AM7/27/18
to django...@googlegroups.com
Hi Xof,

The issue is that, when you do a naive Prefetch you are left with nothing but Django’s auto generated IN query. I was asking about he recommended way to make this into a JOIN, which is not obvious how to do with Django

Also in my case the naive IN query on the PK *did not* generate a Index scan and ended up being a full table scan

I could have however used a Prefetch object to force PG to use a different index apart from the PK but a naive Prefetch of the form queryset.prefetch(‘table__table2’) does not use any index beyond 100 values

Also it’s hard to estimate how many values will be there in a nested  prefetch, to state the obvious, to do anything different on a query by query basis


--
You received this message because you are subscribed to a topic in the Google Groups "Django users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-users/y9sVr9Pbr-o/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/8A21227D-94A9-459F-89D0-771D3052C0C0%40thebuild.com.
For more options, visit https://groups.google.com/d/optout.

Jason

unread,
Jul 27, 2018, 7:05:17 AM7/27/18
to Django users
well, prefetch explicitly does joining in python, as in the docs


prefetch_related, on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python.

since it doesn't use joins, there's nothing really to generate a match on an index.  FWIW, I'm facing a similar issue when optimizing a slow django query.

Simon Charette

unread,
Jul 27, 2018, 11:52:37 AM7/27/18
to Django users
There's two open tickets to work around this issue.

https://code.djangoproject.com/ticket/25464 which allows passing queryset override to be used for retrieval and another one that I can't currently find that allows specifying that a subquery should be used instead of a an IN clause.

Simon

Ram J

unread,
Jul 27, 2018, 2:39:33 PM7/27/18
to django...@googlegroups.com
Hi Xof,

I realized what you meant regd the optimizer and understood why my query was doing a full table scan. The issue was that column through which I prefetching has a lot of NULLs so we have a partial index on NOT NULL and the optimizer is getting fooled by that. Adding a IS NOT NULL in the same query makes the optimizer use the index again.

Thanks
Ram

--
Ram J

Reply all
Reply to author
Forward
0 new messages