[Django] #28128: Fulltext search very slow with annotate(SearchVector)

29 views

Skip to first unread message

Django

unread,

Apr 25, 2017, 6:55:40 PM4/25/17

to django-...@googlegroups.com

#28128: Fulltext search very slow with annotate(SearchVector)
--------------------------------------------+------------------------
Reporter: Gavin Wahl | Owner: (none)
Type: Uncategorized | Status: new
Component: contrib.postgres | Version: 1.11
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
--------------------------------------------+------------------------
I have this model:

{{{#!python
class Bug(models.Model):
# .. fields ...

# This field is managed by Django, adding stuff to it that should be
# included in the fulltext search.
fulltext = models.TextField()
}}}

The `Bug` table has many rows, so I need to add an index on the fulltext
column for fulltext search:

{{{#!sql
CREATE INDEX ON buggy_bug USING GIN (to_tsvector('english'::regconfig,
COALESCE("fulltext", '')));
}}}

Note that postgres requires the `english` config argument to `to_tsvector`
for it be allowed to be used in an index.

Now, at query time, the expression Django uses must exactly match the
expression in the functional index. By default,
`filter(fulltext__search='foo')` leaves out the config argument to
`to_tsvector`, so my index isn't used. The documentation shows us how to
fix this, by annotating a search field:

{{{#!sql
Bug.objects.annotate(
search=SearchVector('fulltext', config='english')
).filter(
search='foo'
)
}}}

Now the expressions match and the index is used. However, the query is
still crazy slow. This is the query Django does:

{{{#!sql
SELECT "buggy_bug"."id",
to_tsvector('english'::regconfig, COALESCE("buggy_bug"."fulltext",
'')) AS "search"
FROM "buggy_bug"
WHERE to_tsvector('english'::regconfig, COALESCE("buggy_bug"."fulltext",
'')) @@ (plainto_tsquery('english'::regconfig, 'foo')) = true;
}}}

Which takes 2500 milliseconds on my data, and EXPLAIN confirms it is using
the index. With some playing around with it, I find that removing the
annotated search column gives a huge speedup:

{{{#!sql
SELECT "buggy_bug"."id"
FROM "buggy_bug"
WHERE to_tsvector('english'::regconfig, COALESCE("buggy_bug"."fulltext",
'')) @@ (plainto_tsquery('english'::regconfig, 'foo')) = true;
}}}

This query only takes 11 milliseconds.

What's the point of selecting the result of to_tsvector? We don't actually
use it in Django, we just had to do it to specify the config to use for
to_tsvector.

There needs to be a way to specify the config for to_tsvector in order to
use an index without a huge slowdown.

--
Ticket URL: <https://code.djangoproject.com/ticket/28128>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,

Apr 25, 2017, 9:10:14 PM4/25/17

to django-...@googlegroups.com

#28128: Fulltext search very slow with annotate(SearchVector)

----------------------------------+--------------------------------------

Reporter: Gavin Wahl | Owner: (none)

Type: Uncategorized | Status: closed
Component: contrib.postgres | Version: 1.11
Severity: Normal | Resolution: duplicate

Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0

----------------------------------+--------------------------------------
Changes (by Simon Charette):