[Django] #30637: Django is unable to combine SearchVectorField and SearchVector

33 views
Skip to first unread message

Django

unread,
Jul 14, 2019, 2:59:09 AM7/14/19
to django-...@googlegroups.com
#30637: Django is unable to combine SearchVectorField and SearchVector
-------------------------------------+-------------------------------------
Reporter: Dani | Owner: (none)
Hodovic |
Type: | Status: new
Cleanup/optimization |
Component: | Version: 2.2
contrib.postgres | Keywords: db, postgres, full-
Severity: Normal | text, search
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
When using `django.contrib.postgres` to perform full text search it's not
possible to combine SearchVectors and SearchVectorfields. Doing so impacts
the performance of the query.

Suppose we have a model with a small "role" field and a much larger "text"
field. The text field is large enough that it warrants indexing in a
separate column as a tsvector with a Gin index to ensure our queries are
fast.

{{{
#!div style="font-size: 80%"
Code highlighting:
{{{#!python
class JobPosting(models.Model):
role = models.CharField(max_length=170, null=True)
text = models.TextField(max_length=8000, default="")
# Large field optimized for full text search
text_search = SearchVectorField(null=True)

class Meta:
indexes = [GinIndex(fields=["text_search"])]
}}}
}}}

If we need to perform a search on all columns we need to combine them into
a common tsvector. The problem is that Django casts the large and search
optimized text_search field to text and then back into a tsvector. This
results in Postgres not using the existing Gin index and having to cast
between types which makes for a very slow query.

{{{
#!div style="font-size: 80%"
Code highlighting:
{{{#!python
JobPosting.objects.annotate(full_text=(SearchVector("role") +
SearchVector("text_search"))).filter(full_text=SearchQuery("foo"))
}}}
}}}


{{{
#!div style="font-size: 80%"
Code highlighting:
{{{#!sql
EXPLAIN ANALYZE SELECT "jobs_jobposting"."id",
"jobs_jobposting"."role",
"jobs_jobposting"."text",
(to_tsvector(COALESCE("jobs_jobposting"."role", '')) ||
to_tsvector(COALESCE(("jobs_jobposting"."text_search")::text, ''))) AS
"full_text"
FROM "jobs_jobposting"
WHERE (to_tsvector(COALESCE("jobs_jobposting"."role", '')) ||
to_tsvector(COALESCE(("jobs_jobposting"."text_search")::text, ''))) @@
(plainto_tsquery('foo')) = true
LIMIT 21;
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..35.54 rows=1 width=809) (actual time=40.085..40.085
rows=0 loops=1)
-> Seq Scan on jobs_jobposting (cost=0.00..35.54 rows=1 width=809)
(actual time=40.082..40.082 rows=0 loops=1)
Filter: ((to_tsvector((COALESCE(role, ''::character
varying))::text) || to_tsvector(COALESCE((text_search)::text, ''::text)))
@@ plainto_tsquery('foo'::text))
Rows Removed by Filter: 42
Planning Time: 3.140 ms
Execution Time: 40.273 ms
}}}
}}}


If you compare this to using the text_search field directly we can see
that the query is much faster, presumably due to the use of index and lack
of casting to `::text`

{{{
#!div style="font-size: 80%"
Code highlighting:
{{{#!python
JobPosting.objects.filter(text_search=SearchQuery("foo"))
}}}
}}}

{{{
#!div style="font-size: 80%"
Code highlighting:
{{{#!sql
EXPLAIN ANALYZE SELECT "jobs_jobposting"."id",
"jobs_jobposting"."text",
"jobs_jobposting"."role"
FROM "jobs_jobposting"
WHERE "jobs_jobposting"."text_search" @@ (plainto_tsquery('foo')) =
true
LIMIT 21;
------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..15.24 rows=1 width=371) (actual time=1.165..1.166
rows=0 loops=1)
-> Seq Scan on jobs_jobposting (cost=0.00..15.24 rows=1 width=371)
(actual time=1.163..1.163 rows=0 loops=1)
Filter: (text_search @@ plainto_tsquery('foo'::text))
Rows Removed by Filter: 42
Planning Time: 0.699 ms
Execution Time: 1.209 ms
}}}
}}}


Compare the execution times: from **40.3ms to 1.2ms**.

You could technically concatenate the role and the text field into one
SearchVectorField, but then you would be unable to search rank different
fields differently. Perhaps we would like to rank the information in the
role column as A, but the text column as C.

I have tried to use F expressions to concatenate the columns, but then
Django complains.

{{{
#!div style="font-size: 80%"
Code highlighting:
{{{#!python
JobPosting.objects.annotate(full_text=(SearchVector("role") +
F("text_search"))).filter(full_text=SearchQuery("foo"))
TypeError: SearchVector can only be combined with other SearchVectors
}}}
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/30637>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jul 14, 2019, 3:04:09 AM7/14/19
to django-...@googlegroups.com
#30637: Django is unable to combine SearchVectorField and SearchVector
-------------------------------------+-------------------------------------
Reporter: Dani Hodovic | Owner: (none)
Type: | Status: new
Cleanup/optimization |
Component: contrib.postgres | Version: 2.2
Severity: Normal | Resolution:
Keywords: db, postgres, full- | Triage Stage:
text, search | Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by Dani Hodovic:

Old description:

New description:

I have also tried casting the "text_search" field into a SearchVector, but
that also fails.


{{{
#!div style="font-size: 80%"
Code highlighting:
{{{#!python
JobPosting.objects.annotate(full_text=(SearchVector("role") +

Cast("text_search", SearchVector()))).filter(full_text=SearchQuery("foo"))


TypeError: SearchVector can only be combined with other SearchVectors
}}}
}}}

--

--
Ticket URL: <https://code.djangoproject.com/ticket/30637#comment:1>

Django

unread,
Jul 15, 2019, 3:43:09 AM7/15/19
to django-...@googlegroups.com
#30637: SearchVectorField and SearchVector should be combinable.

-------------------------------------+-------------------------------------
Reporter: Dani Hodovic | Owner: (none)
Type: Bug | Status: new
Component: contrib.postgres | Version: master
Severity: Normal | Resolution:
Keywords: db, postgres, full- | Triage Stage: Accepted
text, search |
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by felixxm):

* type: Cleanup/optimization => Bug
* version: 2.2 => master
* stage: Unreviewed => Accepted


Comment:

Thanks for the report. Agreed, `SearchVector` and `SearchVectorField`
should be combinable, e.g.
{{{


JobPosting.objects.annotate(full_text=(SearchVector('role') +
F('text_search'))).filter(full_text=SearchQuery('foo'))
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/30637#comment:2>

Django

unread,
Aug 1, 2019, 2:00:34 PM8/1/19
to django-...@googlegroups.com
#30637: SearchVectorField and SearchVector should be combinable.
-------------------------------------+-------------------------------------
Reporter: Dani Hodovic | Owner: hsnprsd
Type: Bug | Status: assigned

Component: contrib.postgres | Version: master
Severity: Normal | Resolution:
Keywords: db, postgres, full- | Triage Stage: Accepted
text, search |
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by hsnprsd):

* status: new => assigned
* owner: (none) => hsnprsd


--
Ticket URL: <https://code.djangoproject.com/ticket/30637#comment:3>

Django

unread,
May 25, 2020, 3:12:29 PM5/25/20
to django-...@googlegroups.com
#30637: SearchVectorField and SearchVector should be combinable.
-------------------------------------+-------------------------------------
Reporter: Dani Hodovic | Owner: Ehsan
| Poursaeed

Type: Bug | Status: assigned
Component: contrib.postgres | Version: master
Severity: Normal | Resolution:
Keywords: db, postgres, full- | Triage Stage: Accepted
text, search |
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Alexandr Tatarinov):

I have taken a look, and not sure about the desired result. Should we
a) avoid ::text cast
b) allow SearchVectorField and SearchVector combination without changing
the produced query
c) both?

I doubt the database can use the index when two fields are combined, and I
can't see the index being used anyway in both queries provided.
So the performance problems should come from ::text cast?
Because SearchVectorField can be null, we still need to wrap it with
Coalesce. So the resulting query is something like this, correct me if I'm
wrong.
Please, test whether this will change the execution time.


{{{


EXPLAIN ANALYZE SELECT "jobs_jobposting"."id",
"jobs_jobposting"."role",
"jobs_jobposting"."text",
(to_tsvector(COALESCE("jobs_jobposting"."role", '')) ||

COALESCE("jobs_jobposting"."text_search", to_tsvector(''))) AS "full_text"


FROM "jobs_jobposting"
WHERE (to_tsvector(COALESCE("jobs_jobposting"."role", '')) ||

COALESCE("jobs_jobposting"."text_search", to_tsvector(''))) @@


(plainto_tsquery('foo')) = true
LIMIT 21;
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/30637#comment:4>

Django

unread,
Mar 12, 2024, 1:49:37 AM3/12/24
to django-...@googlegroups.com
#30637: SearchVectorField and SearchVector should be combinable.
-------------------------------------+-------------------------------------
Reporter: Dani Hodovic | Owner: (none)
Type: Bug | Status: new
Component: contrib.postgres | Version: dev
Severity: Normal | Resolution:
Keywords: db, postgres, full- | Triage Stage: Accepted
text, search |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Ülgen Sarıkavak):

* cc: Ülgen Sarıkavak (added)

--
Ticket URL: <https://code.djangoproject.com/ticket/30637#comment:5>

Django

unread,
Apr 20, 2025, 6:00:17 PM4/20/25
to django-...@googlegroups.com
#30637: SearchVectorField and SearchVector should be combinable.
-------------------------------------+-------------------------------------
Reporter: Dani Hodovic | Owner: Ahmed
| Nassar
Type: Bug | Status: assigned
Component: contrib.postgres | Version: dev
Severity: Normal | Resolution:
Keywords: db, postgres, full- | Triage Stage: Accepted
text, search |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Ahmed Nassar):

* owner: (none) => Ahmed Nassar
* status: new => assigned

--
Ticket URL: <https://code.djangoproject.com/ticket/30637#comment:6>

Django

unread,
Jul 12, 2025, 3:22:48 PM7/12/25
to django-...@googlegroups.com
#30637: SearchVectorField and SearchVector should be combinable.
-------------------------------------+-------------------------------------
Reporter: Dani Hodovic | Owner: Ahmed
| Nassar
Type: Bug | Status: assigned
Component: contrib.postgres | Version: dev
Severity: Normal | Resolution:
Keywords: db, postgres, full- | Triage Stage: Accepted
text, search |
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Dmytro Litvinov):

* cc: Dmytro Litvinov (added)

--
Ticket URL: <https://code.djangoproject.com/ticket/30637#comment:7>
Reply all
Reply to author
Forward
0 new messages