[Django] #31304: PostgreSQL full-text search employs coalesce function for non-null single-column searches with SearchVector

67 views
Skip to first unread message

Django

unread,
Feb 24, 2020, 6:12:10 AM2/24/20
to django-...@googlegroups.com
#31304: PostgreSQL full-text search employs coalesce function for non-null single-
column searches with SearchVector
-------------------------------------+-------------------------------------
Reporter: uiopaubo | Owner: nobody
Type: | Status: new
Uncategorized |
Component: | Version: 2.2
Uncategorized | Keywords: PostgreSQL text
Severity: Normal | search FTS coalesce SearchVector
Triage Stage: | Has patch: 1
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
When following the PostgreSQL full-text search documentation for Django
(https://docs.djangoproject.com/en/2.2/ref/contrib/postgres/search/), the
search lookup...

{{{
Table.objects.filter(column__search="keyword")
}}}

...produces SQL of the following form:

{{{
to_tsvector(column) @@ plainto_tsquery('keyword')
}}}

However, the PostgreSQL documentation notes that such expressions will be
unable to take advantage of indexes created on the column:

"Only text search functions that specify a configuration name can be used
in expression indexes [...] Because the two-argument version of
to_tsvector was used in the index above, only a query reference that uses
the 2-argument version of to_tsvector with the same configuration name
will use that index."

[https://www.postgresql.org/docs/11/textsearch-tables.html]

Introducing a SearchQuery object employing the config parameter...

{{{
Table.objects.filter(column__search=SearchQuery("keyword",
config="simple"))
}}}

...produces SQL of the following form:

{{{
to_tsvector(column) @@ plainto_tsquery('simple', 'keyword')
}}}

The Django documentation suggests using an annotation employing a
SearchVector as follows:

{{{
Table.objects.annotate(search=SearchVector("column",
config="simple")).filter(search=SearchQuery("keyword", config="simple"))
}}}

The resulting SQL generated by Django is then as follows:

{{{
to_tsvector('simple', coalesce(column, '')) @@ plainto_tsquery('simple',
'keyword')
}}}

Unfortunately, the use of coalesce now blocks any application of an index
on the column.

What seems to be possible, however, is to modify the SQL generation to
avoid using coalesce where it can be determined that the operand given to
to_tsvector will not yield a null value. This should produce the following
more desirable SQL:

{{{
to_tsvector('simple', column) @@ plainto_tsquery('simple', 'keyword')
}}}

A patch is provided as a suggestion of how this issue might be fixed.

--
Ticket URL: <https://code.djangoproject.com/ticket/31304>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Feb 24, 2020, 6:12:36 AM2/24/20
to django-...@googlegroups.com
#31304: PostgreSQL full-text search employs coalesce function for non-null single-
column searches with SearchVector
-------------------------------------+-------------------------------------
Reporter: uiopaubo | Owner: nobody
Type: Uncategorized | Status: new
Component: Uncategorized | Version: 2.2
Severity: Normal | Resolution:
Keywords: PostgreSQL text | Triage Stage:
search FTS coalesce SearchVector | Unreviewed
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by uiopaubo):

* Attachment "patch-django-2.2.9-postgres-search.diff" added.

Django

unread,
Feb 24, 2020, 11:01:56 AM2/24/20
to django-...@googlegroups.com
#31304: PostgreSQL full-text search employs coalesce function for non-null single-
column searches with SearchVector
-------------------------------------+-------------------------------------
Reporter: Paul Boddie | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 2.2
(models, ORM) |

Severity: Normal | Resolution:
Keywords: PostgreSQL text | Triage Stage:
search FTS coalesce SearchVector | Unreviewed
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Paul Boddie):

* cc: Paul Boddie (added)
* component: Uncategorized => Database layer (models, ORM)


--
Ticket URL: <https://code.djangoproject.com/ticket/31304#comment:1>

Django

unread,
Feb 24, 2020, 1:26:18 PM2/24/20
to django-...@googlegroups.com
#31304: PostgreSQL full-text search employs coalesce function for non-null single-
column searches with SearchVector
-------------------------------------+-------------------------------------
Reporter: Paul Boddie | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 2.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: PostgreSQL text | Triage Stage:
search FTS coalesce SearchVector | Unreviewed
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

Hey Paul,

> Unfortunately, the use of coalesce now blocks any application of an
index on the column.

Could you elaborate on that? What's preventing you from creating a GIN
functional index on `to_tsvector('simple', coalesce(column, ''))` instead?

I'd suggest you submit your patch as a Github PR so CI can run the full
suite against it.

I suspect the `.null` check won't be enough as some column values mapped
to `null=False` fields can end up being ''null'' when outer joins are
involved and it won't be reflected in `output_field`. In short we can't
drop the `Coalesce` wrapping unless we can guarantee we don't break
backward compatibility here; I wish we forced the usage of `Coalesce` from
the beginning instead but that's not the case.

--
Ticket URL: <https://code.djangoproject.com/ticket/31304#comment:2>

Django

unread,
Feb 24, 2020, 1:39:21 PM2/24/20
to django-...@googlegroups.com
#31304: PostgreSQL full-text search employs coalesce function for non-null single-
column searches with SearchVector
-------------------------------------+-------------------------------------
Reporter: Paul Boddie | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 2.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: PostgreSQL text | Triage Stage:
search FTS coalesce SearchVector | Unreviewed
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

I guess this ticket could also re-purposed as a new feature to make
`__search=SearchQuery` force the usage of the rhs's `config` for the lhs.

Something along

{{{#!diff
diff --git a/django/contrib/postgres/lookups.py
b/django/contrib/postgres/lookups.py
index cc5bc022c6..1cb03b9510 100644
--- a/django/contrib/postgres/lookups.py
+++ b/django/contrib/postgres/lookups.py
@@ -1,7 +1,9 @@
from django.db.models import Lookup, Transform
from django.db.models.lookups import Exact, FieldGetDbPrepValueMixin

-from .search import SearchVector, SearchVectorExact, SearchVectorField
+from .search import (
+ SearchQuery, SearchVector, SearchVectorExact, SearchVectorField
+)


class PostgresSimpleLookup(FieldGetDbPrepValueMixin, Lookup):
@@ -57,7 +59,10 @@ class SearchLookup(SearchVectorExact):

def process_lhs(self, qn, connection):
if not isinstance(self.lhs.output_field, SearchVectorField):
- self.lhs = SearchVector(self.lhs)
+ config = None
+ if isinstance(self.rhs, SearchQuery):
+ config = self.rhs.config
+ self.lhs = SearchVector(self.lhs, config=config)
lhs, lhs_params = super().process_lhs(qn, connection)
return lhs, lhs_params
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/31304#comment:3>

Django

unread,
Feb 25, 2020, 4:06:36 AM2/25/20
to django-...@googlegroups.com
#31304: PostgreSQL full-text search employs coalesce function for non-null single-
column searches with SearchVector
-------------------------------------+-------------------------------------
Reporter: Paul Boddie | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 2.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: PostgreSQL text | Triage Stage:
search FTS coalesce SearchVector | Unreviewed
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Paul Boddie):

The motivation for eliminating coalesce is that it is superfluous when
searching a non-null column already indexed appropriately. I accept that
in the general case, where people may be combining columns, there may be a
need to coalesce null values, but I would regard it as pretty inelegant to
have to create an index coalescing values that would never be null. (One
might argue that PostgreSQL could itself eliminate any redundant coalesce
operation specified for an index, and I don't know why it doesn't in this
case.)

The suggestion to make the search lookup employ any indicating
configuration from the operand is a good one. Indeed, it surprises me that
this isn't already done because it appears to be an easy mistake when
manually preparing queries that could be eliminated in a framework like
this.

I'll try and make a pull request on GitHub.

--
Ticket URL: <https://code.djangoproject.com/ticket/31304#comment:4>

Django

unread,
Feb 25, 2020, 4:19:38 AM2/25/20
to django-...@googlegroups.com
#31304: PostgreSQL full-text search employs coalesce function for non-null single-
column searches with SearchVector
-------------------------------------+-------------------------------------
Reporter: Paul Boddie | Owner: nobody
Type: New feature | Status: new

Component: Database layer | Version: 2.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: PostgreSQL text | Triage Stage: Accepted
search FTS coalesce SearchVector |
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Carlton Gibson):

* type: Uncategorized => New feature
* stage: Unreviewed => Accepted


Comment:

Accepting based on the discussion so far. Thanks both.

--
Ticket URL: <https://code.djangoproject.com/ticket/31304#comment:5>

Django

unread,
Feb 25, 2020, 9:50:01 AM2/25/20
to django-...@googlegroups.com
#31304: PostgreSQL full-text search employs coalesce function for non-null single-
column searches with SearchVector
-------------------------------------+-------------------------------------
Reporter: Paul Boddie | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 2.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: PostgreSQL text | Triage Stage: Accepted
search FTS coalesce SearchVector |
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

Paul, I did more investigation and it looks like we can't also drop the
`Coalesce` even in cases where there's a single expression and no
concatenation needs to take place. The reason for that is that
`to_tsvector('')` and `to_tsvector(null)` are not equivalent as the latter
will result in SQL `Unknown` when used with the `@@` operator. That's
problematic when negating expressions as both `to_tsvector(null) @@
to_tsquery('test')` and `NOT to_tsvector(null) @@ to_tsquery('test')`
result in the same value effectively breaking `~Q` and `exclude`.

Regarding the `.null` handling I also did a bit of investigation and
here's a case where checking for `output_field.null` would be a false
negative


{{{#!python
class Manager(models.Model):
name = models.TextField()

class Store(models.Model):
manager = models.ForeignKey(Manager)
second_manager = models.ForeignKey(Manager, null=True)


Store.objects.annotate(
search=SearchVector('manager__name', 'second_manager__name')
).filter(search='Victor')
}}}

In this particular case the `Col` that `'second_manager__name'` resolves
to would have `Manager.name` as `output_field` but it's not `null=True`
even it's ''nullable'' if `store.second_manager is None`. In order to
properly determine whether or not the expression should be coalesced we
need to annotate `Col` with a flag that takes the whole relationship
nullability in account.

--
Ticket URL: <https://code.djangoproject.com/ticket/31304#comment:6>

Django

unread,
Feb 25, 2020, 10:51:47 AM2/25/20
to django-...@googlegroups.com
#31304: PostgreSQL full-text search employs coalesce function for non-null single-
column searches with SearchVector
-------------------------------------+-------------------------------------
Reporter: Paul Boddie | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 2.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: PostgreSQL text | Triage Stage: Accepted
search FTS coalesce SearchVector |
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Paul Boddie):

Thanks for the comment, Simon.

I wouldn't expect to_tsvector to treat an empty string and null in the
same way - this being SQL after all - but the aim would be to avoid
coalesce where null could never occur. This then becomes an issue of
knowing when this might be guaranteed.

I'm still getting familiar with the ORM, but I imagine that there could
easily be cases where column values are null even if the table columns are
declared not null. For instance, any query involving an outer join could
produce null values for a "not null" column. In such cases, the
characteristics of the output column would need to be defined by
identifying its role in the query, pretty much as you say.

I am inclined to think that making my own lookup would be the safest thing
to do for now, with the lookups provided also considering the issue of
transferring the configuration.

--
Ticket URL: <https://code.djangoproject.com/ticket/31304#comment:7>

Django

unread,
Feb 25, 2020, 6:25:45 PM2/25/20
to django-...@googlegroups.com
#31304: PostgreSQL full-text search employs coalesce function for non-null single-
column searches with SearchVector
-------------------------------------+-------------------------------------
Reporter: Paul Boddie | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: 2.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: PostgreSQL text | Triage Stage: Accepted
search FTS coalesce SearchVector |
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* cc: Simon Charette (added)


Comment:

> I wouldn't expect to_tsvector to treat an empty string and null in the
same way - this being SQL after all - but the aim would be to avoid
coalesce where null could never occur. This then becomes an issue of
knowing when this might be guaranteed.

Right the main issue here is that `SearchVector` (just like `Concat` for
example) has been coalescing nulls to empty strings forever so we can't
just change it now without breaking backward compatibility.

> I'm still getting familiar with the ORM, but I imagine that there could
easily be cases where column values are null even if the table columns are
declared not null. For instance, any query involving an outer join could
produce null values for a "not null" column. In such cases, the
characteristics of the output column would need to be defined by
identifying its role in the query, pretty much as you say.

Right comment:6 provides another example of that. The `JOIN` resolution
logic knows about these things but it's not surfaced at the `Expression`
introspection level right now so we can't even start deprecating the
current behaviour if we wanted to. I plan on working on patch that exposes
an `Expression.nullable` property that gets assigned on field reference
resolution but that's not an easy thing to get right as you might suspect.

> I am inclined to think that making my own lookup would be the safest
thing to do for now, with the lookups provided also considering the issue
of transferring the configuration.

I tend to agree, are you still interested in submitting a PR that does a
`config` assignment in `SearchLookup.process_lhs`?

--
Ticket URL: <https://code.djangoproject.com/ticket/31304#comment:8>

Django

unread,
Feb 26, 2020, 3:11:18 AM2/26/20
to django-...@googlegroups.com
#31304: PostgreSQL full-text search employs coalesce function for non-null single-
column searches with SearchVector
-------------------------------------+-------------------------------------
Reporter: Paul Boddie | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master

(models, ORM) |
Severity: Normal | Resolution:
Keywords: PostgreSQL text | Triage Stage: Accepted
search FTS coalesce SearchVector |
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by felixxm):

* has_patch: 1 => 0
* version: 2.2 => master


--
Ticket URL: <https://code.djangoproject.com/ticket/31304#comment:9>

Django

unread,
Feb 26, 2020, 4:10:33 AM2/26/20
to django-...@googlegroups.com
#31304: PostgreSQL full-text search employs coalesce function for non-null single-
column searches with SearchVector
-------------------------------------+-------------------------------------
Reporter: Paul Boddie | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: PostgreSQL text | Triage Stage: Accepted
search FTS coalesce SearchVector |
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Paul Boddie):

I can probably look into it, given that I have managed to write a lookup
that seems to have the desired behaviour. Should it be a different issue,
though?

--
Ticket URL: <https://code.djangoproject.com/ticket/31304#comment:10>

Django

unread,
Feb 26, 2020, 9:43:09 AM2/26/20
to django-...@googlegroups.com
#31304: PostgreSQL full-text search employs coalesce function for non-null single-
column searches with SearchVector
-------------------------------------+-------------------------------------
Reporter: Paul Boddie | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: PostgreSQL text | Triage Stage: Accepted
search FTS coalesce SearchVector |
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

> Should it be a different issue, though?

Yeah I think it should.

--
Ticket URL: <https://code.djangoproject.com/ticket/31304#comment:11>

Django

unread,
Mar 6, 2020, 4:45:32 AM3/6/20
to django-...@googlegroups.com
#31304: PostgreSQL full-text search employs coalesce function for non-null single-
column searches with SearchVector
-------------------------------------+-------------------------------------
Reporter: Paul Boddie | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: PostgreSQL text | Triage Stage: Accepted
search FTS coalesce SearchVector |
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak <felisiak.mariusz@…>):

In [changeset:"7c8b043a036dd0eab91131800ed104bc17956b22" 7c8b043a]:
{{{
#!CommitTicketReference repository=""
revision="7c8b043a036dd0eab91131800ed104bc17956b22"
Refs #31304 -- Made __search lookup default to its rhs' config.

This make the SearchLookup lookup more coherent with its
SearchVectorExact base which configures its rhs SearchQuery with its
lhs' config.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/31304#comment:12>

Django

unread,
Mar 13, 2020, 3:37:44 PM3/13/20
to django-...@googlegroups.com
#31304: PostgreSQL full-text search employs coalesce function for non-null single-
column searches with SearchVector
-------------------------------------+-------------------------------------
Reporter: Paul Boddie | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: PostgreSQL text | Triage Stage: Accepted
search FTS coalesce SearchVector |
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak <felisiak.mariusz@…>):

In [changeset:"e2417010daf697d5a58bc2057a24b7785c4cef03" e2417010]:
{{{
#!CommitTicketReference repository=""
revision="e2417010daf697d5a58bc2057a24b7785c4cef03"
Refs #31304 -- Added tests for using __search lookup with combined
SearchQuerys with the same config.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/31304#comment:13>

Django

unread,
May 28, 2021, 2:05:31 PM5/28/21
to django-...@googlegroups.com
#31304: PostgreSQL full-text search employs coalesce function for non-null single-
column searches with SearchVector
-------------------------------------+-------------------------------------
Reporter: Paul Boddie | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: dev

(models, ORM) |
Severity: Normal | Resolution:
Keywords: PostgreSQL text | Triage Stage: Accepted
search FTS coalesce SearchVector |
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

Until this is addressed, assuming you know expressions you are dealing
with are `NOT NULL`, a limited replacement for `SearchVector` is the
following

{{{#!python
from django.contrib.postgres.search import SearchConfig, SearchVectorField
from django.db.models import Func
from django.db.models.expressions import ExpressionList


class SearchVector(Func):
"""
Replacement of `django.contrib.postgres.search.SearchVector` that
works around limitations of the later with regards to indexing.

See https://code.djangoproject.com/ticket/31304#comment:6
"""

function = 'to_tsvector'
output_field = SearchVectorField()

def __init__(self, *expressions, config=None):
expressions = (
SearchConfig.from_parameter(config),
ExpressionList(*expressions, arg_joiner=" || ' ' || "),
)
super().__init__(*expressions)
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/31304#comment:14>

Reply all
Reply to author
Forward
0 new messages