Re: [Django] #35539: SearchVector GinIndex raises IMMUTABLE error

13 views
Skip to first unread message

Django

unread,
Jun 21, 2024, 3:56:15 AM6/21/24
to django-...@googlegroups.com
#35539: SearchVector GinIndex raises IMMUTABLE error
----------------------------------+------------------------------------
Reporter: Alastair D'Silva | Owner: (none)
Type: Bug | Status: new
Component: contrib.postgres | Version: 5.0
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------+------------------------------------
Changes (by Sarah Boyce):

* has_patch: 1 => 0

--
Ticket URL: <https://code.djangoproject.com/ticket/35539#comment:2>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jun 21, 2024, 9:15:19 AM6/21/24
to django-...@googlegroups.com
#35539: SearchVector GinIndex raises IMMUTABLE error
----------------------------------+------------------------------------
Reporter: Alastair D'Silva | Owner: (none)
Type: Bug | Status: closed
Component: contrib.postgres | Version: 5.0
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------+------------------------------------
Changes (by Simon Charette):

* resolution: => invalid
* status: new => closed

Comment:

Blaming the surrounding code leads to #30488 and #30385 which have plenty
of context. It also has ties to #34955.

The summary is that `SearchVector` was designed to support any field and
expression thrown at it even the ones that are not of textual type and /
or are nullable. This encouraged the usage of Postgres `CONCAT` function
over the `||` operator as the former properly deals with both

{{{#!sql
psql (16.3, server 15.6 (Debian 15.6-1.pgdg120+2))
Type "help" for help.

django=# SELECT CONCAT('foo', NULL, 1);
concat
--------
foo1
(1 row)

django=# SELECT 'foo' || NULL || 1;
?column?
----------

(1 row)
}}}

The problem with `CONCAT` however is that it produces non-`IMMUTABLE`
output (the resulting data can change based on some connection settings
that relate to collation for example) and thus is not suitable in index
creation which creates a pickle about how `NULL`-able values should be
dealt with.

----

With all that said this ticket has little to do with this complexity. If
you comment out the code reported to ''solve the issue'' by the reporter
your test still fails Sarah and the reason is simple: `to_tsvector` is not
`IMMUTABLE` unless a `regconfig` is specified
[https://www.postgresql.org/docs/current/textsearch-tables.html
#TEXTSEARCH-TABLES-INDEX which is covered at length in the PostgreSQL
documentation].

> Notice that the 2-argument version of `to_tsvector` is used. '''Only
text search functions that specify a configuration name can be used in
expression indexes''' (Section 11.7). This is because the index contents
must be unaffected by `default_text_search_config`. If they were affected,
the index contents might be inconsistent because different entries could
contain `tsvectors` that were created with different text search
configurations, and there would be no way to guess which was which. It
would be impossible to dump and restore such an index correctly.

So for the same reason `Concat` cannot be used in indices because it
relies on alterable global configuration `SearchVector` cannot be used in
indices without specifying a `config` because they do not use the same
''stop-words''

{{{#!sql
django=# SELECT to_tsvector('english', 'The Web framework for
perfectionists with deadlines.');
to_tsvector
-----------------------------------------------------
'deadlin':7 'framework':3 'perfectionist':5 'web':2
(1 row)

django=# SELECT to_tsvector('french', 'The Web framework for
perfectionists with deadlines.');
to_tsvector
------------------------------------------------------------------------------
'deadlin':7 'for':4 'framework':3 'perfectionist':5 'the':1 'web':2
'with':6
(1 row)
}}}

I'm not sure how the reporter got that removing these particular lines of
code would help but in its current form the report appears to me to be
invalid.
--
Ticket URL: <https://code.djangoproject.com/ticket/35539#comment:3>

Django

unread,
Jul 9, 2024, 1:12:50 AM7/9/24
to django-...@googlegroups.com
#35539: SearchVector GinIndex raises IMMUTABLE error
----------------------------------+------------------------------------
Reporter: Alastair D'Silva | Owner: (none)
Type: Bug | Status: closed
Component: contrib.postgres | Version: 5.0
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------+------------------------------------
Comment (by Alastair D'Silva):

Having the NULL propagate doesn't seem to big deal when searching, it
still means that items that don't have a populated search vector won't be
found.

I have the following patch which is working for me, but does need a bit
more effort, in particular, the bit which removes to_tsvector if we
already have a ts_vector (you can't call to_tsvector on a ts_vector).

{{{
--- search.py.orig 2024-07-09 15:10:14.092901831 +1000
+++ search.py 2024-07-09 09:46:15.962716177 +1000
@@ -4,6 +4,7 @@
Field,
FloatField,
Func,
+ JSONField,
Lookup,
TextField,
Value,
@@ -113,19 +114,26 @@

def as_sql(self, compiler, connection, function=None, template=None):
clone = self.copy()
- clone.set_source_expressions(
- [
- Coalesce(
+
+ new_expressions = []
+ for expression in clone.get_source_expressions():
+ if isinstance(expression.output_field, SearchVectorField):
+ new_expressions.append(expression)
+ function = ''
+ elif isinstance(expression.output_field, JSONField):
+ new_expressions.append(Coalesce(expression,
Value("{}"))),
+ else:
+ new_expressions.append(Coalesce(
(
expression
if isinstance(expression.output_field,
(CharField, TextField))
else Cast(expression, TextField())
),
Value(""),
- )
- for expression in clone.get_source_expressions()
- ]
- )
+ ))
+
+ clone.set_source_expressions(new_expressions)
+
config_sql = None
config_params = []
if template is None:

}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/35539#comment:4>

Django

unread,
Jul 9, 2024, 10:05:34 AM7/9/24
to django-...@googlegroups.com
#35539: SearchVector GinIndex raises IMMUTABLE error
----------------------------------+------------------------------------
Reporter: Alastair D'Silva | Owner: (none)
Type: Bug | Status: closed
Component: contrib.postgres | Version: 5.0
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------+------------------------------------
Comment (by Simon Charette):

You don't need this patch at all of the `save()` shenanigans. All you are
doing here is circumventing the fact that must specify a language
configuration for `ts_vector` to be immutable by running the
`update(search_vector=self.search_vector)` that will use the session
configured `default_text_search_config`.

You can simply do

{{{#!python
class Document(models.Model):
title = models.CharField(max_length=255)
metadata = models.JSONField(null=True)

class Meta:
indexes = [
GinIndex(
SearchVector('title', Coalesce('metadata', Value('')),
config='english'),
name='document_search',
),
]
}}}

Or use a `GeneratedField` if you want to materialize the search vector

{{{#!python
class Document(models.Model):
title = models.CharField(max_length=255)
metadata = models.JSONField(null=True)
search_vector = GeneratedField(
SearchVector('title', Coalesce('metadata', Value('')),
config='english')
output_field=SearchVectorField()
)

class Meta:
indexes = [
GinIndex(
'search_vector', name='search_vector_idx'
),
]
}}}

The key part here is that **a `config` must be specified if you want to
use `SearchVector` in an index or generated field to make it `IMMUTABLE`**
--
Ticket URL: <https://code.djangoproject.com/ticket/35539#comment:5>
Reply all
Reply to author
Forward
0 new messages