[Django] #27732: django.contrib.postgres.search SearchRank doesn't handle SearchVectorField references

35 views
Skip to first unread message

Django

unread,
Jan 13, 2017, 5:31:57 AM1/13/17
to django-...@googlegroups.com
#27732: django.contrib.postgres.search SearchRank doesn't handle SearchVectorField
references
--------------------------------------------+------------------------
Reporter: matt-hoskins | Owner: (none)
Type: Bug | Status: new
Component: contrib.postgres | Version: 1.10
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
--------------------------------------------+------------------------
For search performance I'm using SearchVectorField on a model. However if
I reference that field in the vector argument to SearchRank I get an
error. This is because SearchRank always wraps a vector argument that
lacks a "resolve_expression" attribute as a SearchVector even if it would
end up resolving to a SearchVectorField.

If I modify tests/postgres_tests/test_search.py and add the following to
the SearchVectorFieldTest class:
{{{#!python
def test_existing_vector_ranking(self):
Line.objects.update(dialogue_search_vector=SearchVector('dialogue'))
searched = Line.objects.filter(character=self.minstrel).annotate(
rank=SearchRank('dialogue_search_vector', SearchQuery('brave
sir robin')),
).order_by('rank')
self.assertSequenceEqual(searched, [self.verse2, self.verse1,
self.verse0])
}}}

Then this results in the following error:
{{{
ProgrammingError: function to_tsvector(tsvector) does not exist
LINE 1: ... "postgres_tests_line"."dialogue_config", ts_rank(to_tsvecto...
^
HINT: No function matches the given name and argument types. You might
need to add explicit type casts.
}}}

As the enforced wrapping of the vector argument in a SearchVector is then
causing the sql to wrap the tsvector field in a coalesce (which doesn't
cause an issue) and then a to_tsvector call.

I've not delved into the Expressions code before (my last rummaging around
in the bowels of the query code was a few years ago) but from my quick
look at existing code and the APIs then perhaps a solution is for
SearchRank to not wrap the vector argument as it comes into __init__ but
instead have a resolve_expression call which checks the first resolved
expression on the result of calling super's resolve_expression is or isn't
of type SearchVectorField and then if it isn't then resolve a wrapped
version.

I've attached a patch which both adds the test I mention above (it may be
that it should live in the rank test rather than the search vector field
test, or have its own test class which covers the combination of both) and
also modifies SearchRank in the way I've suggested above. The postgres
tests pass on my test system with this patch.

--
Ticket URL: <https://code.djangoproject.com/ticket/27732>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jan 13, 2017, 5:32:44 AM1/13/17
to django-...@googlegroups.com
#27732: django.contrib.postgres.search SearchRank doesn't handle SearchVectorField
references
----------------------------------+--------------------------------------

Reporter: matt-hoskins | Owner: (none)
Type: Bug | Status: new
Component: contrib.postgres | Version: 1.10
Severity: Normal | Resolution:

Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------+--------------------------------------
Changes (by matt-hoskins):

* Attachment "searchrankvectorfieldfix.diff" added.

Django

unread,
Jan 13, 2017, 9:09:45 AM1/13/17
to django-...@googlegroups.com
#27732: django.contrib.postgres.search SearchRank doesn't handle SearchVectorField
references
----------------------------------+--------------------------------------
Reporter: Matt Hoskins | Owner: (none)
Type: Bug | Status: new
Component: contrib.postgres | Version: master
Severity: Normal | Resolution:

Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------+--------------------------------------
Changes (by Simon Charette):

* cc: Marc Tamlyn (added)
* version: 1.10 => master


Comment:

Marc, any thoughts on that?

--
Ticket URL: <https://code.djangoproject.com/ticket/27732#comment:1>

Django

unread,
Jan 13, 2017, 10:16:55 AM1/13/17
to django-...@googlegroups.com
#27732: django.contrib.postgres.search SearchRank doesn't handle SearchVectorField
references
----------------------------------+--------------------------------------
Reporter: Matt Hoskins | Owner: (none)
Type: Bug | Status: new
Component: contrib.postgres | Version: master
Severity: Normal | Resolution:

Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------+--------------------------------------

Comment (by Marc Tamlyn):

I had to make a choice as to the "default" behaviour of SearchRank (and
for that matter SearchVector). I made the latter match the former - so
that the query `SearchRank('body_text', SearchQuery('django'))` worked. I
think supporting the simple string version in both cases (when it is or is
not a `SearchVector`) already is somewhat confusing personally, especially
as I believe the "work around" is:

`SearchRank(F('dialogue_search_vector'), SearchQuery('brave sir robin'))`

--
Ticket URL: <https://code.djangoproject.com/ticket/27732#comment:2>

Django

unread,
Jan 13, 2017, 10:53:48 AM1/13/17
to django-...@googlegroups.com
#27732: django.contrib.postgres.search SearchRank doesn't handle SearchVectorField
references
----------------------------------+--------------------------------------
Reporter: Matt Hoskins | Owner: (none)
Type: Bug | Status: new
Component: contrib.postgres | Version: master
Severity: Normal | Resolution:

Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------+--------------------------------------

Comment (by Matt Hoskins):

The example for SearchRank in the documentation does only shows passing in
a SearchVector() as the first argument to SearchRank and a SearchQuery()
for the second so I was a little surprised when looking at the code to
find it was behaving such that if you passed in a string for the first
argument that it assumed you'd just missed off wrapping it in a
SearchVector() rather than it potentially being a reference to a
SearchVectorField.

If SearchRank were left as is (i.e. not supporting passing in a string
which references a SearchVectorField) then I think it would be useful for
the documentation for SearchVectorField to set out the F() "work around"
as the way to use a SearchVectorField in SearchRank. E.g. just have a bit
saying "To use a SearchVectorField in SearchRank then do..." and given an
example too. It takes a little bit of figuring out otherwise (particularly
as the documentation makes no mention that you can pass in a string as the
first argument and it'll wrap it in a SearchVector for you) and I have to
admit wrapping it in F() to prevent that behaviour is not a leap I'd made
yet as most database functions in django just take field references as
strings if the type of the field is already suitable.

--
Ticket URL: <https://code.djangoproject.com/ticket/27732#comment:3>

Django

unread,
Jan 16, 2017, 3:53:35 AM1/16/17
to django-...@googlegroups.com
#27732: django.contrib.postgres.search SearchRank doesn't handle SearchVectorField
references
----------------------------------+--------------------------------------
Reporter: Matt Hoskins | Owner: (none)
Type: Bug | Status: new
Component: contrib.postgres | Version: master
Severity: Normal | Resolution:

Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------+--------------------------------------

Comment (by Matt Hoskins):

After further musing and taking into account:
* The documentation describes passing in a vector and a query into
SearchRank
* The example included with the above shows passing in SearchVector and
a SearchQuery objects only (and not strings)
* Marc's point "I think supporting the simple string version in both
cases (when it is or is not a SearchVector) already is somewhat confusing
personally"
* My feeling that the intuitive way to reference a SearchVectorField as
the first argument to SearchRank would be to pass in a string (given other
functions, like the DateTime ones, accept a string if you want to
reference a field of the correct type, although I note that they do list
their first argument as "expression").
* The DateTime functions use resolve_expression to validate the type
that their first argument resolves into
* It's possible in postgresql to have a column of type tsquery - I can't
think why anyone would ever want to (no use case seems likely), but there
would be some logical consistency in allowing the second argument of
SearchRank to be a reference to a related query column via the potential
to implement SearchQueryField

... then one course of action that springs to mind is to deprecate passing
in strings to SearchRank except where a string for vector resolves into a
SearchVectorField:
* Perhaps apply my patch in a point release (although it's possible, as
Marc suggested, to work around using "F()" for now)
* In the next version (minor/major) use my patch but also emit a
deprecation warning if the resolving of the vector argument doesn't result
in a SearchVectorField type result and also emit a deprecation warning if
the query argument isn't a SearchQuery
* For the documentation of SearchVectorField perhaps mention that a
string can be passed in to reference one in SearchRank to make it clear
that's how to do it
* Once deprecation is complete perhaps check the types of what the
vector and query arguments resolve into and, like with the DateTime
functions, raise ValueError if they don't resolve into the correct type.

There may be good reasons for adopting a different approach of course
(such as the simplest case of just documenting Marc's "work around" as the
official way to reference a SearchVectorField as the vector argument to
SearchRank), these are just my musings!

--
Ticket URL: <https://code.djangoproject.com/ticket/27732#comment:4>

Django

unread,
Jan 20, 2017, 7:18:48 PM1/20/17
to django-...@googlegroups.com
#27732: django.contrib.postgres.search SearchRank doesn't handle SearchVectorField
references
----------------------------------+------------------------------------
Reporter: Matt Hoskins | Owner: (none)
Type: Bug | Status: new

Component: contrib.postgres | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------+------------------------------------
Changes (by Tim Graham):

* stage: Unreviewed => Accepted


Comment:

Tentatively accepting, even though it seems some design decisions remain.
If no code changes are made, perhaps the documentation could be clarified.

--
Ticket URL: <https://code.djangoproject.com/ticket/27732#comment:5>

Django

unread,
May 11, 2017, 3:14:26 PM5/11/17
to django-...@googlegroups.com
#27732: django.contrib.postgres.search SearchRank doesn't handle SearchVectorField
references
----------------------------------+------------------------------------
Reporter: Matt Hoskins | Owner: (none)
Type: Bug | Status: new

Component: contrib.postgres | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------+------------------------------------

Comment (by Andrii Soldatenko):

What's the next steps, because I've found also this issue, and I reinvent
the same workaround as Marc showed with wrapping using `F`.
I think we should at least fix documentation to not confuse users or fix
behavior. Any thoughts?

--
Ticket URL: <https://code.djangoproject.com/ticket/27732#comment:6>

Reply all
Reply to author
Forward
0 new messages