[Django] #32492: Add django API for Postgres pg_trgm word_similarity, fuzzy full-text search.

21 views
Skip to first unread message

Django

unread,
Feb 28, 2021, 10:52:51 PM2/28/21
to django-...@googlegroups.com
#32492: Add django API for Postgres pg_trgm word_similarity, fuzzy full-text
search.
-----------------------------------------+------------------------
Reporter: Taneli | Owner: nobody
Type: Uncategorized | Status: new
Component: Uncategorized | Version: 3.1
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-----------------------------------------+------------------------
Use case: filter queryset by an inexact substring. For example, retrieve
an object with the following headline ' Dogecoin is following bitcoin in
its dramatic rise' when searching for 'dogge'.

Currently it is possible to filter a queryset on the basis of trigram
similarity between the search string and full text stored in a column.
{{{#!python
Author.objects.annotate(similarity=TrigramSimilarity('name',
test),).filter(similarity__gt=0.3)
}}}

This is a wrapper around the '''similarity''' function of the
'''pg_trgm''' extension. While it allows comparing full strings, i.e.
searching for 'doge' would find 'dogs' or 'dogge' it is useless for fuzzy
searching of substrings.

{{{
SELECT similarity('dogge', 'doge');
---------
0.57

SELECT similarity('dogge', 'dogecoin is following bitcoin');
------------
0.1
}}}

'''word_similarity''' does take into account the word boundaries

{{{
SELECT word_similarity('doge', 'dogecoin is following bitcoin');
--------------
0.5
}}}

Adding a django API to '''word_similarity''' would allow for better fuzzy
fulltext search without a need to use either raw SQL or external tools
like elasticsearch.

--
Ticket URL: <https://code.djangoproject.com/ticket/32492>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Feb 28, 2021, 10:54:06 PM2/28/21
to django-...@googlegroups.com
#32492: Add django API for Postgres pg_trgm word_similarity, fuzzy full-text
search.
-------------------------------+--------------------------------------

Reporter: Taneli | Owner: nobody
Type: Uncategorized | Status: new
Component: Uncategorized | Version: 3.1
Severity: Normal | Resolution:

Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Description changed by Taneli:

Old description:

New description:

--

--
Ticket URL: <https://code.djangoproject.com/ticket/32492#comment:1>

Django

unread,
Feb 28, 2021, 10:54:22 PM2/28/21
to django-...@googlegroups.com
#32492: Add django API for Postgres pg_trgm word_similarity, fuzzy full-text
search.
-------------------------------+--------------------------------------
Reporter: Taneli | Owner: Taneli
Type: Uncategorized | Status: assigned
Component: Uncategorized | Version: 3.1
Severity: Normal | Resolution:

Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Changes (by Taneli):

* owner: nobody => Taneli
* status: new => assigned


--
Ticket URL: <https://code.djangoproject.com/ticket/32492#comment:2>

Django

unread,
Mar 1, 2021, 1:42:44 AM3/1/21
to django-...@googlegroups.com
#32492: Add django API for Postgres pg_trgm word_similarity, fuzzy full-text
search.
----------------------------------+------------------------------------
Reporter: Taneli | Owner: Taneli
Type: New feature | Status: assigned
Component: contrib.postgres | Version: master
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------+------------------------------------
Changes (by Mariusz Felisiak):

* type: Uncategorized => New feature
* version: 3.1 => master
* component: Uncategorized => contrib.postgres
* stage: Unreviewed => Accepted


Comment:

Thanks. We should add also `TrigramWordDistance()` with the `<<->`
operator.

--
Ticket URL: <https://code.djangoproject.com/ticket/32492#comment:3>

Django

unread,
Jul 29, 2021, 6:21:18 AM7/29/21
to django-...@googlegroups.com
#32492: Add django API for Postgres pg_trgm word_similarity, fuzzy full-text
search.
----------------------------------+------------------------------------
Reporter: Taneli | Owner: Taneli
Type: New feature | Status: assigned
Component: contrib.postgres | Version: dev

Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------+------------------------------------

Comment (by Matthew Schinckel):

FWIW, I use pg_trgm, and I also have a `connection_created` receiver that
sets `pg_trgm.similarity_threshold`. It might be nice to include the
ability to set this in some way, so that other operations that use this
can be included.

--
Ticket URL: <https://code.djangoproject.com/ticket/32492#comment:4>

Django

unread,
Sep 2, 2021, 7:03:54 AM9/2/21
to django-...@googlegroups.com
#32492: Add django API for Postgres pg_trgm word_similarity, fuzzy full-text
search.
----------------------------------+------------------------------------
Reporter: Taneli | Owner: (none)

Type: New feature | Status: assigned
Component: contrib.postgres | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------+------------------------------------
Changes (by Nikita Marchant):

* owner: Taneli => (none)


--
Ticket URL: <https://code.djangoproject.com/ticket/32492#comment:5>

Django

unread,
Sep 2, 2021, 7:04:56 AM9/2/21
to django-...@googlegroups.com
#32492: Add django API for Postgres pg_trgm word_similarity, fuzzy full-text
search.
-------------------------------------+-------------------------------------
Reporter: Taneli | Owner: Nikita
| Marchant

Type: New feature | Status: assigned
Component: contrib.postgres | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Nikita Marchant):

* owner: (none) => Nikita Marchant


--
Ticket URL: <https://code.djangoproject.com/ticket/32492#comment:6>

Django

unread,
Sep 3, 2021, 5:49:21 AM9/3/21
to django-...@googlegroups.com
#32492: Add django API for Postgres pg_trgm word_similarity, fuzzy full-text
search.
-------------------------------------+-------------------------------------
Reporter: Taneli | Owner: Nikita
| Marchant
Type: New feature | Status: assigned
Component: contrib.postgres | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Nikita Marchant):

Hi 👋

I've open a pull request (https://github.com/django/django/pull/14833)
implementing the `trigram_word_similar` lookup and updating the
documentation.

I also wrote a test, but for it to be meaningful, i needed a longer string
that the 16 chars available on `CharFieldModel.field` in the tests. I did
increase the limit to 64 and updated the migration at the same time. Is
this the right approach or should it create a new model or a new migration
? (As the test database is recreated almost each time, i guessed that
modifying a migration is not a big deal but i could be wrong).
I ran the tests on PostgreSQL 13.4 on Mac with Python 3.9.1

I also have some code (adding `TrigramWordSimilarity` and
`TrigramWordDistance`) almost ready for the rest of the ticket but i still
have some troubles/questions because the order of the arguments of
`WORD_SIMILARITY()` is meaningful, unlike for `SIMILARITY()`. Should we
discuss it here or on GitHub ?

--
Ticket URL: <https://code.djangoproject.com/ticket/32492#comment:7>

Django

unread,
Sep 3, 2021, 5:49:54 AM9/3/21
to django-...@googlegroups.com
#32492: Add django API for Postgres pg_trgm word_similarity, fuzzy full-text
search.
-------------------------------------+-------------------------------------
Reporter: Taneli | Owner: Nikita
| Marchant
Type: New feature | Status: assigned
Component: contrib.postgres | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Nikita Marchant):

* has_patch: 0 => 1


--
Ticket URL: <https://code.djangoproject.com/ticket/32492#comment:8>

Django

unread,
Sep 6, 2021, 5:31:45 AM9/6/21
to django-...@googlegroups.com
#32492: Add django API for Postgres pg_trgm word_similarity, fuzzy full-text
search.
-------------------------------------+-------------------------------------
Reporter: Taneli | Owner: Nikita
| Marchant
Type: New feature | Status: assigned
Component: contrib.postgres | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 1
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* needs_better_patch: 0 => 1
* needs_docs: 0 => 1


Comment:

[https://github.com/django/django/pull/14833 PR]

--
Ticket URL: <https://code.djangoproject.com/ticket/32492#comment:9>

Django

unread,
Sep 15, 2021, 7:36:31 AM9/15/21
to django-...@googlegroups.com
#32492: Add django API for Postgres pg_trgm word_similarity, fuzzy full-text
search.
-------------------------------------+-------------------------------------
Reporter: Taneli | Owner: Nikita
| Marchant
Type: New feature | Status: assigned
Component: contrib.postgres | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Nikita Marchant):

* needs_better_patch: 1 => 0
* needs_docs: 1 => 0


--
Ticket URL: <https://code.djangoproject.com/ticket/32492#comment:10>

Django

unread,
Sep 17, 2021, 7:07:41 AM9/17/21
to django-...@googlegroups.com
#32492: Add django API for Postgres pg_trgm word_similarity, fuzzy full-text
search.
-------------------------------------+-------------------------------------
Reporter: Taneli | Owner: Nikita
| Marchant
Type: New feature | Status: assigned
Component: contrib.postgres | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* stage: Accepted => Ready for checkin


--
Ticket URL: <https://code.djangoproject.com/ticket/32492#comment:11>

Django

unread,
Sep 17, 2021, 10:24:03 AM9/17/21
to django-...@googlegroups.com
#32492: Add django API for Postgres pg_trgm word_similarity, fuzzy full-text
search.
-------------------------------------+-------------------------------------
Reporter: Taneli | Owner: Nikita
| Marchant
Type: New feature | Status: assigned
Component: contrib.postgres | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Paolo Melchiorre):

* cc: Paolo Melchiorre (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/32492#comment:12>

Django

unread,
Sep 17, 2021, 10:43:04 AM9/17/21
to django-...@googlegroups.com
#32492: Add django API for Postgres pg_trgm word_similarity, fuzzy full-text
search.
-------------------------------------+-------------------------------------
Reporter: Taneli | Owner: Nikita
| Marchant
Type: New feature | Status: closed
Component: contrib.postgres | Version: dev
Severity: Normal | Resolution: fixed

Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak <felisiak.mariusz@…>):

* status: assigned => closed
* resolution: => fixed


Comment:

In [changeset:"4e4082f9396e21de0bd88dbfc651da9ad01c7c0c" 4e4082f9]:
{{{
#!CommitTicketReference repository=""
revision="4e4082f9396e21de0bd88dbfc651da9ad01c7c0c"
Fixed #32492 -- Added TrigramWordSimilarity() and TrigramWordDistance() on
PostgreSQL.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32492#comment:13>

Reply all
Reply to author
Forward
0 new messages