Fuzzy Search?

232 views
Skip to first unread message

Stephen

unread,
Jul 26, 2012, 12:42:59 PM7/26/12
to django...@googlegroups.com
Is there a way in Watson to be able to perform a fuzzy search, where it returns close matches to words like in solr?

For example if I search "cot" I would want it to return "coat" and so forth.


Thanks,
Stephen

Dave Hall

unread,
Jul 26, 2012, 12:56:49 PM7/26/12
to django...@googlegroups.com
Good question.

Watson is at the mercy of it's search backend, so it's capabilities in this regard can vary.

In your case, I'd use to postgres backend (which is best), and take a look at the postgres docs for the capabilities of the full text search engine. It applies a slight fuzziness in terms of word stemming, but it may have greater capabilities than this.

If you do manage to configure postgres to use a more fuzzy search configuration, then you can tell watson to use this by setting the search_config parameter on the PostgresSearchBackend.

Good luck, and let me know how you get on!

--
You received this message because you are subscribed to the Google Groups "django-watson discussion group" group.
To view this discussion on the web, visit https://groups.google.com/d/msg/django-watson/-/IhV3GDYOWx8J.
To post to this group, send an email to django...@googlegroups.com.
To unsubscribe from this group, send email to django-watso...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-watson?hl=en-GB.

Stephen

unread,
Aug 20, 2012, 1:42:51 PM8/20/12
to django...@googlegroups.com
Follow up to this,

seems Postgres is capable of doing this using levenshtein distance. However, I'm having difficulty finding out where exactly to stick this addition in. Basically I want to add 

( levenshtein(SEARCH_WORD, WORD_THAT_IS_BEING_COMPARED_TO) < 3 )
to the where clause and change the query so that it uses this to filter the results.

Help with this would be much appreciated :)

Thanks,
Stephen

Dave Hall

unread,
Aug 21, 2012, 5:25:52 AM8/21/12
to django...@googlegroups.com
So that looks like it'd be used in a "where" clause. Presumably you could order by that value too for relevance ranking?

That appears to only be for comparing one word to another. How about paragraphs of text?

And can it use indexes?

To view this discussion on the web, visit https://groups.google.com/d/msg/django-watson/-/oBIn9Op3eYgJ.

Stephen

unread,
Aug 23, 2012, 6:26:07 PM8/23/12
to django...@googlegroups.com
Hmmm, I did some more digging into postgres, as I'm still pretty new to it. I tried out the levenshtein function and it does what I want it to, but I dont know how to use it in conjunction with the search_tsv tsvector. Is there a way to possibly do that?

Dave Hall

unread,
Aug 24, 2012, 4:30:54 AM8/24/12
to django...@googlegroups.com
I'm not sure that it does do what you want it to.

From the postgres documentation:

This function calculates the Levenshtein distance between two strings.

It doesn't sound like this would be suitable for determining if a word is present in a paragraph of text. It's hard for me to test this, however, because it's not installed on my postgres server.

The docs certainly make no mention of using it on tsvector columns.

Unless it can do fuzzy search within a paragraph of text, I'm not sure it's really suitable for a watson backend. This is particularly so if it can't operate on an index.

To view this discussion on the web, visit https://groups.google.com/d/msg/django-watson/-/062so0RcI64J.
Reply all
Reply to author
Forward
0 new messages