Fuzzy search not working

28 views
Skip to first unread message

Yasin

unread,
Nov 24, 2022, 10:41:05 AM11/24/22
to dotCMS User Group
Hi,

The fuzzy search isn't working in dotCMS, we are using version 22.03.2.

We have a content type FAQ (Faq) with a "question" field.

The query below is returning multiple results.

+contentType:Faq +(Faq.question:*pl*)

However, the fuzzy queries do not return anything.

Fuzzy query 1:
+contentType:Faq +(Faq.question:pl~)

Fuzzy query 2 (escaped):
+contentType:Faq +(Faq.question:pl\~)

Fuzzy query 3 (most powerful one):
{
    "query" : {
        "fuzzy" : { "faq.question" : "pl" }
    },
    "size":  10,
    "from": 0
}

Can someone assist, please?

Thank you

Yasin

unread,
Nov 28, 2022, 7:57:54 AM11/28/22
to dotCMS User Group
Anyone?
Op donderdag 24 november 2022 om 16:41:05 UTC+1 schreef Yasin:

Will Ezell

unread,
Nov 28, 2022, 12:21:44 PM11/28/22
to dot...@googlegroups.com
I am not sure the fuzzy searching works against tokenized fields.  You should try searching against a raw/not analyzed field.  dotCMS automatically creates a raw version field for every text field that you index and appends _dotraw to it.  Knowing this, I would try something like:

+contentType:Faq +(faq.question_dotraw:*pl*)


--
http://dotcms.com - Open Source Java Content Management
---
You received this message because you are subscribed to the Google Groups "dotCMS User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dotcms+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dotcms/3a0f7632-91cb-48c0-b31d-dfe6473cd4d0n%40googlegroups.com.


--



382 NE 191st St #92150
Miami, Florida 33179-3899
Main: 
305-900-2001 | Direct: 978.294.9429

Yasin

unread,
Nov 30, 2022, 4:26:30 AM11/30/22
to dotCMS User Group
Thanks for the response, but based on the documentation Fuzzy search should be working by appending '~' to the search term.

Please see the link below, could you please assist?

https://www.dotcms.com/docs/latest/content-search-syntax#:~:text=firstName%3AR*%20%26%26%20Employee.lastName%3AJ*)-,Fuzzy%20Search,-dotCMS%20search%20queries

Op maandag 28 november 2022 om 18:21:44 UTC+1 schreef Will Ezell:

Jameson Mauro

unread,
Nov 30, 2022, 1:02:48 PM11/30/22
to dotCMS User Group
I was testing this out, and the fuzzy search seems to work with the default operator. 

E.g., on Demo, I tried the three queries below: 
  • +contentType:Blog +Blog.title:Everythi*
  • +contentType:Blog +Blog.title:Everythi~
  • +contentType:Blog +Blog.title:Everythi
The first two, the wildcard and fuzzy queries, both successfully returned a blog post with the word "Everything" in its title; the third one returns nothing.

It should be noted that fuzzy searching is often less gregarious than wildcard searching; it's based on the minimum number of single-character additions, deletions, or edits needed to find a match. So while +Faq.question:pl* may match a word like pleistocene, the fuzzy version that same search, +Faq.question:pl~, will not. However, the fuzzy version should easily find ply — or even pi, which the wildcard would miss. 

So, the result depends on the indexed content being searched. What are some of the words the wildcard is detecting that the fuzzy search is not?

(Note: The optional numeric parameter mentioned in the documentation, such as ~0.5, is currently not behaving as expected; I've reported this to the devs.)

Will Ezell

unread,
Nov 30, 2022, 1:19:13 PM11/30/22
to dot...@googlegroups.com
To follow up on this - because we use elasticsearch under the covers, the caveats found here will be (should be) true as well:

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#avoid-widlcards-fuzzy-searches

Jameson Mauro

unread,
Nov 30, 2022, 2:14:07 PM11/30/22
to dotCMS User Group
After reviewing that ES doc linked above, I'm amending my remark about the numeric parameter; our docs had erroneously listed them as based on a fraction from 0 to 1 (possibly based on an older version of ES?) when in fact it expects an integer representing edit distance. I've updated our own doc to reflect this.
Reply all
Reply to author
Forward
0 new messages