Changed behaviour of suggestions in v4?

29 views
Skip to first unread message

Hristo Kostov

unread,
Mar 21, 2018, 5:46:28 AM3/21/18
to RavenDB - 2nd generation document database
Are there any changes to the way suggestions are supposed to work in version 4.0?

In my code I use suggestions on misspelled names. These are full names thus containing spaces.

This is the index definition:

public class People : AbstractIndexCreationTask<Person>
{
   
public People()
   
{
       
Map = people => from p in people
                       
select new
                       
{
                            p
.Name
                       
};

       
Index(x => x.Name, FieldIndexing.Default);
       
Store(x => x.Name, FieldStorage.Yes);
       
Suggestion(x => x.Name);
   
}
}

There is no tokenization so I expect suggested terms to be the original names. That's the result when using RavenDB 3.5. But using RavenDB 4.0, result is like there is tokenization.

The v3.5 code:

    private static void GetSuggestions(IDocumentSession session, string term)
   
{
       
var suggestionResult = session.Query<Person, People>()
                                   
.Suggest(new SuggestionQuery
                                   
{
                                       
Field = "Name",
                                       
Term = term,
                                       
Accuracy = 0.5f,
                                       
MaxSuggestions = 10,
                                       
Distance = StringDistanceTypes.Levenshtein
                                   
});

       
var suggestions = suggestionResult.Suggestions;
       
Console.WriteLine($"{suggestions.Length} suggestion(s) found.");
       
Array.ForEach(suggestions, x => Console.WriteLine($"\t{x}"));
   
}

The v4.0 code:

    private static void GetSuggestions(IDocumentSession session, string term)
   
{
       
var suggestionResult = session.Query<Person, People>()
                                   
.SuggestUsing(builder => builder.ByField("Name", term)
                                                                   
.WithOptions(new SuggestionOptions
                                                                   
{
                                                                       
Accuracy = 0.5f,
                                                                       
PageSize = 10,
                                                                       
Distance = StringDistanceTypes.Levenshtein
                                                                   
}))
                                       
.Execute();
       
var suggestions = suggestionResult["Name"].Suggestions;
       
Console.WriteLine($"{suggestions.Count} suggestion(s) found.");
        suggestions
.ForEach(x => Console.WriteLine($"\t{x}"));
   
}

The data:

    session.Store(new Person { Name = "Erich Maria Remarque" });
    session
.Store(new Person { Name = "John Steinbeck" });
    session
.Store(new Person { Name = "Jerome David Salinger" });
    session
.Store(new Person { Name = "Fyodor Dostoevsky" });
    session
.Store(new Person { Name = "Ernest Hemingway" });
    session
.Store(new Person { Name = "Gabriel Garcia Marquez" });

Executing

    GetSuggestions(session, "John Steinback");
   
GetSuggestions(session, "Jonh Steinbeck");

against v3.5 gives us:

1 suggestion(s) found.
        john steinbeck
1 suggestion(s) found.
        john steinbeck

However, against v4.0 the result is:

1 suggestion(s) found.
        steinbeck
1 suggestion(s) found.
        steinbeck

Please note that the second suggestion is not "john" - so the result is not the same as if the Name field was tokenized in the index.

Decreasing the accuracy to 0.1 in the v4 code and executing

GetSuggestions(session, "Erich Maria Remarch");

outputs

5 suggestion(s) found.
        maria
        erich
        remarque
        marquez
        garcia


This is a breaking change for me. Am I doing something wrong in my v4 code? Or suggestions work in different way now? How can I achieve the old result when using RavenDB 4?

Tal Weiss

unread,
Mar 21, 2018, 8:51:25 AM3/21/18
to RavenDB - 2nd generation document database
I'm looking into this

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Hibernating Rhinos Ltd  cid:image001.png@01CF95E2.8ED1B7D0

Tal Weiss l Core Team Developer Mobile:+972-54-802-4849

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811l Skype: talweiss1982

RavenDB paving the way to "Data Made Simplehttp://ravendb.net/ 

Tal Weiss

unread,
Mar 21, 2018, 9:24:33 AM3/21/18
to RavenDB - 2nd generation document database
Message has been deleted

Hristo Kostov

unread,
Mar 21, 2018, 9:36:27 AM3/21/18
to RavenDB - 2nd generation document database
So it's a bug. OK, thank you for your quick reaction. I'll be waiting for the fix.

Tal Weiss

unread,
Mar 21, 2018, 10:04:29 AM3/21/18
to RavenDB - 2nd generation document database
It seems that in v3.5 we did run an analyzer on the term still verifying which one but it seems that the wrong analyzer is invoked

On Wed, Mar 21, 2018 at 3:36 PM, Hristo Kostov <hristoko...@gmail.com> wrote:
So it's a bug. OK, thank you for your quick reaction. I'll be waiting for the fix.

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages