[Possible Bug] Can't override existing field analyzer

51 views
Skip to first unread message

Ricardo Brandão

unread,
Jul 6, 2016, 5:53:59 AM7/6/16
to RavenDB - 2nd generation document database
Hi,

I'm trying to use an AbstractAnalyzerGenerator to set the proper analyzer for a given index. However, while I can set the analyzer for indexes without any defined analyzer, I'm unable to override analyzers defined in the index.

A failing test is attached.

Thanks,
Ricardo Brandão
AnalyzerGeneratorTests.cs

Oren Eini (Ayende Rahien)

unread,
Jul 6, 2016, 5:58:38 AM7/6/16
to ravendb
AbstractAnalyzerGenerator is meant for dynamic analyzer selection, if you have a static configuration, that override it

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ricardo Brandão

unread,
Jul 6, 2016, 6:02:43 AM7/6/16
to RavenDB - 2nd generation document database
That's a shame because I was trying to mark (with a "mock" custom Analyzer) some fields where I would want to apply a custom analyzer depending on the language of the document. 

I assume that this is not possible then, right?

Oren Eini (Ayende Rahien)

unread,
Jul 6, 2016, 6:28:06 AM7/6/16
to ravendb
Have the default in the analyzer generator 

Ricardo Brandão

unread,
Jul 6, 2016, 6:44:05 AM7/6/16
to RavenDB - 2nd generation document database
But I still don't know which properties I should apply the language analyzer. In the analyzer generator I was trying to do something like the following:

var perFieldAnalyzerWrapper = previousAnalyzer as RavenPerFieldAnalyzerWrapper;
if (perFieldAnalyzerWrapper != null)
{
    var fieldsProp = previousAnalyzer.GetType().GetRuntimeFields().ToList()[1];
    var analyzersMap = (IDictionary<string, Analyzer>)fieldsProp.GetValue(previousAnalyzer);
    var multiLanguageProperties = analyzersMap.Where(x => x.Value is MultiLanguageAnalyzer).Select(x => x.Key).ToList();

    foreach (var multiLanguageProperty in multiLanguageProperties)
    {
        perFieldAnalyzerWrapper.AddAnalyzer(multiLanguageProperty, SOME_LANGUAGE_ANALYZER);
    }
}

P.S. I'm using relfection because analyzerMap is a private property on RavenPerFieldAnalyzerWrapper and I have no other way of getting the property names.

Ricardo Brandão

unread,
Jul 6, 2016, 6:51:51 AM7/6/16
to RavenDB - 2nd generation document database
Another option is to add to the index a property with the variable names where the language analyzer should be applied but I was trying to avoid this. 

Oren Eini (Ayende Rahien)

unread,
Jul 6, 2016, 8:52:44 AM7/6/16
to ravendb
You have the document instance, and you can get the fields from there.

Ricardo Brandão

unread,
Jul 6, 2016, 9:02:39 AM7/6/16
to RavenDB - 2nd generation document database
On the one hand I only have the document instance when indexing a document. On the other hand I still need to somehow mark the fields to analyze.

Oren Eini (Ayende Rahien)

unread,
Jul 6, 2016, 9:09:26 AM7/6/16
to ravendb
System.Collections.Generic.IList<IFieldable> Lucene.Net.Documents.Document.GetFields

Ricardo Brandão

unread,
Jul 6, 2016, 9:23:56 AM7/6/16
to RavenDB - 2nd generation document database
I know how to get the fields. What I am trying to enphasize is that I need to mark the fields to apply the custom analyzer. For instance, that can be done by adding a property to the index and read that property on the analyzer generator (which I was trying to avoid):

public Posts_Index()
{
    Map = posts => from post in posts
        select new
        {
            Id = post.Id,
            Title = post.Title,
            Body = post.Body,
            LanguageName = post.LanguageName,
            SearchContent = string.Join(" ", post.Title, post.Body),
            MultiLanguagePropertyNames = new [] { "SearchContent" }
        };   
}

public class MyAnalyzerGenerator : AbstractAnalyzerGenerator
{
    private readonly IDictionary<string, string[]> _multiLanguagePropertiesPerIndex = new Dictionary<string, string[]>();

    public override Analyzer GenerateAnalyzerForIndexing(string indexName, Lucene.Net.Documents.Document document, Analyzer previousAnalyzer)
    {
        if (!_multiLanguagePropertiesPerIndex.ContainsKey(indexName))
        {
            _multiLanguagePropertiesPerIndex.Add(indexName, document.GetValues("MultiLanguagePropertyNames"));
        }

        AnalyzeMultilanguageFields(indexName, previousAnalyzer, document.Get("LanguageName"));
            
        return previousAnalyzer;
    }

    public override Analyzer GenerateAnalyzerForQuerying(string indexName, string query, Analyzer previousAnalyzer)
    {
        var languageName = SOME_LANGUAGE;
        AnalyzeMultilanguageFields(indexName, previousAnalyzer, languageName);
            
        return previousAnalyzer;
    }

    private void AnalyzeMultilanguageFields(string indexName, Analyzer previousAnalyzer, string languageName)
    {
        var multiLanguageProperties = _multiLanguagePropertiesPerIndex[indexName];

        var perFieldAnalyzerWrapper = previousAnalyzer as RavenPerFieldAnalyzerWrapper;
        if (perFieldAnalyzerWrapper != null)
        {
            foreach (var multiLanguageProperty in multiLanguageProperties)
            {
                perFieldAnalyzerWrapper.AddAnalyzer(multiLanguageProperty, LANGUAGE_ANALYZER);
            }
        }
    }
}

P.S. I've ommited some code to simplify.

Oren Eini (Ayende Rahien)

unread,
Jul 6, 2016, 1:10:20 PM7/6/16
to ravendb
Have an analyzer per index, that knows the structure of the index.

You can have a config file with index name to field names, too

Ricardo Brandão

unread,
Jul 7, 2016, 5:46:01 AM7/7/16
to RavenDB - 2nd generation document database
Thanks Oren, I'll consider your suggestions.

Ricardo Brandão

unread,
Jul 21, 2016, 11:19:06 AM7/21/16
to RavenDB - 2nd generation document database
In order to read from a config file I need to access the database from the AbstractAnalyzerGenerator instance. What's your opinion on providing such access?

If we all agree I can create a PR for that.

Oren Eini (Ayende Rahien)

unread,
Jul 21, 2016, 4:13:52 PM7/21/16
to ravendb
You create an IStartUpTask instance, that read the values, then you send it that way

Ricardo Brandão

unread,
Jul 22, 2016, 5:16:55 AM7/22/16
to RavenDB - 2nd generation document database
Hi Oren,

That won't do the trick for me because we have multiple (client) databases in each server and the properties vary per database - e.g. the clients might be in different releases. Although I can use the IStartupTask to initialize a property with each database configuration, I still need to know with which database I'm dealing with inside the AbstractAnalyzerGenerator. I was thinking in something like this.

Oren Eini (Ayende Rahien)

unread,
Jul 22, 2016, 5:31:56 AM7/22/16
to ravendb

Send the pr

Ricardo Brandão

unread,
Jul 22, 2016, 5:47:56 AM7/22/16
to RavenDB - 2nd generation document database
Done (issue here).

Oren Eini (Ayende Rahien)

unread,
Jul 22, 2016, 1:40:02 PM7/22/16
to ravendb
Merged and will be in the next build

Ricardo Brandão

unread,
Jul 22, 2016, 1:49:53 PM7/22/16
to RavenDB - 2nd generation document database
Thanks
Reply all
Reply to author
Forward
0 new messages