Adding Analyzer on Field produces gibberish result

24 views
Skip to first unread message

Alexander Nuta

unread,
Mar 27, 2020, 5:05:34 AM3/27/20
to RavenDB - 2nd generation document database
I'm trying to add an analyzer on an indexed field like so:

Analyzers.Add(x => x.Name, "Raven.Server.Documents.Indexes.Persistence.Lucene.Analyzers.Collation.Cultures.EnCollationAnalyzer, Raven.Server");
or
Analyzers.Add(x => x.Name, "Raven.Server.Documents.Indexes.Persistence.Lucene.Analyzers.Collation.Cultures.ElCollationAnalyzer, Raven.Server");

But the result on Name field is "܁ΜDŽშ偳瀺㸝ᄎ䢇 恁悐焈㠈ᰔฬ΁΋Ǐ䃩䡱йဂĒā€䁀†တ䠈ЄȂऀ䁀\u0000\u0000"
No matter what analyzer I use even english letters are transformed to gibberish.

Can you please help me figure out what I'm missing?

Thanks in advance.

Egor Shamanaev

unread,
Mar 29, 2020, 6:02:49 AM3/29/20
to rav...@googlegroups.com
Hi,

It is hard to know without a reproduce, please send a failing test for this
https://ravendb.net/docs/article-page/4.2/csharp/start/test-driver#unittest  

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ravendb/5f5f68bc-2826-47c0-bcd0-5ac56365d245%40googlegroups.com.


--
Egor
Developer   /   Hibernating Rhinos LTD
Support:  sup...@ravendb.net
  

Oren Eini (Ayende Rahien)

unread,
Mar 30, 2020, 2:00:17 AM3/30/20
to ravendb
These analyzes are there so you can generate appropriate *sorting* based on culture rules, that is all.
The way it works, they transform the text to culture appropriate lexically sortable binary data.
That is fine an expected. If you want to do actual textual analysis of the data, that is different. 


Note that the key here is to allow different sorting rules. For example, in French, you have this rule: cote < côte < coté < côté 
But for German, they are the equivalent.


On Fri, Mar 27, 2020 at 12:05 PM Alexander Nuta <alexan...@gmail.com> wrote:
--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ravendb/5f5f68bc-2826-47c0-bcd0-5ac56365d245%40googlegroups.com.


--
Oren Eini
CEO   /   Hibernating Rhinos LTD
Skype:  ayenderahien
Support:  sup...@ravendb.net
  

Alexander Nuta

unread,
Mar 30, 2020, 2:12:51 AM3/30/20
to RavenDB - 2nd generation document database
Thank you very much for the clarification.

Have a nice day!

On Monday, March 30, 2020 at 9:00:17 AM UTC+3, Oren Eini wrote:
These analyzes are there so you can generate appropriate *sorting* based on culture rules, that is all.
The way it works, they transform the text to culture appropriate lexically sortable binary data.
That is fine an expected. If you want to do actual textual analysis of the data, that is different. 


Note that the key here is to allow different sorting rules. For example, in French, you have this rule: cote < côte < coté < côté 
But for German, they are the equivalent.


On Fri, Mar 27, 2020 at 12:05 PM Alexander Nuta <alexan...@gmail.com> wrote:
I'm trying to add an analyzer on an indexed field like so:

Analyzers.Add(x => x.Name, "Raven.Server.Documents.Indexes.Persistence.Lucene.Analyzers.Collation.Cultures.EnCollationAnalyzer, Raven.Server");
or
Analyzers.Add(x => x.Name, "Raven.Server.Documents.Indexes.Persistence.Lucene.Analyzers.Collation.Cultures.ElCollationAnalyzer, Raven.Server");

But the result on Name field is "܁ΜDŽშ偳瀺㸝ᄎ䢇 恁悐焈㠈ᰔฬ΁΋Ǐ䃩䡱йဂĒā€䁀†တ䠈ЄȂऀ䁀\u0000\u0000"
No matter what analyzer I use even english letters are transformed to gibberish.

Can you please help me figure out what I'm missing?

Thanks in advance.

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rav...@googlegroups.com.

Alexander Nuta

unread,
Mar 30, 2020, 2:23:13 AM3/30/20
to RavenDB - 2nd generation document database
Thank you for your reply!!


On Sunday, March 29, 2020 at 1:02:49 PM UTC+3, Egor Shamanaev wrote:
Hi,

It is hard to know without a reproduce, please send a failing test for this
https://ravendb.net/docs/article-page/4.2/csharp/start/test-driver#unittest  

On Fri, Mar 27, 2020 at 12:05 PM Alexander Nuta <alexan...@gmail.com> wrote:
I'm trying to add an analyzer on an indexed field like so:

Analyzers.Add(x => x.Name, "Raven.Server.Documents.Indexes.Persistence.Lucene.Analyzers.Collation.Cultures.EnCollationAnalyzer, Raven.Server");
or
Analyzers.Add(x => x.Name, "Raven.Server.Documents.Indexes.Persistence.Lucene.Analyzers.Collation.Cultures.ElCollationAnalyzer, Raven.Server");

But the result on Name field is "܁ΜDŽშ偳瀺㸝ᄎ䢇 恁悐焈㠈ᰔฬ΁΋Ǐ䃩䡱йဂĒā€䁀†တ䠈ЄȂऀ䁀\u0000\u0000"
No matter what analyzer I use even english letters are transformed to gibberish.

Can you please help me figure out what I'm missing?

Thanks in advance.

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rav...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages