LUCENE - Accents insensitive

480 views

Skip to first unread message

Lenny Kruger

unread,

Sep 16, 2015, 11:32:20 AM9/16/15

to OrientDB

Hi all,

I have a Person vertex class in schema with a firstname property, indexed with Lucene.

I'm looking for a Person called Ève.

I'd like to retrieve my record when I do :

SELECT FROM Person WHERE firstname LUCENE 'è*'

SELECT FROM Person WHERE firstname LUCENE 'é*'

SELECT FROM Person WHERE firstname LUCENE 'e*'

SELECT FROM Person WHERE firstname LUCENE 'E*'

What'd be the right analyzer or the settings to apply for this behaviour ?

Thanks !

Roberto Franchini

unread,

Sep 21, 2015, 4:02:13 AM9/21/15

to orient-database

On Wed, Sep 16, 2015 at 5:17 PM, Lenny Kruger <lenny...@gmail.com> wrote:
> Hi all,
>
>
> I have a Person vertex class in schema with a firstname property, indexed
> with Lucene.
>

[cut]

>
>
> What'd be the right analyzer or the settings to apply for this behaviour ?
>

Hi,
Orient at the moment supports all the analyzer listed here:

https://lucene.apache.org/core/5_3_0/analyzers-common/index.html

You're use case is covered by the ASCIIFoldingFilter

https://lucene.apache.org/core/5_3_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html

So you need to write your custom analyzer with the AsciiFoldingFilter
inside, put the jar in the orient classpath and then create the index
with you new analyzer.

On SO I found this (2nd answer):

http://stackoverflow.com/questions/3354948/how-do-i-use-asciifoldingfilter-in-my-lucene-app

Another solution, if you know which is the language of the text, is
to use a specific analyzer, such as FrenchAnalyzer:

https://lucene.apache.org/core/5_3_0/analyzers-common/org/apache/lucene/analysis/fr/FrenchAnalyzer.html

note that this is more a Lucene question than an Orient question: full
text search is a very big world.

Best regards,
RF

--
Roberto Franchini
"L'impossibile è inevitabile"
jabber:ro.fra...@gmail.com skype:ro.franchini

Reply all

Reply to author

Forward

0 new messages