Orientdb Lucene : fuzzy matching / compound words

299 views
Skip to first unread message

Uncharted

unread,
Apr 3, 2016, 5:52:21 AM4/3/16
to OrientDB
Hello

I'm testing the fulltext search feature with lucene and trying to figure out how to use it.
I did some small tests :

Search with the exact word:
select from color where name lucene 'silber'                             : OK
select from color where name lucene 'test silber test'                : OK

Fuzzy search with the tilde "~"
select from color where name lucene 'siilbeer~0.7'                    : OK
select from color where name lucene 'test siilbeer~0.7 test'       : OK

Search with compound word : 
select from color where name        lucene '"mineralsilber"         : KO
select from color where name lucene "mineralsilber~0.2"          : KO


As you can see, it seems the basic search is not very good with compound words or when whitespace is stripped between two words

How can I do a search with compound words ? 
I found on google that the ngrams were the way to go in those cases: but is it possible to use it with OrientDb/Lucene ? and how ?





Roberto Franchini

unread,
Apr 4, 2016, 3:38:54 AM4/4/16
to orient-...@googlegroups.com
First of all I need to know which version of orient are you using.
Search behaviour depends upon the analyzer configured. The default
analyzer is the StandardaAnalyzer:

http://lucene.apache.org/core/5_5_0/analyzers-common/org/apache/lucene/analysis/standard/StandardAnalyzer.html

Compound words analysis is supported by
https://lucene.apache.org/core/5_1_0/analyzers-common/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilter.html

At the moment we don't support dictionaries, but it is possible to
configure the analyzer for the index:
http://orientdb.com/docs/last/Full-Text-Index.html#analyzer

So, it is possible to create your own analyzer, add the jar to the
orient lib dir and configure index to use it.


--
Best regards,

Roberto Franchini

OrientDB LTD - http://orientdb.com

Uncharted

unread,
Apr 4, 2016, 8:58:02 AM4/4/16
to OrientDB
I was using the lastest version I found on your site (2.1.13)
Thanks for the answer :  i didn't know where to start to add a custom analyzer and missed the information to put the jar in the lib folder


Roberto Franchini

unread,
Apr 5, 2016, 2:37:42 AM4/5/16
to orient-...@googlegroups.com
We have just released 2.1.15.
I don;t know how much you're skilled in java, but basically it'a a
matter of implementing 2/3 classes:

http://www.citrine.io/blog/2015/2/14/building-a-custom-analyzer-in-lucene

At the end, packe the jar cnd put in the lib dir of orient

The configure the analyzer in your index definition, as stated in the doc

http://orientdb.com/docs/last/Full-Text-Index.html#analyzer

orientdb> CREATE INDEX City.name ON City(name) FULLTEXT ENGINE LUCENE METADATA
{"analyzer": "my.own.package.MyAnalyzer"}

In 2.2 analyzers could be defined for single field, and even analyzer
with stowrods could be configured.

Uncharted

unread,
Apr 5, 2016, 5:12:54 PM4/5/16
to OrientDB
Thanks, your answer is very clear to me
Can't wait for the 2.2 features :) 
 
Reply all
Reply to author
Forward
0 new messages