Lucene database: parameters have no effect

64 views
Skip to first unread message

Nigel Vivian

unread,
Jun 21, 2016, 11:09:41 AM6/21/16
to duke
<database class="no.priv.garshol.duke.databases.LuceneDatabase">
    <param name="min-relevance" value="0.75"/>
    <param name="max-search-hits" value="200"/>
    <param name="path" value="lucene-index"/>
    <!-- must turn off fuzzy search, or it will take forever -->
    <!-- <param name="fuzzy-search" value="false"/> -->
</database>

On my dataset changing the min-relevance and max-search-hits to dramatically different values has no effect on the number of matches on my data. (I get 6077) When I use the in memory database I get 9886 for the same data, but it takes 30 minutes as opposed to 3 seconds.

What am I doing wrong?

Regards
Nigel

Lars Marius Garshol

unread,
Jun 26, 2016, 9:57:32 AM6/26/16
to duke

* Nigel Vivian

On my dataset changing the min-relevance and max-search-hits to dramatically different values has no effect on the number of matches on my data. (I get 6077) When I use the in memory database I get 9886 for the same data, but it takes 30 minutes as opposed to 3 seconds.

The reason changing the values has no effect is probably that Lucene just doesn't return these records at all. Have you tried turning on fuzzy search? 

If that doesn't work either you can look into using the blocking database. That will require you to write a couple of Java methods to actually produce the blocking keys, but the result should be much faster.

--Lars Marius

Nigel Vivian

unread,
Jun 30, 2016, 9:11:20 AM6/30/16
to duke
Am I correct in thinking that fuzzy search is on by default?  I came to the same conclusion about Lucene not returning these records - but why?  We don't need to run the search often I am glad to say, so I have some time before I have to look at a solution other than the in-memory DB.

Nigel

Nigel Vivian

unread,
Jun 30, 2016, 9:16:54 AM6/30/16
to duke
Is there documentation on writing keys in Java?


On Sunday, June 26, 2016 at 2:57:32 PM UTC+1, Lars Marius Garshol wrote:

Nigel Vivian

unread,
Jun 30, 2016, 10:14:06 AM6/30/16
to duke
I am trying to build Duke from a fresh clone and when I package the project and run from an expanded zip I am getting the following exception.  The jars appear to be in the directory referenced - I am a Noob to maven so I presume I am doing something dumb...

java -cp "./duke-dist-1.3-SNAPSHOT/lib/*" no.priv.garshol.duke.Duke --singlematch --showmatches --progress --linkfile=matches.csv --threads=4 gem.xml

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/lucene/codecs/lucene41/Lucene41PostingsFormat
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
at java.lang.Class.getConstructor0(Class.java:3075)
at java.lang.Class.newInstance(Class.java:412)
at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:62)
at org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:42)
at org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:37)
at org.apache.lucene.codecs.PostingsFormat.<clinit>(PostingsFormat.java:44)
at org.apache.lucene.codecs.lucene40.Lucene40Codec.<init>(Lucene40Codec.java:53)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:62)
at org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:42)
at org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:37)
at org.apache.lucene.codecs.Codec.<clinit>(Codec.java:41)
at org.apache.lucene.index.LiveIndexWriterConfig.<init>(LiveIndexWriterConfig.java:118)
at org.apache.lucene.index.IndexWriterConfig.<init>(IndexWriterConfig.java:145)
at no.priv.garshol.duke.databases.LuceneDatabase.openIndexes(LuceneDatabase.java:340)
at no.priv.garshol.duke.databases.LuceneDatabase.init(LuceneDatabase.java:314)
at no.priv.garshol.duke.databases.LuceneDatabase.index(LuceneDatabase.java:146)
at no.priv.garshol.duke.Processor.index(Processor.java:477)
at no.priv.garshol.duke.Processor.link(Processor.java:336)
at no.priv.garshol.duke.Duke.main_(Duke.java:175)
at no.priv.garshol.duke.Duke.main(Duke.java:35)
Caused by: java.lang.ClassNotFoundException: org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 27 more

On Sunday, June 26, 2016 at 2:57:32 PM UTC+1, Lars Marius Garshol wrote:

Lars Marius Garshol

unread,
Jun 30, 2016, 1:00:03 PM6/30/16
to duke

* Nigel Vivian
Am I correct in thinking that fuzzy search is on by default?  I came to the same conclusion about Lucene not returning these records - but why?  We don't need to run the search often I am glad to say, so I have some time before I have to look at a solution other than the in-memory DB.

I thought not, but the documentation says yes:

 https://github.com/larsga/Duke/wiki/DatabaseConfig


I came to the same conclusion about Lucene not returning these records - but why?


Without knowing more about the data it’s impossible to say, unfortunately.

 
--Lars Marius

Lars Marius Garshol

unread,
Jun 30, 2016, 1:00:28 PM6/30/16
to duke

* Nigel Vivian
Is there documentation on writing keys in Java?

No, unfortunately. There ought to be, but I haven’t had time. 


This may be helpful, though:

 https://github.com/larsga/Duke/blob/master/duke-core/src/main/java/no/priv/garshol/duke/databases/AbstractKeyFunction.java


--Lars Marius 

Lars Marius Garshol

unread,
Jun 30, 2016, 1:10:45 PM6/30/16
to duke

* Nigel Vivian
I am trying to build Duke from a fresh clone and when I package the project and run from an expanded zip I am getting the following exception.  The jars appear to be in the directory referenced - I am a Noob to maven so I presume I am doing something dumb...

java -cp "./duke-dist-1.3-SNAPSHOT/lib/*" no.priv.garshol.duke.Duke --singlematch --showmatches --progress --linkfile=matches.csv --threads=4 gem.xml

The trouble is that this doesn't work: ./duke-dist-1.3-SNAPSHOT/lib/*

If you're on unix, try "echo ./duke-dist-1.3-SNAPSHOT/lib/*" and you'll see why. Basically you have to do the expansion manually.

--Lars Marius

Nigel Vivian

unread,
Jul 1, 2016, 10:13:59 AM7/1/16
to duke
Ordinarily I would agree with you, but I have a version of Duke SNAPSHOT built a while ago and this command works fine - it's how I generated the output.  (If I run the echo command on mine it prints all the jar filenames is that what you were expecting?)

I was wondering whether Duke is now looking for the org/apache/lucene/codecs/lucene41/Lucene41PostingsFormat and the version of lucene-core is 4.0!

Regards
Nigel
Reply all
Reply to author
Forward
0 new messages