Lucene database: parameters have no effect

64 צפיות
מעבר להודעה הראשונה שלא נקראה

Nigel Vivian

לא נקראה,
21 ביוני 2016, 11:09:4121.6.2016
עד duke
<database class="no.priv.garshol.duke.databases.LuceneDatabase">
    <param name="min-relevance" value="0.75"/>
    <param name="max-search-hits" value="200"/>
    <param name="path" value="lucene-index"/>
    <!-- must turn off fuzzy search, or it will take forever -->
    <!-- <param name="fuzzy-search" value="false"/> -->
</database>

On my dataset changing the min-relevance and max-search-hits to dramatically different values has no effect on the number of matches on my data. (I get 6077) When I use the in memory database I get 9886 for the same data, but it takes 30 minutes as opposed to 3 seconds.

What am I doing wrong?

Regards
Nigel

Lars Marius Garshol

לא נקראה,
26 ביוני 2016, 9:57:3226.6.2016
עד duke

* Nigel Vivian

On my dataset changing the min-relevance and max-search-hits to dramatically different values has no effect on the number of matches on my data. (I get 6077) When I use the in memory database I get 9886 for the same data, but it takes 30 minutes as opposed to 3 seconds.

The reason changing the values has no effect is probably that Lucene just doesn't return these records at all. Have you tried turning on fuzzy search? 

If that doesn't work either you can look into using the blocking database. That will require you to write a couple of Java methods to actually produce the blocking keys, but the result should be much faster.

--Lars Marius

Nigel Vivian

לא נקראה,
30 ביוני 2016, 9:11:2030.6.2016
עד duke
Am I correct in thinking that fuzzy search is on by default?  I came to the same conclusion about Lucene not returning these records - but why?  We don't need to run the search often I am glad to say, so I have some time before I have to look at a solution other than the in-memory DB.

Nigel

Nigel Vivian

לא נקראה,
30 ביוני 2016, 9:16:5430.6.2016
עד duke
Is there documentation on writing keys in Java?


On Sunday, June 26, 2016 at 2:57:32 PM UTC+1, Lars Marius Garshol wrote:

Nigel Vivian

לא נקראה,
30 ביוני 2016, 10:14:0630.6.2016
עד duke
I am trying to build Duke from a fresh clone and when I package the project and run from an expanded zip I am getting the following exception.  The jars appear to be in the directory referenced - I am a Noob to maven so I presume I am doing something dumb...

java -cp "./duke-dist-1.3-SNAPSHOT/lib/*" no.priv.garshol.duke.Duke --singlematch --showmatches --progress --linkfile=matches.csv --threads=4 gem.xml

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/lucene/codecs/lucene41/Lucene41PostingsFormat
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
at java.lang.Class.getConstructor0(Class.java:3075)
at java.lang.Class.newInstance(Class.java:412)
at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:62)
at org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:42)
at org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:37)
at org.apache.lucene.codecs.PostingsFormat.<clinit>(PostingsFormat.java:44)
at org.apache.lucene.codecs.lucene40.Lucene40Codec.<init>(Lucene40Codec.java:53)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:62)
at org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:42)
at org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:37)
at org.apache.lucene.codecs.Codec.<clinit>(Codec.java:41)
at org.apache.lucene.index.LiveIndexWriterConfig.<init>(LiveIndexWriterConfig.java:118)
at org.apache.lucene.index.IndexWriterConfig.<init>(IndexWriterConfig.java:145)
at no.priv.garshol.duke.databases.LuceneDatabase.openIndexes(LuceneDatabase.java:340)
at no.priv.garshol.duke.databases.LuceneDatabase.init(LuceneDatabase.java:314)
at no.priv.garshol.duke.databases.LuceneDatabase.index(LuceneDatabase.java:146)
at no.priv.garshol.duke.Processor.index(Processor.java:477)
at no.priv.garshol.duke.Processor.link(Processor.java:336)
at no.priv.garshol.duke.Duke.main_(Duke.java:175)
at no.priv.garshol.duke.Duke.main(Duke.java:35)
Caused by: java.lang.ClassNotFoundException: org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 27 more

On Sunday, June 26, 2016 at 2:57:32 PM UTC+1, Lars Marius Garshol wrote:

Lars Marius Garshol

לא נקראה,
30 ביוני 2016, 13:00:0330.6.2016
עד duke

* Nigel Vivian
Am I correct in thinking that fuzzy search is on by default?  I came to the same conclusion about Lucene not returning these records - but why?  We don't need to run the search often I am glad to say, so I have some time before I have to look at a solution other than the in-memory DB.

I thought not, but the documentation says yes:

 https://github.com/larsga/Duke/wiki/DatabaseConfig


I came to the same conclusion about Lucene not returning these records - but why?


Without knowing more about the data it’s impossible to say, unfortunately.

 
--Lars Marius

Lars Marius Garshol

לא נקראה,
30 ביוני 2016, 13:00:2830.6.2016
עד duke

* Nigel Vivian
Is there documentation on writing keys in Java?

No, unfortunately. There ought to be, but I haven’t had time. 


This may be helpful, though:

 https://github.com/larsga/Duke/blob/master/duke-core/src/main/java/no/priv/garshol/duke/databases/AbstractKeyFunction.java


--Lars Marius 

Lars Marius Garshol

לא נקראה,
30 ביוני 2016, 13:10:4530.6.2016
עד duke

* Nigel Vivian
I am trying to build Duke from a fresh clone and when I package the project and run from an expanded zip I am getting the following exception.  The jars appear to be in the directory referenced - I am a Noob to maven so I presume I am doing something dumb...

java -cp "./duke-dist-1.3-SNAPSHOT/lib/*" no.priv.garshol.duke.Duke --singlematch --showmatches --progress --linkfile=matches.csv --threads=4 gem.xml

The trouble is that this doesn't work: ./duke-dist-1.3-SNAPSHOT/lib/*

If you're on unix, try "echo ./duke-dist-1.3-SNAPSHOT/lib/*" and you'll see why. Basically you have to do the expansion manually.

--Lars Marius

Nigel Vivian

לא נקראה,
1 ביולי 2016, 10:13:591.7.2016
עד duke
Ordinarily I would agree with you, but I have a version of Duke SNAPSHOT built a while ago and this command works fine - it's how I generated the output.  (If I run the echo command on mine it prints all the jar filenames is that what you were expecting?)

I was wondering whether Duke is now looking for the org/apache/lucene/codecs/lucene41/Lucene41PostingsFormat and the version of lucene-core is 4.0!

Regards
Nigel
השב לכולם
השב למחבר
העבר לנמענים
0 הודעות חדשות