Mavenised Luke for latest version of Lucene

1,175 views
Skip to first unread message

Neil Ireson

unread,
Dec 6, 2012, 5:16:04 PM12/6/12
to luke-d...@googlegroups.com
Hi all,

Just in case it is useful for others I have attached my mavenised version of the Luke code which works with the latest version of Lucene (both 4.0.0 and 4.1-SNAPSHOT).

Simply download and extract the file (tar -xf luke.tgz), which will create a luke directory. In the pom.xml file change the <version> to the desired value and type "mvn", presuming you have maven (>2.2.1) installed, from http://maven.apache.org/.

The luke jars are created in the target directory, luke-VERSION-jar-with-dependencies.jar is equivalent to lukeall.jar.

Share and enjoy

Neil
luke.tgz

Neil Ireson

unread,
Dec 7, 2012, 4:57:06 AM12/7/12
to luke-d...@googlegroups.com
PS

The code produced errors when I just ran it with my default java (1.6.0_37) settings:

Exception in thread "Thread-4" java.lang.OutOfMemoryError: PermGen space
Exception in thread "Thread-5" java.lang.NoClassDefFoundError: Could not initialize class org.apache.lucene.util.RamUsageEstimator
    at org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.getFrame(BlockTreeTermsReader.java:1411)
    at org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.pushFrame(BlockTreeTermsReader.java:1445)
    at org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.next(BlockTreeTermsReader.java:2079)
    at org.getopt.luke.HighFreqTerms.fillQueue(HighFreqTerms.java:204)
    at org.getopt.luke.HighFreqTerms.getHighFreqTerms(HighFreqTerms.java:132)
    at org.getopt.luke.Luke$5.execute(Luke.java:1751)
    at org.getopt.luke.Luke.actionTopTerms(Luke.java:1808)
    at org.getopt.luke.Luke$4.run(Luke.java:1231)

and a more worrying Segmentation fault:

Invalid memory access of location 0x18 rip=0x1010aef82

I increase the memory available for Luke and problems have not yet recurred but I will try to investigate the cause if I find time. At the moment my settings are:

java -Xmx1024m -Xms512m -XX:MaxPermSize=512m -jar luke-4.1-SNAPSHOT-jar-with-dependencies.jar

N

dmitry

unread,
Dec 9, 2012, 4:22:20 PM12/9/12
to luke-d...@googlegroups.com
Hello!

If it is of an interest to anyone, here is luke src + binary package compiled against Lucene / SOLR trunk (version 5.0-SNAPSHOT). Replying to this thread, as this effort was based on the mavenised luke project attached by Neil.

changelog:
* ehcache.version 2.6.0 --> 2.6.2
* project version 4.1-SNAPSHOT --> 5.0-SNAPSHOT
* code refactoring to adapt to Lucene API trunk changes
* fixed a NPE for an empty index

Java environment: 1.6.0_20-b02

Tested against Lucene / SOLR trunk (5.0-SNAPSHOT) index.

Regards,

Dmitry Kan
luke-5.0-SNAPSHOT.tgz
Message has been deleted

Neil Ireson

unread,
Dec 10, 2012, 6:01:41 AM12/10/12
to luke-d...@googlegroups.com
Hi Dmitry,

When I try to compile your code it produces a number of errors (appended below).

The code does not seems to be compatible with version 5.0-SNAPSHOT, which is odd as you seem to have a compiled version in the target directory.

N



[INFO] Compilation failure

/Users/nsi/Downloads/luke/src/main/java/org/getopt/luke/plugins/FsDirectory.java:[323,11] cannot find symbol
symbol  : method seek(long)
location: class org.apache.lucene.store.BufferedIndexOutput

/Users/nsi/Downloads/luke/src/main/java/org/getopt/luke/DocReconstructor.java:[65,6] cannot find symbol
symbol  : class FieldsEnum
location: class org.getopt.luke.DocReconstructor

/Users/nsi/Downloads/luke/src/main/java/org/getopt/luke/DocReconstructor.java:[95,36] incompatible types
found   : org.apache.lucene.index.StoredDocument
required: org.apache.lucene.document.Document

/Users/nsi/Downloads/luke/src/main/java/org/getopt/luke/DocReconstructor.java:[151,40] cannot find symbol
symbol  : method docsAndPositions(org.apache.lucene.util.Bits,org.apache.lucene.index.DocsAndPositionsEnum,boolean)
location: class org.apache.lucene.index.TermsEnum

/Users/nsi/Downloads/luke/src/main/java/org/getopt/luke/Luke.java:[1177,69] getSequentialSubReaders() has protected access in org.apache.lucene.index.BaseCompositeReader

/Users/nsi/Downloads/luke/src/main/java/org/getopt/luke/Luke.java:[1180,28] cannot find symbol
symbol  : method getTopReaderContext()
location: class org.apache.lucene.index.AtomicReader

/Users/nsi/Downloads/luke/src/main/java/org/getopt/luke/Luke.java:[1181,65] cannot find symbol
symbol  : method getTopReaderContext()
location: class org.apache.lucene.index.AtomicReader

/Users/nsi/Downloads/luke/src/main/java/org/getopt/luke/Luke.java:[2491,31] incompatible types
found   : org.apache.lucene.index.StoredDocument
required: org.apache.lucene.document.Document

/Users/nsi/Downloads/luke/src/main/java/org/getopt/luke/Luke.java:[3632,26] cannot find symbol
symbol  : method getTopReaderContext()
location: class org.apache.lucene.index.AtomicReader

/Users/nsi/Downloads/luke/src/main/java/org/getopt/luke/Luke.java:[3735,38] cannot find symbol
symbol  : method getTopReaderContext()
location: class org.apache.lucene.index.AtomicReader

/Users/nsi/Downloads/luke/src/main/java/org/getopt/luke/Luke.java:[4762,30] incompatible types
found   : org.apache.lucene.index.StoredDocument
required: org.apache.lucene.document.Document

/Users/nsi/Downloads/luke/src/main/java/org/getopt/luke/Luke.java:[4779,23] cannot find symbol
symbol  : method stringValue()
location: interface org.apache.lucene.index.IndexableField

/Users/nsi/Downloads/luke/src/main/java/org/getopt/luke/Luke.java:[4871,27] incompatible types
found   : org.apache.lucene.index.StoredDocument
required: org.apache.lucene.document.Document

/Users/nsi/Downloads/luke/src/main/java/org/getopt/luke/Luke.java:[4898,36] incompatible types
found   : org.apache.lucene.index.StoredDocument
required: org.apache.lucene.document.Document

/Users/nsi/Downloads/luke/src/main/java/org/getopt/luke/IndexInfo.java:[55,4] cannot find symbol
symbol  : class FieldsEnum
location: class org.getopt.luke.IndexInfo

/Users/nsi/Downloads/luke/src/main/java/org/getopt/luke/XMLExporter.java:[130,37] incompatible types
found   : org.apache.lucene.index.StoredDocument
required: org.apache.lucene.document.Document

/Users/nsi/Downloads/luke/src/main/java/org/getopt/luke/XMLExporter.java:[235,38] cannot find symbol
symbol  : method docsAndPositions(org.apache.lucene.util.Bits,org.apache.lucene.index.DocsAndPositionsEnum,boolean)
location: class org.apache.lucene.index.TermsEnum

/Users/nsi/Downloads/luke/src/main/java/org/getopt/luke/TermVectorMapper.java:[22,38] cannot find symbol
symbol  : method docsAndPositions(<nulltype>,org.apache.lucene.index.DocsAndPositionsEnum,boolean)
location: class org.apache.lucene.index.TermsEnum

/Users/nsi/Downloads/luke/src/main/java/org/getopt/luke/plugins/VocabAnalysisPlugin.java:[109,28] cannot find symbol
symbol  : method docs(<nulltype>,<nulltype>,boolean)
location: class org.apache.lucene.index.TermsEnum

/Users/nsi/Downloads/luke/src/main/java/org/getopt/luke/HighFreqTerms.java:[136,6] cannot find symbol
symbol  : class FieldsEnum
location: class org.getopt.luke.HighFreqTerms

Dmitry Kan

unread,
Dec 10, 2012, 6:18:36 AM12/10/12
to luke-d...@googlegroups.com
Hi Neil,

Thanks for trying to compile the code!

One possibility can be that the trunk version I have compiled against, was older than the current trunk (against which you probably compiled the code). Will need to look into this further..

Dmitry

понедельник, 10 декабря 2012 г., 13:01:41 UTC+2 пользователь Neil Ireson написал:

Dmitry Kan

unread,
Dec 10, 2012, 4:58:55 PM12/10/12
to luke-d...@googlegroups.com
Hi Neil,

Attached is the updated package that should fix these compile errors. The source code has been compiled against today's trunk checkout. Does it fix the errors on your end?

The quick test in the UI shows that at least the index stats & document(s) browsing / search work ok.

Regards,

Dmitry

понедельник, 10 декабря 2012 г., 13:18:36 UTC+2 пользователь Dmitry Kan написал:
luke-5.0-SNAPSHOT.tgz
Message has been deleted
Message has been deleted

Neil Ireson

unread,
Dec 11, 2012, 4:57:53 AM12/11/12
to luke-d...@googlegroups.com
Hi Dmitry,

I compiled against the latest trunk version and still getting an error:

/Users/nsi/Downloads/luke/src/main/java/org/getopt/luke/NoScoringScorer.java:[7,7] org.getopt.luke.NoScoringScorer is not abstract and does not override abstract method freq() in org.apache.lucene.search.Scorer

/Users/nsi/Downloads/luke/src/main/java/org/getopt/luke/NoScoringScorer.java:[40,15] freq() in org.getopt.luke.NoScoringScorer cannot override freq() in org.apache.lucene.search.Scorer; attempting to use incompatible return type
found   : int
required: float

/Users/nsi/Downloads/luke/src/main/java/org/getopt/luke/NoScoringScorer.java:[39,4] method does not override or implement a method from a supertype

However given that you are chasing a moving target unless Luke is taken on-board by Lucene/Solr compiling against the trunk will always entail the possibility of compilation errors.

N

PS A couple of things about your upload. Personally I wouldn't include the target directory in the upload and also when I deflate the archive the file permissions are no access for anyone. Obviously not big issues but thought I'd mention it.

PPS If the project owner is listening in it might be nice to work out how to get this code into the project.

Dmitry Kan

unread,
Dec 11, 2012, 7:23:45 AM12/11/12
to luke-d...@googlegroups.com
Hi Neil,

Thanks for compiling it once again. It seems trunk is changing really fast, as freq() method was required to be returning int just yesterday my time, and now it is back to float type.
It would certainly make more sense to adapt the luke's code when a pre-release tag is available for the Lucene's trunk.

I agree, that including the target binaries isn't that good of an idea, just thought, that that will make someone's life easier in case they only wanted to test the tool right away. I'll remove it.
For the permissions, the issue can be that the code was compiled and packaged under Windows and inflated under cygwin. I could still have a look at the permissions for the next package.

There was another issue, which I haven't yet mentioned: the method seek seems to be gone from the BufferedIndexOutput class see line 322 of FsDirectory class. I wasn't so sure, if this affects on anything.

Regards,

Dmitry

вторник, 11 декабря 2012 г., 11:57:53 UTC+2 пользователь Neil Ireson написал:

b...@sonar.me

unread,
Feb 2, 2013, 2:05:19 PM2/2/13
to luke-d...@googlegroups.com
Thanks Dmitry for all the great work!
I've put the changes into a github repo and ported some changes back so that Luke compiles against the Lucene 4.1 release that just came out.

-Ben

Neil Ireson

unread,
Mar 15, 2013, 5:37:37 AM3/15/13
to Luke - Lucene Index Toolbox
Hi Ben/Dmiitry/All,

The 4.2.0 release somewhat destroys the Luke code due to the DocValue
changes.

What would be really nice would be to get some community action on the
code. It seems that the code in this google repository is now not
maintained, as there have been no changes in the last 8 months, and so
if the Luke repository is going to move again to the one in github
then it needs to be one that is maintained and developed. I'm fine if
you wish to take the lead on this Ben but there is little point in
creating a code repository unless it is active.

That said if I get time I will try to port the code to 4.2.0, although
it looks like it will not be a straightforward process.

Neil

Dmitry Kan

unread,
Mar 18, 2013, 4:54:36 PM3/18/13
to luke-d...@googlegroups.com
You are welcome, Ben!

Great to see the project on github.

суббота, 2 февраля 2013 г., 21:05:19 UTC+2 пользователь b...@sonar.me написал:

Ľuboš Koščo

unread,
Mar 19, 2013, 5:27:53 AM3/19/13
to luke-d...@googlegroups.com
Have a look here:

https://issues.apache.org/jira/browse/LUCENE-2562

it seems there is/was an effort to get luke part of lucene suite ...
maybe that's where the changes should go

--
L

Ľuboš Koščo

unread,
Mar 25, 2013, 6:34:41 AM3/25/13
to luke-d...@googlegroups.com
fwiw
https://github.com/tarzanek/luke/commit/4d635382576b13d9d1864d53afa65419cd7480b5

quick and dirty fix for 4.2 (contains 4.1 changes as well, so pick what you need)

(also I am not saying it's not buggy, I might have missed some stuff :( )

--
L

Dmitry Kan

unread,
Apr 9, 2013, 1:42:51 AM4/9/13
to luke-d...@googlegroups.com
Great work, Ľuboš!

Thanks for LUCENE jira too.

Dmitry 

понедельник, 25 марта 2013 г., 12:34:41 UTC+2 пользователь Ľuboš Koščo написал:

Mathias Lux

unread,
Apr 11, 2013, 5:52:47 AM4/11/13
to luke-d...@googlegroups.com
Thanks a lot! 

Cheers,
  Mathias

-- 
Reply all
Reply to author
Forward
0 new messages