(88%) Indexing [tss/8-1/tss.div.8.1.99.xml] ... (9 stored keys) ...
Done.
*** Error: class java.io.FileNotFoundException
java.io.FileNotFoundException: /home/capistrano/trunk/xtf/apache-
tomcat-6.0.10/webapps/xtf/index/_jnz.cfs (No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
at org.apache.lucene.store.FSIndexInput
$Descriptor.<init>(FSDirectory.java:497)
at
org.apache.lucene.store.FSIndexInput.<init>(FSDirectory.java:522)
at
org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:434)
at
org
.apache.lucene.index.CompoundFileReader.<init>(CompoundFileReader.java:
63)
at
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:154)
at
org.apache.lucene.index.SegmentReader.get(SegmentReader.java:140)
at
org.apache.lucene.index.SegmentReader.get(SegmentReader.java:121)
at
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:1473)
at
org
.apache.lucene.index.IndexWriter.maybeMergeSegments(IndexWriter.java:
1415)
at
org.apache.lucene.index.IndexWriter.flushRamSegments(IndexWriter.java:
1352)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:
588)
at
org.cdlib.xtf.textIndexer.XMLTextProcessor.close(XMLTextProcessor.java:
641)
at
org.cdlib.xtf.textIndexer.SrcTreeProcessor.close(SrcTreeProcessor.java:
192)
at
org.cdlib.xtf.textIndexer.TextIndexer.main(TextIndexer.java:330)
Unfortunately, I didn't know about this when our nightly cronjob
reindexed. This one always does a -clean. It threw this error:
TextIndexer v2.1
Indexing New/Updated Documents:
Index: "default"
*** Error: class java.lang.IllegalStateException
java.lang.IllegalStateException: doc counts differ for segment _jo1:
fieldsReader shows 1 but segmentInfo shows 100
at
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:164)
at
org.apache.lucene.index.SegmentReader.get(SegmentReader.java:140)
at
org.apache.lucene.index.SegmentReader.get(SegmentReader.java:121)
at org.apache.lucene.index.IndexReader
$1.doBody(IndexReader.java:166)
at org.apache.lucene.index.SegmentInfos
$FindSegmentsFile.run(SegmentInfos.java:579)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:
147)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:
131)
at
org
.cdlib
.xtf
.textIndexer.XMLTextProcessor.openIdxForReading(XMLTextProcessor.java:
3656)
at
org.cdlib.xtf.textIndexer.XMLTextProcessor.open(XMLTextProcessor.java:
517)
at
org.cdlib.xtf.textIndexer.SrcTreeProcessor.open(SrcTreeProcessor.java:
150)
at
org.cdlib.xtf.textIndexer.TextIndexer.main(TextIndexer.java:328)
Indexing Process Aborted.Finished clean index of all data
What I'm curious about is why a -clean run would fail. Doesn't a -
clean just remove the indexed files and start over?
Jamie
I can't think of an explanation for why -clean didn't work properly. I just
tried making a corrupt index (by renaming one of the segment files) and it
successfully blew it away when I indexed with -clean.
Perhaps there is a difference in file ownership that prevented it from being
able to remove the old index? I'm grasping at straws here...
--Martin
In the last couple of days we stumbled on a possible way for the textIndexer
to fail during a -clean index, and I thought I should share it.
As you pointed out, when you call it with -clean, the indexer tries to blow
away the old index directory. What I didn't realize is that the code
silently ignores errors during this process. So if a file or directory can't
be removed, the indexer just goes on with the index process. Of course this
can come back to bite us later in the process when filenames conflict.
I just checked in a change to throw an exception and abort indexing if the
old directory can't be deleted.
We'll release this and a few other bug fixes next week as XTF 2.1.1.
--Martin
Jamie