lucene index corruption

157 views
Skip to first unread message

VineetP

unread,
Jun 29, 2016, 6:43:39 PM6/29/16
to Neo4j
We are trying out Neo4j Community Edition 2.3.1.
We have one thread which periodically loads/updates data into Neo4j using LOAD CSV method, and several read requests to the db.
As far as I understand, we use schema indexes
However, we have been receiving two type of error messages quite frequently which point to Lucene indexes:


1) WARNING: The index is in a failed state: 'File not found: /home/neo4jsupp/neo4j-community-2.3.1/data/INM_06252016/schema/index/lucene/7/_27.cfs (No such file or directory)'.
 
2)
MATCH (n:VSID) WHERE n.code="ABC" RETURN n;

Node with id 5338
Neo.ClientError.Statement.EntityNotFound

There are no error messages in console.log or messages.log.
We have no choice but to shutdown the db, and perform the ritual of removing contents under the lucene folder, and restart the db.

Q1) Can someone shed light on this behavior of Neo4j? Why and when would the lucene index get corrupted?
Q2) Why is there reference to lucene index when we are using schema indexes?
Q3) How can restrict Neo4j from creating lucene indexes?

Thanks in advance.

Michael Hunger

unread,
Jun 30, 2016, 6:11:16 AM6/30/16
to ne...@googlegroups.com
1. can you raise a GH issue, if possible with your db attached and the messages.log / neo4j.log log files

2. it should not get corrupted, one cause could be a broken file system or out of disk space
3. the schema indexes are implemented using lucene
4. not


--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

VineetP

unread,
Jul 18, 2016, 5:16:12 PM7/18/16
to Neo4j
Hi,
More Lucene index errors this week...
We started using below utility to check the index right after the LOAD CSV operation completes.
java -cp /home/neo4j-community-2.3.1/lib/lucene-core-3.6.2.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex 

We receive several error messages like the below...
--------------------------------------------------
An error has occurred on index at folder /home/neo4jsupp/neo4j-community-2.3.1/data/INM_06282016/schema/index/lucene/5, please check http://hln2080p:7474/browser.
Opening index @ /home/neo4jsupp/neo4j-community-2.3.1/data/INM_06282016/schema/index/lucene/5
Segments file=segments_9 numSegments=2 version=3.6.2 format=FORMAT_3_1 [Lucene 3.1+] userData={status=online}
  1 of 2: name=_0 docCount=28809
    compound=true
    hasProx=false
    numFiles=1
    size (MB)=1.061
    diagnostics = {os=Linux, java.vendor=Oracle Corporation, java.version=1.8.0_71, lucene.version=3.6.2 1423725 - rmuir - 2012-12-18 19:45:40, os.arch=amd64, source=flush, os.version=3.10.0-327.13.1.el7.x86_64}
    no deletions
    test: open reader.........OK
    test: fields..............OK [2 fields]
    test: field norms.........OK [0 fields]
    test: terms, freq, prox...OK [57618 terms; 57618 terms/docs pairs; 57618 tokens]
    test: stored fields.......OK [28809 total field count; avg 1 fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc]

  2 of 2: name=_1h docCount=27
    compound=true
    hasProx=false
    numFiles=1
FAILED
    WARNING: fixIndex() would remove reference to this segment; full exception:
java.io.FileNotFoundException: _1h.cfs
        at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:284)
        at org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:303)
        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:494)
        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1064)

WARNING: 1 broken segments (containing 27 documents) detected
WARNING: would write new segments file, and 27 documents would be lost, if -fix were specified
--------------------------------------------------

Q1) Is this a Neo4j implementation of Lucene index issue or an issue with Lucene3.6.2 version itself?

Q2) We notice Neo4j 3.0.3 Community uses Lucene 5.5.0. Will moving to  Neo4j 3.0 avoid these kind of errors?


Sample syntax for LOAD CSV is below...

./bin/neo4j-shell -host <HOSTNAME> -port <PORTNUM> -c "USING PERIODIC COMMIT LOAD CSV FROM 'file:////home//tmp//<FILENAME>.csv' AS line MATCH (v1:VSID {code:line[0]}) match (v2:VSID {code:line[4]}) merge (v1) -[r:CONNECT]->(v2) ON CREATE SET r.transit_time = toFloat(line[3]) ON MATCH SET r.transit_time = toFloat(line[3]);"


Thanks in advance.
Reply all
Reply to author
Forward
0 new messages