Issues with failed index and restart (M06/2.0.0) cannot await population on recovering index

151 views
Skip to first unread message

symo...@gmail.com

unread,
Dec 15, 2013, 6:44:38 AM12/15/13
to ne...@googlegroups.com
I had an instance that shutdown uncleanly. On trying to bring it back up I got:

Dec 15, 2013 10:11:03 AM org.neo4j.server.logging.Logger log
SEVERE:
org.neo4j.server.ServerStartupException: Starting Neo4j Server failed: Error starting org.neo4j.kernel.EmbeddedGraphDatabase, /opt/neo4j-enterprise-2.0.0-M06/data/graph.db
        at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:193)
        at org.neo4j.server.Bootstrapper.start(Bootstrapper.java:86)
        at org.neo4j.server.Bootstrapper.main(Bootstrapper.java:49)
Caused by: java.lang.RuntimeException: Error starting org.neo4j.kernel.EmbeddedGraphDatabase, /opt/neo4j-enterprise-2.0.0-M06/data/graph.db
        at org.neo4j.kernel.InternalAbstractGraphDatabase.run(InternalAbstractGraphDatabase.java:333)
        at org.neo4j.kernel.EmbeddedGraphDatabase.<init>(EmbeddedGraphDatabase.java:100)
        at org.neo4j.graphdb.factory.GraphDatabaseFactory$1.newDatabase(GraphDatabaseFactory.java:92)
        at org.neo4j.graphdb.factory.GraphDatabaseBuilder.newGraphDatabase(GraphDatabaseBuilder.java:197)
        at org.neo4j.kernel.impl.recovery.StoreRecoverer.recover(StoreRecoverer.java:115)
        at org.neo4j.server.preflight.PerformRecoveryIfNecessary.run(PerformRecoveryIfNecessary.java:59)
        at org.neo4j.server.preflight.PreFlightTasks.run(PreFlightTasks.java:70)
        at org.neo4j.server.AbstractNeoServer.runPreflightTasks(AbstractNeoServer.java:303)
        at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:135)
        ... 2 more
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.impl.transaction.XaDataSourceManager@40160f3d' was successfully initialized, but failed to st
art. Please see attached cause exception.
        at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:504)
        at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:115)
        at org.neo4j.kernel.InternalAbstractGraphDatabase.run(InternalAbstractGraphDatabase.java:310)
        ... 10 more
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.impl.nioneo.xa.NeoStoreXaDataSource@a0aa211' was successfully initialized, but failed to star
t. Please see attached cause exception.
        at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:504)
        at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:115)
        at org.neo4j.kernel.impl.transaction.XaDataSourceManager.start(XaDataSourceManager.java:165)
        at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:498)
        ... 12 more
Caused by: java.lang.UnsupportedOperationException: cannot await population on a recovering index
        at org.neo4j.kernel.impl.api.index.RecoveringIndexProxy.awaitStoreScanCompleted(RecoveringIndexProxy.java:46)
        at org.neo4j.kernel.impl.api.index.AbstractDelegatingIndexProxy.awaitStoreScanCompleted(AbstractDelegatingIndexProxy.java:101)
        at org.neo4j.kernel.impl.api.index.AbstractDelegatingIndexProxy.awaitStoreScanCompleted(AbstractDelegatingIndexProxy.java:101)
        at org.neo4j.kernel.impl.api.index.AbstractDelegatingIndexProxy.awaitStoreScanCompleted(AbstractDelegatingIndexProxy.java:101)
        at org.neo4j.kernel.impl.api.index.IndexingService.activateIndex(IndexingService.java:499)
        at org.neo4j.kernel.impl.nioneo.xa.Command$SchemaRuleCommand.execute(Command.java:1221)
        at org.neo4j.kernel.impl.nioneo.xa.WriteTransaction.applyCommit(WriteTransaction.java:801)
        at org.neo4j.kernel.impl.nioneo.xa.WriteTransaction.doCommit(WriteTransaction.java:711)
        at org.neo4j.kernel.impl.transaction.xaframework.XaTransaction.commit(XaTransaction.java:321)
        at org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.injectOnePhaseCommit(XaResourceManager.java:385)
        at org.neo4j.kernel.impl.transaction.xaframework.XaLogicalLog.applyOnePhaseCommitEntry(XaLogicalLog.java:514)
        at org.neo4j.kernel.impl.transaction.xaframework.XaLogicalLog.applyEntry(XaLogicalLog.java:444)
        at org.neo4j.kernel.impl.transaction.xaframework.XaLogicalLog.doInternalRecovery(XaLogicalLog.java:782)
        at org.neo4j.kernel.impl.transaction.xaframework.XaLogicalLog.open(XaLogicalLog.java:210)
        at org.neo4j.kernel.impl.transaction.xaframework.XaLogicalLog.open(XaLogicalLog.java:165)
        at org.neo4j.kernel.impl.transaction.xaframework.XaContainer.openLogicalLog(XaContainer.java:64)
        at org.neo4j.kernel.impl.nioneo.xa.NeoStoreXaDataSource.start(NeoStoreXaDataSource.java:345)
        at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:498)
        ... 15 more

Try what I might I couldn't get the instance to come back up under M06. I read something similar that was fixed in 2.0.0 so I  downloaded 2.0.0 and linked the graph.db database and started up with that which did bring up the database. I then shut it down cleanly and switched back to M06. However, when I did a schema in the neo4j shell, I saw the following:

neo4j-sh (0)$ schema
Indexes
  ON
:DEGREE_OF_HARM(degree_of_harm)           ONLINE                            
  ON
:INCIDENT(incident_description)           ONLINE                            
  ON
:INCIDENT(incident_timestamp)             ONLINE                            
  ON
:INCIDENT(incident_id)                    ONLINE (for uniqueness constraint)
  ON
:INCIDENT_CATEGORY(category_level_01)     ONLINE                            
  ON
:INCIDENT_REPORTER(reporter_level_01)     ONLINE                            
  ON
:INCIDENT_SPECIALITY(speciality_level_01) ONLINE                            
  ON
:NHS_TRUST(name)                          FAILED (for uniqueness constraint)
  ON
:NHS_TRUST_LOCATION(location_level_01)    ONLINE                            
  ON
:NRLS_DATA_TYPE(code)                     ONLINE                            
  ON
:PATIENT(patient_age)                     ONLINE                            
  ON
:PATIENT(patient_sex)                     ONLINE                            
  ON
:PATIENT(patient_ethnicity)               ONLINE                            


Constraints
  ON
(incident:INCIDENT) ASSERT incident.incident_id IS UNIQUE
  ON
(nhs_trust:NHS_TRUST) ASSERT nhs_trust.name IS UNIQUE

So I tried to delete the CONSTRAINT and the failed index and ended up with TM failure:

neo4j-sh (0)$ DROP CONSTRAINT ON (nhs_trust:NHS_TRUST) ASSERT nhs_trust.name IS UNIQUE;
+-------------------+
| No data returned. |
+-------------------+
Constraints removed: 1
285 ms
SystemException: TM has encountered some problem, please perform neccesary action (tx recovery/restart)
neo4j-sh (0)$ schema 
SystemException: TM has encountered some problem, please perform neccesary action (tx recovery/restart)

Looking in the messages I found an error regarding the lock file for the lucene index (which the index directory only had the failure log in it):

org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/opt/neo4j-enterprise-2.0.0-M06/data/graph.db/schema/index/lucene/3/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:84)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1098)
at org.neo4j.kernel.api.impl.index.IndexWriterFactories$1.create(IndexWriterFactories.java:44)
at org.neo4j.kernel.api.impl.index.LuceneIndexPopulator.create(LuceneIndexPopulator.java:65)
at org.neo4j.kernel.impl.api.index.IndexPopulationJob.run(IndexPopulationJob.java:105)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

[root@miyu graph.db]# find schema/index/lucene/3
schema/index/lucene/3
schema/index/lucene/3/failure-message

In order to fix this I stopped neo4j, moved schema/index/lucene/3 to schema/index/lucene/3.bad and started the database again. This recreated the index on startup in a non-failed state which I could then delete and recreate as below:

[root@miyu graph.db]# ls schema/index/lucene/3
_0.fdt  _0.fdx  _0.fnm  _0.frq  _0.nrm  _0.tii  _0.tis  failure-message  segments_1  segments.gen  write.lock

neo4j-sh (0)$ schema
Welcome to the Neo4j Shell! Enter 'help' for a list of commands
[Reconnected to server]
Indexes
  ON :DEGREE_OF_HARM(degree_of_harm)           ONLINE                             
  ON :INCIDENT(incident_description)           ONLINE                             
  ON :INCIDENT(incident_timestamp)             ONLINE                             
  ON :INCIDENT(incident_id)                    ONLINE (for uniqueness constraint) 
  ON :INCIDENT_CATEGORY(category_level_01)     ONLINE                             
  ON :INCIDENT_REPORTER(reporter_level_01)     ONLINE                             
  ON :INCIDENT_SPECIALITY(speciality_level_01) ONLINE                             
  ON :NHS_TRUST(name)                          ONLINE (for uniqueness constraint) 
  ON :NHS_TRUST_LOCATION(location_level_01)    ONLINE                             
  ON :NRLS_DATA_TYPE(code)                     ONLINE                             
  ON :PATIENT(patient_age)                     ONLINE                             
  ON :PATIENT(patient_sex)                     ONLINE                             
  ON :PATIENT(patient_ethnicity)               ONLINE                             

Constraints
  ON (incident:INCIDENT) ASSERT incident.incident_id IS UNIQUE
  ON (nhs_trust:NHS_TRUST) ASSERT nhs_trust.name IS UNIQUE

neo4j-sh (0)$ DROP CONSTRAINT ON (n:NHS_TRUST) ASSERT n.name IS UNIQUE;                
+-------------------+
| No data returned. |
+-------------------+
Constraints removed: 1
349 ms
neo4j-sh (0)$ schema
Indexes
  ON :DEGREE_OF_HARM(degree_of_harm)           ONLINE                             
  ON :INCIDENT(incident_description)           ONLINE                             
  ON :INCIDENT(incident_timestamp)             ONLINE                             
  ON :INCIDENT(incident_id)                    ONLINE (for uniqueness constraint) 
  ON :INCIDENT_CATEGORY(category_level_01)     ONLINE                             
  ON :INCIDENT_REPORTER(reporter_level_01)     ONLINE                             
  ON :INCIDENT_SPECIALITY(speciality_level_01) ONLINE                             
  ON :NHS_TRUST_LOCATION(location_level_01)    ONLINE                             
  ON :NRLS_DATA_TYPE(code)                     ONLINE                             
  ON :PATIENT(patient_age)                     ONLINE                             
  ON :PATIENT(patient_sex)                     ONLINE                             
  ON :PATIENT(patient_ethnicity)               ONLINE                             

Constraints
  ON (incident:INCIDENT) ASSERT incident.incident_id IS UNIQUE
neo4j-sh (0)$ CREATE CONSTRAINT ON (n:NHS_TRUST) ASSERT n.name IS UNIQUE;
+-------------------+
| No data returned. |
+-------------------+
Constraints added: 1
1956 ms
neo4j-sh (0)$ schema
Indexes
  ON :DEGREE_OF_HARM(degree_of_harm)           ONLINE                             
  ON :INCIDENT(incident_description)           ONLINE                             
  ON :INCIDENT(incident_timestamp)             ONLINE                             
  ON :INCIDENT(incident_id)                    ONLINE (for uniqueness constraint) 
  ON :INCIDENT_CATEGORY(category_level_01)     ONLINE                             
  ON :INCIDENT_REPORTER(reporter_level_01)     ONLINE                             
  ON :INCIDENT_SPECIALITY(speciality_level_01) ONLINE                             
  ON :NHS_TRUST(name)                          ONLINE (for uniqueness constraint) 
  ON :NHS_TRUST_LOCATION(location_level_01)    ONLINE                             
  ON :NRLS_DATA_TYPE(code)                     ONLINE                             
  ON :PATIENT(patient_age)                     ONLINE                             
  ON :PATIENT(patient_sex)                     ONLINE                             
  ON :PATIENT(patient_ethnicity)               ONLINE                             

Constraints
  ON (incident:INCIDENT) ASSERT incident.incident_id IS UNIQUE
  ON (nhs_trust:NHS_TRUST) ASSERT nhs_trust.name IS UNIQUE


The incident was originally caused by issuing neo4j stop in M06. Once I tried to start after the stop I ended up in this state and I'm curious as to whether or not I could have fixed the issue in the first place by just moving the lucene index out of the way instead of having to open it with 2.0.0 and then move the lucene index. If you need to me to tar up all the logfiles I can do that, though they might be a bit confused between M06/2.0.0.

Michael Hunger

unread,
Dec 15, 2013, 4:34:32 PM12/15/13
to ne...@googlegroups.com
So are you good now?

Michael

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

JDS

unread,
Dec 15, 2013, 5:49:16 PM12/15/13
to ne...@googlegroups.com
I'm back up and running yes, and I've made some changes to remove the duplicate relationships I was creating before as I had to delete about 10 million of them across 208 nodes which should help things a bit. My concern here is:

a) reproducing the problem to see if it's an issue that happens only with M06 or also on 2.0.0
b) it seems that the db doesn't do any preemptive check for running transactions before shutdown?
c) being told by the shell that I have simply a transaction problem when in reality I had to hunt down to the actual index level failure-message to find the real problem

I now have the following schema and I still need to tweak a few queries to take full advantage of them, though at this point if 2.0.0 can do MERGE on multiple node patterns, i.e. something like:

MERGE (nt:NHS_TRUST { name : 'FOO' })-[r:HAS_NHS_TRUST_LOCATION]->(ntl:NHS_TRUST_LOCATION { location_level_01 : 'BAR' }) ON CREATE ntl SET ntl.location_level_01 = 'FOOBAR' RETURN r;

Then I might have to switch to that because I'm losing time at the moment splitting my CREATE UNIQUE statements into MERGE for the nodes and CREATE UNIQUE for the relationships.

Schema is now (no unique violations as of yet during loading):

neo4j-sh (0)$ schema
Indexes
  ON :DEGREE_OF_HARM(degree_of_harm)           ONLINE (for uniqueness constraint) 
  ON :INCIDENT(incident_description)           ONLINE                             
  ON :INCIDENT(incident_timestamp)             ONLINE                             
  ON :INCIDENT(incident_id)                    ONLINE (for uniqueness constraint) 
  ON :INCIDENT_CATEGORY(category_level_01)     ONLINE (for uniqueness constraint) 
  ON :INCIDENT_REPORTER(reporter_level_01)     ONLINE (for uniqueness constraint) 
  ON :INCIDENT_SPECIALITY(speciality_level_01) ONLINE (for uniqueness constraint) 
  ON :NHS_TRUST(name)                          ONLINE (for uniqueness constraint) 
  ON :NHS_TRUST_LOCATION(location_level_01)    ONLINE (for uniqueness constraint) 
  ON :NRLS_DATA_TYPE(code)                     ONLINE (for uniqueness constraint) 
  ON :PATIENT(patient_age)                     ONLINE                             
  ON :PATIENT(patient_sex)                     ONLINE                             
  ON :PATIENT(patient_ethnicity)               ONLINE                             

Constraints
  ON (nhs_trust:NHS_TRUST) ASSERT nhs_trust.name IS UNIQUE
  ON (incident:INCIDENT) ASSERT incident.incident_id IS UNIQUE
  ON (nhs_trust_location:NHS_TRUST_LOCATION) ASSERT nhs_trust_location.location_level_01 IS UNIQUE
  ON (nrls_data_type:NRLS_DATA_TYPE) ASSERT nrls_data_type.code IS UNIQUE
  ON (degree_of_harm:DEGREE_OF_HARM) ASSERT degree_of_harm.degree_of_harm IS UNIQUE
  ON (incident_category:INCIDENT_CATEGORY) ASSERT incident_category.category_level_01 IS UNIQUE
  ON (incident_reporter:INCIDENT_REPORTER) ASSERT incident_reporter.reporter_level_01 IS UNIQUE
  ON (incident_speciality:INCIDENT_SPECIALITY) ASSERT incident_speciality.speciality_level_01 IS UNIQUE


~ Joshua
Reply all
Reply to author
Forward
0 new messages