issue getting slave to re-sync to master

120 views
Skip to first unread message

Ben

unread,
May 8, 2014, 3:32:27 PM5/8/14
to ne...@googlegroups.com
I've got a two host cluster on 2.0.2 with an arbitrator instance sharing one of the hosts. I shut down the slave service, shutdown the arbitrator, shutdown the master. Then I deleted data/graph.db on both db instances. copied my new graph.db backup to the master node. Restarted the master, restarted the arbitrator, then restarted the slave.

I keep getting the following errors.

console.log =====

2014-05-08 19:00:38.256+0000 INFO  [Cluster] ServerId 3, moving to slave for master ha://10.69.55.186:6001?serverId=1
2014-05-08 19:00:44.251+0000 INFO  [Cluster] Copying store from master
2014-05-08 19:00:44.311+0000 INFO  [Cluster] Copying index/lucene-store.db
2014-05-08 19:00:44.312+0000 INFO  [Cluster] Copied  index/lucene-store.db 40.00 B
2014-05-08 19:00:44.312+0000 INFO  [Cluster] Copying neostore.schemastore.db
2014-05-08 19:00:44.313+0000 INFO  [Cluster] Copied  neostore.schemastore.db 1.44 kB
2014-05-08 19:00:44.313+0000 INFO  [Cluster] Copying neostore.propertystore.db.strings
2014-05-08 19:00:51.762+0000 INFO  [Cluster] Copied  neostore.propertystore.db.strings 1020.32 MB
.....
2014-05-08 19:01:29.947+0000 INFO  [Cluster] Copying nioneo_logical.log.v10
2014-05-08 19:01:29.948+0000 INFO  [Cluster] Copied  nioneo_logical.log.v10 16.00 B
2014-05-08 19:01:29.948+0000 INFO  [Cluster] Done, copied 522 files

then it repeats the process.

graph.db/messages.log
2014-05-08 19:23:23.812+0000 DEBUG [o.n.k.h.c.HighAvailabilityModeSwitcher]: Attempting to switch to slave in 300s
2014-05-08 19:28:23.812+0000 INFO  [o.n.k.h.c.HighAvailabilityModeSwitcher]: ServerId 3, moving to slave for master ha://10.69.55.186:6001?serverId=1
2014-05-08 19:28:23.816+0000 DEBUG [o.n.k.h.i.HaIdGeneratorFactory]: Instantiated slave delegate SlaveIdGenerator[IdRangeIterator[start:0, length:0, position:0, defrag:[]]] of type NODE with highid 0
2014-05-08 19:28:23.816+0000 DEBUG [o.n.k.h.i.HaIdGeneratorFactory]: Instantiated slave delegate SlaveIdGenerator[IdRangeIterator[start:0, length:0, position:0, defrag:[]]] of type RELATIONSHIP with highid 0
2014-05-08 19:28:23.816+0000 DEBUG [o.n.k.h.i.HaIdGeneratorFactory]: Instantiated slave delegate SlaveIdGenerator[IdRangeIterator[start:0, length:0, position:0, defrag:[]]] of type PROPERTY with highid 0
2014-05-08 19:28:23.816+0000 DEBUG [o.n.k.h.i.HaIdGeneratorFactory]: Instantiated slave delegate SlaveIdGenerator[IdRangeIterator[start:0, length:0, position:0, defrag:[]]] of type STRING_BLOCK with highid 1
2014-05-08 19:28:23.816+0000 DEBUG [o.n.k.h.i.HaIdGeneratorFactory]: Instantiated slave delegate SlaveIdGenerator[IdRangeIterator[start:0, length:0, position:0, defrag:[]]] of type ARRAY_BLOCK with highid 1
2014-05-08 19:28:23.816+0000 DEBUG [o.n.k.h.i.HaIdGeneratorFactory]: Instantiated slave delegate SlaveIdGenerator[IdRangeIterator[start:0, length:0, position:0, defrag:[]]] of type PROPERTY_KEY_TOKEN with highid 0
2014-05-08 19:28:23.816+0000 DEBUG [o.n.k.h.i.HaIdGeneratorFactory]: Instantiated slave delegate SlaveIdGenerator[IdRangeIterator[start:0, length:0, position:0, defrag:[]]] of type PROPERTY_KEY_TOKEN_NAME with highid 1
2014-05-08 19:28:23.816+0000 DEBUG [o.n.k.h.i.HaIdGeneratorFactory]: Instantiated slave delegate SlaveIdGenerator[IdRangeIterator[start:0, length:0, position:0, defrag:[]]] of type RELATIONSHIP_TYPE_TOKEN with highid 0
2014-05-08 19:28:23.816+0000 DEBUG [o.n.k.h.i.HaIdGeneratorFactory]: Instantiated slave delegate SlaveIdGenerator[IdRangeIterator[start:0, length:0, position:0, defrag:[]]] of type RELATIONSHIP_TYPE_TOKEN_NAME with highid 1
2014-05-08 19:28:23.816+0000 DEBUG [o.n.k.h.i.HaIdGeneratorFactory]: Instantiated slave delegate SlaveIdGenerator[IdRangeIterator[start:0, length:0, position:0, defrag:[]]] of type LABEL_TOKEN with highid 0
2014-05-08 19:28:23.816+0000 DEBUG [o.n.k.h.i.HaIdGeneratorFactory]: Instantiated slave delegate SlaveIdGenerator[IdRangeIterator[start:0, length:0, position:0, defrag:[]]] of type LABEL_TOKEN_NAME with highid 1
2014-05-08 19:28:23.816+0000 DEBUG [o.n.k.h.i.HaIdGeneratorFactory]: Instantiated slave delegate SlaveIdGenerator[IdRangeIterator[start:0, length:0, position:0, defrag:[]]] of type NEOSTORE_BLOCK with highid 7
2014-05-08 19:28:23.817+0000 DEBUG [o.n.k.h.i.HaIdGeneratorFactory]: Instantiated slave delegate SlaveIdGenerator[IdRangeIterator[start:0, length:0, position:0, defrag:[]]] of type SCHEMA with highid 1
2014-05-08 19:28:23.817+0000 DEBUG [o.n.k.h.i.HaIdGeneratorFactory]: Instantiated slave delegate SlaveIdGenerator[IdRangeIterator[start:0, length:0, position:0, defrag:[]]] of type NODE_LABELS with highid 1
2014-05-08 19:28:23.817+0000 INFO  [o.n.k.i.t.TxManager]: TM shutting down
2014-05-08 19:28:25.071+0000 INFO  [o.n.k.h.MasterClient201]: MasterClient201 communication channel created towards 10.69.55.186:6001
2014-05-08 19:28:25.074+0000 INFO  [o.n.k.h.c.HighAvailabilityModeSwitcher]: Copying store from master
2014-05-08 19:28:25.076+0000 DEBUG [o.n.k.h.MasterClient201]: Opened a new channel to /10.69.55.186:6001
2014-05-08 19:28:25.076+0000 DEBUG [o.n.k.h.MasterClient201]: ResourcePool create resource ([id: 0x7f95808a, /10.69.55.187:40389 => /10.69.55.186:6001], DynamicChannelBuffer(ridx=0, widx=0, cap=256), java.nio.HeapByteBuffer[pos=0 lim=1048576 cap=1048576])
2014-05-08 19:28:25.182+0000 INFO  [o.n.k.h.c.HighAvailabilityModeSwitcher]: Copying index/lucene-store.db
2014-05-08 19:28:25.182+0000 INFO  [o.n.k.h.c.HighAvailabilityModeSwitcher]: Copied  index/lucene-store.db 40.00 B
2014-05-08 19:28:25.182+0000 INFO  [o.n.k.h.c.HighAvailabilityModeSwitcher]: Copying neostore.schemastore.db
2014-05-08 19:28:25.183+0000 INFO  [o.n.k.h.c.HighAvailabilityModeSwitcher]: Copied  neostore.schemastore.db 1.44 kB
2014-05-08 19:28:25.183+0000 INFO  [o.n.k.h.c.HighAvailabilityModeSwitcher]: Copying neostore.propertystore.db.strings
2014-05-08 19:28:34.444+0000 INFO  [o.n.k.h.c.HighAvailabilityModeSwitcher]: Copied  neostore.propertystore.db.strings 1020.32 MB
2014-05-08 19:28:34.444+0000 INFO  [o.n.k.h.c.HighAvailabilityModeSwitcher]: Copying neostore.propertystore.db.arrays
2014-05-08 19:28:34.444+0000 INFO  [o.n.k.h.c.HighAvailabilityModeSwitcher]: Copied  neostore.propertystore.db.arrays 128.00 B
2014-05-08 19:28:34.445+0000 INFO  [o.n.k.h.c.HighAvailabilityModeSwitcher]: Copying neostore.nodestore.db
2014-05-08 19:28:35.362+0000 INFO  [o.n.k.h.c.HighAvailabilityModeSwitcher]: Copied  neostore.nodestore.db 123.78 MB
......
2014-05-08 19:29:03.391+0000 INFO  [o.n.k.h.c.HighAvailabilityModeSwitcher]: Done, copied 529 files
2014-05-08 19:29:03.505+0000 DEBUG [o.n.k.h.MasterClient201]: Closing channel: [id: 0x7f95808a, /10.69.55.187:40389 => /10.69.55.186:6001]. Channel pool size is now 0
2014-05-08 19:29:03.509+0000 DEBUG [o.n.k.h.MasterClient201]: MasterClient201[/10.69.55.186:6001] shutdown
2014-05-08 19:29:03.510+0000 ERROR [o.n.k.h.c.HighAvailabilityModeSwitcher]: Error while trying to switch to slave
java.lang.RuntimeException: Error starting org.neo4j.kernel.EmbeddedGraphDatabase, /opt/software/neo4j/neo4j_2/data/graph.db/temp-copy.tmp
        at org.neo4j.kernel.InternalAbstractGraphDatabase.run(InternalAbstractGraphDatabase.java:351) ~[neo4j-kernel-2.0.2.jar:2.0.2]
        at org.neo4j.kernel.EmbeddedGraphDatabase.<init>(EmbeddedGraphDatabase.java:56) ~[neo4j-kernel-2.0.2.jar:2.0.2]
        at org.neo4j.graphdb.factory.GraphDatabaseFactory$1.newDatabase(GraphDatabaseFactory.java:90) ~[neo4j-kernel-2.0.2.jar:2.0.2]
        at org.neo4j.graphdb.factory.GraphDatabaseBuilder.newGraphDatabase(GraphDatabaseBuilder.java:199) ~[neo4j-kernel-2.0.2.jar:2.0.2]
        at org.neo4j.com.storecopy.RemoteStoreCopier.newTempDatabase(RemoteStoreCopier.java:210) ~[neo4j-com-2.0.2.jar:2.0.2]
        at org.neo4j.com.storecopy.RemoteStoreCopier.createLogWriters(RemoteStoreCopier.java:190) ~[neo4j-com-2.0.2.jar:2.0.2]
        at org.neo4j.com.storecopy.RemoteStoreCopier.writeTransactionsToActiveLogFile(RemoteStoreCopier.java:144) ~[neo4j-com-2.0.2.jar:2.0.2]
        at org.neo4j.com.storecopy.RemoteStoreCopier.copyStore(RemoteStoreCopier.java:107) ~[neo4j-com-2.0.2.jar:2.0.2]
        at org.neo4j.kernel.ha.cluster.HighAvailabilityModeSwitcher.copyStoreFromMaster(HighAvailabilityModeSwitcher.java:672) ~[neo4j-ha-2.0.2.jar:2.0.2]
        at org.neo4j.kernel.ha.cluster.HighAvailabilityModeSwitcher.access$1400(HighAvailabilityModeSwitcher.java:119) ~[neo4j-ha-2.0.2.jar:2.0.2]
        at org.neo4j.kernel.ha.cluster.HighAvailabilityModeSwitcher$3.run(HighAvailabilityModeSwitcher.java:440) ~[neo4j-ha-2.0.2.jar:2.0.2]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_51]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_51]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) [na:1.7.0_51]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) [na:1.7.0_51]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_51]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]
        at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.extension.KernelExtensions@602c9c83' failed to initialize. Please see attached cause exception.
        at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.init(LifeSupport.java:489) ~[neo4j-kernel-2.0.2.jar:2.0.2]
        at org.neo4j.kernel.lifecycle.LifeSupport.init(LifeSupport.java:72) ~[neo4j-kernel-2.0.2.jar:2.0.2]
        at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:106) ~[neo4j-kernel-2.0.2.jar:2.0.2]
        at org.neo4j.kernel.InternalAbstractGraphDatabase.run(InternalAbstractGraphDatabase.java:328) ~[neo4j-kernel-2.0.2.jar:2.0.2]
        ... 17 common frames omitted
Caused by: org.neo4j.kernel.extension.KernelExtensions$UnsatisfiedDepencyException: java.lang.IllegalArgumentException: Could not resolve dependency of type:org.neo4j.kernel.guard.Guard
        at org.neo4j.kernel.extension.KernelExtensions$KernelExtensionHandler.invoke(KernelExtensions.java:249) ~[neo4j-kernel-2.0.2.jar:2.0.2]
        at com.sun.proxy.$Proxy7.getGuard(Unknown Source) ~[na:na]
        at org.neo4j.extension.querykiller.QueryRegistryExtensionFactory.newKernelExtension(QueryRegistryExtensionFactory.java:23) ~[neo4j-querykiller-community-1.0.0-SNAPSHOT.jar:na]
        at org.neo4j.extension.querykiller.QueryRegistryExtensionFactory.newKernelExtension(QueryRegistryExtensionFactory.java:8) ~[neo4j-querykiller-community-1.0.0-SNAPSHOT.jar:na]
        at org.neo4j.kernel.extension.KernelExtensions.init(KernelExtensions.java:103) ~[neo4j-kernel-2.0.2.jar:2.0.2]
        at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.init(LifeSupport.java:483) ~[neo4j-kernel-2.0.2.jar:2.0.2]
        ... 20 common frames omitted
Caused by: java.lang.IllegalArgumentException: Could not resolve dependency of type:org.neo4j.kernel.guard.Guard
        at org.neo4j.graphdb.DependencyResolver$Adapter$1.select(DependencyResolver.java:84) ~[neo4j-kernel-2.0.2.jar:2.0.2]
        at org.neo4j.kernel.extension.KernelExtensions.resolveDependency(KernelExtensions.java:205) ~[neo4j-kernel-2.0.2.jar:2.0.2]
        at org.neo4j.kernel.InternalAbstractGraphDatabase$DependencyResolverImpl.resolveDependency(InternalAbstractGraphDatabase.java:1458) ~[neo4j-kernel-2.0.2.jar:2.0.2]
        at org.neo4j.graphdb.DependencyResolver$Adapter.resolveDependency(DependencyResolver.java:92) ~[neo4j-kernel-2.0.2.jar:2.0.2]
        at org.neo4j.kernel.extension.KernelExtensions$KernelExtensionHandler.invoke(KernelExtensions.java:245) ~[neo4j-kernel-2.0.2.jar:2.0.2]
        ... 25 common frames omitted
2014-05-08 19:29:03.511+0000 DEBUG [o.n.k.h.c.HighAvailabilityModeSwitcher]: Attempting to switch to slave in 300s

Michael Hunger

unread,
May 9, 2014, 9:27:26 PM5/9/14
to ne...@googlegroups.com
What does your config look like?

Do you have the database guard configured?

Seems to be something amiss with your java libraries, perhaps you deleted accidentally a file there?

Cheers,

Michael

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ben

unread,
May 12, 2014, 9:31:05 AM5/12/14
to ne...@googlegroups.com
I'm not sure what you are referring to by database guard, do you mean the execution_guard_enabled parameter? Here are the non-default settings we've made.

##neo4j-wrapper.conf
wrapper.java.initmemory=4096
wrapper.java.maxmemory=4096
wrapper.user=neo4j
wrapper.user.created=false

##neo4j.properties
keep_logical_logs=15 days
online_backup_server=127.0.0.1:6366
ha.server_id=1
ha.initial_hosts=10.69.55.186:5001,10.69.55.186:5002,10.69.55.187:5001
ha.server=10.69.55.186:6001
ha.cluster_server=10.69.55.186:5001
execution_guard_enabled=true  ## this is set for the query killer extension

##neo4j-server.properties
org.neo4j.server.webserver.address=0.0.0.0
org.neo4j.server.webserver.https.port=7484
org.neo4j.server.database.mode=HA
org.neo4j.server.webserver.maxthreads=200
org.neo4j.server.transaction.timeout=40
org.neo4j.server.thirdparty_jaxrs_classes=org.neo4j.extension.querykiller.server=/querykiller


There were no java libraries deleted. I ended up shutting down the cluster and copying the backup to both data directories and restarted. It appears to be working now but I have concerns that the clustering functionality doesn't work as it should.

Ben

unread,
May 28, 2014, 8:17:07 AM5/28/14
to ne...@googlegroups.com
just wanted to post a further update, I disabled the querykiller extension and this fixed the replication issue. I recreated the same problem in a 3 host cluster as well and it's the same issue.

Chris Vest

unread,
May 28, 2014, 10:45:15 AM5/28/14
to ne...@googlegroups.com

On 28 May 2014, at 14:17, Ben <ben....@sni-infinite.mygbiz.com> wrote:
I disabled the querykiller extension and this fixed the replication issue.

What is this querykiller extension? It sounds like a dangerous thing to hack in from the outside, given the current architecture.

--
Chris Vest
System Engineer, Neo Technology
[ skype: mr.chrisvest, twitter: chvest ]

Ben

unread,
May 28, 2014, 10:49:24 AM5/28/14
to ne...@googlegroups.com
It's an extension that a support contractor, that Neo Technology supplied for us, brought to us for our implementation of your product. It's supposedly supported, though from your question of what it is, I'm having my doubts.

Chris Vest

unread,
May 28, 2014, 10:58:44 AM5/28/14
to ne...@googlegroups.com
Well, it’s not part of the main product, which is what I work on. Our field engineers/support contractors have more tools in their belts, for the field and support work they do. This might be one of those things.


--
Chris Vest
System Engineer, Neo Technology
[ skype: mr.chrisvest, twitter: chvest ]


Reply all
Reply to author
Forward
0 new messages