Re: [Neo4j] Unable to start slave and complete replication in HA mode

168 views

Skip to first unread message

Michael Hunger

unread,

Feb 25, 2013, 6:33:17 PM2/25/13

to ne...@googlegroups.com

John,

can you verify that the network connection between the machines is stable?

Cheers

Michael

(neo4j.org) <-[:WORKS_ON]- (@mesirii) -[:TAKES_CARE_OF]-> (you) -[:WORKS_WITH]->(@Neo4j)

Am 25.02.2013 um 23:02 schrieb John Lewis:

Hello All,

I currently have Neo4J enterprise installed in a 3 node HA cluster. Following the HA documentation (http://docs.neo4j.org/chunked/stable/ha-setup-tutorial.html), I have the Coordinators setup and communicating properly.

===== conf/coord.cfg =====
server.1=10.0.100.55:2888:3888
server.2=10.0.100.19:2889:3889
server.3=10.0.100.20:2890:3890

I have verified in there logs that they are communicating without any errors. Next I started the Master node and it start just fine. Verifying in the shell and the web browser, I can see that all my
43 661 873 nodes
3 746 123 688 properties
3 701 950 371 relationships

are recognized. The problem is that when I start the slaves everything starts up fine. They say that it is "Copying store from master" but after about 20 to 30GB of the data store has been copied it throws an "Unexpected Exception". (see below for details)

==== data/graph.db/message.log ====

2013-02-25 16:06:54,916 INFO [neo4j]: Starting up highly available graph database '/usr2/neo4j/data/graph.db'
2013-02-25 16:06:54,919 INFO [neo4j]: Writing at flush-requested: 2
2013-02-25 16:06:55,032 INFO [neo4j]: Writing at flush-requested: -6
2013-02-25 16:06:55,036 INFO [neo4j]: org.neo4j.kernel.ha.zookeeper.ZooClient$WatcherImpl@55595f65, Mon Feb 25 16:06:55 EST 2013 Got event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/1361715122199_1945648888875130468/flush-requested (path=/1361715122199_1945648888875130468/flush-requested)
2013-02-25 16:06:55,037 INFO [neo4j]: getMaster 1 based on [MachineInfo[ID:1, sequence:33, last tx:1, server:(10.0.100.55, 6001), master for last tx:-1], MachineInfo[ID:2, sequence:44, last tx:1, server:(10.0.100.19, 6002), master for last tx:-1]]
2013-02-25 16:06:55,037 INFO [neo4j]: Got event data -6
2013-02-25 16:06:55,040 INFO [neo4j]: Read HA server:10.0.100.19:6002 (for machineID 2) from zoo keeper
2013-02-25 16:06:55,042 INFO [neo4j]: Read HA server:10.0.100.55:6001 (for machineID 1) from zoo keeper
2013-02-25 16:06:55,043 INFO [neo4j]: Checking compatibility mode, read 2 as all machines, 2 as myVersion machines. Based on that I return false
2013-02-25 16:06:55,095 INFO [neo4j]: MasterClient18 communication started and bound to 10.0.100.55:6001
2013-02-25 16:06:55,096 INFO [neo4j]: MasterClient18[/10.0.100.55:6001] shutdown
2013-02-25 16:06:55,208 INFO [neo4j]: Writing at flush-requested: -1
2013-02-25 16:06:55,320 INFO [neo4j]: Read HA server:10.0.100.55:6001 (for machineID 1) from zoo keeper
2013-02-25 16:06:55,322 INFO [neo4j]: Writing at flush-requested: -6
2013-02-25 16:06:55,328 INFO [neo4j]: getMaster 1 based on [MachineInfo[ID:1, sequence:33, last tx:1, server:(10.0.100.55, 6001), master for last tx:-1]]
2013-02-25 16:06:55,331 INFO [neo4j]: MasterClient18 communication started and bound to 10.0.100.55:6001
2013-02-25 16:06:55,337 ERROR [neo4j]: Internal shutdown of HA db[2] reference=HighlyAvailableGraphDatabase[/usr2/neo4j/data/graph.db, ha.server_id:2], masterServer=null
org.neo4j.kernel.InformativeStackTrace: Internal shutdown
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.internalShutdown(HighlyAvailableGraphDatabase.java:1249) [neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.getFreshDatabaseFromMaster(HighlyAvailableGraphDatabase.java:491) [neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.start(HighlyAvailableGraphDatabase.java:607) [neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.<init>(HighlyAvailableGraphDatabase.java:286) [neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.<init>(HighlyAvailableGraphDatabase.java:210) [neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.server.enterprise.EnterpriseDatabase$DatabaseMode$2.createDatabase(EnterpriseDatabase.java:52) [neo4j-server-enterprise-1.8.1.jar:1.8.1]
    at org.neo4j.server.enterprise.EnterpriseDatabase.start(EnterpriseDatabase.java:73) [neo4j-server-enterprise-1.8.1.jar:1.8.1]
    at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:124) [neo4j-server-1.8.1.jar:1.8.1]
    at org.neo4j.server.Bootstrapper.start(Bootstrapper.java:85) [neo4j-server-1.8.1.jar:1.8.1]
    at org.neo4j.server.Bootstrapper.main(Bootstrapper.java:49) [neo4j-server-1.8.1.jar:1.8.1]
2013-02-25 16:06:56,395 INFO [neo4j]: Copying store from master
2013-02-25 16:06:56,460 INFO [neo4j]: Opened a new channel to /10.0.100.55:6001
2013-02-25 16:08:44,531 ERROR [neo4j]: Problems copying store from master
java.lang.RuntimeException: Unexpected Exception
    at org.neo4j.helpers.Exceptions.launderedException(Exceptions.java:78) ~[neo4j-kernel-1.8.1.jar:1.8.1]
    at org.neo4j.helpers.Exceptions.launderedException(Exceptions.java:53) ~[neo4j-kernel-1.8.1.jar:1.8.1]
    at org.neo4j.com.Client.sendRequest(Client.java:237) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.Client.sendRequest(Client.java:180) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.ha.MasterClient18.copyStore(MasterClient18.java:284) ~[neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.copyStoreFromMaster(HighlyAvailableGraphDatabase.java:811) [neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.getFreshDatabaseFromMaster(HighlyAvailableGraphDatabase.java:508) [neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.start(HighlyAvailableGraphDatabase.java:607) [neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.<init>(HighlyAvailableGraphDatabase.java:286) [neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.<init>(HighlyAvailableGraphDatabase.java:210) [neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.server.enterprise.EnterpriseDatabase$DatabaseMode$2.createDatabase(EnterpriseDatabase.java:52) [neo4j-server-enterprise-1.8.1.jar:1.8.1]
    at org.neo4j.server.enterprise.EnterpriseDatabase.start(EnterpriseDatabase.java:73) [neo4j-server-enterprise-1.8.1.jar:1.8.1]
    at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:124) [neo4j-server-1.8.1.jar:1.8.1]
    at org.neo4j.server.Bootstrapper.start(Bootstrapper.java:85) [neo4j-server-1.8.1.jar:1.8.1]
    at org.neo4j.server.Bootstrapper.main(Bootstrapper.java:49) [neo4j-server-1.8.1.jar:1.8.1]
Caused by: java.io.IOException: org.neo4j.com.ComException: org.jboss.netty.handler.queue.BlockingReadTimeoutException
    at org.neo4j.com.ToFileStoreWriter.write(ToFileStoreWriter.java:72) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.Protocol$FileStreamsDeserializer.read(Protocol.java:95) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.Protocol$FileStreamsDeserializer.read(Protocol.java:78) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.Client.sendRequest(Client.java:208) ~[neo4j-com-1.8.1.jar:1.8.1]
    ... 12 common frames omitted
Caused by: org.neo4j.com.ComException: org.jboss.netty.handler.queue.BlockingReadTimeoutException
    at org.neo4j.com.DechunkingChannelBuffer.readNext(DechunkingChannelBuffer.java:75) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.DechunkingChannelBuffer.readNextChunk(DechunkingChannelBuffer.java:93) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.DechunkingChannelBuffer.readNextChunkIfNeeded(DechunkingChannelBuffer.java:87) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.DechunkingChannelBuffer.readUnsignedByte(DechunkingChannelBuffer.java:519) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.BlockLogReader.readNextBlock(BlockLogReader.java:58) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.BlockLogReader.readAsMuchAsPossible(BlockLogReader.java:114) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.BlockLogReader.read(BlockLogReader.java:91) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.ToFileStoreWriter.write(ToFileStoreWriter.java:53) ~[neo4j-com-1.8.1.jar:1.8.1]
    ... 15 common frames omitted
Caused by: org.jboss.netty.handler.queue.BlockingReadTimeoutException: null
    at org.jboss.netty.handler.queue.BlockingReadHandler.readEvent(BlockingReadHandler.java:236) ~[org.apache.servicemix.bundles.netty-3.2.5.Final_1.jar:na]
    at org.jboss.netty.handler.queue.BlockingReadHandler.read(BlockingReadHandler.java:167) ~[org.apache.servicemix.bundles.netty-3.2.5.Final_1.jar:na]
    at org.neo4j.com.DechunkingChannelBuffer.readNext(DechunkingChannelBuffer.java:66) ~[neo4j-com-1.8.1.jar:1.8.1]
    ... 22 common frames omitted
2013-02-25 16:08:45,535 INFO [neo4j]: Writing at flush-requested: -1
2013-02-25 16:08:45,649 INFO [neo4j]: Writing at flush-requested: -6
2013-02-25 16:08:45,656 INFO [neo4j]: getMaster 1 based on [MachineInfo[ID:1, sequence:33, last tx:1, server:(10.0.100.55, 6001), master for last tx:-1]]
2013-02-25 16:08:45,659 INFO [neo4j]: MasterClient18 communication started and bound to 10.0.100.55:6001
2013-02-25 16:08:46,962 INFO [neo4j]: Copying store from master
2013-02-25 16:08:46,967 INFO [neo4j]: Opened a new channel to /10.0.100.55:6001
2013-02-25 16:09:34,488 ERROR [neo4j]: Problems copying store from master
java.lang.RuntimeException: Unexpected Exception
    at org.neo4j.helpers.Exceptions.launderedException(Exceptions.java:78) ~[neo4j-kernel-1.8.1.jar:1.8.1]
    at org.neo4j.helpers.Exceptions.launderedException(Exceptions.java:53) ~[neo4j-kernel-1.8.1.jar:1.8.1]
    at org.neo4j.com.Client.sendRequest(Client.java:237) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.Client.sendRequest(Client.java:180) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.ha.MasterClient18.copyStore(MasterClient18.java:284) ~[neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.copyStoreFromMaster(HighlyAvailableGraphDatabase.java:811) [neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.getFreshDatabaseFromMaster(HighlyAvailableGraphDatabase.java:508) [neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.start(HighlyAvailableGraphDatabase.java:607) [neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.<init>(HighlyAvailableGraphDatabase.java:286) [neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.<init>(HighlyAvailableGraphDatabase.java:210) [neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.server.enterprise.EnterpriseDatabase$DatabaseMode$2.createDatabase(EnterpriseDatabase.java:52) [neo4j-server-enterprise-1.8.1.jar:1.8.1]
    at org.neo4j.server.enterprise.EnterpriseDatabase.start(EnterpriseDatabase.java:73) [neo4j-server-enterprise-1.8.1.jar:1.8.1]
    at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:124) [neo4j-server-1.8.1.jar:1.8.1]
    at org.neo4j.server.Bootstrapper.start(Bootstrapper.java:85) [neo4j-server-1.8.1.jar:1.8.1]
    at org.neo4j.server.Bootstrapper.main(Bootstrapper.java:49) [neo4j-server-1.8.1.jar:1.8.1]
Caused by: java.io.IOException: org.neo4j.com.ComException: org.jboss.netty.handler.queue.BlockingReadTimeoutException
    at org.neo4j.com.ToFileStoreWriter.write(ToFileStoreWriter.java:72) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.Protocol$FileStreamsDeserializer.read(Protocol.java:95) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.Protocol$FileStreamsDeserializer.read(Protocol.java:78) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.Client.sendRequest(Client.java:208) ~[neo4j-com-1.8.1.jar:1.8.1]
    ... 12 common frames omitted
Caused by: org.neo4j.com.ComException: org.jboss.netty.handler.queue.BlockingReadTimeoutException
    at org.neo4j.com.DechunkingChannelBuffer.readNext(DechunkingChannelBuffer.java:75) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.DechunkingChannelBuffer.readNextChunk(DechunkingChannelBuffer.java:93) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.DechunkingChannelBuffer.readNextChunkIfNeeded(DechunkingChannelBuffer.java:87) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.DechunkingChannelBuffer.readUnsignedByte(DechunkingChannelBuffer.java:519) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.BlockLogReader.readNextBlock(BlockLogReader.java:58) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.BlockLogReader.readAsMuchAsPossible(BlockLogReader.java:114) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.BlockLogReader.read(BlockLogReader.java:91) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.ToFileStoreWriter.write(ToFileStoreWriter.java:53) ~[neo4j-com-1.8.1.jar:1.8.1]
    ... 15 common frames omitted
Caused by: org.jboss.netty.handler.queue.BlockingReadTimeoutException: null
    at org.jboss.netty.handler.queue.BlockingReadHandler.readEvent(BlockingReadHandler.java:236) ~[org.apache.servicemix.bundles.netty-3.2.5.Final_1.jar:na]
    at org.jboss.netty.handler.queue.BlockingReadHandler.read(BlockingReadHandler.java:167) ~[org.apache.servicemix.bundles.netty-3.2.5.Final_1.jar:na]
    at org.neo4j.com.DechunkingChannelBuffer.readNext(DechunkingChannelBuffer.java:66) ~[neo4j-com-1.8.1.jar:1.8.1]
    ... 22 common frames omitted
2013-02-25 16:09:35,493 INFO [neo4j]: Writing at flush-requested: -1
2013-02-25 16:09:35,605 INFO [neo4j]: Writing at flush-requested: -6
2013-02-25 16:09:35,611 INFO [neo4j]: getMaster 1 based on [MachineInfo[ID:1, sequence:33, last tx:1, server:(10.0.100.55, 6001), master for last tx:-1]]
2013-02-25 16:09:35,614 INFO [neo4j]: MasterClient18 communication started and bound to 10.0.100.55:6001
2013-02-25 16:09:35,848 INFO [neo4j]: Copying store from master
2013-02-25 16:09:35,853 INFO [neo4j]: Opened a new channel to /10.0.100.55:6001
2013-02-25 16:11:16,501 ERROR [neo4j]: Problems copying store from master
java.lang.RuntimeException: Unexpected Exception
    at org.neo4j.helpers.Exceptions.launderedException(Exceptions.java:78) ~[neo4j-kernel-1.8.1.jar:1.8.1]
    at org.neo4j.helpers.Exceptions.launderedException(Exceptions.java:53) ~[neo4j-kernel-1.8.1.jar:1.8.1]
    at org.neo4j.com.Client.sendRequest(Client.java:237) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.Client.sendRequest(Client.java:180) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.ha.MasterClient18.copyStore(MasterClient18.java:284) ~[neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.copyStoreFromMaster(HighlyAvailableGraphDatabase.java:811) [neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.getFreshDatabaseFromMaster(HighlyAvailableGraphDatabase.java:508) [neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.start(HighlyAvailableGraphDatabase.java:607) [neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.<init>(HighlyAvailableGraphDatabase.java:286) [neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.<init>(HighlyAvailableGraphDatabase.java:210) [neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.server.enterprise.EnterpriseDatabase$DatabaseMode$2.createDatabase(EnterpriseDatabase.java:52) [neo4j-server-enterprise-1.8.1.jar:1.8.1]
    at org.neo4j.server.enterprise.EnterpriseDatabase.start(EnterpriseDatabase.java:73) [neo4j-server-enterprise-1.8.1.jar:1.8.1]
    at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:124) [neo4j-server-1.8.1.jar:1.8.1]
    at org.neo4j.server.Bootstrapper.start(Bootstrapper.java:85) [neo4j-server-1.8.1.jar:1.8.1]
    at org.neo4j.server.Bootstrapper.main(Bootstrapper.java:49) [neo4j-server-1.8.1.jar:1.8.1]
Caused by: java.io.IOException: org.neo4j.com.ComException: org.jboss.netty.handler.queue.BlockingReadTimeoutException
    at org.neo4j.com.ToFileStoreWriter.write(ToFileStoreWriter.java:72) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.Protocol$FileStreamsDeserializer.read(Protocol.java:95) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.Protocol$FileStreamsDeserializer.read(Protocol.java:78) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.Client.sendRequest(Client.java:208) ~[neo4j-com-1.8.1.jar:1.8.1]
    ... 12 common frames omitted
Caused by: org.neo4j.com.ComException: org.jboss.netty.handler.queue.BlockingReadTimeoutException
    at org.neo4j.com.DechunkingChannelBuffer.readNext(DechunkingChannelBuffer.java:75) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.DechunkingChannelBuffer.readNextChunk(DechunkingChannelBuffer.java:93) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.DechunkingChannelBuffer.readNextChunkIfNeeded(DechunkingChannelBuffer.java:87) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.DechunkingChannelBuffer.readUnsignedByte(DechunkingChannelBuffer.java:519) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.BlockLogReader.readNextBlock(BlockLogReader.java:58) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.BlockLogReader.readAsMuchAsPossible(BlockLogReader.java:114) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.BlockLogReader.read(BlockLogReader.java:91) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.ToFileStoreWriter.write(ToFileStoreWriter.java:53) ~[neo4j-com-1.8.1.jar:1.8.1]
    ... 15 common frames omitted
Caused by: org.jboss.netty.handler.queue.BlockingReadTimeoutException: null
    at org.jboss.netty.handler.queue.BlockingReadHandler.readEvent(BlockingReadHandler.java:236) ~[org.apache.servicemix.bundles.netty-3.2.5.Final_1.jar:na]
    at org.jboss.netty.handler.queue.BlockingReadHandler.read(BlockingReadHandler.java:167) ~[org.apache.servicemix.bundles.netty-3.2.5.Final_1.jar:na]
    at org.neo4j.com.DechunkingChannelBuffer.readNext(DechunkingChannelBuffer.java:66) ~[neo4j-com-1.8.1.jar:1.8.1]
    ... 22 common frames omitted
2013-02-25 16:11:17,505 INFO [neo4j]: Writing at flush-requested: -1
2013-02-25 16:11:17,617 INFO [neo4j]: Writing at flush-requested: -6
2013-02-25 16:11:17,624 INFO [neo4j]: getMaster 1 based on [MachineInfo[ID:1, sequence:33, last tx:1, server:(10.0.100.55, 6001), master for last tx:-1]]
2013-02-25 16:11:17,626 INFO [neo4j]: MasterClient18 communication started and bound to 10.0.100.55:6001
2013-02-25 16:11:17,637 INFO [neo4j]: org.neo4j.kernel.ha.zookeeper.ZooClient$WatcherImpl@30518bfc, Mon Feb 25 16:11:17 EST 2013 Got event: WatchedEvent state:SyncConnected type:None path:null (path=null)
2013-02-25 16:11:17,682 INFO [neo4j]: Wrote HA server 10.0.100.19:6002 to zoo keeper
2013-02-25 16:11:17,690 INFO [neo4j]: Did setup, seq=0000000046 new sessionId=161306044786868229
2013-02-25 16:11:17,694 INFO [neo4j]: Read HA server:10.0.100.19:6002 (for machineID 2) from zoo keeper
2013-02-25 16:11:17,695 INFO [neo4j]: Read HA server:10.0.100.55:6001 (for machineID 1) from zoo keeper
2013-02-25 16:11:17,696 INFO [neo4j]: Checking compatibility mode, read 2 as all machines, 2 as myVersion machines. Based on that I return false
2013-02-25 16:11:17,881 INFO [neo4j]: Writing at flush-requested: 2
2013-02-25 16:11:17,994 INFO [neo4j]: Writing at flush-requested: -6
2013-02-25 16:11:17,997 INFO [neo4j]: org.neo4j.kernel.ha.zookeeper.ZooClient$WatcherImpl@30518bfc, Mon Feb 25 16:11:17 EST 2013 Got event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/1361715122199_1945648888875130468/flush-requested (path=/1361715122199_1945648888875130468/flush-requested)
2013-02-25 16:11:17,999 INFO [neo4j]: getMaster 1 based on [MachineInfo[ID:1, sequence:33, last tx:1, server:(10.0.100.55, 6001), master for last tx:-1], MachineInfo[ID:2, sequence:46, last tx:1, server:(10.0.100.19, 6002), master for last tx:-1]]
2013-02-25 16:11:17,999 INFO [neo4j]: Got event data -6
2013-02-25 16:11:18,002 INFO [neo4j]: Read HA server:10.0.100.19:6002 (for machineID 2) from zoo keeper
2013-02-25 16:11:18,002 INFO [neo4j]: MasterClient18 communication started and bound to 10.0.100.55:6001
2013-02-25 16:11:18,003 ERROR [neo4j]: Problems copying store from master
java.lang.RuntimeException: Gave up trying to copy store from master
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.getFreshDatabaseFromMaster(HighlyAvailableGraphDatabase.java:523) [neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.start(HighlyAvailableGraphDatabase.java:607) [neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.<init>(HighlyAvailableGraphDatabase.java:286) [neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.<init>(HighlyAvailableGraphDatabase.java:210) [neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.server.enterprise.EnterpriseDatabase$DatabaseMode$2.createDatabase(EnterpriseDatabase.java:52) [neo4j-server-enterprise-1.8.1.jar:1.8.1]
    at org.neo4j.server.enterprise.EnterpriseDatabase.start(EnterpriseDatabase.java:73) [neo4j-server-enterprise-1.8.1.jar:1.8.1]
    at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:124) [neo4j-server-1.8.1.jar:1.8.1]
    at org.neo4j.server.Bootstrapper.start(Bootstrapper.java:85) [neo4j-server-1.8.1.jar:1.8.1]
    at org.neo4j.server.Bootstrapper.main(Bootstrapper.java:49) [neo4j-server-1.8.1.jar:1.8.1]
Caused by: java.lang.RuntimeException: Unexpected Exception
    at org.neo4j.helpers.Exceptions.launderedException(Exceptions.java:78) ~[neo4j-kernel-1.8.1.jar:1.8.1]
    at org.neo4j.helpers.Exceptions.launderedException(Exceptions.java:53) ~[neo4j-kernel-1.8.1.jar:1.8.1]
    at org.neo4j.com.Client.sendRequest(Client.java:237) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.Client.sendRequest(Client.java:180) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.ha.MasterClient18.copyStore(MasterClient18.java:284) ~[neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.copyStoreFromMaster(HighlyAvailableGraphDatabase.java:811) [neo4j-ha-1.8.1.jar:1.8.1]
    at org.neo4j.kernel.HighlyAvailableGraphDatabase.getFreshDatabaseFromMaster(HighlyAvailableGraphDatabase.java:508) [neo4j-ha-1.8.1.jar:1.8.1]
    ... 8 common frames omitted
Caused by: java.io.IOException: org.neo4j.com.ComException: org.jboss.netty.handler.queue.BlockingReadTimeoutException
    at org.neo4j.com.ToFileStoreWriter.write(ToFileStoreWriter.java:72) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.Protocol$FileStreamsDeserializer.read(Protocol.java:95) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.Protocol$FileStreamsDeserializer.read(Protocol.java:78) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.Client.sendRequest(Client.java:208) ~[neo4j-com-1.8.1.jar:1.8.1]
    ... 12 common frames omitted
Caused by: org.neo4j.com.ComException: org.jboss.netty.handler.queue.BlockingReadTimeoutException
    at org.neo4j.com.DechunkingChannelBuffer.readNext(DechunkingChannelBuffer.java:75) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.DechunkingChannelBuffer.readNextChunk(DechunkingChannelBuffer.java:93) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.DechunkingChannelBuffer.readNextChunkIfNeeded(DechunkingChannelBuffer.java:87) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.DechunkingChannelBuffer.readUnsignedByte(DechunkingChannelBuffer.java:519) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.BlockLogReader.readNextBlock(BlockLogReader.java:58) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.BlockLogReader.readAsMuchAsPossible(BlockLogReader.java:114) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.BlockLogReader.read(BlockLogReader.java:91) ~[neo4j-com-1.8.1.jar:1.8.1]
    at org.neo4j.com.ToFileStoreWriter.write(ToFileStoreWriter.java:53) ~[neo4j-com-1.8.1.jar:1.8.1]
    ... 15 common frames omitted
Caused by: org.jboss.netty.handler.queue.BlockingReadTimeoutException: null
    at org.jboss.netty.handler.queue.BlockingReadHandler.readEvent(BlockingReadHandler.java:236) ~[org.apache.servicemix.bundles.netty-3.2.5.Final_1.jar:na]
    at org.jboss.netty.handler.queue.BlockingReadHandler.read(BlockingReadHandler.java:167) ~[org.apache.servicemix.bundles.netty-3.2.5.Final_1.jar:na]
    at org.neo4j.com.DechunkingChannelBuffer.readNext(DechunkingChannelBuffer.java:66) ~[neo4j-com-1.8.1.jar:1.8.1]
    ... 22 common frames omitted
2013-02-25 16:11:18,004 INFO [neo4j]: Read HA server:10.0.100.55:6001 (for machineID 1) from zoo keeper
2013-02-25 16:11:18,009 INFO [neo4j]: Checking compatibility mode, read 2 as all machines, 2 as myVersion machines. Based on that I return false

==== END OF LOG ====

==== data/log/neo4j-zookeeper.log ====

25-02-2013 16:06:54,300 EST INFO main-SendThread(10.0.100.55:2181) org.apache.zookeeper.ClientCnxn - Socket connection established to 10.0.100.55/10.0.100.55:2181, initiating session
25-02-2013 16:06:54,314 EST INFO main-SendThread(10.0.100.55:2181) org.apache.zookeeper.ClientCnxn - Session establishment complete on server 10.0.100.55/10.0.100.55:2181, sessionid = 0x13d12feeb020008, negotiated timeout = 6000
25-02-2013 16:06:54,488 EST INFO main org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=10.0.100.55:2181,10.0.100.19:2182,10.0.100.20:2183 sessionTimeout=59 watcher=org.neo4j.kernel.ha.zookeeper.ZooClient$WatcherImpl@55595f65
25-02-2013 16:06:54,495 EST INFO main-SendThread() org.apache.zookeeper.ClientCnxn - Opening socket connection to server /10.0.100.19:2182
25-02-2013 16:06:54,496 EST INFO main-SendThread(GCE-CLOUD-RND-SERVER-6.GCE2000.COM:2182) org.apache.zookeeper.ClientCnxn - Socket connection established to GCE-CLOUD-RND-SERVER-6.GCE2000.COM/10.0.100.19:2182, initiating session
25-02-2013 16:06:54,496 EST INFO NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182 org.apache.zookeeper.server.NIOServerCnxn - Accepted socket connection from /10.0.100.19:16246
25-02-2013 16:06:54,497 EST INFO NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182 org.apache.zookeeper.server.NIOServerCnxn - Client attempting to establish new session at /10.0.100.19:16246
25-02-2013 16:06:54,502 EST INFO CommitProcessor:2 org.apache.zookeeper.server.NIOServerCnxn - Established session 0x23d12fe3ba50004 with negotiated timeout 6000 for client /10.0.100.19:16246
25-02-2013 16:06:54,502 EST INFO main-SendThread(GCE-CLOUD-RND-SERVER-6.GCE2000.COM:2182) org.apache.zookeeper.ClientCnxn - Session establishment complete on server GCE-CLOUD-RND-SERVER-6.GCE2000.COM/10.0.100.19:2182, sessionid = 0x23d12fe3ba50004, negotiated timeout = 6000
25-02-2013 16:06:54,520 EST INFO ProcessThread:-1 org.apache.zookeeper.server.PrepRequestProcessor - Got user-level KeeperException when processing sessionid:0x23d12fe3ba50004 type:create cxid:0x4 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/1361715122199_1945648888875130468/ha-servers Error:KeeperErrorCode = NodeExists for /1361715122199_1945648888875130468/ha-servers
25-02-2013 16:06:54,536 EST INFO ProcessThread:-1 org.apache.zookeeper.server.PrepRequestProcessor - Got user-level KeeperException when processing sessionid:0x23d12fe3ba50004 type:create cxid:0x5 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/1361715122199_1945648888875130468/compatibility-1.8 Error:KeeperErrorCode = NodeExists for /1361715122199_1945648888875130468/compatibility-1.8
25-02-2013 16:06:55,097 EST INFO ProcessThread:-1 org.apache.zookeeper.server.PrepRequestProcessor - Processed session termination for sessionid: 0x23d12fe3ba50004
25-02-2013 16:06:55,102 EST INFO main org.apache.zookeeper.ZooKeeper - Session: 0x23d12fe3ba50004 closed
25-02-2013 16:06:55,102 EST INFO main-EventThread org.apache.zookeeper.ClientCnxn - EventThread shut down
25-02-2013 16:06:55,102 EST INFO NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182 org.apache.zookeeper.server.NIOServerCnxn - Closed socket connection for client /10.0.100.19:16246 which had sessionid 0x23d12fe3ba50004
25-02-2013 16:06:55,102 EST INFO main org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=10.0.100.55:2181,10.0.100.19:2182,10.0.100.20:2183 sessionTimeout=5000 watcher=org.neo4j.kernel.ha.zookeeper.ZooKeeperClusterClient$WatcherImpl@2ffe55fd
25-02-2013 16:06:55,104 EST INFO main-SendThread() org.apache.zookeeper.ClientCnxn - Opening socket connection to server /10.0.100.20:2183
25-02-2013 16:06:55,106 EST INFO main-SendThread(GCE-CLOUD-RND-SERVER-7.GCE2000.COM:2183) org.apache.zookeeper.ClientCnxn - Socket connection established to GCE-CLOUD-RND-SERVER-7.GCE2000.COM/10.0.100.20:2183, initiating session
25-02-2013 16:06:55,112 EST INFO main-SendThread(GCE-CLOUD-RND-SERVER-7.GCE2000.COM:2183) org.apache.zookeeper.ClientCnxn - Session establishment complete on server GCE-CLOUD-RND-SERVER-7.GCE2000.COM/10.0.100.20:2183, sessionid = 0x33d12fe3bad0002, negotiated timeout = 6000
25-02-2013 16:11:17,627 EST INFO main org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=10.0.100.55:2181,10.0.100.19:2182,10.0.100.20:2183 sessionTimeout=59 watcher=org.neo4j.kernel.ha.zookeeper.ZooClient$WatcherImpl@30518bfc
25-02-2013 16:11:17,628 EST INFO main-SendThread() org.apache.zookeeper.ClientCnxn - Opening socket connection to server /10.0.100.19:2182
25-02-2013 16:11:17,630 EST INFO main-SendThread(GCE-CLOUD-RND-SERVER-6.GCE2000.COM:2182) org.apache.zookeeper.ClientCnxn - Socket connection established to GCE-CLOUD-RND-SERVER-6.GCE2000.COM/10.0.100.19:2182, initiating session
25-02-2013 16:11:17,630 EST INFO NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182 org.apache.zookeeper.server.NIOServerCnxn - Accepted socket connection from /10.0.100.19:16271
25-02-2013 16:11:17,630 EST INFO NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182 org.apache.zookeeper.server.NIOServerCnxn - Client attempting to establish new session at /10.0.100.19:16271
25-02-2013 16:11:17,636 EST INFO main-SendThread(GCE-CLOUD-RND-SERVER-6.GCE2000.COM:2182) org.apache.zookeeper.ClientCnxn - Session establishment complete on server GCE-CLOUD-RND-SERVER-6.GCE2000.COM/10.0.100.19:2182, sessionid = 0x23d12fe3ba50005, negotiated timeout = 6000
25-02-2013 16:11:17,636 EST INFO CommitProcessor:2 org.apache.zookeeper.server.NIOServerCnxn - Established session 0x23d12fe3ba50005 with negotiated timeout 6000 for client /10.0.100.19:16271
25-02-2013 16:11:17,640 EST INFO ProcessThread:-1 org.apache.zookeeper.server.PrepRequestProcessor - Got user-level KeeperException when processing sessionid:0x23d12fe3ba50005 type:create cxid:0x4 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/1361715122199_1945648888875130468/ha-servers Error:KeeperErrorCode = NodeExists for /1361715122199_1945648888875130468/ha-servers
25-02-2013 16:11:17,646 EST INFO ProcessThread:-1 org.apache.zookeeper.server.PrepRequestProcessor - Got user-level KeeperException when processing sessionid:0x23d12fe3ba50005 type:create cxid:0x5 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/1361715122199_1945648888875130468/compatibility-1.8 Error:KeeperErrorCode = NodeExists for /1361715122199_1945648888875130468/compatibility-1.8
25-02-2013 16:11:18,636 EST WARN NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182 org.apache.zookeeper.server.NIOServerCnxn - EndOfStreamException: Unable to read additional data from client sessionid 0x23d12fe3ba50005, likely client has closed socket
25-02-2013 16:11:18,637 EST INFO NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182 org.apache.zookeeper.server.NIOServerCnxn - Closed socket connection for client /10.0.100.19:16271 which had sessionid 0x23d12fe3ba50005
25-02-2013 16:11:24,001 EST INFO SessionTracker org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x33d12fe3bad0002, timeout of 6000ms exceeded
25-02-2013 16:11:24,001 EST INFO SessionTracker org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x13d12feeb020008, timeout of 6000ms exceeded
25-02-2013 16:11:24,001 EST INFO ProcessThread:-1 org.apache.zookeeper.server.PrepRequestProcessor - Processed session termination for sessionid: 0x33d12fe3bad0002
25-02-2013 16:11:24,001 EST INFO ProcessThread:-1 org.apache.zookeeper.server.PrepRequestProcessor - Processed session termination for sessionid: 0x13d12feeb020008
25-02-2013 16:11:27,000 EST INFO SessionTracker org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x23d12fe3ba50005, timeout of 6000ms exceeded
25-02-2013 16:11:27,001 EST INFO ProcessThread:-1 org.apache.zookeeper.server.PrepRequestProcessor - Processed session termination for sessionid: 0x23d12fe3ba50005

==== END OF LOG ====

Has anyone come across this problem before? If so, how did they solve it?

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

John Lewis

unread,

Mar 1, 2013, 3:26:08 PM3/1/13

to ne...@googlegroups.com

Hey Michael,

I have moved the data to another server and have no problems scp a copy of the database and starting it that way but whenever I delete the database from the slave and start the salve up. I get this exception after about 25G of the directory has been transferred. I used several utilities like ping, iperf and iftop to monitor the network connection but none showed a slow connection, or a bandwdth bottleneck, or any packages being lost.

So question, what is this exception really saying? Is the connection really timing out? If so, is there a way I can increase the time out value? Also is a 523GB database too big to replicate in one start of the slave? All the configurations being used are default . The master stops and starts just fine. It is just a problem starting the slave with an empty datastore directory.

Sincerely,
John Lewis

Peter Neubauer

unread,

Mar 2, 2013, 1:41:47 PM3/2/13

to Neo4j User

John,

this should be investigated. Is there any chance we can look at that server? Otherwise, can you replicate it with fake data?

/peter

Cheers,

/peter neubauer

G: neubauer.peter
S: peter.neubauer
P: +46 704 106975
L: http://www.linkedin.com/in/neubauer
T: @peterneubauer

Graph database introduction book for the uninitiated - http://graphdatabases.com

Peter Neubauer

unread,

Mar 4, 2013, 3:04:27 AM3/4/13

to ne...@googlegroups.com

John,

any progress on this?

/peter

John Lewis

unread,

Mar 4, 2013, 9:05:26 AM3/4/13

to ne...@googlegroups.com

Hey Mr. Peter,

No progress yet. Don't know if I can give access to my servers yet. I need to talk to my managers. But here some addition information:

The data we are working with is the DBpedia NTriple Files. We initially inserted them into the Neo4J Community Edition using the bpedia4neo utility by Claudio Martella.

The datasets inserted are:

Ontology Infobox Types
Ontology Infobox Properties
Ontology Infobox Properties (Specific)
Titles
Short Abstracts
Extended Abstracts
Geographic Coordinates
Persondata
Articles Categories
Categories (Labels)
Categories (Skos)
IRI-same-as-URI links

We then moved the data store to the Neo4J 1.8.1 Enterprise Edition. Everything worked fined. But after some testing we decide to start a cluster and that is when we came across problems. A fake data set of about 1GB replicated fine but our dbpedia data set keeps failing during slave boot.

I ask again. Is there a configurable timeout value when it comes to replication? If so, how can I try to change it?

Sincerely,
John Lewis

John Lewis

unread,

Mar 4, 2013, 1:50:29 PM3/4/13

to ne...@googlegroups.com

Hey Mr. Peter & Mr. Michael,

We currently still don't know the the exact problem but we were able to start the slave and complete the replication by switching the switch the the servers were using. We will continue to investigating on our end to see what is occurring with that network switch that might be cause this problem. We might just use this work around for our testing purposes and possibly whenever we to production. Sorry for the inconvenience. Thanks for all y'all help.

Sincerely,
John Lewis

Michael Hunger

unread,

Mar 4, 2013, 10:11:32 PM3/4/13

to ne...@googlegroups.com

Great to know,

Any indication on the network/switch level what might be the root cause? Faulty hw?

Cheers

Sent from mobile device

--

Reply all

Reply to author

Forward

0 new messages