I am having problems setting up a simple master-slave relationship between two computers over a vpn. I understand that 3 computers are a recommended minimum, but I believe this is not the issue I am facing.
Any advice on what I am doing wrong is very much appreciated!
Step 0)
Master(1) can ping Slave(2) on 10.0.0.2
Slave(2) can ping Master(1) on 10.0.0.1
Step 1) I start Zookeeper on the Master(1) using this configuration:
===conf/coord.cfg===
tickTime=3000
initLimit=20
syncLimit=10
server.1=127.0.0.1:2888:3888
server.2=10.0.0.2:2889:3889
dataDir=data/coordinator/
clientPort=2181
===data/coordinator/myid===
1
Step 2) I start Zookeeper on the Slave(2) using this configuration:
===conf/coord.cfg===
tickTime=3000
initLimit=20
syncLimit=10
server.1=10.0.0.1:2888:3888
server.2=127.0.0.1:2889:3889
dataDir=data/coordinator/
clientPort=2182
===data/coordinator/myid===
2
Step 3) I confirm that Zookeeper is communicating correctly:
Master(1):
===data/log/neo4j-zookeeper.log===
08-08-2012 23:42:21,771 PDT INFO QuorumPeer:/0:0:0:0:0:0:0:0:2181 org.apache.zookeeper.server.ZooKeeperServer - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 60000 datadir data/coordinator/version-2 snapdir data/coordinator/version-2
08-08-2012 23:42:21,782 PDT INFO WorkerReceiver Thread org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: 2 (n.leader), 0 (n.zxid), 1 (n.round), LEADING (n.state), 2 (n.sid), FOLLOWING (my state)
08-08-2012 23:42:21,875 PDT INFO QuorumPeer:/0:0:0:0:0:0:0:0:2181 org.apache.zookeeper.server.quorum.Learner - Getting a snapshot from leader
08-08-2012 23:42:21,878 PDT INFO QuorumPeer:/0:0:0:0:0:0:0:0:2181 org.apache.zookeeper.server.quorum.Learner - Setting leader epoch 1
08-08-2012 23:42:21,891 PDT INFO QuorumPeer:/0:0:0:0:0:0:0:0:2181 org.apache.zookeeper.server.persistence.FileTxnSnapLog - Snapshotting: 100000000
Slave(2):
===data/log/neo4j-zookeeper.log===
09-08-2012 01:42:17,813 CDT INFO QuorumPeer:/0:0:0:0:0:0:0:0:2182 org.apache.zookeeper.server.ZooKeeperServer - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 60000 datadir data/coordinator/version-2 snapdir data/coordinator/version-2
09-08-2012 01:42:17,817 CDT INFO QuorumPeer:/0:0:0:0:0:0:0:0:2182 org.apache.zookeeper.server.persistence.FileTxnSnapLog - Snapshotting: 0
09-08-2012 01:42:17,886 CDT INFO LearnerHandler-/
10.0.0.1:43712 org.apache.zookeeper.server.quorum.LearnerHandler - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@594bfbf1
09-08-2012 01:42:17,887 CDT WARN LearnerHandler-/
10.0.0.1:43712 org.apache.zookeeper.server.quorum.LearnerHandler - Sending snapshot last zxid of peer is 0x0 zxid of leader is 0x100000000
09-08-2012 01:42:17,936 CDT WARN LearnerHandler-/
10.0.0.1:43712 org.apache.zookeeper.server.quorum.Leader - Commiting zxid 0x100000000 from /
10.0.0.2:2889 not first!
09-08-2012 01:42:17,936 CDT WARN LearnerHandler-/
10.0.0.1:43712 org.apache.zookeeper.server.quorum.Leader - First is 0
09-08-2012 01:42:17,936 CDT INFO LearnerHandler-/
10.0.0.1:43712 org.apache.zookeeper.server.quorum.Leader - Have quorum of supporters; starting up and setting last processed zxid: 4294967296
Step 4) I start the Neo4j (Embedded!) server on the Master using this configuration:
===neo4j.properties===
enable_online_backup=true
ha.server_id=1
ha.cluster_name = blockviewer.cluster
ha.pull_interval = 10
enable_remote_shell=true
Step 5) I attempt to start the Neo4j (Stand-Alone!) server on the Slave(2) using this configuration. Please note I have forced this node to be a slave:
===conf/neo4j.properties===
enable_online_backup=true
ha.server_id=2
ha.slave_coordinator_update_mode=none
ha.cluster_name = blockviewer.cluster
ha.pull_interval = 10
Step 6) After trying to start this Slave(2), both the Master(1) and the Slave(2) fail to start.
Slave(2):
===data/log/neo4j.0.0.log===
Aug 09, 2012 1:23:32 AM org.neo4j.server.logging.Logger log
SEVERE:
java.lang.RuntimeException: Tried to join the cluster, but was unable to
at org.neo4j.kernel.HighlyAvailableGraphDatabase.start(HighlyAvailableGraphDatabase.java:603)
at org.neo4j.kernel.HighlyAvailableGraphDatabase.<init>(HighlyAvailableGraphDatabase.java:265)
at org.neo4j.kernel.HighlyAvailableGraphDatabase.<init>(HighlyAvailableGraphDatabase.java:185)
at org.neo4j.server.enterprise.EnterpriseNeoServerBootstrapper$DatabaseMode$2.createDatabase(EnterpriseNeoServerBootstrapper.java:54)
at org.neo4j.server.database.Database.createDatabase(Database.java:81)
at org.neo4j.server.database.Database.<init>(Database.java:64)
at org.neo4j.server.NeoServerWithEmbeddedWebServer.startDatabase(NeoServerWithEmbeddedWebServer.java:175)
at org.neo4j.server.NeoServerWithEmbeddedWebServer.start(NeoServerWithEmbeddedWebServer.java:93)
at org.neo4j.server.Bootstrapper.start(Bootstrapper.java:87)
at org.neo4j.server.advanced.AdvancedNeoServerBootstrapper.start(AdvancedNeoServerBootstrapper.java:37)
at org.neo4j.server.Bootstrapper.main(Bootstrapper.java:52)
Caused by: java.lang.RuntimeException: Gave up trying to copy store from master
at org.neo4j.kernel.HighlyAvailableGraphDatabase.getFreshDatabaseFromMaster(HighlyAvailableGraphDatabase.java:493)
at org.neo4j.kernel.HighlyAvailableGraphDatabase.start(HighlyAvailableGraphDatabase.java:578)
... 10 more
Caused by: org.neo4j.kernel.ha.zookeeper.NoMasterException: No master
at org.neo4j.kernel.ha.zookeeper.AbstractZooKeeperManager$1.noMasterException(AbstractZooKeeperManager.java:474)
at org.neo4j.kernel.ha.zookeeper.AbstractZooKeeperManager$1.copyStore(AbstractZooKeeperManager.java:504)
at org.neo4j.kernel.HighlyAvailableGraphDatabase.copyStoreFromMaster(HighlyAvailableGraphDatabase.java:776)
at org.neo4j.kernel.HighlyAvailableGraphDatabase.getFreshDatabaseFromMaster(HighlyAvailableGraphDatabase.java:478)
... 11 more
Aug 09, 2012 1:23:32 AM org.neo4j.server.logging.Logger log
SEVERE: Failed to start Neo Server on port [7474]
Aug 09, 2012 1:24:30 AM org.neo4j.server.logging.Logger log
INFO: Store files missing, or not in suitable state for upgrade. Leaving this problem for main server process to resolve.
Aug 09, 2012 1:24:31 AM org.neo4j.server.logging.Logger log
INFO: Starting Neo Server on port [7474] with [80] threads available
Aug 09, 2012 1:24:31 AM org.neo4j.server.logging.Logger log
INFO: Enabling HTTPS on port [7473]
Aug 09, 2012 1:24:31 AM org.mortbay.log.Slf4jLog info
INFO: Logging to org.slf4j.impl.JDK14LoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
Aug 09, 2012 1:24:31 AM org.neo4j.server.logging.Logger log
INFO: Using database at /opt/neo4j-enterprise-1.7.2/data/graph.db
Master(1):
(Printed to console)
java.nio.channels.ClosedChannelException
at org.jboss.netty.channel.socket.nio.NioWorker.cleanUpWriteBuffer(NioWorker.java:623)
at org.jboss.netty.channel.socket.nio.NioWorker.close(NioWorker.java:599)
at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:119)
at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:76)
at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:60)
at org.jboss.netty.channel.Channels.close(Channels.java:720)
at org.jboss.netty.channel.AbstractChannel.close(AbstractChannel.java:200)
at org.neo4j.com.Server$ServerHandler.exceptionCaught(Server.java:237)
at org.jboss.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:238)
at org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:432)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:331)
at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:218)
at sun.nio.ch.IOUtil.read(IOUtil.java:186)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:359)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:321)
at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
The Slave(2) fails to start (BAD) and the Master(1) throws that error. Any thoughts would be very much appreciated!