I am trying to setup 5 node zookeeper ensemble manage through Exhibitor. I have 5 machines and on each machine I will be running exhibitor and zookeeper. Below is my zoo.cfg file which is generated by exhibitor.
#Auto-generated by Exhibitor - Mon Feb 09 10:18:35 GMT-07:00 2015
#Mon Feb 09 10:18:35 GMT-07:00 2015
server.3=
machineC.host.com\:2888\:3888
server.2=
machineB.host.com\:2888\:3888
server.1=
machineA.host.com\:2888\:3888
initLimit=10
syncLimit=5
maxClientCnxns=21000
clientPort=2181
tickTime=2000
dataDir=/opt/zookeeper/data
dataLogDir=/opt/zookeeper/data
server.5=
machineD.host.com\:2888\:3888
server.4=
machineE.host.com\:2888\:3888
As soon as I am starting zookeeper throughg Exhibitor config pannel, I can see all the five machines in my control panel but they all are yellow which means "ZooKeeper is running, but can’t communicate with the rest of the ensemble" and in my Exhbitor logs, I am seeing these which has some ERROR in it.
dev
INFO com.netflix.exhibitor.core.activity.ActivityLog Exhibitor started [main]
INFO org.mortbay.log Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog [main]
INFO org.mortbay.log jetty-6.1.x [main]
INFO org.mortbay.log Started
SocketC...@0.0.0.0:8080 [main]
INFO com.netflix.exhibitor.core.activity.ActivityLog State: not serving [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper down/not-serving waiting 30004 of 40000 ms before restarting [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog Restarting down/not-serving ZooKeeper after 60008 ms pause [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog Attempting to stop instance [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog Attempting to start/restart ZooKeeper [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog Kill attempted result: 0 [ActivityQueue-0]
ERROR com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper Server: JMX enabled by default [pool-2-thread-1]
INFO com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper Server: -Xmx2048m -Djava.net.preferIPv4Stack=true [pool-2-thread-2]
INFO com.netflix.exhibitor.core.activity.ActivityLog Process started via: /opt/zookeeper/zookeeper-3.4.6/bin/zkServer.sh [ActivityQueue-0]
ERROR com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper Server: Using config: /opt/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg [pool-2-thread-1]
INFO com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper Server: Starting zookeeper ... STARTED [pool-2-thread-2]
INFO com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper down/not-serving waiting 30005 of 40000 ms before restarting [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog Restarting down/not-serving ZooKeeper after 60008 ms pause [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog Attempting to stop instance [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog Attempting to start/restart ZooKeeper [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog Kill attempted result: 0 [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog Process started via: /opt/zookeeper/zookeeper-3.4.6/bin/zkServer.sh [ActivityQueue-0]
ERROR com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper Server: JMX enabled by default [pool-2-thread-1]
INFO com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper Server: -Xmx2048m -Djava.net.preferIPv4Stack=true [pool-2-thread-2]
ERROR com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper Server: Using config: /opt/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg [pool-2-thread-1]
INFO com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper Server: Starting zookeeper ... STARTED [pool-2-thread-2]
INFO com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper down/not-serving waiting 30004 of 40000 ms before restarting [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog Restarting down/not-serving ZooKeeper after 60014 ms pause [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog Attempting to stop instance [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog Attempting to start/restart ZooKeeper [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog Kill attempted result: 0 [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog Process started via: /opt/zookeeper/zookeeper-3.4.6/bin/zkServer.sh [ActivityQueue-0]
ERROR com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper Server: JMX enabled by default [pool-2-thread-3]
INFO com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper Server: -Xmx2048m -Djava.net.preferIPv4Stack=true [pool-2-thread-2]
ERROR com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper Server: Using config: /opt/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg [pool-2-thread-3]
INFO com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper Server: Starting zookeeper ... STARTED [pool-2-thread-2]
INFO com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper down/not-serving waiting 30005 of 40000 ms before restarting [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog Restarting down/not-serving ZooKeeper after 60008 ms pause [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog Attempting to stop instance [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog Attempting to start/restart ZooKeeper [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog Kill attempted result: 0 [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog Process started via: /opt/zookeeper/zookeeper-3.4.6/bin/zkServer.sh [ActivityQueue-0]
ERROR com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper Server: JMX enabled by default [pool-2-thread-2]
INFO com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper Server: -Xmx2048m -Djava.net.preferIPv4Stack=true [pool-2-thread-3]
ERROR com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper Server: Using config: /opt/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg [pool-2-thread-2]
INFO com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper Server: Starting zookeeper ... STARTED [pool-2-thread-3]
INFO com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper down/not-serving waiting 30004 of 40000 ms before restarting [ActivityQueue-0]
And in my zookeeper logs, I am seeing these -
2015-02-09 00:11:19,355 [myid:] - INFO [main:QuorumPeerConfig@103] - Reading configuration from: /opt/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
2015-02-09 00:11:19,365 [myid:] - INFO [main:QuorumPeerConfig@340] - Defaulting to majority quorums
2015-02-09 00:11:19,368 [myid:1] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2015-02-09 00:11:19,368 [myid:1] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
2015-02-09 00:11:19,369 [myid:1] - INFO [main:DatadirCleanupManager@101] - Purge task is not scheduled.
2015-02-09 00:11:19,379 [myid:1] - INFO [main:QuorumPeerMain@127] - Starting quorum peer
2015-02-09 00:11:19,397 [myid:1] - INFO [main:NIOServerCnxnFactory@94] - binding to port
0.0.0.0/0.0.0.0:2181 2015-02-09 00:11:19,414 [myid:1] - INFO [main:QuorumPeer@959] - tickTime set to 2000
2015-02-09 00:11:19,414 [myid:1] - INFO [main:QuorumPeer@979] - minSessionTimeout set to -1
2015-02-09 00:11:19,414 [myid:1] - INFO [main:QuorumPeer@990] - maxSessionTimeout set to -1
2015-02-09 00:11:19,414 [myid:1] - INFO [main:QuorumPeer@1005] - initLimit set to 10
2015-02-09 00:11:19,431 [myid:1] - INFO [Thread-1:QuorumCnxManager$Listener@504] - My election bind port:
machineA.host.com/127.0.1.1:3888 2015-02-09 00:11:19,440 [myid:1] - INFO [QuorumPeer[myid=1]/0.0.0.0:2181:QuorumPeer@714] - LOOKING
2015-02-09 00:11:19,441 [myid:1] - INFO [QuorumPeer[myid=1]/0.0.0.0:2181:FastLeaderElection@815] - New election. My id = 1, proposed zxid=0x0
2015-02-09 00:11:19,443 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 1 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)
2015-02-09 00:11:19,445 [myid:1] - WARN [WorkerSender[myid=1]:QuorumCnxManager@382] - Cannot open channel to 2 at election address
machineB.host.com/10.52.81.211:3888 java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
at java.net.Socket.connect(Socket.java:546)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
at java.lang.Thread.run(Thread.java:679)
2015-02-09 00:11:19,449 [myid:1] - WARN [WorkerSender[myid=1]:QuorumCnxManager@382] - Cannot open channel to 3 at election address
machineC.host.com/10.57.78.941:3888 java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
at java.net.Socket.connect(Socket.java:546)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
at java.lang.Thread.run(Thread.java:679)
2015-02-09 00:11:19,450 [myid:1] - WARN [WorkerSender[myid=1]:QuorumCnxManager@382] - Cannot open channel to 4 at election address
machineD.host.com/10.59.576.12:3888 I am running Exhibitor 1.5.3 and Zookeeper 3.4.6. Is there anything wrong I am doing? I have googled it for this ERROR and I was not able to find anything concrete. I have also verified that it is able to generate myid successfully in each machine.