Cannot open channel to 2 at election address ERROR while starting zookeeper

2,448 views
Skip to first unread message

Check Peck

unread,
Feb 9, 2015, 12:47:26 PM2/9/15
to Jordan Zimmerman, exhibit...@googlegroups.com
I am trying to setup 5 node zookeeper ensemble manage through Exhibitor. I have 5 machines and on each machine I will be running exhibitor and zookeeper. Below is my zoo.cfg file which is generated by exhibitor.

    #Auto-generated by Exhibitor - Mon Feb 09 10:18:35 GMT-07:00 2015
    #Mon Feb 09 10:18:35 GMT-07:00 2015
    server.3=machineC.host.com\:2888\:3888
    server.2=machineB.host.com\:2888\:3888
    server.1=machineA.host.com\:2888\:3888
    initLimit=10
    syncLimit=5
    maxClientCnxns=21000
    clientPort=2181
    tickTime=2000
    dataDir=/opt/zookeeper/data
    dataLogDir=/opt/zookeeper/data
    server.5=machineD.host.com\:2888\:3888
    server.4=machineE.host.com\:2888\:3888
   
As soon as I am starting zookeeper throughg Exhibitor config pannel, I can see all the five machines in my control panel but they all are yellow which means "ZooKeeper is running, but can’t communicate with the rest of the ensemble" and in my Exhbitor logs, I am seeing these which has some ERROR in it.

    dev
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  Exhibitor started [main]
    INFO  org.mortbay.log  Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog [main]
    INFO  org.mortbay.log  jetty-6.1.x [main]
    INFO  org.mortbay.log  Started SocketC...@0.0.0.0:8080 [main]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  State: not serving [ActivityQueue-0]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper down/not-serving waiting 30004 of 40000 ms before restarting [ActivityQueue-0]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  Restarting down/not-serving ZooKeeper after 60008 ms pause [ActivityQueue-0]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  Attempting to stop instance [ActivityQueue-0]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  Attempting to start/restart ZooKeeper [ActivityQueue-0]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  Kill attempted result: 0 [ActivityQueue-0]
    ERROR com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper Server: JMX enabled by default [pool-2-thread-1]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper Server: -Xmx2048m -Djava.net.preferIPv4Stack=true [pool-2-thread-2]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  Process started via: /opt/zookeeper/zookeeper-3.4.6/bin/zkServer.sh [ActivityQueue-0]
    ERROR com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper Server: Using config: /opt/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg [pool-2-thread-1]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper Server: Starting zookeeper ... STARTED [pool-2-thread-2]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper down/not-serving waiting 30005 of 40000 ms before restarting [ActivityQueue-0]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  Restarting down/not-serving ZooKeeper after 60008 ms pause [ActivityQueue-0]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  Attempting to stop instance [ActivityQueue-0]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  Attempting to start/restart ZooKeeper [ActivityQueue-0]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  Kill attempted result: 0 [ActivityQueue-0]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  Process started via: /opt/zookeeper/zookeeper-3.4.6/bin/zkServer.sh [ActivityQueue-0]
    ERROR com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper Server: JMX enabled by default [pool-2-thread-1]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper Server: -Xmx2048m -Djava.net.preferIPv4Stack=true [pool-2-thread-2]
    ERROR com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper Server: Using config: /opt/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg [pool-2-thread-1]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper Server: Starting zookeeper ... STARTED [pool-2-thread-2]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper down/not-serving waiting 30004 of 40000 ms before restarting [ActivityQueue-0]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  Restarting down/not-serving ZooKeeper after 60014 ms pause [ActivityQueue-0]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  Attempting to stop instance [ActivityQueue-0]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  Attempting to start/restart ZooKeeper [ActivityQueue-0]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  Kill attempted result: 0 [ActivityQueue-0]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  Process started via: /opt/zookeeper/zookeeper-3.4.6/bin/zkServer.sh [ActivityQueue-0]
    ERROR com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper Server: JMX enabled by default [pool-2-thread-3]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper Server: -Xmx2048m -Djava.net.preferIPv4Stack=true [pool-2-thread-2]
    ERROR com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper Server: Using config: /opt/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg [pool-2-thread-3]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper Server: Starting zookeeper ... STARTED [pool-2-thread-2]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper down/not-serving waiting 30005 of 40000 ms before restarting [ActivityQueue-0]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  Restarting down/not-serving ZooKeeper after 60008 ms pause [ActivityQueue-0]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  Attempting to stop instance [ActivityQueue-0]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  Attempting to start/restart ZooKeeper [ActivityQueue-0]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  Kill attempted result: 0 [ActivityQueue-0]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  Process started via: /opt/zookeeper/zookeeper-3.4.6/bin/zkServer.sh [ActivityQueue-0]
    ERROR com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper Server: JMX enabled by default [pool-2-thread-2]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper Server: -Xmx2048m -Djava.net.preferIPv4Stack=true [pool-2-thread-3]
    ERROR com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper Server: Using config: /opt/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg [pool-2-thread-2]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper Server: Starting zookeeper ... STARTED [pool-2-thread-3]
    INFO  com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper down/not-serving waiting 30004 of 40000 ms before restarting [ActivityQueue-0]

And in my zookeeper logs, I am seeing these -

    2015-02-09 00:11:19,355 [myid:] - INFO  [main:QuorumPeerConfig@103] - Reading configuration from: /opt/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
    2015-02-09 00:11:19,365 [myid:] - INFO  [main:QuorumPeerConfig@340] - Defaulting to majority quorums
    2015-02-09 00:11:19,368 [myid:1] - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
    2015-02-09 00:11:19,368 [myid:1] - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
    2015-02-09 00:11:19,369 [myid:1] - INFO  [main:DatadirCleanupManager@101] - Purge task is not scheduled.
    2015-02-09 00:11:19,379 [myid:1] - INFO  [main:QuorumPeerMain@127] - Starting quorum peer
    2015-02-09 00:11:19,397 [myid:1] - INFO  [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:2181
    2015-02-09 00:11:19,414 [myid:1] - INFO  [main:QuorumPeer@959] - tickTime set to 2000
    2015-02-09 00:11:19,414 [myid:1] - INFO  [main:QuorumPeer@979] - minSessionTimeout set to -1
    2015-02-09 00:11:19,414 [myid:1] - INFO  [main:QuorumPeer@990] - maxSessionTimeout set to -1
    2015-02-09 00:11:19,414 [myid:1] - INFO  [main:QuorumPeer@1005] - initLimit set to 10
    2015-02-09 00:11:19,431 [myid:1] - INFO  [Thread-1:QuorumCnxManager$Listener@504] - My election bind port: machineA.host.com/127.0.1.1:3888
    2015-02-09 00:11:19,440 [myid:1] - INFO  [QuorumPeer[myid=1]/0.0.0.0:2181:QuorumPeer@714] - LOOKING
    2015-02-09 00:11:19,441 [myid:1] - INFO  [QuorumPeer[myid=1]/0.0.0.0:2181:FastLeaderElection@815] - New election. My id =  1, proposed zxid=0x0
    2015-02-09 00:11:19,443 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 1 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)
    2015-02-09 00:11:19,445 [myid:1] - WARN  [WorkerSender[myid=1]:QuorumCnxManager@382] - Cannot open channel to 2 at election address machineB.host.com/10.52.81.211:3888
    java.net.ConnectException: Connection refused
            at java.net.PlainSocketImpl.socketConnect(Native Method)
            at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
            at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
            at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
            at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
            at java.net.Socket.connect(Socket.java:546)
            at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
            at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
            at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
            at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
            at java.lang.Thread.run(Thread.java:679)
    2015-02-09 00:11:19,449 [myid:1] - WARN  [WorkerSender[myid=1]:QuorumCnxManager@382] - Cannot open channel to 3 at election address machineC.host.com/10.57.78.941:3888
    java.net.ConnectException: Connection refused
            at java.net.PlainSocketImpl.socketConnect(Native Method)
            at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
            at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
            at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
            at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
            at java.net.Socket.connect(Socket.java:546)
            at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
            at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
            at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
            at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
            at java.lang.Thread.run(Thread.java:679)
    2015-02-09 00:11:19,450 [myid:1] - WARN  [WorkerSender[myid=1]:QuorumCnxManager@382] - Cannot open channel to 4 at election address machineD.host.com/10.59.576.12:3888
   
I am running Exhibitor 1.5.3 and Zookeeper 3.4.6. Is there anything wrong I am doing? I have googled it for this ERROR and I was not able to find anything concrete. I have also verified that it is able to generate myid successfully in each machine.

Jordan Zimmerman

unread,
Feb 9, 2015, 12:48:16 PM2/9/15
to Check Peck, exhibit...@googlegroups.com
Sounds like a firewall/security group issue. Make sure 2181/2888/3888 are all open.

Check Peck

unread,
Feb 9, 2015, 12:51:12 PM2/9/15
to Jordan Zimmerman, exhibit...@googlegroups.com
I see. The only thing which zookeeper is having issues is with election port which is 3888 I guess by looking at the logs. Let me talk to security folks in our company to open the firewall on these port if it is indeed blocked.

Check Peck

unread,
Feb 9, 2015, 7:15:36 PM2/9/15
to Jordan Zimmerman, exhibit...@googlegroups.com
Hi Jordan,

It looks like all those ports are opened on my five machines and there is no firewall issue. I got the confirmation from our security team. Anything else should I try out?

Check Peck

unread,
Feb 10, 2015, 12:10:56 PM2/10/15
to Jordan Zimmerman, exhibit...@googlegroups.com
Hi Jordan,

Is this known issue? I have seen other people also having same issue after I search on the google?

Jordan Zimmerman

unread,
Feb 10, 2015, 12:14:21 PM2/10/15
to Check Peck, exhibit...@googlegroups.com
You might ask on the ZK user group. I don’t think this is an Exhibitor specific issue.
Reply all
Reply to author
Forward
0 new messages