Plugin stop connecting to Zookeeper during partition

44 views
Skip to first unread message

Bruno Bonacci

unread,
Feb 11, 2016, 10:20:43 AM2/11/16
to elasticsearch-zookeeper

Hi,
I had the following case running ELS.1.6.2 with Zookeeper 3.4.6 and the els-zookeeper plugin 1.6.1.

It looks like due to a network partition the plugin lost the connection to Zookeeper, and after trying to reconnect it got a UnknownHostException which presumably is due to the same partition,
but rather than pausing and retrying after a while the plugin just gave up and left the cluster.

I've added here the logs and highlighted the relevant parts.
 
---
[2016-02-10 06:15:35,746][INFO ][org.apache.zookeeper.ClientCnxn] Client session timed out, have not heard from server in 26679ms for sessionid 0x34f3ad0a3294272, closing socket connection and attempting
reconnect
[2016-02-10 06:15:35,943][INFO ][org.apache.zookeeper.ClientCnxn] Opening socket connection to server ip-10-10-1-5.eu-west-1.compute.internal/10.10.1.5:2181. Will not attempt to authenticate using SASL (u
nknown error
)
[2016-02-10 06:15:35,944][INFO ][org.apache.zookeeper.ClientCnxn] Socket connection established to ip-10-10-1-5.eu-west-1.compute.internal/10.10.1.5:2181, initiating session
[2016-02-10 06:15:35,953][INFO ][org.apache.zookeeper.ClientCnxn] Unable to reconnect to ZooKeeper service, session 0x34f3ad0a3294272 has expired, closing socket connection
[2016-02-10 06:15:35,961][INFO ][com.sonian.elasticsearch.zookeeper.client.ZooKeeperClientService] [Omen] Restarting ZooKeeper discovery
[2016-02-10 06:15:35,961][INFO ][org.apache.zookeeper.ZooKeeper] Initiating client connection, connectString=zookeeper.service.consul:2181 sessionTimeout=60000 watcher=com.sonian.elasticsearch.zookeeper.client.ZooKeeperClientService$1@463dc8
[2016-02-10 06:15:35,999][ERROR][org.apache.zookeeper.ClientCnxn] Caught unexpected throwable
org
.elasticsearch.ElasticsearchException: Cannot start ZooKeeper
        at com
.sonian.elasticsearch.zookeeper.client.ZooKeeperFactory.newZooKeeper(ZooKeeperFactory.java:61)
        at com
.sonian.elasticsearch.zookeeper.client.ZooKeeperClientService.doStart(ZooKeeperClientService.java:91)
        at com
.sonian.elasticsearch.zookeeper.client.ZooKeeperClientService$19.processResult(ZooKeeperClientService.java:517)
        at org
.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:609)
        at org
.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: java.net.UnknownHostException: zookeeper.service.consul: unknown error
        at java
.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
        at java
.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:907)
        at java
.net.InetAddress.getAddressesFromNameService(InetAddress.java:1302)
        at java
.net.InetAddress.getAllByName0(InetAddress.java:1255)
        at java
.net.InetAddress.getAllByName(InetAddress.java:1171)
        at java
.net.InetAddress.getAllByName(InetAddress.java:1105)
        at org
.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)
        at org
.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
        at org
.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)
        at com
.sonian.elasticsearch.zookeeper.client.ZooKeeperFactory.newZooKeeper(ZooKeeperFactory.java:59)
       
... 4 more
[2016-02-10 06:15:35,999][INFO ][org.apache.zookeeper.ClientCnxn] EventThread shut down

---

The main reason for using this plugin is to get a robust solution in presence of network partitions,
and it seems to me that this bug negates the benefits of the tool itself.
Is there any chance to get this issue fixed even in a recent ELS version?

Bruno

Reply all
Reply to author
Forward
0 new messages