Hi Team,
We found tons of exception logs thrown out on one of our druid historical node about zookeeper connection. we've found this issues days back, but after we restarted the problematic host, it now happened again. While the other historical node has no such issue with the same configurations.
The exceptions are like below:
2014-11-02 19:44:44,607 INFO [ServerInventoryView-0-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
2014-11-02 19:44:44,607 WARN [ServerInventoryView-0-EventThread] org.apache.curator.framework.state.ConnectionStateManager - ConnectionStateManager queue full - dropping events to make room
2014-11-02 19:44:44,607 ERROR [ServerInventoryView-0-EventThread] org.apache.curator.framework.imps.CuratorFrameworkImpl - Background operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:695)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:496)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl$2.processResult(GetChildrenBuilderImpl.java:166)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:593)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2014-11-02 19:44:44,607 ERROR [ServerInventoryView-0-EventThread] org.apache.curator.framework.imps.CuratorFrameworkImpl - Background operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:695)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:496)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl$2.processResult(GetChildrenBuilderImpl.java:166)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:593)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2014-11-02 19:44:44,607 ERROR [ServerInventoryView-0-EventThread] org.apache.curator.framework.imps.CuratorFrameworkImpl - Background operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:695)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:496)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl$2.processResult(GetChildrenBuilderImpl.java:166)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:593)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2014-11-02 19:44:44,607 ERROR [ServerInventoryView-0-EventThread] org.apache.curator.framework.imps.CuratorFrameworkImpl - Background operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:695)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:496)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl$2.processResult(GetChildrenBuilderImpl.java:166)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:593)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2014-11-02 19:44:44,607 INFO [ServerInventoryView-0-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: LOST
2014-11-02 19:44:44,607 WARN [ServerInventoryView-0-EventThread] org.apache.curator.framework.state.ConnectionStateManager - ConnectionStateManager queue full - dropping events to make room
2014-11-02 19:44:44,607 ERROR [ServerInventoryView-0-EventThread] org.apache.curator.framework.imps.CuratorFrameworkImpl - Background operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:695)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:496)
at org.apache.curator.framework.imps.BackgroundSyncImpl$1.processResult(BackgroundSyncImpl.java:50)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:609)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2014-11-02 19:44:44,607 ERROR [ServerInventoryView-0-EventThread] org.apache.curator.framework.imps.CuratorFrameworkImpl - Background operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:695)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:496)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl$2.processResult(GetChildrenBuilderImpl.java:166)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:593)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
the zookeeper version we are running is 3.4.6. Is this a druid issue or a zookeeper/curator one?
I found these exceptions similar to
https://groups.google.com/forum/#!topic/druid-development/BtOyYgwcDjQ but not exact the same and I could not find 'reinstating' information in the log files for the past 12+ hours.
Could you please help? zipped log file attached.
thanks,
xulu