Hi,
When I am doing an intensive query, I noticed in the compute nodes log that following are scrolling:
2013-10-31 20:32:34,188 INFO [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: RECONNECTED
2013-10-31 20:32:34,188 INFO [ZkCoordinator-0] com.metamx.druid.coordination.ZkCoordinator - Ignoring event[PathChildrenCacheEvent{type=CONNECTION_RECONNECTED, data=null}]
2013-10-31 20:32:34,304 INFO [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
2013-10-31 20:32:34,304 INFO [ZkCoordinator-0] com.metamx.druid.coordination.ZkCoordinator - Ignoring event[PathChildrenCacheEvent{type=CONNECTION_SUSPENDED, data=null}]
and
2013-10-31 20:40:04,557 ERROR [CuratorFramework-0] org.apache.curator.ConnectionState - Connection timed out for connection string (
10.5.11.24:2181) and timeout (15000) / elapsed (23693)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:191)
at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:86)
at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:108)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:455)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl.performBackgroundOperation(GetChildrenBuilderImpl.java:175)
at org.apache.curator.framework.imps.OperationAndData.callPerformBackgroundOperation(OperationAndData.java:57)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:659)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:651)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:54)
at org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:242)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
May I ask what this meant? Latency is too high to connect to zookeeper? One thing worth to mention is that the druid cluster we set up is in california and the zookeeper is in virginia. Would it be the fix to have a new zookeeper around or I'm missing something else in the config?
Thanks,
Rui