I've noticed I am able to reproduce the cascading connection timeouts very reliably by setting break points in my code, running in debug mode, and pausing execution on one of them for a surprisingly short time. I've also noticed this can happen during periods of high load on my servers and I am wondering if the two are related.
Thankfully since 1.3.0 it recovers when load goes down or when I resume from a break point.
Now, I do the vast majority of my curatorFramework.getData() or getChildren() operations synchronously. I'm wondering if this is the best way to go. Could I avoid the timeout issues during times of high load by using asynchronous call with callback function or even by doing the call out to zookeeper in its own thread? Could a high syncLimit setting diminish these issues? What are appropriate values for syncLimit? Is the default sufficient for most situations?
I'm interested in any thoughts anyone might have about this.
Thanks,
Matthew
PS - This is an example of the kind of log statement I sometimes see repeatedly in the logs for my app:
...
2013-01-25 13:09:04 | ERROR | | main-EventThread | com.netflix.curator.ConnectionState | Connection timed out for connection string (11.120.101.157:2181,11.120.101.219:2181,11.120.101.155:2181) and timeout (60000) / elapsed (3833841)org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLossat com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:101)at com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:107)at com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:413)at com.netflix.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:301)at com.netflix.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:290)at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:106)at com.netflix.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:286)at com.netflix.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:278)at com.netflix.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:40)