ConnectionTimeout and debugging

479 views
Skip to first unread message

mattd...@gmail.com

unread,
Jan 28, 2013, 4:43:21 PM1/28/13
to curato...@googlegroups.com
I've noticed I am able to reproduce the cascading connection timeouts very reliably by setting break points in my code, running in debug mode, and pausing execution on one of them for a surprisingly short time.  I've also noticed this can happen during periods of high load on my servers and I am wondering if the two are related.

Thankfully since 1.3.0 it recovers when load goes down or when I resume from a break point.

Now, I do the vast majority of my curatorFramework.getData() or getChildren() operations synchronously.  I'm wondering if this is the best way to go.  Could I avoid the timeout issues during times of high load by using asynchronous call with callback function or even by doing the call out to zookeeper in its own thread?  Could a high syncLimit setting diminish these issues?  What are appropriate values for syncLimit?  Is the default sufficient for most situations?

I'm interested in any thoughts anyone might have about this.

Thanks,

Matthew

PS - This is an example of the kind of log statement I sometimes see repeatedly in the logs for my app:

...
2013-01-25 13:09:04 | ERROR | | main-EventThread | com.netflix.curator.ConnectionState | Connection timed out for connection string (11.120.101.157:2181,11.120.101.219:2181,11.120.101.155:2181) and timeout (60000) / elapsed (3833841)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:101)
at com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:107)
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:413)
at com.netflix.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:301)
at com.netflix.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:290)
at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:106)
at com.netflix.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:286)
at com.netflix.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:278)
at com.netflix.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:40)

Jordan Zimmerman

unread,
Jan 28, 2013, 5:10:38 PM1/28/13
to curato...@googlegroups.com
ZooKeeper maintains heartbeats with the server. When you set a breakpoint the heartbeat can fail and the server will kill the connection. The heartbeat is a function of the session timeout. From the ZK code:

readTimeout = sessionTimeout * 2 / 3;

So, you can increase the sessionTimeout to a large number if you want to safely set breakpoints.

-Jordan 

--
You received this message because you are subscribed to the Google Groups "curator-users" group.
To post to this group, send email to curato...@googlegroups.com.
To unsubscribe from this group, send email to curator-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg/curator-users/-/T8TRp3FUWaIJ.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

mattd...@gmail.com

unread,
Jan 28, 2013, 5:12:38 PM1/28/13
to curato...@googlegroups.com
Does that explain the connection timeouts during times of high/peak load on the server?  Is it your standard practice to make your calls asynchronous or in their own thread?  Also, what if any are the deleterious effects of setting a ridiculous value for connection timeout?

Jordan Zimmerman

unread,
Jan 28, 2013, 5:17:49 PM1/28/13
to curato...@googlegroups.com
No - that wouldn't explain it. We've seen connection timeouts in servers that were thrashing (GC). Anything that prevents the ZK connection from sending the heartbeat is a problem.

>Also, what if any are the deleterious effects of setting a ridiculous value for connection timeout?
You shouldn't do it. We set the connection timeout to 30000 and the session timeout to 180000.

-JZ

To view this discussion on the web visit https://groups.google.com/d/msg/curator-users/-/CVskmRnvc04J.
Reply all
Reply to author
Forward
0 new messages