Immediately after pushing out the new version we noticed the Thrift open connections to the C* rings began climbing (see attachment).Prior to this we were seeing a maximum of about 75 connections to a single ring. Before we were able to roll this back the connection count shot up to 250 and was still climbing.
To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-us...@lists.datastax.com.
I did run this a bit longer on only one specific node in our app tier and I found that it did grow to the maxConnectionsPerHost level: 8 connections per host, a total of 264 connections for our 33 node ring (actual value was 265, but I figure there's a buffer of one somewhere).
I haven't changed any other other pooling options with this deploy. Is it possible the idle connection handling is causing the connection count to grow instead of getting reclaimed?
How can I monitor the total number of requests? Will the request timer give me that? http://www.datastax.com/drivers/java/2.0/com/datastax/driver/core/Metrics.html#getRequestsTimer()
I haven't changed any other other pooling options with this deploy. Is it possible the idle connection handling is causing the connection count to grow instead of getting reclaimed?Possibly, but my thoughts are if you have running into errors and the driver is reacting in a poor way (not decrementing number of inflight connections) that it could make the connections appear busier than they actually are, causing more to be created. I don't expect this to be happening (I'm running a stress scenario against 2.0.8 right now and am not observing that at the moment). I'm assuming you and Jared Kuolt are working together on this and he mentioned in his post that you are encountering timeouts caused by a spike in traffic.
How can I monitor the total number of requests? Will the request timer give me that? http://www.datastax.com/drivers/java/2.0/com/datastax/driver/core/Metrics.html#getRequestsTimer()Good catch, the requestTimer is a great resource for this. A new request context gets created for each request handler, and it is completed whenever the request is completed or failed. That will show you both the response latency percentiles and the rate of requests. If you have a mechanism to compare this before and after, that would be really interesting to see. If the metrics are similar between 2.0.6 and 2.0.8 for both latencies and rate, that would indicate to me that something is going askew in the driver, but if they are different, that could indicate some change in behavior, whether it be better application, driver or cassandra ring performance (can push more requests through).
I reverted the driver bump and pushed out a version just tracking this metric. I'll let that run overnight and then try a deploy of the new version again to compare the request latency/rate before and after.
--
Olivier Michallat
Driver & tools engineer, DataStax
--
Olivier Michallat
Driver & tools engineer, DataStax
--
Olivier Michallat
Driver & tools engineer, DataStax