I'm having some problems with Hector who seems to keep trying connecting nodes that are marked as dead (see logs under the signature).
In the specific, one node is marked down and then immediately after "discovered" as new node of the cluster. It also seems like connections are trying to use the node anyway, causing many timeouts in the logs.
CassandraHostConfigurator chc = new CassandraHostConfigurator(hosts);
chc.setClockResolution(ClockResolution.MICROSECONDS_SYNC);
chc.setAutoDiscoverHosts(true);
chc.setRetryDownedHosts(true);
chc.setRetryDownedHostsQueueSize(8);
chc.setRetryDownedHostsDelayInSeconds(120);
chc.setMaxActive(20);
chc.setCassandraThriftSocketTimeout(2000);
Cluster cluster = HFactory.getOrCreateCluster("Test Cluster", chc);
ConfigurableConsistencyLevel cp = new ConfigurableConsistencyLevel();
cp.setDefaultReadConsistencyLevel(HConsistencyLevel.ONE);
cp.setDefaultWriteConsistencyLevel(HConsistencyLevel.QUORUM);
FailoverPolicy failoverPolicy = new FailoverPolicy(numRetries, 100);
Keyspace keyspace = HFactory.createKeyspace(keyspaceName, cluster, cp, failoverPolicy);
return keyspace;
10:04:05,827 INFO NodeAutoDiscoverService:108 - Found a node we don't know about xxx.xxx.20.64(xxx.xxx.20.64):9160 for TokenRange TokenRange(start_token:151646312376237217242810867221430285223, end_token:24040424780885293444045389434517205927, endpoints:[xxx.xxx.20.61, xxx.xxx.20.64, xxx.xxx.20.62], rpc_endpoints:[0.0.0.0, 0.0.0.0, 0.0.0.0], endpoint_details:[EndpointDetails(host:xxx.xxx.20.61, datacenter:datacenter1, rack:rack1), EndpointDetails(host:xxx.xxx.20.64, datacenter:datacenter1, rack:rack1), EndpointDetails(host:xxx.xxx.20.62, datacenter:datacenter1, rack:rack1)])
10:04:05,828 INFO NodeAutoDiscoverService:108 - Found a node we don't know about xxx.xxx.20.64(xxx.xxx.20.64):9160 for TokenRange TokenRange(start_token:109111016511119909309889041292459258791, end_token:151646312376237217242810867221430285223, endpoints:[xxx.xxx.20.63, xxx.xxx.20.61, xxx.xxx.20.64], rpc_endpoints:[0.0.0.0, 0.0.0.0, 0.0.0.0], endpoint_details:[EndpointDetails(host:xxx.xxx.20.63, datacenter:datacenter1, rack:rack1), EndpointDetails(host:xxx.xxx.20.61, datacenter:datacenter1, rack:rack1), EndpointDetails(host:xxx.xxx.20.64, datacenter:datacenter1, rack:rack1)])
10:04:05,828 INFO NodeAutoDiscoverService:108 - Found a node we don't know about xxx.xxx.20.64(xxx.xxx.20.64):9160 for TokenRange TokenRange(start_token:24040424780885293444045389434517205927, end_token:66575720646002601376967215363488232359, endpoints:[xxx.xxx.20.64, xxx.xxx.20.62, xxx.xxx.20.63], rpc_endpoints:[0.0.0.0, 0.0.0.0, 0.0.0.0], endpoint_details:[EndpointDetails(host:xxx.xxx.20.64, datacenter:datacenter1, rack:rack1), EndpointDetails(host:xxx.xxx.20.62, datacenter:datacenter1, rack:rack1), EndpointDetails(host:xxx.xxx.20.63, datacenter:datacenter1, rack:rack1)])
10:04:05,828 INFO NodeAutoDiscoverService:70 - Found 1 new host(s) in Ring
10:04:05,829 INFO NodeAutoDiscoverService:72 - Addding found host xxx.xxx.20.64(xxx.xxx.20.64):9160 to pool
10:04:07,830 ERROR HConnectionManager:119 - Transport exception host to HConnectionManager: xxx.xxx.20.64(xxx.xxx.20.64):9160
me.prettyprint.hector.api.exceptions.HectorTransportException: Unable to open transport to xxx.xxx.20.64(xxx.xxx.20.64):9160 , java.net.SocketTimeoutException: connect timed out
at me.prettyprint.cassandra.connection.client.HThriftClient.open(HThriftClient.java:144)
at me.prettyprint.cassandra.connection.client.HThriftClient.open(HThriftClient.java:26)
at me.prettyprint.cassandra.connection.ConcurrentHClientPool.createClient(ConcurrentHClientPool.java:147)
at me.prettyprint.cassandra.connection.ConcurrentHClientPool.<init>(ConcurrentHClientPool.java:53)
at me.prettyprint.cassandra.connection.RoundRobinBalancingPolicy.createConnection(RoundRobinBalancingPolicy.java:67)
at me.prettyprint.cassandra.connection.HConnectionManager.addCassandraHost(HConnectionManager.java:112)
at me.prettyprint.cassandra.connection.NodeAutoDiscoverService.doAddNodes(NodeAutoDiscoverService.java:74)
at me.prettyprint.cassandra.connection.NodeAutoDiscoverService$QueryRing.run(NodeAutoDiscoverService.java:59)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:680)
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: connect timed out
at org.apache.thrift.transport.TSocket.open(TSocket.java:183)
at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
at me.prettyprint.cassandra.connection.client.HThriftClient.open(HThriftClient.java:138)
... 16 more
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:432)
at java.net.Socket.connect(Socket.java:529)
at org.apache.thrift.transport.TSocket.open(TSocket.java:178)
... 18 more