Hector keep retrying connecting to dead nodes

Filippo Diotalevi

unread,

Jun 6, 2013, 9:10:29 AM6/6/13

to hector...@googlegroups.com

Hi,

I'm having some problems with Hector who seems to keep trying connecting nodes that are marked as dead (see logs under the signature).

In the specific, one node is marked down and then immediately after "discovered" as new node of the cluster. It also seems like connections are trying to use the node anyway, causing many timeouts in the logs.

Cassandra version: 1.2

Hector version: 1.0-5

Hector initialisation:

CassandraHostConfigurator chc = new CassandraHostConfigurator(hosts);

chc.setClockResolution(ClockResolution.MICROSECONDS_SYNC);

chc.setAutoDiscoverHosts(true);

chc.setRetryDownedHosts(true);

chc.setRetryDownedHostsQueueSize(8);

chc.setRetryDownedHostsDelayInSeconds(120);

chc.setMaxActive(20);

chc.setCassandraThriftSocketTimeout(2000);

Cluster cluster = HFactory.getOrCreateCluster("Test Cluster", chc);

ConfigurableConsistencyLevel cp = new ConfigurableConsistencyLevel();

cp.setDefaultReadConsistencyLevel(HConsistencyLevel.ONE);

cp.setDefaultWriteConsistencyLevel(HConsistencyLevel.QUORUM);

FailoverPolicy failoverPolicy = new FailoverPolicy(numRetries, 100);

Keyspace keyspace = HFactory.createKeyspace(keyspaceName, cluster, cp, failoverPolicy);

return keyspace;

Can anyone shed some light?

Thanks,

--

Filippo

--- Logs follow ------

10:03:59,156 WARN CassandraHostRetryService:213 - Downed cassandra-04.stag.vvvvvvv.com(xxx.xxx.20.64):9160 host still appears to be down: Unable to open transport to cassandra-04.stag.vvvvvvv.com(xxx.xxx.20.64):9160 , java.net.SocketTimeoutException: connect timed out

10:03:59,156 INFO CassandraHostRetryService:159 - Downed Host retry status false with host: cassandra-04.stag.vvvvvvv.com(xxx.xxx.20.64):9160

10:04:05,827 INFO NodeAutoDiscoverService:108 - Found a node we don't know about xxx.xxx.20.64(xxx.xxx.20.64):9160 for TokenRange TokenRange(start_token:151646312376237217242810867221430285223, end_token:24040424780885293444045389434517205927, endpoints:[xxx.xxx.20.61, xxx.xxx.20.64, xxx.xxx.20.62], rpc_endpoints:[0.0.0.0, 0.0.0.0, 0.0.0.0], endpoint_details:[EndpointDetails(host:xxx.xxx.20.61, datacenter:datacenter1, rack:rack1), EndpointDetails(host:xxx.xxx.20.64, datacenter:datacenter1, rack:rack1), EndpointDetails(host:xxx.xxx.20.62, datacenter:datacenter1, rack:rack1)])

10:04:05,828 INFO NodeAutoDiscoverService:108 - Found a node we don't know about xxx.xxx.20.64(xxx.xxx.20.64):9160 for TokenRange TokenRange(start_token:109111016511119909309889041292459258791, end_token:151646312376237217242810867221430285223, endpoints:[xxx.xxx.20.63, xxx.xxx.20.61, xxx.xxx.20.64], rpc_endpoints:[0.0.0.0, 0.0.0.0, 0.0.0.0], endpoint_details:[EndpointDetails(host:xxx.xxx.20.63, datacenter:datacenter1, rack:rack1), EndpointDetails(host:xxx.xxx.20.61, datacenter:datacenter1, rack:rack1), EndpointDetails(host:xxx.xxx.20.64, datacenter:datacenter1, rack:rack1)])

10:04:05,828 INFO NodeAutoDiscoverService:108 - Found a node we don't know about xxx.xxx.20.64(xxx.xxx.20.64):9160 for TokenRange TokenRange(start_token:24040424780885293444045389434517205927, end_token:66575720646002601376967215363488232359, endpoints:[xxx.xxx.20.64, xxx.xxx.20.62, xxx.xxx.20.63], rpc_endpoints:[0.0.0.0, 0.0.0.0, 0.0.0.0], endpoint_details:[EndpointDetails(host:xxx.xxx.20.64, datacenter:datacenter1, rack:rack1), EndpointDetails(host:xxx.xxx.20.62, datacenter:datacenter1, rack:rack1), EndpointDetails(host:xxx.xxx.20.63, datacenter:datacenter1, rack:rack1)])

10:04:05,828 INFO NodeAutoDiscoverService:70 - Found 1 new host(s) in Ring

10:04:05,829 INFO NodeAutoDiscoverService:72 - Addding found host xxx.xxx.20.64(xxx.xxx.20.64):9160 to pool

10:04:07,830 ERROR HConnectionManager:119 - Transport exception host to HConnectionManager: xxx.xxx.20.64(xxx.xxx.20.64):9160

me.prettyprint.hector.api.exceptions.HectorTransportException: Unable to open transport to xxx.xxx.20.64(xxx.xxx.20.64):9160 , java.net.SocketTimeoutException: connect timed out

at me.prettyprint.cassandra.connection.client.HThriftClient.open(HThriftClient.java:144)

at me.prettyprint.cassandra.connection.client.HThriftClient.open(HThriftClient.java:26)

at me.prettyprint.cassandra.connection.ConcurrentHClientPool.createClient(ConcurrentHClientPool.java:147)

at me.prettyprint.cassandra.connection.ConcurrentHClientPool.<init>(ConcurrentHClientPool.java:53)

at me.prettyprint.cassandra.connection.RoundRobinBalancingPolicy.createConnection(RoundRobinBalancingPolicy.java:67)

at me.prettyprint.cassandra.connection.HConnectionManager.addCassandraHost(HConnectionManager.java:112)

at me.prettyprint.cassandra.connection.NodeAutoDiscoverService.doAddNodes(NodeAutoDiscoverService.java:74)

at me.prettyprint.cassandra.connection.NodeAutoDiscoverService$QueryRing.run(NodeAutoDiscoverService.java:59)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)

at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)

at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)

at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)

at java.lang.Thread.run(Thread.java:680)

Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: connect timed out

at org.apache.thrift.transport.TSocket.open(TSocket.java:183)

at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)

at me.prettyprint.cassandra.connection.client.HThriftClient.open(HThriftClient.java:138)

... 16 more

Caused by: java.net.SocketTimeoutException: connect timed out

at java.net.PlainSocketImpl.socketConnect(Native Method)

at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)

at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)

at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)

at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:432)

at java.net.Socket.connect(Socket.java:529)

at org.apache.thrift.transport.TSocket.open(TSocket.java:178)

... 18 more

Patricio Echagüe

unread,

Jun 6, 2013, 11:36:45 AM6/6/13

to hector-users

That is expected. Is the node decommissioned ?

If not it will still be discoverable.

--
You received this message because you are subscribed to the Google Groups "hector-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hector-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Rafael Neves

unread,

Jun 6, 2013, 1:16:28 PM6/6/13

to hector...@googlegroups.com

use Firebrand, man. is a best framework for hector-client.

2013/6/6 Patricio Echagüe <patr...@gmail.com>

--
Att,
Rafael Neves
Formando em Análise e Desenvolvimento de Sistemas
Cursando pós em Engenharia de Software
Analista Júnior!!!
Contato: Ranev...@gmail.com

Filippo Diotalevi

unread,

Jun 6, 2013, 5:14:38 PM6/6/13

to hector...@googlegroups.com

On Thursday, June 6, 2013 4:36:45 PM UTC+1, Patricio Echague wrote:

That is expected. Is the node decommissioned ?

If not it will still be discoverable.

Thanks.

However, that is a temporarily failure. I was under the impression that nodetool decommision was only to permanently remove a node.

Is that the best practice to temporarily remove the node from the ring as well?

--

Filippo

Patricio Echagüe

unread,

Jun 6, 2013, 7:15:38 PM6/6/13

to hector-users

not at all. What I was saying is that Hector prints the exception (and that is ok) because it's trying to connect to a node that is in the ring but happens to be down.

If you decomission the node it will just leave the ring forever.

if what you want is to silent the autoDiscoveryService you can set a different logger threshold in log4j to not show messages from that service.

--

Reply all

Reply to author

Forward