Request timeout and Mark host down

Daning Wang

unread,

Apr 10, 2012, 3:10:27 PM4/10/12

to hector...@googlegroups.com

Hi all, We are using Hector(v1.0.4) and often we see lots of timeout exceptions in the log, I know that the hector can failover to other node, but I'd like to reduce the number of timeouts. any hector parameter I should change to reduce this error? We have already set CasssandraHostCOnfigurator.cassandraThriftSocketTimeout to 3000ms, and I think it is already pretty high. Thanks in advance. Here is the exceptionme.prettyprint.hector.api.

exceptions.HectorTransportException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
        at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:33)
        at me.prettyprint.cassandra.model.CqlQuery$1.execute(CqlQuery.java:130)
        at me.prettyprint.cassandra.model.CqlQuery$1.execute(CqlQuery.java:100)
        at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103)
        at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:246)
        at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:97)
        at me.prettyprint.cassandra.model.CqlQuery.execute(CqlQuery.java:99)
        at com.netseer.cassandra.cache.dao.CacheReader.getRows(CacheReader.java:267)
        at com.netseer.cassandra.cache.dao.CacheReader.getCache0(CacheReader.java:55)
        at com.netseer.cassandra.cache.dao.CacheDao.getCaches(CacheDao.java:85)
        at com.netseer.cassandra.cache.dao.CacheDao.getCache(CacheDao.java:71)
        at com.netseer.cassandra.cache.dao.CacheDao.getCache(CacheDao.java:149)
        at com.netseer.cassandra.cache.service.CacheServiceImpl.getCache(CacheServiceImpl.java:55)
        at com.netseer.cassandra.cache.service.CacheServiceImpl.getCache(CacheServiceImpl.java:28)
        at com.netseer.dsat.cache.CassandraDSATCacheImpl.get(CassandraDSATCacheImpl.java:62)
        at com.netseer.dsat.cache.CassandraDSATCacheImpl.getTimedValue(CassandraDSATCacheImpl.java:144)
        at com.netseer.dsat.serving.GenericCacheManager$4.call(GenericCacheManager.java:427)
        at com.netseer.dsat.serving.GenericCacheManager$4.call(GenericCacheManager.java:423)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
        at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
        at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
        at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
        at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
        at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql_query(Cassandra.java:1698)
        at org.apache.cassandra.thrift.Cassandra$Client.execute_cql_query(Cassandra.java:1682)
        at me.prettyprint.cassandra.model.CqlQuery$1.execute(CqlQuery.java:106)
        ... 21 more

Here is the error in log 12/04/04 15:13:25 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.28.78.123(10.28.78.123):9160 12/04/04 15:13:25 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.28.78.123(10.28.78.123):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19 12/04/04 15:13:44 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.240.113.171(10.240.113.171):9160 12/04/04 15:13:44 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.240.113.171(10.240.113.171):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.28.78.123(10.28.78.123):9160 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.28.78.123(10.28.78.123):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.123.83.114(10.123.83.114):9160 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.123.83.114(10.123.83.114):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.6.115.239(10.6.115.239):9160 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.6.115.239(10.6.115.239):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19 12/04/04 15:13:49 ERROR com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 10000 ms 12/04/04 15:13:49 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.120.205.48(10.120.205.48):9160 12/04/04 15:13:49 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.120.205.48(10.120.205.48):9160}; IsActive?: true; Active: 3; Blocked: 0; Idle: 3; NumBeforeExhausted: 17 12/04/04 15:13:50 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.28.20.200(10.28.20.200):9160 12/04/04 15:13:50 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.28.20.200(10.28.20.200):9160}; IsActive?: true; Active: 2; Blocked: 0; Idle: 4; NumBeforeExhausted: 18

Nate McCall

unread,

Apr 10, 2012, 3:21:01 PM4/10/12

to hector...@googlegroups.com

How big are the queries you are executing?

HTimeoutException is a very thin wrapper around TimeoutException
coming out of Cassandra. It ususally means you're slice size is too
big and/or your cluster is under-provisioned.

Daning Wang

unread,

Apr 10, 2012, 3:51:56 PM4/10/12

to hector...@googlegroups.com

Thanks Nate! we do column slice "select first 1 ver0..ver99999 from....", column slice start will ver0, and end with a very big number(ver999999). but we really only keep 3 columns, so it should not cause performance issue, right?

and return value is a blob, it is between 2k-4k.

Thanks,

Daning

Reply all

Reply to author

Forward