Sweet. I think things are indeed a little better with the 0.8.0-3-
SNAPSHOT!
In the log, it seems like NodeAutoDiscoveryService and
CassandraHostRetryService are fighting each other. I filed
https://github.com/rantav/hector/issues/301
I do still see sporadic recurring exceptions surfaced to my app. My
app is periodically given clients to a host Hector should already know
is down. Eventually I see the following and then everything calms
down:
HConnectionManager 2011-10-13 00:11:49,344 -- ERROR -- MARK HOST AS
DOWN TRIGGERED for host 10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:11:49,344 -- ERROR -- Pool state on
shutdown: <ConcurrentCassandraClientPoolByHost>:
{10.183.1.254(10.183.1.254):9160}; IsActive?: true; Active: 1;
Blocked: 0; Idle: 0; NumBeforeExhausted: 49
ConcurrentHClientPool 2011-10-13 00:11:49,344 -- INFO -- Shutdown
triggered on <ConcurrentCassandraClientPoolByHost>:
{10.183.1.254(10.183.1.254):9160}
ConcurrentHClientPool 2011-10-13 00:11:49,344 -- INFO -- Shutdown
complete on <ConcurrentCassandraClientPoolByHost>:
{10.183.1.254(10.183.1.254):9160}
I'd rather get that within a couple seconds than a couple minutes. Is
there a knob I can twiddle in CassandraHostConfigurator?
Briefly looking at the code, here's a hypothesis:
* HostTimeoutTracker not relevant because in my simulation the node
isn't timing out.
* The right thing happens when HConnectionManager gets a
HectorTransportException. (Wave hands re: why this eventually happens)
* For most of the time I receive java.net.NoRouteToHostException,
which ExceptionsTranslatorImpl does not (but should?) map to
HectorTransportException.
What do you think? Here's the log with the stacktraces stripped. I've
also stripped some of the NodeAutoDiscover lines for brevity. I'm
confused by the contradictory suspend/unsuspend messages.
HConnectionManager 2011-10-13 00:09:07,442 -- WARN -- Could not
fullfill request on this host CassandraClient<10.183.2.0:9160-247>
HConnectionManager 2011-10-13 00:09:09,223 -- INFO -- Suspend
operation status was true for CassandraHost 10.183.1.254(10.183.1.254):
9160
HConnectionManager 2011-10-13 00:09:09,223 -- INFO -- Suspend
operation status was false for CassandraHost
10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:09:09,224 -- WARN -- Could not
fullfill request on this host CassandraClient<10.183.1.254:9160-347>
--->Exception thrown to app: Unable to open transport to
10.183.1.254(10.183.1.254):9160 , java.net.NoRouteToHostException:
Network is unreachable
HConnectionManager 2011-10-13 00:09:10,447 -- WARN -- Could not
fullfill request on this host CassandraClient<10.183.2.0:9160-30>
HConnectionManager 2011-10-13 00:09:14,491 -- WARN -- Could not
fullfill request on this host CassandraClient<10.183.1.253:9160-11>
NodeAutoDiscoverService 2011-10-13 00:09:17,594 -- INFO -- Addding
found host 10.183.1.254(10.183.1.254):9160 to pool
HConnectionManager 2011-10-13 00:09:17,594 -- ERROR -- Transport
exception host to HConnectionManager: 10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:09:27,779 -- INFO -- UN-Suspend
operation status was true for CassandraHost 10.183.1.254(10.183.1.254):
9160
HConnectionManager 2011-10-13 00:09:30,837 -- INFO -- Suspend
operation status was true for CassandraHost 10.183.1.254(10.183.1.254):
9160
HConnectionManager 2011-10-13 00:09:30,837 -- INFO -- Suspend
operation status was false for CassandraHost
10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:09:30,838 -- WARN -- Could not
fullfill request on this host CassandraClient<10.183.1.254:9160-348>
HConnectionManager 2011-10-13 00:09:31,817 -- INFO -- Suspend
operation status was false for CassandraHost
10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:09:31,817 -- INFO -- Suspend
operation status was false for CassandraHost
10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:09:31,818 -- WARN -- Could not
fullfill request on this host CassandraClient<10.183.1.254:9160-349>
--->Exception thrown to app: Unable to open transport to
10.183.1.254(10.183.1.254):9160 , java.net.NoRouteToHostException:
Network is unreachable
NodeAutoDiscoverService 2011-10-13 00:09:47,597 -- INFO -- Addding
found host 10.183.1.254(10.183.1.254):9160 to pool
HConnectionManager 2011-10-13 00:09:47,597 -- ERROR -- Transport
exception host to HConnectionManager: 10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:09:47,780 -- INFO -- UN-Suspend
operation status was true for CassandraHost 10.183.1.254(10.183.1.254):
9160
HConnectionManager 2011-10-13 00:09:51,165 -- INFO -- Suspend
operation status was true for CassandraHost 10.183.1.254(10.183.1.254):
9160
HConnectionManager 2011-10-13 00:09:51,165 -- INFO -- Suspend
operation status was false for CassandraHost
10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:09:51,166 -- WARN -- Could not
fullfill request on this host CassandraClient<10.183.1.254:9160-350>
HConnectionManager 2011-10-13 00:09:52,000 -- INFO -- Suspend
operation status was false for CassandraHost
10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:09:52,000 -- INFO -- Suspend
operation status was false for CassandraHost
10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:09:52,001 -- WARN -- Could not
fullfill request on this host CassandraClient<10.183.1.254:9160-351>
--->Exception thrown to app: Unable to open transport to
10.183.1.254(10.183.1.254):9160 , java.net.NoRouteToHostException:
Network is unreachable
HConnectionManager 2011-10-13 00:10:07,780 -- INFO -- UN-Suspend
operation status was true for CassandraHost 10.183.1.254(10.183.1.254):
9160
HConnectionManager 2011-10-13 00:10:10,935 -- INFO -- Suspend
operation status was true for CassandraHost 10.183.1.254(10.183.1.254):
9160
HConnectionManager 2011-10-13 00:10:10,935 -- INFO -- Suspend
operation status was false for CassandraHost
10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:10:10,936 -- WARN -- Could not
fullfill request on this host CassandraClient<10.183.1.254:9160-336>
HConnectionManager 2011-10-13 00:10:12,184 -- INFO -- Suspend
operation status was false for CassandraHost
10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:10:12,184 -- INFO -- Suspend
operation status was false for CassandraHost
10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:10:12,185 -- WARN -- Could not
fullfill request on this host CassandraClient<10.183.1.254:9160-337>
--->Exception thrown to app: Unable to open transport to
10.183.1.254(10.183.1.254):9160 , java.net.NoRouteToHostException:
Network is unreachable
NodeAutoDiscoverService 2011-10-13 00:10:17,600 -- INFO -- Addding
found host 10.183.1.254(10.183.1.254):9160 to pool
HConnectionManager 2011-10-13 00:10:17,600 -- ERROR -- Transport
exception host to HConnectionManager: 10.183.1.254(10.183.1.254):9160
Gossiper 2011-10-13 00:10:21,135 -- INFO -- InetAddress /
10.183.1.254
is now dead.
HConnectionManager 2011-10-13 00:10:27,781 -- INFO -- UN-Suspend
operation status was true for CassandraHost 10.183.1.254(10.183.1.254):
9160
HConnectionManager 2011-10-13 00:10:30,895 -- INFO -- Suspend
operation status was true for CassandraHost 10.183.1.254(10.183.1.254):
9160
HConnectionManager 2011-10-13 00:10:30,895 -- INFO -- Suspend
operation status was false for CassandraHost
10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:10:30,896 -- WARN -- Could not
fullfill request on this host CassandraClient<10.183.1.254:9160-338>
HConnectionManager 2011-10-13 00:10:32,372 -- INFO -- Suspend
operation status was false for CassandraHost
10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:10:32,372 -- INFO -- Suspend
operation status was false for CassandraHost
10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:10:32,373 -- WARN -- Could not
fullfill request on this host CassandraClient<10.183.1.254:9160-339>
--->Exception thrown to app: Unable to open transport to
10.183.1.254(10.183.1.254):9160 , java.net.NoRouteToHostException:
Network is unreachable
NodeAutoDiscoverService 2011-10-13 00:10:47,604 -- INFO -- Addding
found host 10.183.1.254(10.183.1.254):9160 to pool
HConnectionManager 2011-10-13 00:10:47,604 -- ERROR -- Transport
exception host to HConnectionManager: 10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:10:47,781 -- INFO -- UN-Suspend
operation status was true for CassandraHost 10.183.1.254(10.183.1.254):
9160
HConnectionManager 2011-10-13 00:10:50,893 -- INFO -- Suspend
operation status was true for CassandraHost 10.183.1.254(10.183.1.254):
9160
HConnectionManager 2011-10-13 00:10:50,893 -- INFO -- Suspend
operation status was false for CassandraHost
10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:10:50,894 -- WARN -- Could not
fullfill request on this host CassandraClient<10.183.1.254:9160-340>
HConnectionManager 2011-10-13 00:10:51,809 -- INFO -- Suspend
operation status was false for CassandraHost
10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:10:51,809 -- INFO -- Suspend
operation status was false for CassandraHost
10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:10:51,810 -- WARN -- Could not
fullfill request on this host CassandraClient<10.183.1.254:9160-341>
--->Exception thrown to app: Unable to open transport to
10.183.1.254(10.183.1.254):9160 , java.net.NoRouteToHostException:
Network is unreachable
HConnectionManager 2011-10-13 00:11:07,782 -- INFO -- UN-Suspend
operation status was true for CassandraHost 10.183.1.254(10.183.1.254):
9160
HConnectionManager 2011-10-13 00:11:10,846 -- INFO -- Suspend
operation status was true for CassandraHost 10.183.1.254(10.183.1.254):
9160
HConnectionManager 2011-10-13 00:11:10,846 -- INFO -- Suspend
operation status was false for CassandraHost
10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:11:10,847 -- WARN -- Could not
fullfill request on this host CassandraClient<10.183.1.254:9160-342>
HConnectionManager 2011-10-13 00:11:11,972 -- INFO -- Suspend
operation status was false for CassandraHost
10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:11:11,973 -- INFO -- Suspend
operation status was false for CassandraHost
10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:11:11,974 -- WARN -- Could not
fullfill request on this host CassandraClient<10.183.1.254:9160-343>
--->Exception thrown to app: Unable to open transport to
10.183.1.254(10.183.1.254):9160 , java.net.NoRouteToHostException:
Network is unreachable
NodeAutoDiscoverService 2011-10-13 00:11:17,607 -- INFO -- Addding
found host 10.183.1.254(10.183.1.254):9160 to pool
HConnectionManager 2011-10-13 00:11:17,607 -- ERROR -- Transport
exception host to HConnectionManager: 10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:11:27,782 -- INFO -- UN-Suspend
operation status was true for CassandraHost 10.183.1.254(10.183.1.254):
9160
HConnectionManager 2011-10-13 00:11:31,314 -- INFO -- Suspend
operation status was true for CassandraHost 10.183.1.254(10.183.1.254):
9160
HConnectionManager 2011-10-13 00:11:31,315 -- INFO -- Suspend
operation status was false for CassandraHost
10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:11:31,316 -- WARN -- Could not
fullfill request on this host CassandraClient<10.183.1.254:9160-344>
--->Exception thrown to app: Unable to open transport to
10.183.1.254(10.183.1.254):9160 , java.net.NoRouteToHostException:
Network is unreachable
HConnectionManager 2011-10-13 00:11:31,735 -- INFO -- Suspend
operation status was false for CassandraHost
10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:11:31,735 -- INFO -- Suspend
operation status was false for CassandraHost
10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:11:31,736 -- WARN -- Could not
fullfill request on this host CassandraClient<10.183.1.254:9160-345>
NodeAutoDiscoverService 2011-10-13 00:11:47,611 -- INFO -- Addding
found host 10.183.1.254(10.183.1.254):9160 to pool
HConnectionManager 2011-10-13 00:11:47,611 -- ERROR -- Transport
exception host to HConnectionManager: 10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:11:47,782 -- INFO -- UN-Suspend
operation status was true for CassandraHost 10.183.1.254(10.183.1.254):
9160
HConnectionManager 2011-10-13 00:11:49,344 -- ERROR -- MARK HOST AS
DOWN TRIGGERED for host 10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:11:49,344 -- ERROR -- Pool state on
shutdown: <ConcurrentCassandraClientPoolByHost>:
{10.183.1.254(10.183.1.254):9160}; IsActive?: true; Active: 1;
Blocked: 0; Idle: 0; NumBeforeExhausted: 49
ConcurrentHClientPool 2011-10-13 00:11:49,344 -- INFO -- Shutdown
triggered on <ConcurrentCassandraClientPoolByHost>:
{10.183.1.254(10.183.1.254):9160}
ConcurrentHClientPool 2011-10-13 00:11:49,344 -- INFO -- Shutdown
complete on <ConcurrentCassandraClientPoolByHost>:
{10.183.1.254(10.183.1.254):9160}
CassandraHostRetryService 2011-10-13 00:11:49,344 -- INFO -- Host
detected as down was added to retry queue: 10.183.1.254(10.183.1.254):
9160
CassandraHostRetryService 2011-10-13 00:11:49,345 -- WARN -- Downed
10.183.1.254(10.183.1.254):9160 host still appears to be down: Unable
to open transport to 10.183.1.254(10.183.1.254):9160 ,
java.net.NoRouteToHostException: Network is unreachable
HConnectionManager 2011-10-13 00:11:49,345 -- WARN -- Could not
fullfill request on this host null
HConnectionManager 2011-10-13 00:11:50,800 -- INFO -- Suspend
operation status was false for CassandraHost
10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:11:50,800 -- INFO -- Suspend
operation status was false for CassandraHost
10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:11:50,801 -- WARN -- Could not
fullfill request on this host CassandraClient<10.183.1.254:9160-346>
HConnectionManager 2011-10-13 00:11:50,802 -- INFO -- Client
CassandraClient<10.183.1.254:9160-346> released to inactive or dead
pool. Closing.
CassandraHostRetryService 2011-10-13 00:11:50,841 -- WARN -- Downed
10.183.1.254(10.183.1.254):9160 host still appears to be down: Unable
to open transport to 10.183.1.254(10.183.1.254):9160 ,
java.net.NoRouteToHostException: Network is unreachable
CassandraHostRetryService 2011-10-13 00:11:50,841 -- INFO -- Downed
Host retry status false with host: 10.183.1.254(10.183.1.254):9160
CassandraHostRetryService 2011-10-13 00:12:00,842 -- WARN -- Downed
10.183.1.254(10.183.1.254):9160 host still appears to be down: Unable
to open transport to 10.183.1.254(10.183.1.254):9160 ,
java.net.NoRouteToHostException: Network is unreachable
CassandraHostRetryService 2011-10-13 00:12:00,842 -- INFO -- Downed
Host retry status false with host: 10.183.1.254(10.183.1.254):9160
HConnectionManager 2011-10-13 00:12:07,783 -- INFO -- UN-Suspend
operation status was false for CassandraHost
10.183.1.254(10.183.1.254):9160
CassandraHostRetryService 2011-10-13 00:12:10,842 -- WARN -- Downed
10.183.1.254(10.183.1.254):9160 host still appears to be down: Unable
to open transport to 10.183.1.254(10.183.1.254):9160 ,
java.net.NoRouteToHostException: Network is unreachable
CassandraHostRetryService 2011-10-13 00:12:10,842 -- INFO -- Downed
Host retry status false with host: 10.183.1.254(10.183.1.254):9160
NodeAutoDiscoverService 2011-10-13 00:12:17,614 -- INFO -- Addding
found host 10.183.1.254(10.183.1.254):9160 to pool
HConnectionManager 2011-10-13 00:12:17,614 -- ERROR -- Transport
exception host to HConnectionManager: 10.183.1.254(10.183.1.254):9160
CassandraHostRetryService 2011-10-13 00:12:20,843 -- WARN -- Downed
10.183.1.254(10.183.1.254):9160 host still appears to be down: Unable
to open transport to 10.183.1.254(10.183.1.254):9160 ,
java.net.NoRouteToHostException: Network is unreachable
CassandraHostRetryService 2011-10-13 00:12:20,843 -- INFO -- Downed
Host retry status false with host: 10.183.1.254(10.183.1.254):9160
On Oct 12, 3:57 pm, Patricio Echagüe <
patric...@gmail.com> wrote:
> yes.
>
>
http://rantav.github.com/hector/build/html/index.html-> (Cloudbees maven
> repo with nightly
> snapshots<
https://repository-hector-dev.forge.cloudbees.com/snapshot>
> )
> ...
>
> read more »