It can be turned on via:
cassandraHostConfigurator.setAutoDiscoverHosts(true);
The retry issue would only be triggered if you configured the failover
policy for failing fast (the default is to keep retrying).
If thrift traffic was run on a different hardware interface (it's not
unless you explicitly configured it for such), thrift would try to
connect to the machines retrieved from the API which would be the
gossip address so it would get a connection refused (thrift trying to
connect to port 9160 on the gossip IP address which is only listening
for port 7000).
Regardless, if you have multiple hosts being unavailable within the
cluster, you should look at throttling your jobs or adjusting
read_repair_chance (or turn RR off completely) and
dynamic_snitch_badness_threshold (see the comments of the latter in
cassandra.yaml for an explanation on this).
I'll investigate get a fix out for the above in the morning. Thanks
for bringing this up.
Also, having monitoring or at the least taking a couple of samples
from "nodetool tpstats" for the nodes having issues while this is
happening will get us some extra data on where to look.
I do know of other folks successfully bulk loading from Hector though.
(Anybody else seeing similar issues?)
One more thing- what version of hector? update to the latest 0.7.0-22
if you have not already as there was some scewyness with the default
least active load balancing policy.
Upgrade to rc4 if you have not already.
There are some details on monitoring hector in this document:
http://www.riptano.com/sites/default/files/hector-v2-client-doc.pdf