Hi,
While trying to test the failover, I ran into write error. Here's
my scenario. I've 4 nodes with 1 seed. I start writing 25000 records
using 2 concurrent threads. While the write is in progress, I brought
down one of the non-seed nodes. This is what I noticed.
1. Writing goes into a pause.
2. The 3 nodes prints the following message
INFO 02:45:30,577 error writing to /
10.192.223.15
INFO 02:45:32,596 InetAddress /
10.192.223.15 is now dead.
No activity is recorded after that.
3. The error from hector
Aug 15, 2010 2:46:55 AM
me.prettyprint.cassandra.service.CassandraClientFactory
createThriftClient
SEVERE: Unable to open transport to
10.192.223.15:9160
org.apache.thrift.transport.TTransportException:
java.net.ConnectException: Connection refused
at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
at
me.prettyprint.cassandra.service.CassandraClientFactory.createThriftClient(CassandraClientFactory.java:
89)
at
me.prettyprint.cassandra.service.CassandraClientFactory.create(CassandraClientFactory.java:
71)
Aug 15, 2010 2:46:55 AM
me.prettyprint.cassandra.service.FailoverOperator operate
WARNING: Skip-host failed
me.prettyprint.cassandra.service.SkipHostException:
org.apache.thrift.transport.TTransportException: Unable to open
transport to
10.192.223.15:9160 , java.net.ConnectException:
Connection refused
at
me.prettyprint.cassandra.service.FailoverOperator.skipToNextHost(FailoverOperator.java:
244)
at
me.prettyprint.cassandra.service.FailoverOperator.operateSingleIteration(FailoverOperator.java:
192)
at
me.prettyprint.cassandra.service.FailoverOperator.operate(FailoverOperator.java:
84)
at
me.prettyprint.cassandra.service.KeyspaceImpl.operateWithFailover(KeyspaceImpl.java:
151)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:
195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:519)
at java.net.Socket.connect(Socket.java:469)
at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
Aug 15, 2010 2:46:55 AM
me.prettyprint.cassandra.service.FailoverOperator
operateSingleIteration
WARNING: Got a TTransportException from 174.129.112.50. Num of
retries: 3 (thread=http-8090-2)
Aug 15, 2010 2:46:55 AM
me.prettyprint.cassandra.service.FailoverOperator skipToNextHost
INFO: Skipping to next host (thread=http-8090-2). Current host is:
174.129.112.50
Aug 15, 2010 2:46:55 AM
me.prettyprint.cassandra.service.FailoverOperator invalidate
INFO: Invalidating client CassandraClient<174.129.112.50:9160-7>
(thread=http-8090-2)
Aug 15, 2010 2:46:55 AM
me.prettyprint.cassandra.service.CassandraClientFactory
createThriftClient
SEVERE: Unable to open transport to
10.192.223.15:9160
org.apache.thrift.transport.TTransportException:
java.net.ConnectException: Connection refused
at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
at
me.prettyprint.cassandra.service.CassandraClientFactory.createThriftClient(CassandraClientFactory.java:
89)
at
me.prettyprint.cassandra.service.CassandraClientFactory.create(CassandraClientFactory.java:
71)
at
me.prettyprint.cassandra.service.CassandraClientFactory.makeObject(CassandraClientFactory.java:
141)
at
org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:
1148)
I ran the cassandra logs in debug, but after one died, the remaining
stopped responded after the message at point 2.
I'm bit puzzled since I thought the fail over is built in and
Cassandra is quick to respond to the situation through other running
nodes.
I'm using CL=1 and RF=1.
Am I missing something in the config?
Any pointers will be highly appreciated.
Thanks