i have some problems when seed node failure happen and i originally
submit an email here for help:
http://www.mail-archive.com/u...@cassandra.apache.org/msg09731.html
so i would not repeat the background here much but instead i will
provide more details on the problem, thanks
3 machines are in the cluster:
10.1.4.221
10.1.4.223
10.1.4.224
with 10.1.4.221 and 10.1.4.223 marked as "seeds" in /conf/
cassandra.yaml as:
- 10.1.4.221
- 10.1.4.223
so i use Pelops to initialize the cluster by:
new Cluster("10.1.4.221,10.1.4.223,10.1.4.224" /* nodeIps */,
9160 /* nodePort */, true /* dynamicNodeDiscovery */)
and i will describe different problem situation i met, will be a bit
long, please forgive me
situation 1)
- all 3 machines are up with keyspace and column family already
created
- client repeatedly do things in a loop (get mutator, write update to
the same row/column with level ONE, then get selector and query column
value to print out)
- inside the loop, 10.1.4.221 is down
- exceptions happened in the client and the client quit:
Determining which node is the least loaded
Node '10.1.4.221' has 0 active connections
Node '10.1.4.224' has 0 active connections
Node '10.1.4.223' has 0 active connections
Chose node '10.1.4.221'...
Attempting to borrow free connection for node '10.1.4.221'
Borrowing connection 'Connection[Keyspace1][
10.1.4.221:9160]
[17977639]'
Operation failed as result of network exception. Connection is being
marked as c
orrupt (and will probably be be destroyed). See cause for details...
org.apache.thrift.transport.TTransportException
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTranspor
t.java:132)
at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTranspo
rt.java:129)
at
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.ja
va:101)
at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.ja
va:378)
at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.ja
va:297)
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryPr
otocol.java:204)
at org.apache.cassandra.thrift.Cassandra
$Client.recv_batch_mutate(Cassan
dra.java:906)
at org.apache.cassandra.thrift.Cassandra
$Client.batch_mutate(Cassandra.j
ava:890)
at org.scale7.cassandra.pelops.Mutator$1.execute(Mutator.java:
46)
at org.scale7.cassandra.pelops.Mutator$1.execute(Mutator.java:
42)
at
org.scale7.cassandra.pelops.Operand.tryOperation(Operand.java:56)
at org.scale7.cassandra.pelops.Mutator.execute(Mutator.java:
51)
at
Returned connection 'Connection[Keyspace1][10.1.4.221:9160][17977639]'
has been
closed or is marked as corrupt
Destroying connection 'Connection[Keyspace1][
10.1.4.221:9160]
[17977639]'
Determining which node is the least loaded
Node '10.1.4.221' has 0 active connections
Node '10.1.4.224' has 0 active connections
Node '10.1.4.223' has 0 active connections
Attempting to honor the notNodeHint '10.1.4.221', skipping node
Chose node '10.1.4.224'...
Attempting to borrow free connection for node '10.1.4.224'
Borrowing connection 'Connection[Keyspace1][
10.1.4.224:9160]
[32477527]'
Operation failed as result of network exception. Connection is being
marked as c
orrupt (and will probably be be destroyed). See cause for details...
TimedOutException()
at org.apache.cassandra.thrift.Cassandra
$batch_mutate_result.read(Cassan
dra.java:16493)
at org.apache.cassandra.thrift.Cassandra
$Client.recv_batch_mutate(Cassan
dra.java:916)
at org.apache.cassandra.thrift.Cassandra
$Client.batch_mutate(Cassandra.j
ava:890)
at org.scale7.cassandra.pelops.Mutator$1.execute(Mutator.java:
46)
at org.scale7.cassandra.pelops.Mutator$1.execute(Mutator.java:
42)
at
org.scale7.cassandra.pelops.Operand.tryOperation(Operand.java:56)
at org.scale7.cassandra.pelops.Mutator.execute(Mutator.java:
51)
at
Returned connection 'Connection[Keyspace1][10.1.4.224:9160][32477527]'
has been
closed or is marked as corrupt
Destroying connection 'Connection[Keyspace1][
10.1.4.224:9160]
[32477527]'
Determining which node is the least loaded
Node '10.1.4.221' has 0 active connections
Node '10.1.4.224' has 0 active connections
Node '10.1.4.223' has 0 active connections
Chose node '10.1.4.221'...
Attempting to borrow free connection for node '10.1.4.221'
Borrowing connection 'Connection[Keyspace1][
10.1.4.221:9160]
[16805237]'
Operation failed as result of network exception. Connection is being
marked as c
orrupt (and will probably be be destroyed). See cause for details...
org.apache.thrift.transport.TTransportException
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTranspor
t.java:132)
at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTranspo
rt.java:129)
at
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.ja
va:101)
at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.ja
va:378)
at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.ja
va:297)
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryPr
otocol.java:204)
at org.apache.cassandra.thrift.Cassandra
$Client.recv_batch_mutate(Cassan
dra.java:906)
at org.apache.cassandra.thrift.Cassandra
$Client.batch_mutate(Cassandra.j
ava:890)
at org.scale7.cassandra.pelops.Mutator$1.execute(Mutator.java:
46)
at org.scale7.cassandra.pelops.Mutator$1.execute(Mutator.java:
42)
at
org.scale7.cassandra.pelops.Operand.tryOperation(Operand.java:56)
at org.scale7.cassandra.pelops.Mutator.execute(Mutator.java:
51)
at
Returned connection 'Connection[Keyspace1][10.1.4.221:9160][16805237]'
has been
closed or is marked as corrupt
Destroying connection 'Connection[Keyspace1][
10.1.4.221:9160]
[16805237]'
situation 2)
- similar to situation 1, i close 10.1.4.223 and 10.1.4.224
alternatively, but it does not affect the client
- the client just keep "choosing"
10.1.4.221:
Chose node '10.1.4.221'...
Attempting to borrow free connection for node '10.1.4.221'
Borrowing connection 'Connection[Keyspace1][
10.1.4.221:9160]
[17977639]'
Returning connection 'Connection[Keyspace1][
10.1.4.221:9160]
[17977639]'
Determining which node is the least loaded
Node '10.1.4.221' has 0 active connections
Node '10.1.4.224' has 0 active connections
Node '10.1.4.223' has 0 active connections
Chose node '10.1.4.221'...
Attempting to borrow free connection for node '10.1.4.221'
Borrowing connection 'Connection[Keyspace1][
10.1.4.221:9160]
[17977639]'
Returning connection 'Connection[Keyspace1][
10.1.4.221:9160]
[17977639]'
Determining which node is the least loaded
Node '10.1.4.221' has 0 active connections
Node '10.1.4.224' has 0 active connections
Node '10.1.4.223' has 0 active connections
Chose node '10.1.4.221'...
Attempting to borrow free connection for node '10.1.4.221'
Borrowing connection 'Connection[Keyspace1][
10.1.4.221:9160]
[17977639]'
Returning connection 'Connection[Keyspace1][
10.1.4.221:9160]
[17977639]'
Determining which node is the least loaded
Node '10.1.4.221' has 0 active connections
Node '10.1.4.224' has 0 active connections
Node '10.1.4.223' has 0 active connections
Chose node '10.1.4.221'...
situation 3)
- before starting the client, i close 10.1.4.221
- and start the client and have the following exception stack traces
(with custom and junit codes removed) and the client quit:
Dynamic node discovery is enabled, detecting initial list of nodes
from [10.1.4.
221, 10.1.4.223, 10.1.4.224]
Failed to open transport. See cause for details...
org.apache.thrift.transport.TTransportException:
java.net.ConnectException: Conn
ection refused: connect
at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
at
org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.ja
va:81)
at
org.scale7.cassandra.pelops.Connection.open(Connection.java:
70)
at
org.scale7.cassandra.pelops.ManagerOperand.openClient(ManagerOperand.
java:54)
at
org.scale7.cassandra.pelops.ManagerOperand.tryOperation(ManagerOperan
d.java:94)
at
org.scale7.cassandra.pelops.KeyspaceManager.getKeyspaceNames(Keyspace
Manager.java:35)
at
org.scale7.cassandra.pelops.Cluster.refreshInternal(Cluster.java:145)
at org.scale7.cassandra.pelops.Cluster.refresh(Cluster.java:
118)
at org.scale7.cassandra.pelops.Cluster.refresh(Cluster.java:
136)
at org.scale7.cassandra.pelops.Cluster.<init>(Cluster.java:56)
at org.scale7.cassandra.pelops.Cluster.<init>(Cluster.java:42)
at org.scale7.cassandra.pelops.Cluster.<init>(Cluster.java:34)
Caused by: java.net.ConnectException: Connection refused: connect
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:
333)
at
java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:525)
at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
... 31 more
Failed to discover nodes dynamically, using existing list of nodes.
See cause f
or details...
org.apache.thrift.transport.TTransportException: Cannot write to null
outputStre
am
at
org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTranspo
rt.java:142)
at
org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.j
ava:156)
at org.apache.cassandra.thrift.Cassandra
$Client.send_describe_keyspaces(
Cassandra.java:1019)
at org.apache.cassandra.thrift.Cassandra
$Client.describe_keyspaces(Cassa
ndra.java:1009)
at org.scale7.cassandra.pelops.KeyspaceManager
$1.execute(KeyspaceManager
.java:32)
at org.scale7.cassandra.pelops.KeyspaceManager
$1.execute(KeyspaceManager
.java:29)
at
org.scale7.cassandra.pelops.ManagerOperand.tryOperation(ManagerOperan
d.java:97)
at
org.scale7.cassandra.pelops.KeyspaceManager.getKeyspaceNames(Keyspace
Manager.java:35)
at
org.scale7.cassandra.pelops.Cluster.refreshInternal(Cluster.java:145)
at org.scale7.cassandra.pelops.Cluster.refresh(Cluster.java:
118)
at org.scale7.cassandra.pelops.Cluster.refresh(Cluster.java:
136)
at org.scale7.cassandra.pelops.Cluster.<init>(Cluster.java:56)
at org.scale7.cassandra.pelops.Cluster.<init>(Cluster.java:42)
at org.scale7.cassandra.pelops.Cluster.<init>(Cluster.java:34)
Failed to open transport. See cause for details...
org.apache.thrift.transport.TTransportException:
java.net.ConnectException: Conn
ection refused: connect
at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
at
org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.ja
va:81)
at
org.scale7.cassandra.pelops.Connection.open(Connection.java:
70)
at
org.scale7.cassandra.pelops.ManagerOperand.openClient(ManagerOperand.
java:54)
at
org.scale7.cassandra.pelops.ManagerOperand.tryOperation(ManagerOperan
d.java:94)
at
org.scale7.cassandra.pelops.KeyspaceManager.getKeyspaceSchema(Keyspac
eManager.java:55)
Caused by: java.net.ConnectException: Connection refused: connect
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:
333)
at
java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:525)
at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
... 32 more
Failed to open transport. See cause for details...
org.apache.thrift.transport.TTransportException:
java.net.ConnectException: Conn
ection refused: connect
at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
at
org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.ja
va:81)
at
org.scale7.cassandra.pelops.Connection.open(Connection.java:
70)
at
org.scale7.cassandra.pelops.ManagerOperand.openClient(ManagerOperand.
java:54)
at
org.scale7.cassandra.pelops.ManagerOperand.tryOperation(ManagerOperan
d.java:94)
at
org.scale7.cassandra.pelops.KeyspaceManager.getKeyspaceSchema(Keyspac
eManager.java:55)
situation 4)
- before starting client, i close 10.1.4.223 or 10.1.4.224
respectively
- the client show exceptions connecting to 10.1.4.223 or 224, but can
finally start and run and connect to 10.1.4.221 and the remaining
working 223 or 224
- the exceptions in the start up:
Adding node '10.1.4.223' to the pool...
MBean
'com.scale7.cassandra.pelops.pool:type=CommonsBackedPoolPooledNode-
Keyspac
e1' is already registered, removing...
Registering MBean
'com.scale7.cassandra.pelops.pool:type=CommonsBackedPoolPooled
Node-Keyspace1'...
Made new connection 'Connection[Keyspace1][10.1.4.223:9160][2773808]'
Failed to open transport. See cause for details...
org.apache.thrift.transport.TTransportException:
java.net.ConnectException: Conn
ection refused: connect
at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
at
org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.ja
va:81)
at
org.scale7.cassandra.pelops.Connection.open(Connection.java:
70)
at org.scale7.cassandra.pelops.pool.CommonsBackedPool
$ConnectionFactory.
makeObject(CommonsBackedPool.java:785)
at
org.apache.commons.pool.impl.GenericKeyedObjectPool.addObject(Generic
KeyedObjectPool.java:1685)
at
org.apache.commons.pool.impl.GenericKeyedObjectPool.ensureMinIdle(Gen
ericKeyedObjectPool.java:2058)
at
org.apache.commons.pool.impl.GenericKeyedObjectPool.preparePool(Gener
icKeyedObjectPool.java:1722)
at
org.scale7.cassandra.pelops.pool.CommonsBackedPool.addNode(CommonsBac
kedPool.java:373)
at
org.scale7.cassandra.pelops.pool.CommonsBackedPool.<init>(CommonsBack
edPool.java:104)
at
org.scale7.cassandra.pelops.pool.CommonsBackedPool.<init>(CommonsBack
edPool.java:64)
at
org.scale7.cassandra.pelops.pool.CommonsBackedPool.<init>(CommonsBack
edPool.java:52)
at org.scale7.cassandra.pelops.Pelops.addPool(Pelops.java:24)
to sum up:
- if the cluster does not have any keyspace, we can't use "auto
discover = true"
- in fact, setting "auto discover = true" does not help in the
automatic fail over
- it seems that Pelops always use the first ip in the node list/
cluster?
- it seems that Pelops always fail when the seed node is down when
calling:
org.scale7.cassandra.pelops.KeyspaceManager.getKeyspaceNames
- how can we use Pelops to have a transparent and automatic fail over
when at least one machine (non-seed) of the cluster is up?
- i am new to cassandra and pelops and if there is anything i am
probably doing wrong, please free feel to let me know and i would love
to provide more details
thanks for your time