I'm running cassandra cluster with 24 node.
cassandra server version is 2.1.2
yesterday 1 node was down because of hardware fault.
and after about 10 hours ... all server gone hang with "Too many open files"
In my cassandra system.log, there were too many and too frequently handshake try exists.
X.X.X.X ip is downed server with hardware falut.
and at last it leads a "Too many open files" and hang server.
I think it probably unclosed client socket.
Is it a cassandra BUG? or my mistake(use) in config file or etc.
some one can help me?
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:16,463 OutboundTcpConnection.java:438 - Cannot handshake version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:16,463 OutboundTcpConnection.java:429 - Handshaking version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:16,463 OutboundTcpConnection.java:438 - Cannot handshake version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:16,463 OutboundTcpConnection.java:429 - Handshaking version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:16,463 OutboundTcpConnection.java:438 - Cannot handshake version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:16,464 OutboundTcpConnection.java:429 - Handshaking version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:16,464 OutboundTcpConnection.java:438 - Cannot handshake version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:16,464 OutboundTcpConnection.java:429 - Handshaking version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:16,464 OutboundTcpConnection.java:438 - Cannot handshake version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:16,464 OutboundTcpConnection.java:429 - Handshaking version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:16,465 OutboundTcpConnection.java:438 - Cannot handshake version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:16,465 OutboundTcpConnection.java:429 - Handshaking version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:16,465 OutboundTcpConnection.java:438 - Cannot handshake version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:16,465 OutboundTcpConnection.java:429 - Handshaking version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:16,465 OutboundTcpConnection.java:438 - Cannot handshake version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:16,465 OutboundTcpConnection.java:429 - Handshaking version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:16,466 OutboundTcpConnection.java:438 - Cannot handshake version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:16,466 OutboundTcpConnection.java:429 - Handshaking version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:16,466 OutboundTcpConnection.java:438 - Cannot handshake version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:16,466 OutboundTcpConnection.java:429 - Handshaking version with /X.X.X.X
....
..... this line lasts million times ~ and at last
.....
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:22,469 OutboundTcpConnection.java:438 - Cannot handshake version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:22,470 OutboundTcpConnection.java:429 - Handshaking version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:22,470 OutboundTcpConnection.java:438 - Cannot handshake version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:22,470 OutboundTcpConnection.java:429 - Handshaking version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:22,470 OutboundTcpConnection.java:438 - Cannot handshake version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:22,470 OutboundTcpConnection.java:429 - Handshaking version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:22,470 OutboundTcpConnection.java:438 - Cannot handshake version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:22,471 OutboundTcpConnection.java:429 - Handshaking version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:22,471 OutboundTcpConnection.java:438 - Cannot handshake version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:22,471 OutboundTcpConnection.java:429 - Handshaking version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:22,471 OutboundTcpConnection.java:438 - Cannot handshake version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:22,471 OutboundTcpConnection.java:429 - Handshaking version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:22,471 OutboundTcpConnection.java:438 - Cannot handshake version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:22,472 OutboundTcpConnection.java:429 - Handshaking version with /X.X.X.X
INFO [HANDSHAKE-/X.X.X.X] 2015-02-05 04:56:22,472 OutboundTcpConnection.java:438 - Cannot handshake version with /X.X.X.X
WARN [SharedPool-Worker-25] 2015-02-05 04:56:22,525 AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-25,5,main]: {}
java.lang.RuntimeException: java.lang.RuntimeException: java.io.FileNotFoundException: /home/dev1/lib/cassandra2/data/user1/user_key-0febb330962c11e4b3d39dcaec8ca56f/user1-user_key-ka-4488-Data.db (Too many open files)
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2084) ~[apache-cassandra-2.1.2.jar:2.1.2]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_67]
at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.1.2.jar:2.1.2]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_67]
here's ulimit.
[de...@csdr001.u1 logs]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 385578
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 100000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 32768
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited