[AbstractNioSelector.warn] - Failed to accept a connection.
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) ~[na:1.7.0_80]
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) ~[na:1.7.0_80]
at org.jboss.netty.channel.socket.nio.NioServerBoss.process(NioServerBoss.java:100) [netty-3.9.4.Final.jar:na]
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) [netty-3.9.4.Final.jar:nThe OpenTSDB cluster is separate from our HBase cluster, and has a very low request traffic (< 400 per hour). Not sure how the file descriptor limit was reached with this kind of traffic.
Has anyone seen this issue before?
Thanks,
David
$ ulimit -H -n
4096$ lsof -a -p 28147
...
java 28147 opentsdb 391u sock 0,7 0t0 130995 can't identify protocol
java 28147 opentsdb 392u sock 0,7 0t0 131043 can't identify protocol
java 28147 opentsdb 393u sock 0,7 0t0 131061 can't identify protocol
java 28147 opentsdb 394u sock 0,7 0t0 131027 can't identify protocol
java 28147 opentsdb 395u sock 0,7 0t0 132157 can't identify protocol
java 28147 opentsdb 396u sock 0,7 0t0 132103 can't identify protocol
java 28147 opentsdb 397u sock 0,7 0t0 132121 can't identify protocol
java 28147 opentsdb 398u sock 0,7 0t0 132139 can't identify protocol
java 28147 opentsdb 399u sock 0,7 0t0 132196 can't identify protocol
...
There are whole bunch of these open.
Any chance TSDB is leaking file descriptors? Or does it have anything to do with how the httpClient queries the OpenTSDB server (not closing Response, etc)?
Are your clients connecting, sending a tiny bit, and disconnecting?
Or do they connect, stay connected, stream lots-o-metrics?
How do the writes happen?
Other nodes I guess?
I will look and see if we have a similar symptom. Our default ulimit is 10k, so we might not have noticed it.
The client just do simple queries through POST queries. No writes:public List<OpenTsdbResult> queryOpenTsdb(OpenTsdbQuery query) throws WebApplicationException {Response response = rootWebTarget.path(QUERY_URL).request().post(Entity.entity(query, MediaType.APPLICATION_JSON));if (response.getStatus() != Response.Status.OK.getStatusCode()) {throw new WebApplicationException(response);}return response.readEntity(new GenericType<List<OpenTsdbResult>>() {});}
java.net.SocketException: Unexpected end of file from server
at org.glassfish.jersey.client.HttpUrlConnector
....
On my client side, everything looks healthy. I issued the "netstat" command on the client machines and all of them have just a handful of CLOSE_WAIT connections (which should turned to CLOSED eventually).On the OpenTSDB side, however, I'm seeing 30 something CLOSE_WAIT connections. These OpenTSDB servers has been taken off of the ELB (for investigation), which has no incoming connections. So the connections in CLOSE_WAIT seemed to be stuck in that state.Our clients do seem to get exceptions while sending post requests:java.net.SocketException: Unexpected end of file from server
at org.glassfish.jersey.client.HttpUrlConnector
....This is thrown before closing the connections.At this point, we only serve static data from OpenTSDB, which we bulk imported. So no writes to the server at all.