[2016-09-14 11:46:26,674] ERROR Error while accepting connection (kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:323)
at kafka.network.Acceptor.run(SocketServer.scala:268)
at java.lang.Thread.run(Thread.java:745)
I've tried netstat and other tools to check if socket related and so far appears they are far below the 100,000 open file limit set for BOTH root and * in /etc/security/limits.conf
I've dumped my lsof output and even sorted and the "java" process is the biggest offender but hard to tell if Cassandra or Kafka. I even removed that Cassandra node from cluster and left disabled and Kafka still failed on that node so I believe I ruled it out. It's unclear how to diagnose or find the cause and this is delaying our go-live with new Kafka-based stream pipeline architecture (compete rewrite of back end architecture/apps).
Here are dumps of LSOF:
http://hive.goomzee.com/lsof_logs.tar.gz
-------------
These errors occur when starting Python-based processors on app server that uses Confluent-Kafka-Python. Each of approximately 30 scripts instantiate a Consumer that polls for new messages in a topic and also instantiates a Producer for sending the edited message to another topic along the chain. Consumer/Producer classes were built using kafka-confluent-python library that wraps librdkafka
Configs attached to this ticket.
-------------
I have set /etc/security/limits.conf
* soft nofile 100000
* hard nofile 100000
root soft nofile 100000
root hard nofile 100000
----------------
I start kafka server (and zookeeper, etc.) using Supervisor with the following configuration (template for Ansible playbook with jinja variables):
[program:kafka-server]
command=/usr/bin/kafka-server-start {{ kafka_config_dir }}/server.properties
stdout_logfile=/var/log/supervisor/kafka.log
redirect_stderr=true
autostart=true
;directory=
stopsignal=QUIT
;user=kafka
environment=KAFKA_LOG4J_OPTS="-Dlog4j.configuration=file:{{ kafka_config_dir }}/log4j.properties"
[2016-09-15 09:07:25,873] ERROR Error while accepting connection (kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:323)
at kafka.network.Acceptor.run(SocketServer.scala:268)
at java.lang.Thread.run(Thread.java:745)
[2016-09-15 09:07:27,887] INFO KafkaConfig values:
advertised.host.name = null
metric.reporters = []
quota.producer.default = 9223372036854775807
offsets.topic.num.partitions = 50
log.flush.interval.messages = 9223372036854775807
auto.create.topics.enable = true
controller.socket.timeout.ms = 30000
log.flush.interval.ms = null
principal.builder.class = class org.apache.kafka.common.security.auth.DefaultPrincipalBuilder
replica.socket.receive.buffer.bytes = 65536
min.insync.replicas = 1
replica.fetch.wait.max.ms = 500
num.recovery.threads.per.data.dir = 1
ssl.keystore.type = JKS
sasl.mechanism.inter.broker.protocol = GSSAPI
default.replication.factor = 1
ssl.truststore.password = null
log.preallocate = false
sasl.kerberos.principal.to.local.rules = [DEFAULT]
fetch.purgatory.purge.interval.requests = 1000
ssl.endpoint.identification.algorithm = null
replica.socket.timeout.ms = 30000
message.max.bytes = 1000012
num.io.threads = 8
offsets.commit.required.acks = -1
log.flush.offset.checkpoint.interval.ms = 60000
delete.topic.enable = true
quota.window.size.seconds = 1
ssl.truststore.type = JKS
offsets.commit.timeout.ms = 5000
quota.window.num = 11
zookeeper.connect = queue1:2181,queue2:2181,queue3:2181,queue4:2181
num.replica.fetchers = 1
log.retention.ms = null
log.roll.jitter.hours = 0
log.cleaner.enable = true
offsets.load.buffer.size = 5242880
log.cleaner.delete.retention.ms = 86400000
ssl.client.auth = none
controlled.shutdown.max.retries = 3
queued.max.requests = 500
offsets.topic.replication.factor = 3
log.cleaner.threads = 1
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
socket.request.max.bytes = 104857600
ssl.trustmanager.algorithm = PKIX
zookeeper.session.timeout.ms = 6000
log.retention.bytes = -1
log.message.timestamp.type = CreateTime
sasl.kerberos.min.time.before.relogin = 60000
zookeeper.set.acl = false
connections.max.idle.ms = 600000
offsets.retention.minutes = 1440
replica.fetch.backoff.ms = 1000
inter.broker.protocol.version = 0.10.0-IV1
log.retention.hours = 336
num.partitions = 1
broker.id.generation.enable = true
listeners = PLAINTEXT://:9092
ssl.provider = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
log.roll.ms = null
log.flush.scheduler.interval.ms = 9223372036854775807
ssl.cipher.suites = null
log.index.size.max.bytes = 10485760
ssl.keymanager.algorithm = SunX509
security.inter.broker.protocol = PLAINTEXT
replica.fetch.max.bytes = 1048576
advertised.port = null
log.cleaner.dedupe.buffer.size = 134217728
replica.high.watermark.checkpoint.interval.ms = 5000
log.cleaner.io.buffer.size = 524288
sasl.kerberos.ticket.renew.window.factor = 0.8
zookeeper.connection.timeout.ms = 6000
controlled.shutdown.retry.backoff.ms = 5000
log.roll.hours = 168
log.cleanup.policy = delete
host.name = db4
log.roll.jitter.ms = null
max.connections.per.ip = 2147483647
offsets.topic.segment.bytes = 104857600
background.threads = 10
quota.consumer.default = 9223372036854775807
request.timeout.ms = 30000
log.message.format.version = 0.10.0-IV1
log.index.interval.bytes = 4096
log.dir = /tmp/kafka-logs
log.segment.bytes = 1073741824
log.cleaner.backoff.ms = 15000
offset.metadata.max.bytes = 4096
ssl.truststore.location = null
group.max.session.timeout.ms = 300000
ssl.keystore.password = null
zookeeper.sync.time.ms = 2000
port = 9092
log.retention.minutes = null
log.segment.delete.delay.ms = 60000
log.dirs = /var/lib/kafka
controlled.shutdown.enable = true
compression.type = producer
max.connections.per.ip.overrides =
log.message.timestamp.difference.max.ms = 9223372036854775807
sasl.kerberos.kinit.cmd = /usr/bin/kinit
log.cleaner.io.max.bytes.per.second = 1.7976931348623157E308
auto.leader.rebalance.enable = true
leader.imbalance.check.interval.seconds = 300
log.cleaner.min.cleanable.ratio = 0.5
replica.lag.time.max.ms = 10000
num.network.threads = 3
ssl.key.password = null
reserved.broker.max.id = 1000
metrics.num.samples = 2
socket.send.buffer.bytes = 102400
ssl.protocol = TLS
socket.receive.buffer.bytes = 102400
ssl.keystore.location = null
replica.fetch.min.bytes = 1
broker.rack = null
unclean.leader.election.enable = true
sasl.enabled.mechanisms = [GSSAPI]
group.min.session.timeout.ms = 6000
log.cleaner.io.buffer.load.factor = 0.9
offsets.retention.check.interval.ms = 600000
producer.purgatory.purge.interval.requests = 1000
metrics.sample.window.ms = 30000
broker.id = 4
offsets.topic.compression.codec = 0
log.retention.check.interval.ms = 300000
advertised.listeners = null
leader.imbalance.per.broker.percentage = 10
(kafka.server.KafkaConfig)
[2016-09-15 09:07:27,932] WARN The support metrics collection feature ("Metrics") of Proactive Support is disabled. (io.confluent.support.metrics.SupportedServerStartable)
[2016-09-15 09:07:27,932] INFO starting (kafka.server.KafkaServer)
[2016-09-15 09:07:27,938] INFO Connecting to zookeeper on queue1:2181,queue2:2181,queue3:2181,queue4:2181 (kafka.server.KafkaServer)
[2016-09-15 09:07:27,947] INFO Starting ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread)
[2016-09-15 09:07:27,952] INFO Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT (org.apache.zookeeper.ZooKeeper)
[2016-09-15 09:07:27,952] INFO Client environment:host.name=queue4 (org.apache.zookeeper.ZooKeeper)
[2016-09-15 09:07:27,952] INFO Client environment:java.version=1.8.0_101 (org.apache.zookeeper.ZooKeeper)
[2016-09-15 09:07:27,952] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper)
[2016-09-15 09:07:27,952] INFO Client environment:java.home=/usr/lib/jvm/java-8-oracle/jre (org.apache.zookeeper.ZooKeeper)
[2016-09-15 09:07:27,952] INFO Client environment:java.class.path=:/usr/bin/../share/java/kafka/jackson-databind-2.6.3.jar:/usr/bin/../share/java/kafka/jetty-security-9.2.15.v20160210.jar:/usr/bin/../share/java/kafka/commons-digester-1.8.1.jar:/usr/bin/../share/java/kafka/zkclient-0.8.jar:/usr/bin/../share/java/kafka/jersey-container-servlet-core-2.22.2.jar:/usr/bin/../share/java/kafka/jackson-annotations-2.6.0.jar:/usr/bin/../share/java/kafka/support-metrics-client-3.0.1.jar:/usr/bin/../share/java/kafka/lz4-1.3.0.jar:/usr/bin/../share/java/kafka/jersey-guava-2.22.2.jar:/usr/bin/../share/java/kafka/aopalliance-repackaged-2.4.0-b34.jar:/usr/bin/../share/java/kafka/jetty-servlet-9.2.15.v20160210.jar:/usr/bin/../share/java/kafka/jetty-io-9.2.15.v20160210.jar:/usr/bin/../share/java/kafka/kafka.jar:/usr/bin/../share/java/kafka/commons-collections-3.2.1.jar:/usr/bin/../share/java/kafka/kafka_2.11-0.10.0.1-cp1-test-sources.jar:/usr/bin/../share/java/kafka/jetty-servlets-9.2.15.v20160210.jar:/usr/bin/../share/java/kafka/slf4j-api-1.7.21.jar:/usr/bin/../share/java/kafka/scala-library-2.11.8.jar:/usr/bin/../share/java/kafka/commons-logging-1.2.jar:/usr/bin/../share/java/kafka/jersey-container-servlet-2.22.2.jar:/usr/bin/../share/java/kafka/commons-beanutils-1.8.3.jar:/usr/bin/../share/java/kafka/commons-codec-1.9.jar:/usr/bin/../share/java/kafka/javax.inject-1.jar:/usr/bin/../share/java/kafka/jersey-common-2.22.2.jar:/usr/bin/../share/java/kafka/httpcore-4.4.3.jar:/usr/bin/../share/java/kafka/osgi-resource-locator-1.0.1.jar:/usr/bin/../share/java/kafka/connect-runtime-0.10.0.1-cp1.jar:/usr/bin/../share/java/kafka/jackson-mapper-asl-1.9.13.jar:/usr/bin/../share/java/kafka/kafka-streams-examples-0.10.0.1-cp1.jar:/usr/bin/../share/java/kafka/avro-1.7.7.jar:/usr/bin/../share/java/kafka/jackson-module-jaxb-annotations-2.6.3.jar:/usr/bin/../share/java/kafka/kafka-streams-0.10.0.1-cp1.jar:/usr/bin/../share/java/kafka/jackson-core-2.6.3.jar:/usr/bin/../share/java/kafka/commons-validator-1.4.1.jar:/usr/bin/../share/java/kafka/xz-1.0.jar:/usr/bin/../share/java/kafka/jersey-media-jaxb-2.22.2.jar:/usr/bin/../share/java/kafka/jopt-simple-4.9.jar:/usr/bin/../share/java/kafka/connect-api-0.10.0.1-cp1.jar:/usr/bin/../share/java/kafka/jetty-server-9.2.15.v20160210.jar:/usr/bin/../share/java/kafka/hk2-utils-2.4.0-b34.jar:/usr/bin/../share/java/kafka/paranamer-2.3.jar:/usr/bin/../share/java/kafka/jetty-http-9.2.15.v20160210.jar:/usr/bin/../share/java/kafka/kafka_2.11-0.10.0.1-cp1-scaladoc.jar:/usr/bin/../share/java/kafka/javax.annotation-api-1.2.jar:/usr/bin/../share/java/kafka/validation-api-1.1.0.Final.jar:/usr/bin/../share/java/kafka/kafka_2.11-0.10.0.1-cp1-javadoc.jar:/usr/bin/../share/java/kafka/rocksdbjni-4.8.0.jar:/usr/bin/../share/java/kafka/jersey-server-2.22.2.jar:/usr/bin/../share/java/kafka/slf4j-log4j12-1.7.21.jar:/usr/bin/../share/java/kafka/connect-file-0.10.0.1-cp1.jar:/usr/bin/../share/java/kafka/commons-lang3-3.1.jar:/usr/bin/../share/java/kafka/snappy-java-1.1.2.6.jar:/usr/bin/../share/java/kafka/jetty-util-9.2.15.v20160210.jar:/usr/bin/../share/java/kafka/jackson-core-asl-1.9.13.jar:/usr/bin/../share/java/kafka/scala-parser-combinators_2.11-1.0.4.jar:/usr/bin/../share/java/kafka/kafka_2.11-0.10.0.1-cp1.jar:/usr/bin/../share/java/kafka/kafka-clients-0.10.0.1-cp1.jar:/usr/bin/../share/java/kafka/jackson-jaxrs-json-provider-2.6.3.jar:/usr/bin/../share/java/kafka/httpmime-4.5.1.jar:/usr/bin/../share/java/kafka/guava-18.0.jar:/usr/bin/../share/java/kafka/kafka_2.11-0.10.0.1-cp1-test.jar:/usr/bin/../share/java/kafka/jetty-continuation-9.2.15.v20160210.jar:/usr/bin/../share/java/kafka/argparse4j-0.5.0.jar:/usr/bin/../share/java/kafka/connect-json-0.10.0.1-cp1.jar:/usr/bin/../share/java/kafka/log4j-1.2.17.jar:/usr/bin/../share/java/kafka/javassist-3.18.2-GA.jar:/usr/bin/../share/java/kafka/jackson-jaxrs-base-2.6.3.jar:/usr/bin/../share/java/kafka/javax.inject-2.4.0-b34.jar:/usr/bin/../share/java/kafka/metrics-core-2.2.0.jar:/usr/bin/../share/java/kafka/kafka_2.11-0.10.0.1-cp1-sources.jar:/usr/bin/../share/java/kafka/jersey-client-2.22.2.jar:/usr/bin/../share/java/kafka/reflections-0.9.10.jar:/usr/bin/../share/java/kafka/hk2-locator-2.4.0-b34.jar:/usr/bin/../share/java/kafka/httpclient-4.5.1.jar:/usr/bin/../share/java/kafka/javax.servlet-api-3.1.0.jar:/usr/bin/../share/java/kafka/support-metrics-common-3.0.1.jar:/usr/bin/../share/java/kafka/commons-compress-1.4.1.jar:/usr/bin/../share/java/kafka/kafka-tools-0.10.0.1-cp1.jar:/usr/bin/../share/java/kafka/javax.ws.rs-api-2.0.1.jar:/usr/bin/../share/java/kafka/kafka-log4j-appender-0.10.0.1-cp1.jar:/usr/bin/../share/java/kafka/hk2-api-2.4.0-b34.jar:/usr/bin/../share/java/kafka/zookeeper-3.4.6.jar:/usr/bin/../share/java/confluent-support-metrics/support-metrics-fullcollector-3.0.1.jar:/usr/share/java/confluent-support-metrics/support-metrics-fullcollector-3.0.1.jar (org.apache.zookeeper.ZooKeeper)
[2016-09-15 09:07:27,952] INFO Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper)
[2016-09-15 09:07:27,952] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper)
[2016-09-15 09:07:27,952] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper)
[2016-09-15 09:07:27,952] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper)
[2016-09-15 09:07:27,952] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper)
[2016-09-15 09:07:27,952] INFO Client environment:os.version=3.19.0-68-generic (org.apache.zookeeper.ZooKeeper)
[2016-09-15 09:07:27,952] INFO Client environment:user.name=root (org.apache.zookeeper.ZooKeeper)
[2016-09-15 09:07:27,952] INFO Client environment:user.home=/root (org.apache.zookeeper.ZooKeeper)
[2016-09-15 09:07:27,953] INFO Client environment:user.dir=/ (org.apache.zookeeper.ZooKeeper)
[2016-09-15 09:07:27,953] INFO Initiating client connection, connectString=queue1:2181,queue2:2181,queue3:2181,queue4:2181 sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@5c30a9b0 (org.apache.zookeeper.ZooKeeper)
[2016-09-15 09:07:27,965] INFO Waiting for keeper state SyncConnected (org.I0Itec.zkclient.ZkClient)
[2016-09-15 09:07:27,968] INFO Opening socket connection to server queue4/10.0.1.24:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2016-09-15 09:07:28,040] INFO Socket connection established to queue4/10.0.1.24:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2016-09-15 09:07:28,047] INFO Session establishment complete on server queue4/10.0.1.24:2181, sessionid = 0x4572ae031840006, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
[2016-09-15 09:07:28,049] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient)
[2016-09-15 09:07:28,611] INFO Loading logs. (kafka.log.LogManager)
[2016-09-15 09:07:28,640] WARN Found a corrupted index file, /var/lib/kafka/social.activitystream.UserEvents-2/00000000000000000000.index, deleting and rebuilding index... (kafka.log.Log)
[2016-09-15 09:07:28,646] INFO Recovering unflushed segment 0 in log social.activitystream.UserEvents-2. (kafka.log.Log)
[2016-09-15 09:07:28,648] INFO Completed load of log social.activitystream.UserEvents-2 with log end offset 0 (kafka.log.Log)
[2016-09-15 09:07:28,653] WARN Found a corrupted index file, /var/lib/kafka/social.activitystream.UserEvents-1/00000000000000000000.index, deleting and rebuilding index... (kafka.log.Log)
[2016-09-15 09:07:28,653] INFO Recovering unflushed segment 0 in log social.activitystream.UserEvents-1. (kafka.log.Log)
[2016-09-15 09:07:28,654] INFO Completed load of log social.activitystream.UserEvents-1 with log end offset 0 (kafka.log.Log)
[2016-09-15 09:07:28,661] WARN Found a corrupted index file, /var/lib/kafka/rets.gacar.OpenHouse-0/00000000000000003653.index, deleting and rebuilding index... (kafka.log.Log)
IT WILL RUN FOR A WHILE AND THEN LOG STARTS FILLING UP WITH Too many open files errors and we crash. ???
--
You received this message because you are subscribed to a topic in the Google Groups "Confluent Platform" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/confluent-platform/ViX98ylVTSU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/74a9bbb0-3a44-4a12-b767-55152e241128%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and all its topics, send an email to confluent-platform+unsub...@googlegroups.com.