Kafka Node crashes giving ERROR Uncaught exception in scheduled task 'kafka-log-retention'

290 views
Skip to first unread message

prateek shekhar

unread,
Dec 10, 2019, 5:55:57 PM12/10/19
to Confluent Platform
Hello,

We have installed a version of Confluent Kafka 5.2.1 for our 3 node Kafka cluster and 3 node Zookeeper cluster.

We have a  number of topics in the Kafka cluster and processes which continuously write data into some of these topics. Along with these topics we have some test topic which are rarely used. But for the past couple of months we are seeing some weird behavior in these test topics which leads to crashing of Kafka node. It seems that the Kafka Scheduler throws Exception and crashes the service when it could not find a log to be deleted. I am not able to understand that even when there is no data, why does Kafka Scheduler looks for deleting a particular log segment from a topic's log directory.

Attaching logged error details where the original topic names have been replaced with different names. Also I have copied and pasted the error below.

[2019-12-02 21:25:59,269] ERROR Error while deleting segments for test-debug-0 in dir /tmp/kafka-logs (kafka.server.LogDirFailureChannel:76)
java.nio.file.NoSuchFileException: /tmp/kafka-logs/test-debug-0/00000000000000000000.log
    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
    at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
    at java.nio.file.Files.move(Files.java:1395)
    at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:805)
    at org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:224)
    at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:488)
    at kafka.log.Log.asyncDeleteSegment(Log.scala:1924)
    at kafka.log.Log.deleteSegment(Log.scala:1909)
    at kafka.log.Log.$anonfun$deleteSegments$3(Log.scala:1455)
    at kafka.log.Log.$anonfun$deleteSegments$3$adapted(Log.scala:1455)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1455)
    at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
    at kafka.log.Log.maybeHandleIOException(Log.scala:2013)
    at kafka.log.Log.deleteSegments(Log.scala:1446)
    at kafka.log.Log.deleteOldSegments(Log.scala:1441)
    at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1519)
    at kafka.log.Log.deleteOldSegments(Log.scala:1509)
    at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:913)
    at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:910)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at kafka.log.LogManager.cleanupLogs(LogManager.scala:910)
    at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:395)
    at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
    at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    Suppressed: java.nio.file.NoSuchFileException: /tmp/kafka-logs/test-debug-0/00000000000000000000.log -> /tmp/kafka-logs/test-debug-0/00000000000000000000.log.deleted
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
        at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
        at java.nio.file.Files.move(Files.java:1395)
        at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:802)
        ... 30 more
[2019-12-02 21:25:59,271] ERROR Uncaught exception in scheduled task 'kafka-log-retention' (kafka.utils.KafkaScheduler:76)
org.apache.kafka.common.errors.KafkaStorageException: Error while deleting segments for test-debug-0 in dir /tmp/kafka-logs
Caused by: java.nio.file.NoSuchFileException: /tmp/kafka-logs/test-debug-0/00000000000000000000.log
    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
    at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
    at java.nio.file.Files.move(Files.java:1395)
    at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:805)
    at org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:224)
    at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:488)
    at kafka.log.Log.asyncDeleteSegment(Log.scala:1924)
    at kafka.log.Log.deleteSegment(Log.scala:1909)
    at kafka.log.Log.$anonfun$deleteSegments$3(Log.scala:1455)
    at kafka.log.Log.$anonfun$deleteSegments$3$adapted(Log.scala:1455)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1455)
    at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
    at kafka.log.Log.maybeHandleIOException(Log.scala:2013)
    at kafka.log.Log.deleteSegments(Log.scala:1446)
    at kafka.log.Log.deleteOldSegments(Log.scala:1441)
    at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1519)
    at kafka.log.Log.deleteOldSegments(Log.scala:1509)
    at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:913)
    at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:910)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at kafka.log.LogManager.cleanupLogs(LogManager.scala:910)
    at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:395)
    at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
    at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    Suppressed: java.nio.file.NoSuchFileException: /tmp/kafka-logs/test-debug-0/00000000000000000000.log -> /tmp/kafka-logs/test-debug-0/00000000000000000000.log.deleted
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
        at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
        at java.nio.file.Files.move(Files.java:1395)
        at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:802)
        ... 30 more
[2019-12-02 21:25:59,272] INFO [ReplicaManager broker=1] Stopping serving replicas in dir /tmp/kafka-logs (kafka.server.ReplicaManager:66)
[2019-12-02 21:25:59,287] INFO [ReplicaFetcherManager on broker 1] Removed fetcher for partitions Set(test-debug-0,  <other topics.....>) (kafka.server.ReplicaAlterLogDirsManager:66)
[2019-12-02 21:25:59,326] INFO [KafkaApi-1] Closing connection due to error during produce request with correlation id 81507 from client id  with ack=0
Topic and partition to exceptions: topic_a-0 -> org.apache.kafka.common.errors.KafkaStorageException (kafka.server.KafkaApis:66)
[2019-12-02 21:25:59,409] INFO [ReplicaManager broker=1] Broker 1 stopped fetcher for partitions test-debug-0, ,  <other topics.....> and stopped moving logs for partitions  because they are in the failed log directory /tmp/kafka-logs. (kafka.server.ReplicaManager:66)
[2019-12-02 21:25:59,409] INFO Stopping serving logs in dir /tmp/kafka-logs (kafka.log.LogManager:66)
[2019-12-02 21:25:59,411] INFO [KafkaApi-1] Closing connection due to error during produce request with correlation id 57755 from client id  with ack=0
Topic and partition to exceptions: topic_-2 -> org.apache.kafka.common.errors.KafkaStorageException (kafka.server.KafkaApis:66)
[2019-12-02 21:25:59,428] ERROR Shutdown broker because all log dirs in /tmp/kafka-logs have failed (kafka.log.LogManager:143)

Kindly guide me through how to overcome this error. Any help would be much appreciated.

Best,
Prateek Shekhar
kafka-log-error.log
Reply all
Reply to author
Forward
0 new messages