Error on HDFS Connector

40 views
Skip to first unread message

JPS

unread,
Aug 17, 2017, 11:17:01 PM8/17/17
to Confluent Platform
Hi

We have been using Confluent 3.2.0 HDFS Connector for quite some time now. Sometimes, there is no data loss while writing data from kafka to hdfs even for 70 million records. Sometimes, there is data loss with just 500,000 records. We do see below error logs though at times for multiple threads. 


2017-08-16 12:05:04,211 [pool-1-thread-75] ERROR (WorkerTask.java:141) - Task HDFS-Dev2-00-64 threw an uncaught and unrecoverable exception
java.lang.NullPointerException
        at io.confluent.connect.hdfs.DataWriter.close(DataWriter.java:296)
        at io.confluent.connect.hdfs.HdfsSinkTask.close(HdfsSinkTask.java:121)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.commitOffsets(WorkerSinkTask.java:317)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.closePartitions(WorkerSinkTask.java:480)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:152)
        at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:139)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:182)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:744)
2017-08-16 12:05:04,229 [pool-1-thread-75] ERROR (WorkerTask.java:142) - Task is being killed and will not recover until manually restarted


We are using com.qubole.streamx.ByteArrayConverter as key and value converter. 

Confluent version - 3.2.0
Kafka Version - 0.10.2.0
Hadoop Version - 2.7.3

Data writing from one topic with 400 partitions. Tasks.max is 200 in worker payload request.

Why this error at times? What is the root cause of this?

Thanks

JPS

unread,
Aug 25, 2017, 3:16:36 AM8/25/17
to Confluent Platform
Same error cam again last night along with below error:

org.apache.kafka.common.errors.WakeupException
        at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.maybeTriggerWakeup(ConsumerNetworkClient.java:411)
        at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:239)
        at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:188)
        at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:578)
        at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1125)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.doCommitSync(WorkerSinkTask.java:255)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.doCommit(WorkerSinkTask.java:274)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.commitOffsets(WorkerSinkTask.java:348)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.closePartitions(WorkerSinkTask.java:480)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:152)
        at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:139)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:182)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:744)
2017-08-24 21:52:41,390 [pool-1-thread-196] ERROR (WorkerTask.java:142) - Task is being killed and will not recover until manually restarted

Interestingly, it says Task is being killed and will not recover until manually restarted. This came for almost 20-30 tasks. But when I see the status of the tasks, I see all of them in "RUNNING" state.

And these errors and exceptions led to major data loss last night. 

Why these errors come at times and what is the root cause of this? How to overcome them?

Thanks
Reply all
Reply to author
Forward
0 new messages