Hello,
I have some log lines I do not understand in hdfs connect:
[2017-01-27 15:42:37,256] ERROR Recovery failed at state RECOVERY_PARTITION_PAUSED (io.confluent.connect.hdfs.TopicPartitionWriter:229)
org.apache.kafka.connect.errors.ConnectException: java.io.EOFException
at io.confluent.connect.hdfs.wal.FSWAL.apply(FSWAL.java:131)
at io.confluent.connect.hdfs.TopicPartitionWriter.applyWAL(TopicPartitionWriter.java:484)
at io.confluent.connect.hdfs.TopicPartitionWriter.recover(TopicPartitionWriter.java:212)
at io.confluent.connect.hdfs.TopicPartitionWriter.write(TopicPartitionWriter.java:256)
at io.confluent.connect.hdfs.DataWriter.write(DataWriter.java:234)
at io.confluent.connect.hdfs.HdfsSinkTask.put(HdfsSinkTask.java:103)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:384)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:240)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:172)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:143)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at io.confluent.connect.hdfs.wal.WALFile$Reader.init(WALFile.java:584)
at io.confluent.connect.hdfs.wal.WALFile$Reader.initialize(WALFile.java:552)
at io.confluent.connect.hdfs.wal.WALFile$Reader.<init>(WALFile.java:529)
at io.confluent.connect.hdfs.wal.FSWAL.apply(FSWAL.java:107)
... 16 more
The exception I got many times. The line before varies, eg.
INFO Committed hdfs:/xxxx:8020//kafka-connect/topics/tstperf/year=2017/month=01/day=27/hour=14//tstperf+1+0011992516+0011992520.avro for tstperf-1 (io.confluent.connect.hdfs.TopicPartitionWriter:625)
or
INFO Starting commit and rotation for topic partition open-0 with start offsets {year=2017/month=01/day=27/hour=14/=58337131} and end offsets {year=2017/month=01/day=27/hour=14/=58338674, year=2017/month=01/day=27/hour=13/=58274188} (io.confluent.connect.hdfs.TopicPartitionWriter:297)
When the line before is the first example, the file pointed out does not exist in hdfs.
There is no other common pattern I could find.
I am using the distributed connector, on 3 servers, and only one of them shows this issue.
Data still arrives in hdfs, probably from the 2 other connectors.
Issues started to happen after a restart of all connectors (stopped during hdfs maintenance).
There are no weird empty files, nothing about 'not an avro data file' in the logs...
I have 2 questions related to that:
- What happens and how can it be fixed?
- Can I be confident that all data is still being written to HDFS thanks to the other instances of the connector, or am I losing 1/3?
Thanks,