It took me a while to finally follow your suggestion, but there we are
I tried using standalone connectors instead, but it does not make a difference in result. What I managed to see, though, are those 2 things:
- Sometimes the lease is acquired, and I can see the time of the file ${logs.dir}/${topic}/${partition}/log in hdfs updated. No actual data come into HDFS, though. Int the log output, I have a lot lines like this one:
[2016-12-22 13:55:43,292] INFO Successfully acquired lease for hdfs://ip-10-0-0-239.eu-west-1.compute.internal:8020//kafka-connect/wal/sent/0/log (io.confluent.connect.hdfs.wal.FSWAL:75)
[2016-12-22 13:55:43,292] INFO Successfully acquired lease for hdfs://ip-10-0-0-239.eu-west-1.compute.internal:8020//kafka-connect/wal/sent/2/log (io.confluent.connect.hdfs.wal.FSWAL:75)
[2016-12-22 13:55:43,292] INFO Successfully acquired lease for hdfs://ip-10-0-0-239.eu-west-1.compute.internal:8020//kafka-connect/wal/sent/1/log (io.confluent.connect.hdfs.wal.FSWAL:75)
[2016-12-22 13:56:33,080] INFO WorkerSinkTask{id=sent-connector-standalone-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSinkTask:262)
[2016-12-22 13:56:33,206] INFO WorkerSinkTask{id=sent-connector-standalone-1} Committing offsets (org.apache.kafka.connect.runtime.WorkerSinkTask:262)
[2016-12-22 13:56:33,225] INFO WorkerSinkTask{id=sent-connector-standalone-2} Committing offsets (org.apache.kafka.connect.runtime.WorkerSinkTask:262)
[2016-12-22 13:56:43,225] INFO Starting commit and rotation for topic partition sent-1 with start offsets {} and end offsets {} (io.confluent.connect.hdfs.TopicPartitionWriter:297)
[2016-12-22 13:56:43,237] INFO Starting commit and rotation for topic partition sent-2 with start offsets {} and end offsets {} (io.confluent.connect.hdfs.TopicPartitionWriter:297)
[2016-12-22 13:56:43,316] INFO Starting commit and rotation for topic partition sent-0 with start offsets {} and end offsets {} (io.confluent.connect.hdfs.TopicPartitionWriter:297)
The empty offset look wrong to me. Indeed, with the distributed connector, if the offset is empty I have no data, if it is not empty I do have data. I know data is coming to kafka in the meantime, so it should be found by the connector.
- After a few start, ^C, I now have the message:
[2016-12-22 14:26:14,077] INFO Cannot acquire lease on WAL hdfs://ip-10-0-0-239.eu-west-1.compute.internal:8020//kafka-connect/wal/sent/2/log (io.confluent.connect.hdfs.wal.FSWAL:80)
[2016-12-22 14:26:21,835] ERROR Recovery failed at state RECOVERY_PARTITION_PAUSED (io.confluent.connect.hdfs.TopicPartitionWriter:229)
org.apache.kafka.connect.errors.ConnectException: Cannot acquire lease after timeout, will retry.
at io.confluent.connect.hdfs.wal.FSWAL.acquireLease(FSWAL.java:95)
I would like to give more info but that's all I have for the moment.