Kafka Connect unable to write data into HDFS

253 views
Skip to first unread message

Nishant Verma

unread,
Jan 12, 2017, 4:47:45 AM1/12/17
to Confluent Platform
I will explain the scenario here.

I have a standalone KAFKA 0.10 and ZOOKEEPER running in one server. I have one Hadoop Namenode in a different machine. My standalone KAFKA and ZOOKEEPER have some 7-8 topics created. Out of these topics, I am trying to pull one into HDFS through confluent KAFKA CONNECT HDFS CONNECTOR. I downloaded confluent 3.1.1 in the standalone server and updated hdfs.url to namenode's ip (hdfs://ip-10-16-37-124:9000) . I am giving below command to start SCHEMA REGISTRY:

sudo ./bin/schema-registry-start ./etc/schema-registry/schema-registry.properties

I am giving below command to start KAFKA CONNECT:

sudo ./bin/connect-standalone etc/kafka/connect-standalone.properties etc/kafka-connect-hdfs/quickstart-hdfs.properties



After starting KAFKA CONNECT, I am getting below information and data is not written on HDFS:


[2017-01-12 15:09:20,860] INFO Fetch offset 0 is out of range for partition Prd_IN_GeneralEvents-99, resetting offset (org.apache.kafka.clients.consumer.internals.Fetcher:708)

[2017-01-12 15:09:20,862] INFO Fetch offset 0 is out of range for partition Prd_IN_GeneralEvents-10, resetting offset (org.apache.kafka.clients.consumer.internals.Fetcher:708)

[2017-01-12 15:09:20,863] INFO Fetch offset 0 is out of range for partition Prd_IN_GeneralEvents-175, resetting offset (org.apache.kafka.clients.consumer.internals.Fetcher:708)

[2017-01-12 15:09:20,865] INFO Fetch offset 0 is out of range for partition Prd_IN_GeneralEvents-343, resetting offset (org.apache.kafka.clients.consumer.internals.Fetcher:708)

[2017-01-12 15:09:24,951] INFO Reflections took 7628 ms to scan 263 urls, producing 12034 keys and 80062 values  (org.reflections.Reflections:229)


I have created one blank file as connect.offsets inside confluent directory and updated value as 


offset.storage.file.filename=/opt/confluent-3.0.0/connect.offsets in connect-standalone.properties .


Why the data is not getting written on HDFS?



Ewen Cheslack-Postava

unread,
Jan 13, 2017, 2:10:15 AM1/13/17
to Confluent Platform
Are there any other logs? The messages you listed are just INFO and indicate that the connector is getting assigned some partitions and starts to consume from them.

The offsets setting shouldn't be relevant -- nothing around offsets management would cause the connector to not even write any data to HDFS. (In fact, in this case the Kafka Connect offset tracking is not even used.) Some more log info might help reveal the source of the problem.

-Ewen

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent-platform@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/620336fe-d4eb-4741-81d9-e5d231267cd5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nishant Verma

unread,
Jan 13, 2017, 3:38:39 AM1/13/17
to Confluent Platform
Right now, I could see that after starting HDFS-Connector, there is no error in the logs. 
In HDFS, in /topics/+tmp there is a directory with name Prd_IN_GeneralEvents (one of the topics from KAFKA) is there. There is just one partition directory inside this path(partition=219). But that is an empty directory. Also these are present in /topics/+tmp . Shouldn't they move to /topics/Prd_IN_GeneralEvents and inside this all partitions would be there? 

Nishant Verma

unread,
Jan 13, 2017, 3:46:13 AM1/13/17
to Confluent Platform
Which log file should I focus on here? 

Nishant


On Thursday, January 12, 2017 at 3:17:45 PM UTC+5:30, Nishant Verma wrote:

Ewen Cheslack-Postava

unread,
Jan 15, 2017, 6:37:39 PM1/15/17
to Confluent Platform
The Connect log file. The temp directory is used to store files before they are "committed", i.e.  moved to their final location. This is necessary since we a) don't know the full filename up front and b) need to protect against failed writers since we want to guarantee exactly once delivery.

If the Connect log never has a log message about "Starting commit and rotation for topic partition" at INFO level, that means you're never hitting a condition where it'll try closing the temp files and moving them to their final locations. If this is the case, you should look at the configs for rotating files as well as checking how much traffic you have flowing through the topic.

-Ewen

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsubscribe@googlegroups.com.

To post to this group, send email to confluent-platform@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages