Kafka HDFS Connector: Topic directory not created in HDFS /topics

446 views
Skip to first unread message

danoomistmatiste

unread,
Apr 4, 2017, 3:35:36 PM4/4/17
to Confluent Platform
Hello, I have a very fundamental problem.  The topic directory for my topic (connect-test) is not getting created under /topics in hdfs.  All I see is /topics/+tmp which is empty.  Here are my command and configs.  I have installed the confluent 3.0 platform and all the processes such as Kafka,  connect-standalone are running from this distribution.  It is connecting to an Apache Hadoop 2.7.3 cluster running on my Mac OS X machine.

./bin/connect-standalone etc/kafka/connect-standalone.properties etc/kafka/connect-file-source.properties etc/kafka-connect-hdfs/quickstart-hdfs.properties


connect-standalone.properties


bootstrap.servers=localhost:9092

key.converter=org.apache.kafka.connect.json.JsonConverter

value.converter=org.apache.kafka.connect.json.JsonConverter

key.converter.schemas.enable=false

value.converter.schemas.enable=false

internal.key.converter=org.apache.kafka.connect.json.JsonConverter

internal.value.converter=org.apache.kafka.connect.json.JsonConverter

internal.key.converter.schemas.enable=false

internal.value.converter.schemas.enable=false

offset.storage.file.filename=/tmp/connect.offsets

offset.flush.interval.ms=1000



connect-file-source.properties


name=local-file-source

connector.class=FileStreamSource

tasks.max=1

file=/Users/hadoop/confluent-3.0.0/test.txt

topic=connect-test


quickstart-hdfs.properties


name=hdfs-sink

connector.class=io.confluent.connect.hdfs.HdfsSinkConnector

tasks.max=1

topics=connect-test

key.ignore=true

topic.schema.ignore=true

hdfs.url=hdfs://localhost:8020

flush.size=3

hadoop.conf.dir=/Users/hadoop/hadoop-2.7.3/etc/hadoop/conf

hadoop.home=/Users/hadoop/hadoop-2.7.3

locale=US

rotate.interval.ms=1000

offset.flush.interval.ms=1000

Ewen Cheslack-Postava

unread,
Apr 7, 2017, 2:06:52 AM4/7/17
to Confluent Platform
I see that you're using JsonConverter. Currently, there are no output formats for the HDFS connector that will handle schemaless data, so this may be the source of your problem. The default format is Avro and you haven't specified format.class in your config, so it should be using that. Avro requires schema information to be serialized properly. I would expect that an error would be reported in the log, and that the REST API would report a failed status for the tasks in the connector.

We plan to have better support for schemaless data in an upcoming version of the HDFS connector, although you'll need to override the format in that case (e.g. to store 1 JSON record per line, instead of converting it to Avro format).

-Ewen

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent-platform@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/0dd5c84a-f33f-43a2-97d6-c961f7471cdc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

danoomistmatiste

unread,
Apr 17, 2017, 6:26:30 PM4/17/17
to Confluent Platform
Ewen, Thank you for the response.  I had to start the schema-registry process.  Once I did this the messages started flowing into HDFS.
To post to this group, send email to confluent...@googlegroups.com.

Zhan Progaman

unread,
Jan 21, 2018, 3:30:55 PM1/21/18
to Confluent Platform
Hi @Ewen,
I am trying to use Kafka HDFS Sink with Azure Blob Storage through "wasb",
I have added properties in core-site.xml 
<property>
<name>fs.wasbs.impl</name>
<value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value>
</property>
and added HADOOP_CLASSPATH, but Hadoop can not find this class "NativeAzureFileSystem".

Could you please help with this? Is it possible? 

пятница, 7 апреля 2017 г., 9:06:52 UTC+3 пользователь Ewen Cheslack-Postava написал:
To post to this group, send email to confluent...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages