KStream Avro to Parquet conversion and store on HDFS

105 views
Skip to first unread message

omkar....@gslab.com

unread,
Sep 1, 2016, 6:19:47 AM9/1/16
to Confluent Platform
Hi,

In our use case we are currently using Kafka JDBC connector, Kafka Streams to process the data.
The requirement is to put the processed data in Parquet format into HDFS. The sequence of data flow that we have is as follows

1. We use JDBC connector that talks to MySQL database and read data.
2. JDBC connector pass data to Kafka broker on topic say 'test-mysql-kafka'
3. Kafka Stream consumer consume data from Kafka on topic 'test-mysql-kafka'.
4. The data that we get is in Avro format in Kafka Stream.

We want to store this data in Hadoop HDFS and in converted Parquet format.

Does Kafka stream or any other library available that we can use for data conversation process as well as store it in HDFS?

Thanks,
OmkarSabane

Dustin Cote

unread,
Sep 1, 2016, 8:34:43 AM9/1/16
to confluent...@googlegroups.com
Hi Omar,

The HDFS Sink Connector can be used with a Parquet output format.  Here's an example of setting up Parquet as the output format:

Regards,


--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent-platform@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/7a8754a0-2c97-43e0-8cc8-7cc8c14fbaae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Dustin Cote
Customer Operations Engineer | Confluent
Follow us: Twitter | blog
Reply all
Reply to author
Forward
0 new messages