from kafka to hbase

Chen Wang

unread,

Jun 13, 2014, 7:56:11 AM6/13/14

to camu...@googlegroups.com

Hey Folks,

I am trying to use map reduce to read data from kakfa, doing some processing in parallel, then commit to hbase table. I am trying to modify camus source code to achieve that. Just wondering is there any guidance on how this might work? I know currently it's from kafka to hdfs, but could use some help on how to change the 'hdfs' side to 'hbase'

Thanks in advance!!!

Chen

Ken Goodhope

unread,

Jun 13, 2014, 4:18:22 PM6/13/14

to Chen Wang, camu...@googlegroups.com

Hi Chen,

I would start here:

https://github.com/linkedin/camus/blob/afe16bd0816776894fd39f6605651d8b566312b0/camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java

You could create an Hbase implementation that uses an Hbase client, and then use the following prop to specify your implementation:

etl.record.writer.provider.class=

Ken

--
You received this message because you are subscribed to the Google Groups "Camus - Kafka ETL for Hadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Felix GV

unread,

Jun 13, 2014, 5:42:41 PM6/13/14

to Ken Goodhope, Chen Wang, camu...@googlegroups.com

Hello,

I am curious why you would want to use Map Reduce to write data from Kafka to HBase?

Since HBase supports random writes, why not just write a Kafka consumer that pushes the data to HBase in real-time? What's the point of writing in batch?

Unless your intent is to write HFiles and load them while skipping the overhead of the HBase WAL?

--

Felix GV
Data Infrastructure Engineer
Distributed Data Systems
LinkedIn

f...@linkedin.com
linkedin.com/in/felixgv

From: camu...@googlegroups.com [camu...@googlegroups.com] on behalf of Ken Goodhope [kengo...@gmail.com]
Sent: Friday, June 13, 2014 1:18 PM
To: Chen Wang; camu...@googlegroups.com
Subject: Re: from kafka to hbase

Chen Wang

unread,

Jun 13, 2014, 5:53:12 PM6/13/14

to Felix GV, Ken Goodhope, camu...@googlegroups.com

Felix,

The kafka will contain lots of topic, and each topic contains large amount of records, and I need an application to commit those records to hbase as fast as enough: This part itself wasn't that interesting as I already have a storm topology reading from kafka with the high level api at real time and populate to hbase. However, I have requirement that certain topic of kafka need to be read at certain time, thus making it difficult to utilize the storm topology. I thus need to make a stand alone distributable application that can read the topic at specified time, and commit to hbase, hence the map reduce method. (originally i was think of just writting a high level consumer with the same group id and deploy it to multiple machines, but it seems to do the job of map reduce, then I did some research and found that camus might suit my need)

Please do let me know if this makes sense to you, or i could have a better solution.

Thanks,

Chen

Chen Wang

unread,

Jun 13, 2014, 6:07:23 PM6/13/14

to Felix GV, Ken Goodhope, camu...@googlegroups.com

On a second thought, it might make sense to just save the topic content into hdfs instead of kafka, and then having scheduled map reduce job to read form file and load into hbase...

Dzmitry Hancharou

unread,

May 7, 2015, 3:07:21 AM5/7/15

to camu...@googlegroups.com, chen.apa...@gmail.com, kengo...@gmail.com, fvil...@linkedin.com

Hi Chen,

What the final solution you used and how successful it was?

Thanks,

Dzmitry

Michael Taluc

unread,

Aug 26, 2015, 6:54:29 PM8/26/15

to Camus - Kafka ETL for Hadoop, chen.apa...@gmail.com, kengo...@gmail.com, fvil...@linkedin.com

Note this great solution http://www.conductor.com/nightlight/data-stream-processing-bulk-kafka-hadoop/.

Reply all

Reply to author

Forward