Using custom FileInputFormat with Kite

12 views

Skip to first unread message

Eugene Dvorkin

unread,

Feb 10, 2015, 8:08:18 AM2/10/15

to cdk...@cloudera.org

I have a data file (csv) which has line separator characters inside some records. Because of this Hadoop line reader incorrectly break record during map phase.

I overcome this problem by writing custom FileInputFormat (class CustomFileInputFormat extends FileInputFormat<LongWritable, Text>) and corresponding LineReader and RecordReader. This is working with plain map/reduce java code. In my driver I have:

Job job=new Job(getConf(),"Clean up UCC records");

job.setJarByClass(getClass());

job.setInputFormatClass(CustomFileInputFormat.class);..

But I want to use Kite to describe my data source and use kite's Dataset to describe and load my data. Is it possible? How can I told Kite to use my CustomInputFormat?

Thanks

Ryan Blue

unread,

Feb 17, 2015, 5:41:23 PM2/17/15

to Eugene Dvorkin, cdk...@cloudera.org

Hi Eugene!

Sorry it has taken me so long to respond! I ended up thinking that this
was a great question and added the answer to our team's blog. The post
is here:

http://ingest.tips/2015/02/17/kite-0-18-0-adds-custom-inputformat-support/

Thanks for the great question!

rb

--
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply all

Reply to author

Forward

0 new messages