Using custom FileInputFormat with Kite

12 views
Skip to first unread message

Eugene Dvorkin

unread,
Feb 10, 2015, 8:08:18 AM2/10/15
to cdk...@cloudera.org
I have a data file (csv) which has line separator characters inside some records. Because of this Hadoop line reader incorrectly break record during map phase. 
I overcome this problem by writing custom FileInputFormat (class CustomFileInputFormat extends FileInputFormat<LongWritable, Text>) and corresponding LineReader and RecordReader. This is working with plain map/reduce java code. In my driver I have:
Job job=new Job(getConf(),"Clean up UCC records");

job.setJarByClass(getClass());

job.setInputFormatClass(CustomFileInputFormat.class);..

But I want to use Kite to describe my data source and use kite's Dataset to describe and load my data. Is it possible? How can I told Kite to use my CustomInputFormat?
Thanks

Ryan Blue

unread,
Feb 17, 2015, 5:41:23 PM2/17/15
to Eugene Dvorkin, cdk...@cloudera.org
Hi Eugene!

Sorry it has taken me so long to respond! I ended up thinking that this
was a great question and added the answer to our team's blog. The post
is here:


http://ingest.tips/2015/02/17/kite-0-18-0-adds-custom-inputformat-support/

Thanks for the great question!

rb


--
Ryan Blue
Software Engineer
Cloudera, Inc.
Reply all
Reply to author
Forward
0 new messages