I have a data file (csv) which has line separator characters inside some records. Because of this Hadoop line reader incorrectly break record during map phase.
I overcome this problem by writing custom FileInputFormat (class CustomFileInputFormat extends FileInputFormat<LongWritable, Text>) and corresponding LineReader and RecordReader. This is working with plain map/reduce java code. In my driver I have:
Job job=new Job(getConf(),"Clean up UCC records");
job.setJarByClass(getClass());
job.setInputFormatClass(CustomFileInputFormat.class);..
But I want to use Kite to describe my data source and use kite's Dataset to describe and load my data. Is it possible? How can I told Kite to use my CustomInputFormat?
Thanks