Hi all,
I'm writing a KeyValueParser for an initial scan of an HDFS file. In my case, I need the input to be split not by a single line but grouped by four lines (actually, some multiple of four lines).
In Hadoop, I can specify that each mapper be sent a certain number of lines:
conf.setInputFormat(NLineInputFormat.class);
conf.setInt("mapred.line.input.format.linespermap", 2000000); // each mapper receives this many lines to work on
Is there a way to change the default one-line-per-key-value-pair behavior as I write the IKeyValueParser?
Thanks!