NLineInputFormat equivalent for Hyracks?

10 views
Skip to first unread message

Jacob Biesinger

unread,
Jul 25, 2013, 2:47:20 PM7/25/13
to hyrack...@googlegroups.com
Hi all,

I'm writing a KeyValueParser for an initial scan of an HDFS file. In my case, I need the input to be split not by a single line but grouped by four lines (actually, some multiple of four lines).

In Hadoop, I can specify that each mapper be sent a certain number of lines:

conf.setInputFormat(NLineInputFormat.class);
conf.setInt("mapred.line.input.format.linespermap", 2000000); // each mapper receives this many lines to work on

Is there a way to change the default one-line-per-key-value-pair behavior as I write the IKeyValueParser?

Thanks!

Yingyi Bu

unread,
Jul 25, 2013, 3:19:05 PM7/25/13
to hyrack...@googlegroups.com
You can pass whatever conf to the constructor of HDFSReadOperatorDescriptor.
The splits will be obtained in the way your conf wants.

Yingyi

--
You received this message because you are subscribed to the Google Groups "hyracks-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hyracks-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Jacob Biesinger

unread,
Jul 25, 2013, 4:42:13 PM7/25/13
to hyrack...@googlegroups.com
Great, thanks, Yingyi.  Guess I should have tried first.  Works like a charm.

--
Jake Biesinger
Graduate Student
Xie Lab, UC Irvine

Yingyi Bu

unread,
Jul 25, 2013, 4:43:25 PM7/25/13
to hyrack...@googlegroups.com
No problem.  Let me know if you run into further issues:-)

Yingyi

Jacob Biesinger

unread,
Jul 25, 2013, 5:04:18 PM7/25/13
to hyrack...@googlegroups.com
I've set the conf up properly AFAICT and it seems the splits are respected, but the parser I'm writing still only receives a single line of text as its value.  

I am overriding `createKeyValueParser` and returning a closure class which implements `parse(LongWritable key, Text value, IFrameWriter writer)`.  Is there a different parser to extend or some hyracks-specific option to set?

--
Jake Biesinger
Graduate Student
Xie Lab, UC Irvine


Yingyi Bu

unread,
Jul 25, 2013, 5:10:35 PM7/25/13
to hyrack...@googlegroups.com
Oh, right, that line parameter is not respected because the IKeyValueParser process one-key-value-at-a-time.
You need to implement the IKeyValueParserFactory to parse your data.   They key and value types are generic so you will be able to process different type key and values.

Yingyi
Reply all
Reply to author
Forward
0 new messages