Is there a TypedBytesInputFormat?

7 views
Skip to first unread message

Igor Gatis

unread,
Nov 13, 2013, 11:17:28 AM11/13/13
to dumbo...@googlegroups.com
I've populated a typedbytes file with key-value pairs. Is there a way to make hadoop it read without converting it to, say, sequence file?

side question: say the typedbytes file is gzipped. How can I make hadoop ungzip it on the fly?

Klaas Bosteels

unread,
Nov 18, 2013, 2:25:27 AM11/18/13
to dumbo...@googlegroups.com
You could write an input format for reading plain files containing typed bytes, but it wouldn't be that useful as it won't be able to split the files. You'll probably want to put the typedbytes in a container file such as SequenceFile that supports splitting (and also compression), which is basically the file format Dumbo outputs by default...

-K


On Wed, Nov 13, 2013 at 5:17 PM, Igor Gatis <igor...@gmail.com> wrote:
I've populated a typedbytes file with key-value pairs. Is there a way to make hadoop it read without converting it to, say, sequence file?

side question: say the typedbytes file is gzipped. How can I make hadoop ungzip it on the fly?

--
You received this message because you are subscribed to the Google Groups "dumbo-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dumbo-user+...@googlegroups.com.
To post to this group, send email to dumbo...@googlegroups.com.
Visit this group at http://groups.google.com/group/dumbo-user.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all
Reply to author
Forward
0 new messages