Sequence Files with Dumbo

7 views
Skip to first unread message

Ryan

unread,
Nov 9, 2009, 3:41:26 AM11/9/09
to dumbo-user
Hello,

Does Dumbo support sequence files?

-Ryan

Klaas Bosteels

unread,
Nov 9, 2009, 5:19:02 AM11/9/09
to dumbo...@googlegroups.com
Yes, it does. Sequence files are the default output format on Hadoop,
and the "short tutorial" on the wiki explains what happens when you
take sequence files as input:

"For sequence files, the type of the keys and values can differ from
file to file. Most common writables are converted to suitable Python
types, and the remaining writables are converted to a string by means
of their toString() method. Hadoop records are converted to lists
consisting of the values of their attributes."

http://wiki.github.com/klbostee/dumbo/short-tutorial#input_formats

Hope this helps,
-Klaas
Reply all
Reply to author
Forward
0 new messages