Using non-text input in local runs

Igor Gatis

unread,

Oct 29, 2013, 7:47:57 AM10/29/13

to dumbo...@googlegroups.com

I noticed only text files are supported for local runs. Is there any way to make local run work for binary files?

Gilles

unread,

Oct 29, 2013, 8:51:22 AM10/29/13

to dumbo...@googlegroups.com

Local run is really too limited and is not usable out of the simple word count example. I recommend to setup a Single Node Hadoop Cluster on you dev box. This is easy to do and it gives you the real hadoop features for developing with dumbo.

http://hadoop.apache.org/docs/r1.2.1/single_node_setup.html

-Gilles

Igor Gatis

unread,

Oct 29, 2013, 9:10:51 AM10/29/13

to dumbo...@googlegroups.com

I now have a single node setup (I downloaded hadoop 1.2.1 to a directory). I tried a couple of command lines, here are the promising ones:

dumbo start merge_people.py -hadoop tmp/hadoop-1.2.1 -input input.seq00 -overwrite yes -output test

This fails with:

INFO: consuming file:/home/ubuntu/input.seq00

INFO: inputting typed bytes

13/10/29 13:06:02 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]

INFO: buffersize = 168960

INFO: outputting typed bytes

Traceback (most recent call last):

...

File "typedbytes.py", line 93, in _read

return self.handler_table[t](self)

File "typedbytes.py", line 182, in invalid_typecode

raise StructError("Invalid type byte: " + str(self.t))

struct.error: Invalid type byte: 50

I changed it to:

dumbo start merge_people.py -hadoop tmp/hadoop-1.2.1 -inputformat 'org.apache.hadoop.mapred.SequenceFileInputFormat' -input input.seq00 -overwrite yes -output test

It's not working properly. Mapper's key parameter has the initial bytes of the input file (e.g. SEQ...).

Any clues?

--
You received this message because you are subscribed to the Google Groups "dumbo-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dumbo-user+...@googlegroups.com.
To post to this group, send email to dumbo...@googlegroups.com.
Visit this group at http://groups.google.com/group/dumbo-user.
For more options, visit https://groups.google.com/groups/opt_out.

Igor Gatis

unread,

Oct 29, 2013, 9:32:01 AM10/29/13

to dumbo...@googlegroups.com

I was using NullWritable as key. I changed it to IntWritable and everything worked just fine. I wonder whether NullWritable should be supported. Using empty key saves space.

Klaas Bosteels

unread,

Oct 29, 2013, 1:39:44 PM10/29/13

to dumbo...@googlegroups.com

Not sure how easy it'd be to add, but supporting NullWritable could be useful yeah. Patches welcome.. :)

-K

Reply all

Reply to author

Forward