Using non-text input in local runs

11 views
Skip to first unread message

Igor Gatis

unread,
Oct 29, 2013, 7:47:57 AM10/29/13
to dumbo...@googlegroups.com
I noticed only text files are supported for local runs. Is there any way to make local run work for binary files?

Gilles

unread,
Oct 29, 2013, 8:51:22 AM10/29/13
to dumbo...@googlegroups.com
Local run is really too limited and is not usable out of the simple word count example. I recommend to setup a Single Node Hadoop Cluster on you dev box. This is easy to do and it gives you the real hadoop features for developing with dumbo.

-Gilles

Igor Gatis

unread,
Oct 29, 2013, 9:10:51 AM10/29/13
to dumbo...@googlegroups.com
I now have a single node setup (I downloaded hadoop 1.2.1 to a directory). I tried a couple of command lines, here are the promising ones:

dumbo start merge_people.py -hadoop tmp/hadoop-1.2.1 -input input.seq00 -overwrite yes -output test

This fails with:
INFO: consuming file:/home/ubuntu/input.seq00
INFO: inputting typed bytes
13/10/29 13:06:02 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
INFO: buffersize = 168960
INFO: outputting typed bytes
Traceback (most recent call last):
  ...
  File "typedbytes.py", line 93, in _read
    return self.handler_table[t](self)
  File "typedbytes.py", line 182, in invalid_typecode
    raise StructError("Invalid type byte: " + str(self.t))
struct.error: Invalid type byte: 50

I changed it to:

dumbo start merge_people.py -hadoop tmp/hadoop-1.2.1 -inputformat 'org.apache.hadoop.mapred.SequenceFileInputFormat' -input input.seq00 -overwrite yes -output test

It's not working properly. Mapper's key parameter has the initial bytes of the input file (e.g. SEQ...).

Any clues?




--
You received this message because you are subscribed to the Google Groups "dumbo-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dumbo-user+...@googlegroups.com.
To post to this group, send email to dumbo...@googlegroups.com.
Visit this group at http://groups.google.com/group/dumbo-user.
For more options, visit https://groups.google.com/groups/opt_out.

Igor Gatis

unread,
Oct 29, 2013, 9:32:01 AM10/29/13
to dumbo...@googlegroups.com
I was using NullWritable as key. I changed it to IntWritable and everything worked just fine. I wonder whether NullWritable should be supported. Using empty key saves space.

Klaas Bosteels

unread,
Oct 29, 2013, 1:39:44 PM10/29/13
to dumbo...@googlegroups.com
Not sure how easy it'd be to add, but supporting NullWritable could be useful yeah. Patches welcome.. :)

-K
Reply all
Reply to author
Forward
0 new messages