I now have a single node setup (I downloaded hadoop 1.2.1 to a directory). I tried a couple of command lines, here are the promising ones:
dumbo start merge_people.py -hadoop tmp/hadoop-1.2.1 -input input.seq00 -overwrite yes -output test
This fails with:
INFO: consuming file:/home/ubuntu/input.seq00
INFO: inputting typed bytes
13/10/29 13:06:02 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
INFO: buffersize = 168960
INFO: outputting typed bytes
Traceback (most recent call last):
...
File "typedbytes.py", line 93, in _read
return self.handler_table[t](self)
File "typedbytes.py", line 182, in invalid_typecode
raise StructError("Invalid type byte: " + str(self.t))
struct.error: Invalid type byte: 50
I changed it to:
dumbo start merge_people.py -hadoop tmp/hadoop-1.2.1 -inputformat 'org.apache.hadoop.mapred.SequenceFileInputFormat' -input input.seq00 -overwrite yes -output test
It's not working properly. Mapper's key parameter has the initial bytes of the input file (e.g. SEQ...).
Any clues?