Invalid type byte error

tshead

unread,

Mar 26, 2012, 6:37:51 PM3/26/12

to dumbo-user

Folks:

I'm trying to run the wordcount.py example using hadoop on OSX Snow
Leopard. My configuration:

* Hadoop 1.0.1 installed using MacPorts
* Python 2.7.2 installed using MacPorts
* Dumbo 0.21.32 installed using easy_install.

When I run the test as local UNIX processes, it works fine. When it
run it on Hadoop:

$ sudo dumbo start test.py -hadoop /opt/local/share/java/hadoop-1.0.1/
-input /books/input.txt -inputformat text -output /books/output.txt -
overwrite yes -python /opt/local/bin/python

all of the map tasks fail, with similar output in the stderr logs:

INFO: consuming hdfs://localhost:9000/books/input.txt
INFO: inputting typed bytes
INFO: outputting typed bytes
Traceback (most recent call last):
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/
lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/
lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/opt/local/var/hadoop/cache/mapred/local/taskTracker/root/
jobcache/job_201203261358_0007/attempt_201203261358_0007_m_000000_2/
work/test.py", line 10, in <module>
dumbo.run(mapper, reducer)
File "dumbo/core.py", line 315, in run
typedbytes.PairedOutput(sys.stdout).writes(outputs)
File "typedbytes.py", line 397, in writes
self._writes(flatten(iterable))
File "typedbytes.py", line 256, in _writes
for obj in iterable:
File "typedbytes.py", line 229, in flatten
for i in iterable:
File "dumbo/core.py", line 439, in mapfunc_iter
for (key, value) in data:
File "typedbytes.py", line 381, in reads
key = next()
File "typedbytes.py", line 104, in _reads
yield r()
File "typedbytes.py", line 93, in _read
return self.handler_table[t](self)
File "typedbytes.py", line 182, in invalid_typecode
raise StructError("Invalid type byte: " + str(self.t))
struct.error: Invalid type byte: 89
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
failed with code 255
at
org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:
311)
at
org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:
545)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:
36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:
1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

The "Invalid type byte: 89" makes me think that this is some sort of
compatibility issue between versions?

Suggestions welcome,
Tim Shead

Piers Harding

unread,

Mar 26, 2012, 6:45:30 PM3/26/12

to dumbo...@googlegroups.com

Hi -

I think that looks like you don't have the typed bytes patches etc. built into Hadoop - https://github.com/klbostee/dumbo/wiki/Building-and-installing .

Cheers.

--
You received this message because you are subscribed to the Google Groups "dumbo-user" group.
To post to this group, send email to dumbo...@googlegroups.com.
To unsubscribe from this group, send email to dumbo-user+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/dumbo-user?hl=en.

--
mail/xmpp: pi...@ompka.net

http://www.piersharding.com

Timothy Shead

unread,

Mar 26, 2012, 8:36:11 PM3/26/12

to dumbo...@googlegroups.com

On Mar 26, 2012, at 4:45 PM, Piers Harding wrote:

> Hi -
>
> I think that looks like you don't have the typed bytes patches etc. built into Hadoop - https://github.com/klbostee/dumbo/wiki/Building-and-installing .
>
> Cheers.

Piers:

Thanks for the quick response … my impression from those instructions was that they only applied to versions of Hadoop < 0.21. Since all three patches are marked closed, I assumed that they were already applied in Hadoop 1.0.1 … maybe I'm just too optimistic?

Cheers,
Tim

Klaas Bosteels

unread,

Mar 27, 2012, 6:03:19 AM3/27/12

to dumbo...@googlegroups.com

It's a bit confusing, but Hadoop 1.0 is actually a rename of 0.20 (and thus 0.21 is basically a "higher" Hadoop version number than 1.0). However, as far as I know all patches that Dumbo requires will be included in 1.0.2.

Setting up Hadoop and its related projects can be rather tricky in general these these. I highly recommend using a distribution such as CDH instead...

-K

Timothy Shead

unread,

Mar 27, 2012, 11:43:57 AM3/27/12

to dumbo...@googlegroups.com

On Tue, Mar 27, 2012 at 4:03 AM, Klaas Bosteels <klaas.b...@gmail.com> wrote:

It's a bit confusing, but Hadoop 1.0 is actually a rename of 0.20 (and thus 0.21 is basically a "higher" Hadoop version number than 1.0). However, as far as I know all patches that Dumbo requires will be included in 1.0.2.

Setting up Hadoop and its related projects can be rather tricky in general these these. I highly recommend using a distribution such as CDH instead...

-K

Klaas:

Ahhhh, that clears it up ... ya gotta love working in a field where .21 > 1.0 :)

Many thanks!
Tim

Reply all

Reply to author

Forward