tshead
unread,Mar 26, 2012, 6:37:51 PM3/26/12Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to dumbo-user
Folks:
I'm trying to run the wordcount.py example using hadoop on OSX Snow
Leopard. My configuration:
* Hadoop 1.0.1 installed using MacPorts
* Python 2.7.2 installed using MacPorts
* Dumbo 0.21.32 installed using easy_install.
When I run the test as local UNIX processes, it works fine. When it
run it on Hadoop:
$ sudo dumbo start test.py -hadoop /opt/local/share/java/hadoop-1.0.1/
-input /books/input.txt -inputformat text -output /books/output.txt -
overwrite yes -python /opt/local/bin/python
all of the map tasks fail, with similar output in the stderr logs:
INFO: consuming hdfs://localhost:9000/books/input.txt
INFO: inputting typed bytes
INFO: outputting typed bytes
Traceback (most recent call last):
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/
lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/
lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/opt/local/var/hadoop/cache/mapred/local/taskTracker/root/
jobcache/job_201203261358_0007/attempt_201203261358_0007_m_000000_2/
work/test.py", line 10, in <module>
dumbo.run(mapper, reducer)
File "dumbo/core.py", line 315, in run
typedbytes.PairedOutput(sys.stdout).writes(outputs)
File "typedbytes.py", line 397, in writes
self._writes(flatten(iterable))
File "typedbytes.py", line 256, in _writes
for obj in iterable:
File "typedbytes.py", line 229, in flatten
for i in iterable:
File "dumbo/core.py", line 439, in mapfunc_iter
for (key, value) in data:
File "typedbytes.py", line 381, in reads
key = next()
File "typedbytes.py", line 104, in _reads
yield r()
File "typedbytes.py", line 93, in _read
return self.handler_table[t](self)
File "typedbytes.py", line 182, in invalid_typecode
raise StructError("Invalid type byte: " + str(self.t))
struct.error: Invalid type byte: 89
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
failed with code 255
at
org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:
311)
at
org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:
545)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:
36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:
1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
The "Invalid type byte: 89" makes me think that this is some sort of
compatibility issue between versions?
Suggestions welcome,
Tim Shead