Error running MultipleOutputFiles example

Felix H

unread,

Jan 23, 2013, 10:05:07 AM1/23/13

to dumbo...@googlegroups.com

Hi,

I am struggling with getting the MultipleOutputFiles example to work. I am running an Hadoop cluster with CDH4 (Hadoop 2.0.0) and a working dumbo installation.

I was able to compile 'feathers.jar' and wanted to try out the example given at http://dumbotics.com/2009/06/08/multiple-outputs/ .

Running splitwordcount.py locally returns the results in a single file, just as explained in the tutorial.

If I want to try it on hadoop, the map phase runs smoothly, however during reducing I get the following error for all reducer attempts and the job fails:

java.io.IOException: subprocess still running
R/W/S=82260/24/0 in:NA [rec/s] out:NA [rec/s]
minRecWrittenToEnableSkip_=9223372036854775807 LOGNAME=null
HOST=null
USER=mapred
HADOOP_USER=null
last Hadoop input: |null|
last tool output: |9812|
Date: Wed Jan 23 15:48:35 CET 2013
Broken pipe
	at org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:131)
	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:492)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)

Klaas Bosteels

unread,

Jan 23, 2013, 12:29:26 PM1/23/13

to dumbo...@googlegroups.com

Can you try looking for the stderr logs of the failing tasks? The broken pipe error that hadoop typically throws when something goes wrong in a dumbo script is pretty useless...

-K

--
You received this message because you are subscribed to the Google Groups "dumbo-user" group.
To post to this group, send email to dumbo...@googlegroups.com.
To unsubscribe from this group, send email to dumbo-user+...@googlegroups.com.
Visit this group at http://groups.google.com/group/dumbo-user?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Felix H

unread,

Jan 24, 2013, 4:27:24 AM1/24/13

to dumbo...@googlegroups.com

This is the stderr log of the failing task:

INFO: inputting typed bytes
INFO: outputting typed bytes
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/var/hadoop/tmp/mapred/local/taskTracker/hieber/jobcache/job_201301111424_1442/attempt_201301111424_1442_r_000000_0/work/splitwordcount.py", line 12, in <module>
run(mapper, reducer, combiner=sumreducer)
File "build/bdist.linux-x86_64/egg/dumbo/core.py", line 380, in run
File "build/bdist.linux-x86_64/egg/typedbytes.py", line 397, in writes
File "build/bdist.linux-x86_64/egg/typedbytes.py", line 257, in _writes
File "build/bdist.linux-x86_64/egg/typedbytes.py", line 250, in _write
File "build/bdist.linux-x86_64/egg/typedbytes.py", line 312, in write_vector
File "build/bdist.linux-x86_64/egg/typedbytes.py", line 257, in _writes
File "build/bdist.linux-x86_64/egg/typedbytes.py", line 250, in _write
File "build/bdist.linux-x86_64/egg/typedbytes.py", line 299, in write_string
IOError: [Errno 32] Broken pipe

However, I used custom input. I tried it with the small example file "brian.txt", and it works seamlessly. I guess its because of my input, but I would like to understand what causes the error.

Thanks for your help!

Reply all

Reply to author

Forward